L1Normalizer
The L1Normalizer adjusts each sample so that its sum of absolute values (L1 norm) equals 1. This scaling method removes differences in total signal between samples, making them directly comparable regardless of overall intensity while preserving the relative proportions of features within each sample.
Overview
L1 normalization scales each sample by dividing all values by the sum of their absolute values. This creates a probabilistic interpretation where each feature represents a proportion of the total signal. The method is particularly useful when:
Total signal varies between samples due to technical factors
You want to focus on relative feature proportions rather than absolute intensities
Samples need to contribute equally to downstream analyses
Working with compositional data or when interested in relative abundances
This approach is commonly used in:
Proteomics for correcting sample loading differences
Genomics for normalizing library size effects
Machine learning for feature scaling
Any analysis where relative proportions are more important than absolute values
Key Features
Unit norm: All samples have exactly the same L1 norm (sum of absolute values = 1)
Proportion preservation: Relative relationships between features within samples are maintained
Simple and fast: Computationally efficient with no parameters to tune
Interpretable: Normalized values represent proportions of the total signal
Algorithm Details
For a data matrix X with shape (n_samples, n_features):
Calculate L1 norm: For each sample i, compute L1_norm_i = Σ|X[i, j]|
Scale sample: X_normalized[i, :] = X[i, :] / L1_norm_i
Mathematical representation:
Example: For sample [10, 20, 30, 40]: - L1 norm = |10| + |20| + |30| + |40| = 100 - Normalized = [0.1, 0.2, 0.3, 0.4]
Parameters
- class pronoms.normalizers.L1Normalizer[source]
Bases:
objectNormalizer that scales each sample to have an L1 norm of 1.
L1 normalization divides each value in a sample by the sum of absolute values in that sample. This is also known as “sum normalization” in proteomics.
- scaling_factors
Scaling factors used for normalization (L1 norm of each sample). Only available after calling normalize().
- Type:
Optional[np.ndarray]
- mean_of_scaling_factors
Mean of scaling factors used to preserve original scale. Only available after calling normalize().
- Type:
Optional[float]
- normalize(X: ndarray) ndarray[source]
Perform L1 normalization on input data X.
- Parameters:
X (np.ndarray) – Input data matrix with shape (n_samples, n_features). Each row represents a sample, each column represents a feature/protein.
- Returns:
Normalized data matrix with the same shape as X.
- Return type:
np.ndarray
- Raises:
ValueError – If input data contains NaN or Inf values.
- plot_comparison(before_data: ndarray, after_data: ndarray, figsize: tuple[int, int] = (10, 8), title: str = 'L1 Normalization Comparison') Figure[source]
Plot data before vs after normalization using a 2D hexbin density plot.
- Parameters:
before_data (np.ndarray) – Data before normalization, shape (n_samples, n_features).
after_data (np.ndarray) – Data after normalization, shape (n_samples, n_features).
figsize (Tuple[int, int], optional) – Figure size, by default (10, 8).
title (str, optional) – Plot title, by default “L1 Normalization Comparison”.
- Returns:
Figure object containing the hexbin density plot.
- Return type:
plt.Figure
Usage Example
Basic L1 normalization:
import numpy as np
from pronoms.normalizers import L1Normalizer
# Create sample data with different total intensities
data = np.array([
[10, 20, 30, 40], # Sample 1: total = 100
[5, 10, 15, 20], # Sample 2: total = 50
[100, 200, 300, 400] # Sample 3: total = 1000
])
# Create and apply normalizer
normalizer = L1Normalizer()
normalized_data = normalizer.normalize(data)
print("Original data:")
print(data)
print("\nNormalized data:")
print(normalized_data)
# Verify L1 norms are 1
print("\nL1 norms after normalization:")
for i, sample in enumerate(normalized_data):
l1_norm = np.sum(np.abs(sample))
print(f"Sample {i+1}: {l1_norm:.6f}")
Visualization:
# Visualize the normalization effect
fig = normalizer.plot_comparison(data, normalized_data)
fig.show()
When to Use
L1Normalizer is particularly useful when:
Sample loading varies: Different amounts of total protein/material across samples
Library size effects: In genomics, when sequencing depth varies between samples
Compositional analysis: When interested in relative proportions rather than absolute amounts
Equal contribution needed: When samples should contribute equally to downstream analyses
Sparse data: Works well with data containing many zeros
Considerations
Scale dependency: Results depend on the absolute scale of the original data
Zero handling: All-zero rows are passed through as zero (the divisor is guarded against division by zero);
scaling_factorsreports the true L1 norm (0for these rows), andmean_of_scaling_factorsis the unbiased mean of the true normsOutlier sensitivity: Large outliers can dominate the L1 norm and compress other values
Information loss: Absolute magnitude information is lost, only proportions are preserved
Not suitable for negative-dominant data: Less meaningful when most values are negative
See Also
MedianNormalizer: For scaling-based normalization that preserves overall scale
QuantileNormalizer: For making distributions identical across samples
MADNormalizer: For robust normalization using median absolute deviation
VSNNormalizer: For variance-stabilizing normalization