L1Normalizer

The L1Normalizer adjusts each sample so that its sum of absolute values (L1 norm) equals 1. This scaling method removes differences in total signal between samples, making them directly comparable regardless of overall intensity while preserving the relative proportions of features within each sample.

Overview

L1 normalization scales each sample by dividing all values by the sum of their absolute values. This creates a probabilistic interpretation where each feature represents a proportion of the total signal. The method is particularly useful when:

  • Total signal varies between samples due to technical factors

  • You want to focus on relative feature proportions rather than absolute intensities

  • Samples need to contribute equally to downstream analyses

  • Working with compositional data or when interested in relative abundances

This approach is commonly used in:

  • Proteomics for correcting sample loading differences

  • Genomics for normalizing library size effects

  • Machine learning for feature scaling

  • Any analysis where relative proportions are more important than absolute values

Key Features

  • Unit norm: All samples have exactly the same L1 norm (sum of absolute values = 1)

  • Proportion preservation: Relative relationships between features within samples are maintained

  • Simple and fast: Computationally efficient with no parameters to tune

  • Interpretable: Normalized values represent proportions of the total signal

Algorithm Details

For a data matrix X with shape (n_samples, n_features):

  1. Calculate L1 norm: For each sample i, compute L1_norm_i = Σ|X[i, j]|

  2. Scale sample: X_normalized[i, :] = X[i, :] / L1_norm_i

Mathematical representation:

\[X_{normalized}[i,j] = \frac{X[i,j]}{\sum_{k=1}^{n} |X[i,k]|}\]

Example: For sample [10, 20, 30, 40]: - L1 norm = |10| + |20| + |30| + |40| = 100 - Normalized = [0.1, 0.2, 0.3, 0.4]

Parameters

class pronoms.normalizers.L1Normalizer[source]

Bases: object

Normalizer that scales each sample to have an L1 norm of 1.

L1 normalization divides each value in a sample by the sum of absolute values in that sample. This is also known as “sum normalization” in proteomics.

scaling_factors

Scaling factors used for normalization (L1 norm of each sample). Only available after calling normalize().

Type:

Optional[np.ndarray]

mean_of_scaling_factors

Mean of scaling factors used to preserve original scale. Only available after calling normalize().

Type:

Optional[float]

normalize(X: ndarray) ndarray[source]

Perform L1 normalization on input data X.

Parameters:

X (np.ndarray) – Input data matrix with shape (n_samples, n_features). Each row represents a sample, each column represents a feature/protein.

Returns:

Normalized data matrix with the same shape as X.

Return type:

np.ndarray

Raises:

ValueError – If input data contains NaN or Inf values.

plot_comparison(before_data: ndarray, after_data: ndarray, figsize: tuple[int, int] = (10, 8), title: str = 'L1 Normalization Comparison') Figure[source]

Plot data before vs after normalization using a 2D hexbin density plot.

Parameters:
  • before_data (np.ndarray) – Data before normalization, shape (n_samples, n_features).

  • after_data (np.ndarray) – Data after normalization, shape (n_samples, n_features).

  • figsize (Tuple[int, int], optional) – Figure size, by default (10, 8).

  • title (str, optional) – Plot title, by default “L1 Normalization Comparison”.

Returns:

Figure object containing the hexbin density plot.

Return type:

plt.Figure

Usage Example

Basic L1 normalization:

import numpy as np
from pronoms.normalizers import L1Normalizer

# Create sample data with different total intensities
data = np.array([
    [10, 20, 30, 40],    # Sample 1: total = 100
    [5, 10, 15, 20],     # Sample 2: total = 50
    [100, 200, 300, 400] # Sample 3: total = 1000
])

# Create and apply normalizer
normalizer = L1Normalizer()
normalized_data = normalizer.normalize(data)

print("Original data:")
print(data)
print("\nNormalized data:")
print(normalized_data)

# Verify L1 norms are 1
print("\nL1 norms after normalization:")
for i, sample in enumerate(normalized_data):
    l1_norm = np.sum(np.abs(sample))
    print(f"Sample {i+1}: {l1_norm:.6f}")

Visualization:

# Visualize the normalization effect
fig = normalizer.plot_comparison(data, normalized_data)
fig.show()

When to Use

L1Normalizer is particularly useful when:

  • Sample loading varies: Different amounts of total protein/material across samples

  • Library size effects: In genomics, when sequencing depth varies between samples

  • Compositional analysis: When interested in relative proportions rather than absolute amounts

  • Equal contribution needed: When samples should contribute equally to downstream analyses

  • Sparse data: Works well with data containing many zeros

Considerations

  • Scale dependency: Results depend on the absolute scale of the original data

  • Zero handling: All-zero rows are passed through as zero (the divisor is guarded against division by zero); scaling_factors reports the true L1 norm (0 for these rows), and mean_of_scaling_factors is the unbiased mean of the true norms

  • Outlier sensitivity: Large outliers can dominate the L1 norm and compress other values

  • Information loss: Absolute magnitude information is lost, only proportions are preserved

  • Not suitable for negative-dominant data: Less meaningful when most values are negative

See Also