L1Normalizer

The L1Normalizer adjusts each sample so that its sum of absolute values (L1 norm) equals 1. This scaling method removes differences in total signal between samples, making them directly comparable regardless of overall intensity while preserving the relative proportions of features within each sample.

Overview

L1 normalization scales each sample by dividing all values by the sum of their absolute values. This creates a probabilistic interpretation where each feature represents a proportion of the total signal. The method is particularly useful when:

Total signal varies between samples due to technical factors
You want to focus on relative feature proportions rather than absolute intensities
Samples need to contribute equally to downstream analyses
Working with compositional data or when interested in relative abundances

This approach is commonly used in:

Proteomics for correcting sample loading differences
Genomics for normalizing library size effects
Machine learning for feature scaling
Any analysis where relative proportions are more important than absolute values

Key Features

Unit norm: All samples have exactly the same L1 norm (sum of absolute values = 1)
Proportion preservation: Relative relationships between features within samples are maintained
Simple and fast: Computationally efficient with no parameters to tune
Interpretable: Normalized values represent proportions of the total signal

Algorithm Details

For a data matrix X with shape (n_samples, n_features):

Calculate L1 norm: For each sample i, compute L1_norm_i = Σ|X[i, j]|
Scale sample: X_normalized[i, :] = X[i, :] / L1_norm_i

Mathematical representation:

\[X_{normalized}[i,j] = \frac{X[i,j]}{\sum_{k=1}^{n} |X[i,k]|}\]

Example: For sample [10, 20, 30, 40]: - L1 norm = |10| + |20| + |30| + |40| = 100 - Normalized = [0.1, 0.2, 0.3, 0.4]

Parameters

class pronoms.normalizers.L1Normalizer[source]

Bases: object

Normalizer that scales each sample to have an L1 norm of 1.

L1 normalization divides each value in a sample by the sum of absolute values in that sample. This is also known as “sum normalization” in proteomics.

scaling_factors

Scaling factors used for normalization (L1 norm of each sample). Only available after calling normalize().

Type:: Optional[np.ndarray]

mean_of_scaling_factors

Mean of scaling factors used to preserve original scale. Only available after calling normalize().

Type:: Optional[float]

normalize(X: ndarray) → ndarray[source]

Perform L1 normalization on input data X.

Parameters:: X (np.ndarray) – Input data matrix with shape (n_samples, n_features). Each row represents a sample, each column represents a feature/protein.
Returns:: Normalized data matrix with the same shape as X.
Return type:: np.ndarray
Raises:: ValueError – If input data contains NaN or Inf values.

plot_comparison(before_data: ndarray, after_data: ndarray, figsize: tuple[int, int] = (10, 8), title: str = 'L1 Normalization Comparison') → Figure[source]

Plot data before vs after normalization using a 2D hexbin density plot.

Parameters:

before_data (np.ndarray) – Data before normalization, shape (n_samples, n_features).
after_data (np.ndarray) – Data after normalization, shape (n_samples, n_features).
figsize (Tuple[int, int], optional) – Figure size, by default (10, 8).
title (str, optional) – Plot title, by default “L1 Normalization Comparison”.

Returns:

Figure object containing the hexbin density plot.

Return type:

plt.Figure

Usage Example

Basic L1 normalization:

import numpy as np
from pronoms.normalizers import L1Normalizer

# Create sample data with different total intensities
data = np.array([
    [10, 20, 30, 40],    # Sample 1: total = 100
    [5, 10, 15, 20],     # Sample 2: total = 50
    [100, 200, 300, 400] # Sample 3: total = 1000
])

# Create and apply normalizer
normalizer = L1Normalizer()
normalized_data = normalizer.normalize(data)

print("Original data:")
print(data)
print("\nNormalized data:")
print(normalized_data)

# Verify L1 norms are 1
print("\nL1 norms after normalization:")
for i, sample in enumerate(normalized_data):
    l1_norm = np.sum(np.abs(sample))
    print(f"Sample {i+1}: {l1_norm:.6f}")

Visualization:

# Visualize the normalization effect
fig = normalizer.plot_comparison(data, normalized_data)
fig.show()

When to Use

L1Normalizer is particularly useful when:

Sample loading varies: Different amounts of total protein/material across samples
Library size effects: In genomics, when sequencing depth varies between samples
Compositional analysis: When interested in relative proportions rather than absolute amounts
Equal contribution needed: When samples should contribute equally to downstream analyses
Sparse data: Works well with data containing many zeros

Considerations

Scale dependency: Results depend on the absolute scale of the original data
Zero handling: All-zero rows are passed through as zero (the divisor is guarded against division by zero); scaling_factors reports the true L1 norm (0 for these rows), and mean_of_scaling_factors is the unbiased mean of the true norms
Outlier sensitivity: Large outliers can dominate the L1 norm and compress other values
Information loss: Absolute magnitude information is lost, only proportions are preserved
Not suitable for negative-dominant data: Less meaningful when most values are negative