MedianPolishNormalizer ====================== The ``MedianPolishNormalizer`` implements Tukey's Median Polish algorithm to iteratively remove row and column medians from a data matrix. This method decomposes the data into overall, row, column, and residual effects, effectively removing systematic biases from both samples (rows) and features (columns). Overview -------- Median Polish is a robust exploratory data analysis technique that decomposes a two-way table into additive components: **Data = Overall + Row Effect + Column Effect + Residual** The algorithm works by iteratively: 1. **Removing row medians**: Subtracting the median of each row from all values in that row 2. **Removing column medians**: Subtracting the median of each column from all values in that column 3. **Updating overall effect**: Tracking the cumulative median adjustments 4. **Repeating until convergence**: Continuing until changes become negligible This approach is particularly effective for: - Removing systematic biases affecting entire samples or features - Exploratory analysis of two-way structured data - Preprocessing for downstream analyses that assume additive effects - Microarray and proteomics data where both sample and feature effects are present Key Features ------------ - **Dual bias removal**: Corrects for both row (sample) and column (feature) effects simultaneously - **Robust method**: Uses medians instead of means, making it resistant to outliers - **Additive decomposition**: Provides interpretable components (overall, row, column, residual) - **Iterative convergence**: Continues until stable solution is reached - **Log-space option**: Can automatically log-transform data for multiplicative effects Algorithm Details ----------------- The Median Polish algorithm iteratively removes medians until convergence: 1. **Initialize**: Start with the original data matrix 2. **Row sweep**: For each row, subtract its median from all values 3. **Column sweep**: For each column, subtract its median from all values 4. **Update overall**: Add the median of medians to the overall effect 5. **Check convergence**: Repeat steps 2-4 until changes are below threshold 6. **Return residuals**: Final result is overall + residuals **Mathematical representation**: After convergence: X[i,j] = Overall + Row[i] + Column[j] + Residual[i,j] The normalized output typically returns: Overall + Residual[i,j] Parameters ---------- .. autoclass:: pronoms.normalizers.MedianPolishNormalizer :members: :undoc-members: :show-inheritance: Usage Example ------------- Basic median polish normalization: .. code-block:: python import numpy as np from pronoms.normalizers import MedianPolishNormalizer # Create sample data with row and column effects np.random.seed(42) base_data = np.random.normal(100, 10, (4, 5)) # Add systematic row effects (sample biases) row_effects = np.array([0, 20, -10, 15]).reshape(-1, 1) # Add systematic column effects (feature biases) col_effects = np.array([0, 50, -20, 30, 10]) # Combine effects data = base_data + row_effects + col_effects # Create and apply normalizer normalizer = MedianPolishNormalizer(log_transform=False, max_iter=10) normalized_data = normalizer.normalize(data) print("Original data:") print(data) print("\nNormalized data (residuals + overall):") print(normalized_data) # Examine the decomposition print(f"\nOverall effect: {normalizer.overall_:.2f}") print(f"Row effects: {normalizer.row_effects_}") print(f"Column effects: {normalizer.col_effects_}") With log transformation: .. code-block:: python # For multiplicative effects, use log transformation normalizer_log = MedianPolishNormalizer(log_transform=True) normalized_log = normalizer_log.normalize(data) print("Log-transformed normalization:") print(normalized_log) Visualization: .. code-block:: python # Visualize the normalization effect fig = normalizer.plot_comparison(data, normalized_data) fig.show() When to Use ----------- MedianPolishNormalizer is particularly useful when: - **Two-way effects present**: Both sample (row) and feature (column) biases exist - **Exploratory analysis**: Understanding the structure of systematic effects in data - **Microarray data**: Classic application for gene expression data - **Proteomics preprocessing**: When both sample preparation and protein-specific effects are present - **Robust normalization needed**: When outliers might affect mean-based methods Considerations -------------- - **Additive assumption**: Assumes effects are additive (or multiplicative if log-transformed) - **Convergence**: May require multiple iterations to reach stable solution - **Interpretation**: Results are residuals plus overall effect, not original scale - **Missing values**: Algorithm may not handle missing data well - **Small datasets**: May be unstable with very small sample or feature numbers See Also -------- - :doc:`median_normalizer`: For simple median-based scaling without two-way decomposition - :doc:`mad_normalizer`: For robust normalization using median absolute deviation - :doc:`quantile_normalizer`: For making distributions identical across samples - :doc:`vsn_normalizer`: For variance-stabilizing normalization