nirs4all.operators.augmentation.scattering module
Scattering effects augmentation operators for spectral data.
This module provides wavelength-aware augmentation operators that simulate light scattering effects on NIR spectra, including particle size effects and EMSC-style distortions.
These operators inherit from SpectraTransformerMixin and automatically receive wavelength information from the dataset when used in nirs4all pipelines.
References
Martens, H., Nielsen, J. P., & Engelsen, S. B. (2003). Light scattering and light absorbance separated by extended multiplicative signal correction. Analytical Chemistry, 75(3), 394-404.
Dahm, D. J., & Dahm, K. D. (2007). Interpreting Diffuse Reflectance and Transmittance. NIR Publications.
Burger, J., & Geladi, P. (2005). Hyperspectral NIR image regression. Journal of Chemometrics, 19(5‐7), 355-363.
- class nirs4all.operators.augmentation.scattering.EMSCDistortionAugmenter(multiplicative_range: Tuple[float, float] = (0.9, 1.1), additive_range: Tuple[float, float] = (-0.05, 0.05), polynomial_order: int = 2, polynomial_strength: float = 0.02, correlation: float = 0.3, random_state: int | None = None)[source]
Bases:
SpectraTransformerMixinApply EMSC-style scatter distortions for data augmentation.
Simulates the spectral distortions that Extended Multiplicative Scatter Correction (EMSC) is designed to correct:
x_distorted = a + b*x + c1*λ + c2*λ² + c3*λ³ + …
where: - a is additive offset - b is multiplicative gain - c1, c2, … are polynomial scattering coefficients
- Parameters:
multiplicative_range (tuple of (float, float), default=(0.9, 1.1)) – Range for multiplicative gain factor (b term).
additive_range (tuple of (float, float), default=(-0.05, 0.05)) – Range for additive offset (a term).
polynomial_order (int, default=2) – Order of wavelength polynomial (0 = no polynomial term).
polynomial_strength (float, default=0.02) – Base strength of polynomial scattering terms.
correlation (float, default=0.3) – Correlation between multiplicative and additive terms. Higher values create more realistic scatter patterns.
random_state (int, optional) – Random seed for reproducibility.
Examples
>>> from nirs4all.operators.augmentation import EMSCDistortionAugmenter >>> aug = EMSCDistortionAugmenter(multiplicative_range=(0.85, 1.15)) >>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Use in pipeline for data augmentation >>> aug = EMSCDistortionAugmenter(polynomial_order=3) >>> pipeline = [aug, SNV(), PLSRegression(10)]
Notes
This augmenter is particularly useful when: - Training models that need to be robust to scatter variations - Simulating data from different instruments or sample presentation - Creating training data for transfer learning
References
Martens et al. (2003). Light scattering and light absorbance separated by extended multiplicative signal correction. Analytical Chemistry.
- set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') EMSCDistortionAugmenter
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]
Apply EMSC-style distortions to spectra.
- Parameters:
X (ndarray of shape (n_samples, n_features)) – Input spectra.
wavelengths (ndarray of shape (n_features,)) – Wavelength array in nm.
- Returns:
X_transformed – Spectra with EMSC-style distortions applied.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.augmentation.scattering.ParticleSizeAugmenter(mean_size_um: float = 50.0, size_variation_um: float = 15.0, size_range_um: Tuple[float, float] | None = None, reference_size_um: float = 50.0, wavelength_exponent: float = 1.5, size_effect_strength: float = 0.1, include_path_length: bool = True, path_length_sensitivity: float = 0.5, random_state: int | None = None)[source]
Bases:
SpectraTransformerMixinSimulate particle size effects on scattering for data augmentation.
Particle size affects NIR spectra through wavelength-dependent baseline scattering, typically following a λ^(-n) relationship where n depends on the particle size regime (Rayleigh vs Mie).
Smaller particles cause: - Increased scattering baseline (especially at shorter wavelengths) - Reduced effective optical path length - Additional sample-to-sample variation
- Parameters:
mean_size_um (float, default=50.0) – Mean particle size in micrometers.
size_variation_um (float, default=15.0) – Standard deviation of particle size.
size_range_um (tuple of (float, float), optional) – If provided, randomly sample particle sizes from this range. Overrides mean_size_um and size_variation_um.
reference_size_um (float, default=50.0) – Reference particle size for baseline calculations.
wavelength_exponent (float, default=1.5) – Exponent for wavelength dependence (higher = finer particles). - 4.0 = Rayleigh regime (particles << wavelength) - 1.0-2.0 = Typical for NIR powder samples - 0.0 = No wavelength dependence
size_effect_strength (float, default=0.1) – Overall strength of the scattering effect (0-1).
include_path_length (bool, default=True) – Whether to include path length effects (multiplicative).
path_length_sensitivity (float, default=0.5) – How strongly particle size affects effective path length.
random_state (int, optional) – Random seed for reproducibility.
Examples
>>> from nirs4all.operators.augmentation import ParticleSizeAugmenter >>> aug = ParticleSizeAugmenter(mean_size_um=30.0) >>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Random particle size in pipeline >>> aug = ParticleSizeAugmenter(size_range_um=(20, 100)) >>> pipeline = [aug, PLSRegression(10)]
References
Dahm & Dahm (2007). Interpreting Diffuse Reflectance and Transmittance.
- set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') ParticleSizeAugmenter
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]
Apply particle size effects to spectra.
- Parameters:
X (ndarray of shape (n_samples, n_features)) – Input spectra.
wavelengths (ndarray of shape (n_features,)) – Wavelength array in nm.
- Returns:
X_transformed – Spectra with particle size effects applied.
- Return type:
ndarray of shape (n_samples, n_features)