nirs4all.operators.augmentation.scattering module

Scattering effects augmentation operators for spectral data.

This module provides wavelength-aware augmentation operators that simulate light scattering effects on NIR spectra, including particle size effects and EMSC-style distortions.

These operators inherit from SpectraTransformerMixin and automatically receive wavelength information from the dataset when used in nirs4all pipelines.

References

  • Martens, H., Nielsen, J. P., & Engelsen, S. B. (2003). Light scattering and light absorbance separated by extended multiplicative signal correction. Analytical Chemistry, 75(3), 394-404.

  • Dahm, D. J., & Dahm, K. D. (2007). Interpreting Diffuse Reflectance and Transmittance. NIR Publications.

  • Burger, J., & Geladi, P. (2005). Hyperspectral NIR image regression. Journal of Chemometrics, 19(5‐7), 355-363.

class nirs4all.operators.augmentation.scattering.EMSCDistortionAugmenter(multiplicative_range: Tuple[float, float] = (0.9, 1.1), additive_range: Tuple[float, float] = (-0.05, 0.05), polynomial_order: int = 2, polynomial_strength: float = 0.02, correlation: float = 0.3, random_state: int | None = None)[source]

Bases: SpectraTransformerMixin

Apply EMSC-style scatter distortions for data augmentation.

Simulates the spectral distortions that Extended Multiplicative Scatter Correction (EMSC) is designed to correct:

x_distorted = a + b*x + c1*λ + c2*λ² + c3*λ³ + …

where: - a is additive offset - b is multiplicative gain - c1, c2, … are polynomial scattering coefficients

Parameters:
  • multiplicative_range (tuple of (float, float), default=(0.9, 1.1)) – Range for multiplicative gain factor (b term).

  • additive_range (tuple of (float, float), default=(-0.05, 0.05)) – Range for additive offset (a term).

  • polynomial_order (int, default=2) – Order of wavelength polynomial (0 = no polynomial term).

  • polynomial_strength (float, default=0.02) – Base strength of polynomial scattering terms.

  • correlation (float, default=0.3) – Correlation between multiplicative and additive terms. Higher values create more realistic scatter patterns.

  • random_state (int, optional) – Random seed for reproducibility.

_requires_wavelengths

Always True - this operator requires wavelength information.

Type:

bool

Examples

>>> from nirs4all.operators.augmentation import EMSCDistortionAugmenter
>>> aug = EMSCDistortionAugmenter(multiplicative_range=(0.85, 1.15))
>>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Use in pipeline for data augmentation
>>> aug = EMSCDistortionAugmenter(polynomial_order=3)
>>> pipeline = [aug, SNV(), PLSRegression(10)]

Notes

This augmenter is particularly useful when: - Training models that need to be robust to scatter variations - Simulating data from different instruments or sample presentation - Creating training data for transfer learning

References

  • Martens et al. (2003). Light scattering and light absorbance separated by extended multiplicative signal correction. Analytical Chemistry.

set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') EMSCDistortionAugmenter

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

wavelengths (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for wavelengths parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]

Apply EMSC-style distortions to spectra.

Parameters:
Returns:

X_transformed – Spectra with EMSC-style distortions applied.

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.augmentation.scattering.ParticleSizeAugmenter(mean_size_um: float = 50.0, size_variation_um: float = 15.0, size_range_um: Tuple[float, float] | None = None, reference_size_um: float = 50.0, wavelength_exponent: float = 1.5, size_effect_strength: float = 0.1, include_path_length: bool = True, path_length_sensitivity: float = 0.5, random_state: int | None = None)[source]

Bases: SpectraTransformerMixin

Simulate particle size effects on scattering for data augmentation.

Particle size affects NIR spectra through wavelength-dependent baseline scattering, typically following a λ^(-n) relationship where n depends on the particle size regime (Rayleigh vs Mie).

Smaller particles cause: - Increased scattering baseline (especially at shorter wavelengths) - Reduced effective optical path length - Additional sample-to-sample variation

Parameters:
  • mean_size_um (float, default=50.0) – Mean particle size in micrometers.

  • size_variation_um (float, default=15.0) – Standard deviation of particle size.

  • size_range_um (tuple of (float, float), optional) – If provided, randomly sample particle sizes from this range. Overrides mean_size_um and size_variation_um.

  • reference_size_um (float, default=50.0) – Reference particle size for baseline calculations.

  • wavelength_exponent (float, default=1.5) – Exponent for wavelength dependence (higher = finer particles). - 4.0 = Rayleigh regime (particles << wavelength) - 1.0-2.0 = Typical for NIR powder samples - 0.0 = No wavelength dependence

  • size_effect_strength (float, default=0.1) – Overall strength of the scattering effect (0-1).

  • include_path_length (bool, default=True) – Whether to include path length effects (multiplicative).

  • path_length_sensitivity (float, default=0.5) – How strongly particle size affects effective path length.

  • random_state (int, optional) – Random seed for reproducibility.

_requires_wavelengths

Always True - this operator requires wavelength information.

Type:

bool

Examples

>>> from nirs4all.operators.augmentation import ParticleSizeAugmenter
>>> aug = ParticleSizeAugmenter(mean_size_um=30.0)
>>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Random particle size in pipeline
>>> aug = ParticleSizeAugmenter(size_range_um=(20, 100))
>>> pipeline = [aug, PLSRegression(10)]

References

  • Dahm & Dahm (2007). Interpreting Diffuse Reflectance and Transmittance.

set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') ParticleSizeAugmenter

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

wavelengths (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for wavelengths parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]

Apply particle size effects to spectra.

Parameters:
Returns:

X_transformed – Spectra with particle size effects applied.

Return type:

ndarray of shape (n_samples, n_features)