nirs4all.operators.augmentation package

Submodules

Module contents

Augmentation operators for spectral data.

This module provides data augmentation operators for NIRS spectra, including:

Noise and Distortion:
  • GaussianAdditiveNoise: Add Gaussian noise

  • MultiplicativeNoise: Apply random gain factors

Baseline Effects:
  • LinearBaselineDrift: Add linear baseline

  • PolynomialBaselineDrift: Add polynomial baseline

Wavelength Distortions:
  • WavelengthShift: Shift spectra along wavelength axis

  • WavelengthStretch: Stretch/compress wavelength axis

Environmental Effects (require wavelengths):
  • TemperatureAugmenter: Simulate temperature-induced spectral changes

  • MoistureAugmenter: Simulate moisture/water activity effects

Scattering Effects (require wavelengths):
  • ParticleSizeAugmenter: Simulate particle size scattering

  • EMSCDistortionAugmenter: Apply EMSC-style distortions

  • ScatterSimulationMSC: Simple MSC-style scatter (legacy)

Edge Artifacts (require wavelengths):
  • DetectorRollOffAugmenter: Simulate detector sensitivity roll-off at edges

  • StrayLightAugmenter: Simulate stray light effects (peak truncation)

  • EdgeCurvatureAugmenter: Simulate edge curvature/baseline bending

  • TruncatedPeakAugmenter: Add truncated peaks at spectral boundaries

  • EdgeArtifactsAugmenter: Combined edge artifacts augmenter

class nirs4all.operators.augmentation.Augmenter(apply_on='samples', random_state=None, *, copy=True)[source]

Bases: TransformerMixin, BaseEstimator

Base class for data augmentation transformers.

abstractmethod augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

fit(X, y=None)[source]

Fit to data.

Parameters:
  • X (array-like) – Input data to fit.

  • y (array-like or None) – Target variable (unused).

Returns:

self – Returns the instance itself.

Return type:

object

fit_transform(X, y=None, **fit_params)[source]

Fit to data and transform it.

Parameters:
  • X (array-like) – Input data to fit and transform.

  • y (array-like or None) – Target variable (unused).

  • **fit_params (dict) – Additional fitting parameters (unused).

Returns:

Transformed data.

Return type:

array-like

transform(X)[source]

Transform the input data by applying data augmentation.

Parameters:

X (array-like) – Input data to transform.

Returns:

Transformed data after augmentation.

Return type:

array-like

class nirs4all.operators.augmentation.DetectorRollOffAugmenter(detector_model: str = 'generic_nir', effect_strength: float = 1.0, noise_amplification: float = 0.02, include_baseline_distortion: bool = True, random_state: int | None = None)[source]

Bases: SpectraTransformerMixin

Simulate detector sensitivity roll-off at spectral edges.

NIR detectors have wavelength-dependent sensitivity curves that typically roll off at the edges of their spectral range. This causes: - Increased noise at edge wavelengths (lower SNR) - Apparent baseline curvature near spectral boundaries - Reduced peak heights at the edges

The effect is modeled as an exponential decay of detector sensitivity outside the optimal wavelength range, which manifests as multiplicative noise amplification and slight baseline distortion.

Parameters:
  • detector_model (str, default="generic_nir") – Detector type to simulate. Available models: - “ingaas_standard”: Standard InGaAs (1000-1600 nm optimal) - “ingaas_extended”: Extended InGaAs (1100-2200 nm optimal) - “pbs”: Lead sulfide (1000-2800 nm optimal) - “silicon_ccd”: Silicon CCD (400-900 nm optimal) - “generic_nir”: Generic NIR detector

  • effect_strength (float, default=1.0) – Scaling factor for the roll-off effect (0-2).

  • noise_amplification (float, default=0.02) – Additional noise added at low-sensitivity wavelengths.

  • include_baseline_distortion (bool, default=True) – Whether to include slight baseline distortion at edges.

  • random_state (int, optional) – Random seed for reproducibility.

_requires_wavelengths

Always True - this operator requires wavelength information.

Type:

bool

Examples

>>> from nirs4all.operators.augmentation import DetectorRollOffAugmenter
>>> aug = DetectorRollOffAugmenter(detector_model="ingaas_standard")
>>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Stronger effect for portable spectrometers
>>> aug = DetectorRollOffAugmenter(effect_strength=1.5)
>>> pipeline = [aug, SNV(), PLSRegression(10)]

References

  • JASCO (2020). Advantages of high-sensitivity InGaAs detector.

  • LaserComponents InGaAs Photodiodes specifications.

set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') DetectorRollOffAugmenter

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

wavelengths (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for wavelengths parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]

Apply detector roll-off effects to spectra.

Parameters:
Returns:

X_transformed – Spectra with detector roll-off effects applied.

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.augmentation.EMSCDistortionAugmenter(multiplicative_range: Tuple[float, float] = (0.9, 1.1), additive_range: Tuple[float, float] = (-0.05, 0.05), polynomial_order: int = 2, polynomial_strength: float = 0.02, correlation: float = 0.3, random_state: int | None = None)[source]

Bases: SpectraTransformerMixin

Apply EMSC-style scatter distortions for data augmentation.

Simulates the spectral distortions that Extended Multiplicative Scatter Correction (EMSC) is designed to correct:

x_distorted = a + b*x + c1*λ + c2*λ² + c3*λ³ + …

where: - a is additive offset - b is multiplicative gain - c1, c2, … are polynomial scattering coefficients

Parameters:
  • multiplicative_range (tuple of (float, float), default=(0.9, 1.1)) – Range for multiplicative gain factor (b term).

  • additive_range (tuple of (float, float), default=(-0.05, 0.05)) – Range for additive offset (a term).

  • polynomial_order (int, default=2) – Order of wavelength polynomial (0 = no polynomial term).

  • polynomial_strength (float, default=0.02) – Base strength of polynomial scattering terms.

  • correlation (float, default=0.3) – Correlation between multiplicative and additive terms. Higher values create more realistic scatter patterns.

  • random_state (int, optional) – Random seed for reproducibility.

_requires_wavelengths

Always True - this operator requires wavelength information.

Type:

bool

Examples

>>> from nirs4all.operators.augmentation import EMSCDistortionAugmenter
>>> aug = EMSCDistortionAugmenter(multiplicative_range=(0.85, 1.15))
>>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Use in pipeline for data augmentation
>>> aug = EMSCDistortionAugmenter(polynomial_order=3)
>>> pipeline = [aug, SNV(), PLSRegression(10)]

Notes

This augmenter is particularly useful when: - Training models that need to be robust to scatter variations - Simulating data from different instruments or sample presentation - Creating training data for transfer learning

References

  • Martens et al. (2003). Light scattering and light absorbance separated by extended multiplicative signal correction. Analytical Chemistry.

set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') EMSCDistortionAugmenter

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

wavelengths (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for wavelengths parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]

Apply EMSC-style distortions to spectra.

Parameters:
Returns:

X_transformed – Spectra with EMSC-style distortions applied.

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.augmentation.EdgeArtifactsAugmenter(detector_roll_off: bool = True, stray_light: bool = True, edge_curvature: bool = True, truncated_peaks: bool = True, overall_strength: float = 1.0, detector_model: str = 'generic_nir', random_state: int | None = None)[source]

Bases: SpectraTransformerMixin

Combined augmenter for edge-related spectral artifacts.

This is a convenience class that combines multiple edge artifact effects: - Detector roll-off - Stray light - Edge curvature - Truncated peaks

Each effect can be individually enabled/disabled.

Parameters:
  • detector_roll_off (bool, default=True) – Enable detector sensitivity roll-off effect.

  • stray_light (bool, default=True) – Enable stray light effect.

  • edge_curvature (bool, default=True) – Enable edge curvature/bending effect.

  • truncated_peaks (bool, default=True) – Enable truncated peak effect at boundaries.

  • overall_strength (float, default=1.0) – Scaling factor for all effects (0-2).

  • detector_model (str, default="generic_nir") – Detector model for roll-off simulation.

  • random_state (int, optional) – Random seed for reproducibility.

_requires_wavelengths

Always True - this operator requires wavelength information.

Type:

bool

Examples

>>> from nirs4all.operators.augmentation import EdgeArtifactsAugmenter
>>> aug = EdgeArtifactsAugmenter(overall_strength=0.8)
>>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Only detector and stray light effects
>>> aug = EdgeArtifactsAugmenter(
...     detector_roll_off=True,
...     stray_light=True,
...     edge_curvature=False,
...     truncated_peaks=False
... )
>>> pipeline = [aug, SNV(), PLSRegression(10)]
set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') EdgeArtifactsAugmenter

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

wavelengths (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for wavelengths parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]

Apply all enabled edge artifact effects to spectra.

Parameters:
Returns:

X_transformed – Spectra with edge artifacts applied.

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.augmentation.EdgeCurvatureAugmenter(curvature_strength: float = 0.02, curvature_type: str = 'random', asymmetry: float = 0.0, edge_focus: float = 0.7, random_state: int | None = None)[source]

Bases: SpectraTransformerMixin

Simulate edge curvature and baseline bending at spectral boundaries.

Edge curvature can arise from various sources: - Optical aberrations in the spectrometer - Wavelength-dependent baseline drift - Polynomial baseline correction artifacts - Sample holder effects

This operator adds smooth curvature that increases towards the spectral edges, mimicking the characteristic “smile” or “frown” patterns often seen in real spectra.

Parameters:
  • curvature_strength (float, default=0.02) – Maximum curvature amplitude (in absorbance units).

  • curvature_type (str, default="random") – Type of curvature pattern: - “random”: Randomly choose smile/frown/asymmetric - “smile”: Upward curvature at edges (convex) - “frown”: Downward curvature at edges (concave) - “asymmetric”: Different curvature at each edge

  • asymmetry (float, default=0.0) – For “asymmetric” type, ratio of left/right curvature (-1 to 1). Positive values emphasize left edge, negative emphasize right.

  • edge_focus (float, default=0.7) – How concentrated the curvature is at edges (0-1). Higher values create sharper edge effects.

  • random_state (int, optional) – Random seed for reproducibility.

_requires_wavelengths

Always True - this operator requires wavelength information.

Type:

bool

Examples

>>> from nirs4all.operators.augmentation import EdgeCurvatureAugmenter
>>> aug = EdgeCurvatureAugmenter(curvature_strength=0.03)
>>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Simulate baseline correction artifacts
>>> aug = EdgeCurvatureAugmenter(
...     curvature_type="asymmetric",
...     asymmetry=0.5,
...     edge_focus=0.8
... )
>>> pipeline = [aug, Detrend(), PLSRegression(10)]

References

  • Cao, A., et al. (2007). A robust method for automated background subtraction of tissue fluorescence. Journal of Raman Spectroscopy.

  • NIRPY Research (2019). Two methods for baseline correction of spectral data.

set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') EdgeCurvatureAugmenter

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

wavelengths (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for wavelengths parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]

Apply edge curvature effects to spectra.

Parameters:
Returns:

X_transformed – Spectra with edge curvature applied.

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.augmentation.IdentityAugmenter(apply_on='samples', random_state=None, *, copy=True)[source]

Bases: Augmenter

An augmenter that returns the input data without any changes.

augment(X, _)[source]

Perform identity augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • _ (str) – Placeholder for unused parameter.

Returns:

Augmented data (same as input data).

Return type:

array-like

class nirs4all.operators.augmentation.MoistureAugmenter(water_activity_delta: float = 0.1, water_activity_range: Tuple[float, float] | None = None, reference_water_activity: float = 0.5, free_water_fraction: float = 0.3, bound_water_shift: float = 25.0, moisture_content: float = 0.1, enable_shift: bool = True, enable_intensity: bool = True, random_state: int | None = None)[source]

Bases: SpectraTransformerMixin

Simulate moisture-induced spectral changes for data augmentation.

Water activity and moisture content affect NIR spectra through shifts in water bands between free and bound states. Higher water activity leads to more free water, while lower water activity means more water is hydrogen-bonded to the sample matrix.

Parameters:
  • water_activity_delta (float, default=0.1) – Change in water activity from reference (0-1 scale).

  • water_activity_range (tuple of (float, float), optional) – If provided, randomly sample water_activity_delta from this range for each sample.

  • reference_water_activity (float, default=0.5) – Reference water activity for the input spectra.

  • free_water_fraction (float, default=0.3) – Base fraction of water that is “free” vs. bound (0-1).

  • bound_water_shift (float, default=25.0) – Wavelength shift (nm) for bound water relative to free water.

  • moisture_content (float, default=0.10) – Base moisture content as fraction (affects intensity).

  • enable_shift (bool, default=True) – Apply water band position shifts.

  • enable_intensity (bool, default=True) – Apply water band intensity changes based on moisture content.

  • random_state (int, optional) – Random seed for reproducibility.

_requires_wavelengths

Always True - this operator requires wavelength information.

Type:

bool

Examples

>>> from nirs4all.operators.augmentation import MoistureAugmenter
>>> aug = MoistureAugmenter(water_activity_delta=0.2)
>>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Random moisture variation in pipeline
>>> aug = MoistureAugmenter(water_activity_range=(-0.2, 0.2))
>>> pipeline = [aug, PLSRegression(10)]

References

  • Büning-Pfaue, H. (2003). Analysis of water in food by near infrared spectroscopy. Food Chemistry, 82(1), 107-115.

  • Luck, W. A. P. (1998). The importance of cooperativity for the properties of liquid water. Journal of Molecular Structure.

BOUND_WATER_PEAK_1ST = 1460
BOUND_WATER_PEAK_COMB = 1940
FREE_WATER_PEAK_1ST = 1410
FREE_WATER_PEAK_COMB = 1920
set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') MoistureAugmenter

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

wavelengths (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for wavelengths parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]

Apply moisture effects to spectra.

Parameters:
Returns:

X_transformed – Spectra with moisture effects applied.

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.augmentation.ParticleSizeAugmenter(mean_size_um: float = 50.0, size_variation_um: float = 15.0, size_range_um: Tuple[float, float] | None = None, reference_size_um: float = 50.0, wavelength_exponent: float = 1.5, size_effect_strength: float = 0.1, include_path_length: bool = True, path_length_sensitivity: float = 0.5, random_state: int | None = None)[source]

Bases: SpectraTransformerMixin

Simulate particle size effects on scattering for data augmentation.

Particle size affects NIR spectra through wavelength-dependent baseline scattering, typically following a λ^(-n) relationship where n depends on the particle size regime (Rayleigh vs Mie).

Smaller particles cause: - Increased scattering baseline (especially at shorter wavelengths) - Reduced effective optical path length - Additional sample-to-sample variation

Parameters:
  • mean_size_um (float, default=50.0) – Mean particle size in micrometers.

  • size_variation_um (float, default=15.0) – Standard deviation of particle size.

  • size_range_um (tuple of (float, float), optional) – If provided, randomly sample particle sizes from this range. Overrides mean_size_um and size_variation_um.

  • reference_size_um (float, default=50.0) – Reference particle size for baseline calculations.

  • wavelength_exponent (float, default=1.5) – Exponent for wavelength dependence (higher = finer particles). - 4.0 = Rayleigh regime (particles << wavelength) - 1.0-2.0 = Typical for NIR powder samples - 0.0 = No wavelength dependence

  • size_effect_strength (float, default=0.1) – Overall strength of the scattering effect (0-1).

  • include_path_length (bool, default=True) – Whether to include path length effects (multiplicative).

  • path_length_sensitivity (float, default=0.5) – How strongly particle size affects effective path length.

  • random_state (int, optional) – Random seed for reproducibility.

_requires_wavelengths

Always True - this operator requires wavelength information.

Type:

bool

Examples

>>> from nirs4all.operators.augmentation import ParticleSizeAugmenter
>>> aug = ParticleSizeAugmenter(mean_size_um=30.0)
>>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Random particle size in pipeline
>>> aug = ParticleSizeAugmenter(size_range_um=(20, 100))
>>> pipeline = [aug, PLSRegression(10)]

References

  • Dahm & Dahm (2007). Interpreting Diffuse Reflectance and Transmittance.

set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') ParticleSizeAugmenter

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

wavelengths (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for wavelengths parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]

Apply particle size effects to spectra.

Parameters:
Returns:

X_transformed – Spectra with particle size effects applied.

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.augmentation.Random_X_Operation(apply_on='global', random_state=None, *, copy=True, operator_func=<built-in function mul>, operator_range=(0.97, 1.03))[source]

Bases: Augmenter

Class for applying random operation on data augmentation.

Parameters:
  • apply_on (str, optional) – Apply augmentation on “features” or “samples” data. Default is “features”.

  • random_state (int or None, optional) – Random seed for reproducibility. Default is None.

  • copy (bool, optional) – If True, creates a copy of the input data. Default is True.

  • operator_func (function, optional) – Operator function to be applied. Default is operator.mul.

  • operator_range (tuple, optional) – Range for generating random values for the operator. Default is (0.97, 1.03).

augment(X, apply_on='global')[source]

Augment the data by applying random operation.

Parameters:
  • X (ndarray) – Input data to be augmented.

  • apply_on (str, optional) – Apply augmentation on “features” or “samples” data. Default is “features”.

Returns:

Augmented data.

Return type:

ndarray

class nirs4all.operators.augmentation.Rotate_Translate(apply_on='samples', random_state=None, *, copy=True, p_range=2, y_factor=3)[source]

Bases: Augmenter

Class for rotating and translating data augmentation.

Vectorized implementation that processes all samples in batch.

Parameters:
  • apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.

  • random_state (int or None, optional) – Random seed for reproducibility. Default is None.

  • copy (bool, optional) – If True, creates a copy of the input data. Default is True.

  • p_range (int, optional) – Range for generating random slope values. Default is 2.

  • y_factor (int, optional) – Scaling factor for the initial value. Default is 3.

augment(X, apply_on='samples')[source]

Augment the data by rotating and translating the signal.

Vectorized implementation using NumPy broadcasting.

Parameters:
  • X (ndarray) – Input data to be augmented, shape (n_samples, n_features).

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.

Returns:

Augmented data.

Return type:

ndarray

class nirs4all.operators.augmentation.Spline_Curve_Simplification(apply_on='samples', random_state=None, *, copy=True, spline_points=None, uniform=False)[source]

Bases: Augmenter

Class to simplify a 1D signal using B-spline interpolation along the curve.

Optimized implementation with pre-allocated output arrays.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).

  • spline_points (int, optional) – Number of spline points for simplification. Default is None: the length of the sample / 4.

  • uniform (bool, optional) – If True, the spline points are uniformly spaced. Default is False.

augment(X, apply_on='samples')[source]

Select regularly spaced points on the x-axis and adjust a spline.

Optimized with pre-allocated output array.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “features” (default: “samples”).

Returns:

Augmented data.

Return type:

ndarray

class nirs4all.operators.augmentation.Spline_Smoothing(apply_on='samples', random_state=None, *, copy=True)[source]

Bases: Augmenter

Class to apply a smoothing spline to a 1D signal.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).

augment(X, apply_on='samples')[source]

Apply a smoothing spline to the data.

Optimized implementation with pre-allocated output array.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).

Returns:

Augmented data.

Return type:

ndarray

class nirs4all.operators.augmentation.Spline_X_Perturbations(apply_on='samples', random_state=None, *, copy=True, spline_degree=3, perturbation_density=0.05, perturbation_range=(-10, 10))[source]

Bases: Augmenter

Class to apply a perturbation to a 1D signal using B-spline interpolation.

Optimized implementation with pre-generated random parameters.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).

  • spline_degree (int, optional) – Degree of the spline. Default is 3 (cubic).

  • perturbation_density (float, optional) – Density of perturbation points relative to data size. Default is 0.05.

  • perturbation_range (tuple, optional) – Range of perturbation values (min, max). Default is (-10, 10).

augment(X, apply_on='samples')[source]

Augment the data with a perturbation using B-spline interpolation.

Optimized with pre-allocated arrays and batch random generation.

Parameters:
  • X (ndarray) – Input data to be augmented.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.

Returns:

Augmented data.

Return type:

ndarray

class nirs4all.operators.augmentation.Spline_X_Simplification(apply_on='samples', random_state=None, *, copy=True, spline_points=None, uniform=False)[source]

Bases: Augmenter

Class to simplify a 1D signal using B-spline interpolation along the x-axis.

Optimized implementation with pre-generated random parameters.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).

  • spline_points (int, optional) – Number of spline points for simplification. Default is None: the length of the sample / 4.

  • uniform (bool, optional) – If True, the spline points are uniformly spaced. Default is False.

augment(X, apply_on='samples')[source]

Select randomly spaced points along the x-axis and adjust a spline.

Optimized with pre-allocated arrays and batch random generation.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).

Returns:

Augmented data.

Return type:

ndarray

class nirs4all.operators.augmentation.Spline_Y_Perturbations(apply_on='samples', random_state=None, *, copy=True, spline_points=None, perturbation_intensity=0.005)[source]

Bases: Augmenter

Augment the data with a perturbation on the y-axis using B-spline interpolation.

Optimized implementation with pre-generated random parameters.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).

  • spline_points (int, optional) – Number of spline points. Default is None (uses sample length / 2).

  • perturbation_intensity (float, optional) – Intensity of perturbation relative to max value. Default is 0.005.

augment(X, apply_on='samples')[source]

Augment the data with a perturbation on the y-axis using B-spline interpolation.

Optimized with pre-allocated arrays and batch random generation.

Parameters:
  • X (ndarray) – Input data to be augmented.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.

Returns:

Augmented data.

Return type:

ndarray

class nirs4all.operators.augmentation.StrayLightAugmenter(stray_light_fraction: float = 0.001, edge_enhancement: float = 2.0, edge_width: float = 0.1, include_peak_truncation: bool = True, random_state: int | None = None)[source]

Bases: SpectraTransformerMixin

Simulate stray light effects on NIR spectra.

Stray light is unwanted radiation that reaches the detector without passing through the intended optical path. Its effects are most pronounced: - At high-absorbance wavelengths (peaks appear truncated) - At spectral edges where instrument sensitivity is lower - Near the limits of the detector’s wavelength range

The primary effect is a reduction in observed peak height, causing apparent negative deviations from Beer’s law. This is particularly problematic at the edges of spectra where stray light often constitutes a larger fraction of the total signal.

Parameters:
  • stray_light_fraction (float, default=0.001) – Base stray light as fraction of total signal (0.001 = 0.1%). Typical values: 0.0001-0.01 depending on instrument quality.

  • edge_enhancement (float, default=2.0) – Factor by which stray light increases at spectral edges.

  • edge_width (float, default=0.1) – Fraction of spectral range considered “edge” (0-0.5).

  • include_peak_truncation (bool, default=True) – Whether to simulate peak height reduction at high absorbance.

  • random_state (int, optional) – Random seed for reproducibility.

_requires_wavelengths

Always True - this operator requires wavelength information.

Type:

bool

Examples

>>> from nirs4all.operators.augmentation import StrayLightAugmenter
>>> aug = StrayLightAugmenter(stray_light_fraction=0.005)
>>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # High stray light (older/portable instruments)
>>> aug = StrayLightAugmenter(stray_light_fraction=0.01, edge_enhancement=3.0)
>>> pipeline = [aug, MSC(), PLSRegression(10)]

Notes

The observed transmittance with stray light is:

T_obs = (T_true + s) / (1 + s)

where s is the stray light fraction. This causes: - At high absorbance (low T_true): T_obs ≈ s, creating a floor effect - At low absorbance (high T_true): Minimal effect

Converting to absorbance:

A_obs = -log10(T_obs) < A_true

References

  • Applied Optics (1975). Resolution and stray light in near infrared spectroscopy, 14(8), 1977.

  • Chalmers & Griffiths (2001). Mid-Infrared Spectroscopy: Anomalies, Artifacts and Common Errors.

set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') StrayLightAugmenter

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

wavelengths (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for wavelengths parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]

Apply stray light effects to spectra.

Parameters:
Returns:

X_transformed – Spectra with stray light effects applied.

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.augmentation.TemperatureAugmenter(temperature_delta: float = 5.0, temperature_range: Tuple[float, float] | None = None, reference_temperature: float = 25.0, enable_shift: bool = True, enable_intensity: bool = True, enable_broadening: bool = True, region_specific: bool = True, random_state: int | None = None)[source]

Bases: SpectraTransformerMixin

Simulate temperature-induced spectral changes for data augmentation.

Temperature affects NIR spectra through: - Peak position shifts (especially O-H, N-H bands) - Intensity changes (hydrogen bonding disruption) - Band broadening (thermal motion)

This operator applies region-specific temperature effects based on literature values for NIR spectroscopy.

Parameters:
  • temperature_delta (float, default=5.0) – Temperature change from reference (°C). Positive = heating.

  • temperature_range (tuple of (float, float), optional) – If provided, randomly sample temperature_delta from this range for each sample. Overrides temperature_delta parameter.

  • reference_temperature (float, default=25.0) – Reference temperature for the input spectra (°C).

  • enable_shift (bool, default=True) – Apply peak position shifts.

  • enable_intensity (bool, default=True) – Apply intensity changes.

  • enable_broadening (bool, default=True) – Apply band broadening.

  • region_specific (bool, default=True) – Apply region-specific effects (recommended). If False, applies uniform average effects across all wavelengths.

  • random_state (int, optional) – Random seed for reproducibility.

_requires_wavelengths

Always True - this operator requires wavelength information.

Type:

bool

Examples

>>> from nirs4all.operators.augmentation import TemperatureAugmenter
>>> aug = TemperatureAugmenter(temperature_delta=10.0)
>>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Random temperature variation in pipeline
>>> aug = TemperatureAugmenter(temperature_range=(-5, 10))
>>> pipeline = [aug, PLSRegression(10)]

References

  • Maeda et al. (1995). JNIR Spectroscopy, 3(4), 191-201.

  • Segtnan et al. (2001). Analytical Chemistry, 73(13), 3153-3161.

set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') TemperatureAugmenter

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

wavelengths (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for wavelengths parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]

Apply temperature effects to spectra.

Parameters:
Returns:

X_transformed – Spectra with temperature effects applied.

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.augmentation.TruncatedPeakAugmenter(peak_probability: float = 0.3, amplitude_range: Tuple[float, float] = (0.01, 0.1), width_range: Tuple[float, float] = (50, 200), left_edge: bool = True, right_edge: bool = True, random_state: int | None = None)[source]

Bases: SpectraTransformerMixin

Simulate truncated absorption peaks at spectral boundaries.

When measuring NIR spectra, absorption bands that have their centers outside the measured wavelength range will appear as partial peaks at the spectral edges. This creates characteristic rising or falling baselines at the spectrum boundaries.

This effect is common when: - The spectrometer range doesn’t cover the full absorption band - Strong absorbers (e.g., water) have peaks just outside the range - Mid-IR absorption bands tail into the NIR region

Parameters:
  • peak_probability (float, default=0.3) – Probability of adding truncated peaks (0-1).

  • amplitude_range (tuple of (float, float), default=(0.01, 0.1)) – Range of peak amplitudes (in absorbance units).

  • width_range (tuple of (float, float), default=(50, 200)) – Range of peak widths (in nm). Controls how fast the edge rises/falls.

  • left_edge (bool, default=True) – Whether to potentially add truncated peak at left (low wavelength) edge.

  • right_edge (bool, default=True) – Whether to potentially add truncated peak at right (high wavelength) edge.

  • random_state (int, optional) – Random seed for reproducibility.

_requires_wavelengths

Always True - this operator requires wavelength information.

Type:

bool

Examples

>>> from nirs4all.operators.augmentation import TruncatedPeakAugmenter
>>> aug = TruncatedPeakAugmenter(peak_probability=0.5)
>>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Strong truncated peaks (e.g., water band edge)
>>> aug = TruncatedPeakAugmenter(
...     amplitude_range=(0.05, 0.2),
...     width_range=(100, 300)
... )
>>> pipeline = [aug, SNV(), PLSRegression(10)]

Notes

The truncated peak is modeled as a Gaussian band with its center positioned outside the measured wavelength range. Only the “tail” of this band appears in the spectrum.

set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') TruncatedPeakAugmenter

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

wavelengths (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for wavelengths parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]

Apply truncated peak effects to spectra.

Parameters:
Returns:

X_transformed – Spectra with truncated peaks at edges.

Return type:

ndarray of shape (n_samples, n_features)