nirs4all.operators.transforms package

Submodules

Module contents

class nirs4all.operators.transforms.ASLSBaseline(lam: float = 1000000.0, p: float = 0.01, max_iter: int = 50, tol: float = 0.001, *, copy: bool = True)[source]

Bases: _BaselineMethodAlias

Asymmetric Least Squares (AsLS) baseline correction.

Convenience class for ASLS baseline correction. This is equivalent to PyBaselineCorrection(method=’asls’, …).

Parameters:
  • lam (float, default=1e6) – Smoothness parameter (lambda).

  • p (float, default=0.01) – Asymmetry parameter (0 < p < 1).

  • max_iter (int, default=50) – Maximum number of iterations.

  • tol (float, default=1e-3) – Convergence tolerance.

  • copy (bool, default=True) – Whether to copy input data.

References

Eilers, P.H.C. and Boelens, H.F.M. (2005). Baseline Correction with Asymmetric Least Squares Smoothing.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') ASLSBaseline

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

class nirs4all.operators.transforms.AirPLS(lam: float = 1000000.0, max_iter: int = 50, tol: float = 0.001, *, copy: bool = True)[source]

Bases: _BaselineMethodAlias

Adaptive Iteratively Reweighted Penalized Least Squares baseline correction.

A robust baseline correction method that adaptively adjusts weights based on the difference between the fitted baseline and the data.

Parameters:
  • lam (float, default=1e6) – Smoothness parameter. Larger values produce smoother baselines.

  • max_iter (int, default=50) – Maximum number of iterations.

  • tol (float, default=1e-3) – Convergence tolerance.

  • copy (bool, default=True) – Whether to copy input data.

References

Zhang, Z.M., et al. (2010). Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst, 135(5), 1138-1146.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') AirPLS

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

class nirs4all.operators.transforms.ArPLS(lam: float = 1000000.0, max_iter: int = 50, tol: float = 0.001, *, copy: bool = True)[source]

Bases: _BaselineMethodAlias

Asymmetrically Reweighted Penalized Least Squares baseline correction.

Parameters:
  • lam (float, default=1e6) – Smoothness parameter.

  • max_iter (int, default=50) – Maximum number of iterations.

  • tol (float, default=1e-3) – Convergence tolerance.

  • copy (bool, default=True) – Whether to copy input data.

References

Baek, S.J., et al. (2015). Baseline correction using asymmetrically reweighted penalized least squares smoothing. Analyst, 140(1), 250-257.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') ArPLS

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

class nirs4all.operators.transforms.Augmenter(apply_on='samples', random_state=None, *, copy=True)[source]

Bases: TransformerMixin, BaseEstimator

Base class for data augmentation transformers.

abstractmethod augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

fit(X, y=None)[source]

Fit to data.

Parameters:
  • X (array-like) – Input data to fit.

  • y (array-like or None) – Target variable (unused).

Returns:

self – Returns the instance itself.

Return type:

object

fit_transform(X, y=None, **fit_params)[source]

Fit to data and transform it.

Parameters:
  • X (array-like) – Input data to fit and transform.

  • y (array-like or None) – Target variable (unused).

  • **fit_params (dict) – Additional fitting parameters (unused).

Returns:

Transformed data.

Return type:

array-like

transform(X)[source]

Transform the input data by applying data augmentation.

Parameters:

X (array-like) – Input data to transform.

Returns:

Transformed data after augmentation.

Return type:

array-like

class nirs4all.operators.transforms.BEADS(lam_0: float = 1.0, lam_1: float = 1.0, lam_2: float = 1.0, max_iter: int = 50, tol: float = 0.01, *, copy: bool = True)[source]

Bases: _BaselineMethodAlias

Baseline Estimation And Denoising with Sparsity.

Simultaneously estimates baseline and removes noise using sparsity constraints.

Parameters:
  • lam_0 (float, default=1.0) – Regularization parameter for the baseline.

  • lam_1 (float, default=1.0) – Regularization parameter for the first derivative.

  • lam_2 (float, default=1.0) – Regularization parameter for the second derivative.

  • max_iter (int, default=50) – Maximum number of iterations.

  • tol (float, default=1e-2) – Convergence tolerance.

  • copy (bool, default=True) – Whether to copy input data.

References

Ning, X., et al. (2014). Chromatogram baseline estimation and denoising using sparsity (BEADS). Chemometrics and Intelligent Laboratory Systems, 139, 156-167.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') BEADS

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

class nirs4all.operators.transforms.BandMasking(apply_on='samples', random_state=None, *, copy=True, n_bands_range: Tuple[int, int] = (1, 3), bandwidth_range: Tuple[int, int] = (5, 20), mode: str = 'interp')[source]

Bases: Augmenter

Masks out bands of the spectrum.

Optimized with pre-generated random parameters.

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.BandPerturbation(apply_on='samples', random_state=None, *, copy=True, n_bands: int = 3, bandwidth_range: Tuple[int, int] = (5, 20), gain_range: Tuple[float, float] = (0.9, 1.1), offset_range: Tuple[float, float] = (-0.01, 0.01))[source]

Bases: Augmenter

Perturbs specific bands of the spectrum.

Optimized with pre-generated random parameters.

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.Baseline(*, copy=True)[source]

Bases: TransformerMixin, BaseEstimator

Removes baseline (mean) from each spectrum.

Parameters:

copy (bool, optional) – Flag to indicate whether to make a copy of the object, by default True.

fit(X, y=None)[source]

Compute the minimum and maximum to be used for later scaling.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The data used to compute the per-feature minimum and maximum used for later scaling along the features axis.

  • y (None) – Ignored.

Returns:

self – Fitted Baseline object.

Return type:

object

inverse_transform(X, y=None)[source]
partial_fit(X, y=None)[source]
transform(X, y=None)[source]
class nirs4all.operators.transforms.CARS(n_components: int = 10, n_sampling_runs: int = 50, n_variables_ratio_start: float = 1.0, n_variables_ratio_end: float = 0.1, cv_folds: int = 5, subset_ratio: float = 0.8, random_state: int | None = None)[source]

Bases: TransformerMixin, BaseEstimator

Competitive Adaptive Reweighted Sampling (CARS) for wavelength selection.

CARS is a variable selection method that iteratively selects important wavelengths by: 1. Fitting PLS models on subsets of samples 2. Calculating variable importance weights from regression coefficients 3. Using exponentially decreasing function to reduce variable count 4. Applying adaptive reweighted sampling based on importance

The method was introduced by Li et al. (2009) and is widely used for NIRS wavelength selection.

Parameters:
  • n_components (int, default=10) – Number of PLS components for the internal PLS model.

  • n_sampling_runs (int, default=50) – Number of Monte-Carlo sampling runs.

  • n_variables_ratio_start (float, default=1.0) – Starting ratio of variables to keep (1.0 = all variables).

  • n_variables_ratio_end (float, default=0.1) – Ending ratio of variables to keep.

  • cv_folds (int, default=5) – Number of cross-validation folds for RMSECV calculation.

  • subset_ratio (float, default=0.8) – Ratio of samples to use in each Monte-Carlo run.

  • random_state (int or None, default=None) – Random seed for reproducibility.

selected_indices_

Indices of selected features/wavelengths.

Type:

ndarray of shape (n_selected,)

selection_mask_

Boolean mask indicating selected features.

Type:

ndarray of shape (n_features,)

n_features_in_

Number of features in input data.

Type:

int

n_features_out_

Number of selected features.

Type:

int

rmsecv_history_

RMSECV values at each iteration.

Type:

ndarray of shape (n_sampling_runs,)

n_variables_history_

Number of variables at each iteration.

Type:

ndarray of shape (n_sampling_runs,)

optimal_run_idx_

Index of the run with minimum RMSECV.

Type:

int

Examples

>>> from nirs4all.operators.transforms import CARS
>>> import numpy as np
>>>
>>> # Spectral data with 200 wavelengths
>>> X = np.random.randn(100, 200)
>>> y = np.random.randn(100)
>>>
>>> # Select informative wavelengths
>>> cars = CARS(n_components=10, n_sampling_runs=30)
>>> cars.fit(X, y)
>>> X_selected = cars.transform(X)
>>> print(f"Selected {X_selected.shape[1]} from {X.shape[1]} wavelengths")

References

Li, H., Liang, Y., Xu, Q., & Cao, D. (2009). Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Analytica Chimica Acta, 648(1), 77-84.

Notes

  • CARS works best with standardized/scaled data

  • The exponential decay function ensures smooth variable reduction

  • Final selection is based on minimum cross-validated RMSECV

__repr__()[source]

String representation of the selector.

fit(X, y=None, wavelengths: ndarray | None = None)[source]

Fit the CARS selector to identify important wavelengths.

Parameters:
Returns:

self – Fitted selector.

Return type:

CARS

get_feature_names_out(input_features=None)[source]

Get output feature names (selected wavelengths as strings).

Parameters:

input_features (array-like of str or None, default=None) – Input feature names. If None, uses indices.

Returns:

feature_names_out – Selected feature names.

Return type:

ndarray of str

get_support(indices: bool = False)[source]

Get a mask or indices of selected features.

Parameters:

indices (bool, default=False) – If True, return indices instead of boolean mask.

Returns:

support – Boolean mask or indices of selected features.

Return type:

ndarray

set_fit_request(*, wavelengths: bool | None | str = '$UNCHANGED$') CARS

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

wavelengths (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for wavelengths parameter in fit.

Returns:

self – The updated object.

Return type:

object

transform(X)[source]

Transform data by selecting only the important wavelengths.

Parameters:

X (array-like of shape (n_samples, n_features)) – Data to transform.

Returns:

X_selected – Data with only selected wavelengths.

Return type:

ndarray of shape (n_samples, n_selected)

class nirs4all.operators.transforms.ChannelDropout(apply_on='samples', random_state=None, *, copy=True, dropout_prob: float = 0.01, mode: str = 'interp')[source]

Bases: Augmenter

Drops individual wavelengths (sets to zero or interpolates).

Optimized with vectorized mask generation.

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.CropTransformer(start: int = 0, end: int = None)[source]

Bases: BaseEstimator, TransformerMixin

fit(X, y=None)[source]
transform(X)[source]
class nirs4all.operators.transforms.Derivate(order=1, delta=1, copy=True)[source]

Bases: TransformerMixin, BaseEstimator

fit(X, y=None)[source]
set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') Derivate

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(X, copy=None)[source]
class nirs4all.operators.transforms.Detrend(bp=0, *, copy=True)[source]

Bases: TransformerMixin, BaseEstimator

Perform spectral detrending to remove linear trend from data.

Parameters:
  • bp (int, optional) – Breakpoints for piecewise linear detrending. Default is 0.

  • copy (bool, optional) – Whether to make a copy of the input data. Default is True.

fit(X, y=None)[source]

Fit the transformer to the data.

Parameters:
Returns:

self – Returns self.

Return type:

object

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') Detrend

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(X, copy=None)[source]

Transform the data by removing linear trend.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The input data.

  • copy (bool or None, optional) – Whether to make a copy of the input data. If None, self.copy is used. Default is None.

Returns:

The transformed data.

Return type:

numpy.ndarray

class nirs4all.operators.transforms.FirstDerivative(delta: float = 1.0, edge_order: int = 2, *, copy: bool = True)[source]

Bases: TransformerMixin, BaseEstimator

First numerical derivative using numpy.gradient.

Parameters:
  • delta (float, default=1.0) – Sampling step along the feature axis.

  • edge_order (int, default=2) – 1 or 2, order of accuracy at the boundaries.

  • copy (bool, default=True) – Whether to copy input.

fit(X, y=None)[source]
set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') FirstDerivative

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(X, copy=None)[source]
class nirs4all.operators.transforms.FlattenPreprocessing(sources: str | int | List[int] = 'all')[source]

Bases: BaseEstimator, TransformerMixin

Flatten the preprocessing dimension of a 3D feature array.

Transforms a 3D array of shape (samples, preprocessings, features) into a 2D array of shape (samples, preprocessings * features) by horizontally concatenating all preprocessing views.

This is useful after feature_augmentation when you want to flatten multiple preprocessing views into a single feature vector for models that expect 2D input.

Parameters:

sources – Which sources to apply the flattening to. - “all” (default): Apply to all sources - List of indices: [0, 2] to apply only to sources 0 and 2 - Single int: Apply to only that source If a source is not in the list, it is passed through unchanged.

Example

>>> # Input: (100, 4, 2151) - 4 preprocessing views of 2151 features each
>>> flattener = FlattenPreprocessing()
>>> output = flattener.transform(X)
>>> # Output: (100, 8604) - 4 * 2151 = 8604 features
>>> # Apply only to specific sources
>>> flattener = FlattenPreprocessing(sources=[0, 2])
>>> # Only sources 0 and 2 will be flattened

Note

  • If input is already 2D, it is returned unchanged.

  • The transformer is stateless (fit does nothing).

fit(X, y=None)[source]

Fit is a no-op for this transformer.

transform(X)[source]

Flatten the preprocessing dimension.

Parameters:

X – Input array. Can be: - 2D array (samples, features): returned unchanged - 3D array (samples, preprocessings, features): flattened to 2D

Returns:

2D numpy array of shape (samples, preprocessings * features).

class nirs4all.operators.transforms.FractionToPercent[source]

Bases: TransformerMixin, BaseEstimator

Convert fractional [0, 1] values to percentage [0, 100] range.

Simply multiplies by 100.

Examples

>>> transformer = FractionToPercent()
>>> X_frac = np.array([[0.5, 0.6], [0.7, 0.8]])
>>> X_pct = transformer.fit_transform(X_frac)
>>> # X_pct = [[50, 60], [70, 80]]
fit(X, y=None)[source]

Fit the transformer.

inverse_transform(X, y=None)[source]

Transform percent to fraction.

transform(X, y=None)[source]

Transform fraction to percent.

class nirs4all.operators.transforms.FromAbsorbance(target_type: str | SignalType = 'reflectance')[source]

Bases: TransformerMixin, BaseEstimator

Convert absorbance to reflectance or transmittance.

Applies the inverse log transform: R/T = 10^(-A)

Parameters:

target_type (str or SignalType) – Output signal type. Valid: “reflectance”, “reflectance%”, “transmittance”, “transmittance%”

Examples

>>> from nirs4all.operators.transforms.signal_conversion import FromAbsorbance
>>> transformer = FromAbsorbance(target_type="reflectance")
>>> A = np.array([[0.301, 0.398], [0.222, 0.301]])
>>> R = transformer.fit_transform(A)
>>> # R ≈ [[0.5, 0.4], [0.6, 0.5]]
fit(X, y=None)[source]

Fit the transformer.

inverse_transform(X, y=None)[source]

Convert back to absorbance.

transform(X, y=None)[source]

Transform absorbance to reflectance/transmittance.

class nirs4all.operators.transforms.Gaussian(order=2, sigma=1, *, copy=True)[source]

Bases: TransformerMixin, BaseEstimator

fit(X, y=None)[source]

Fit the Gaussian filter.

Parameters:
Returns:

self – Returns the instance itself.

Return type:

object

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') Gaussian

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(X, copy=None)[source]

Transform the input data using the Gaussian filter.

Parameters:
  • X (numpy.ndarray) – Input data.

  • copy (bool, default=None) – Whether to make a copy of the input data.

Returns:

Transformed data.

Return type:

numpy.ndarray

class nirs4all.operators.transforms.GaussianAdditiveNoise(apply_on='samples', random_state=None, *, copy=True, sigma: float = 0.01, smoothing_kernel_width: int = 1)[source]

Bases: Augmenter

Adds Gaussian noise to the spectra. X_aug = X + noise

Vectorized implementation using batch convolution.

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.GaussianSmoothingJitter(apply_on='samples', random_state=None, *, copy=True, sigma_range: Tuple[float, float] = (0.5, 2.0), kernel_width: int = 11)[source]

Bases: Augmenter

Applies Gaussian smoothing with random sigma.

Optimized with pre-generated random parameters. Note: Due to per-sample kernel requirements, this still uses a loop but with pre-generated random values.

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.Haar(*, copy: bool = True)[source]

Bases: Wavelet

Shortcut to the Wavelet haar transform.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') Haar

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

class nirs4all.operators.transforms.IASLS(lam: float = 1000000.0, p: float = 0.01, lam_1: float = 0.0001, max_iter: int = 50, tol: float = 0.001, *, copy: bool = True)[source]

Bases: _BaselineMethodAlias

Improved Asymmetric Least Squares baseline correction.

An improvement over ASLS that uses a different weighting scheme.

Parameters:
  • lam (float, default=1e6) – Smoothness parameter.

  • p (float, default=0.01) – Asymmetry parameter.

  • lam_1 (float, default=1e-4) – First derivative smoothing parameter.

  • max_iter (int, default=50) – Maximum number of iterations.

  • tol (float, default=1e-3) – Convergence tolerance.

  • copy (bool, default=True) – Whether to copy input data.

References

He, S., et al. (2014). Baseline correction for Raman spectra using an improved asymmetric least squares method. Analytical Methods, 6(12), 4402-4407.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') IASLS

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

class nirs4all.operators.transforms.IModPoly(poly_order: int = 5, max_iter: int = 250, tol: float = 0.001, *, copy: bool = True)[source]

Bases: _BaselineMethodAlias

Improved Modified Polynomial baseline correction.

A polynomial-based baseline correction that iteratively fits and removes points above the baseline.

Parameters:
  • poly_order (int, default=5) – Polynomial order for fitting.

  • max_iter (int, default=250) – Maximum number of iterations.

  • tol (float, default=1e-3) – Convergence tolerance.

  • copy (bool, default=True) – Whether to copy input data.

References

Zhao, J., et al. (2007). Automated autofluorescence background subtraction algorithm for biomedical Raman spectroscopy. Applied Spectroscopy, 61(11), 1225-1232.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') IModPoly

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

class nirs4all.operators.transforms.IdentityAugmenter(apply_on='samples', random_state=None, *, copy=True)[source]

Bases: Augmenter

An augmenter that returns the input data without any changes.

augment(X, _)[source]

Perform identity augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • _ (str) – Placeholder for unused parameter.

Returns:

Augmented data (same as input data).

Return type:

array-like

nirs4all.operators.transforms.IdentityTransformer

alias of FunctionTransformer

class nirs4all.operators.transforms.IntegerKBinsDiscretizer(n_bins=5, encode='ordinal', strategy='quantile')[source]

Bases: BaseEstimator, TransformerMixin

KBinsDiscretizer qui retourne des entiers au lieu de floats

fit(X, y=None)[source]
inverse_transform(X)[source]
transform(X)[source]
class nirs4all.operators.transforms.KubelkaMunk(source_type: str | SignalType = 'reflectance', epsilon: float = 1e-10)[source]

Bases: TransformerMixin, BaseEstimator

Apply Kubelka-Munk transformation for diffuse reflectance.

The Kubelka-Munk function: F(R) = (1-R)² / (2R)

This is theoretically more appropriate for scattering media (powders) than simple log(1/R), though in NIR the benefit is dataset-dependent.

Parameters:
  • source_type (str or SignalType) – Input signal type. Valid: “reflectance”, “reflectance%”

  • epsilon (float, default=1e-10) – Small value to avoid division by zero

Examples

>>> from nirs4all.operators.transforms.signal_conversion import KubelkaMunk
>>> transformer = KubelkaMunk(source_type="reflectance")
>>> R = np.array([[0.5, 0.4], [0.6, 0.5]])
>>> F_R = transformer.fit_transform(R)
>>> # F_R[0,0] = (1-0.5)² / (2*0.5) = 0.25 / 1 = 0.25
fit(X, y=None)[source]

Fit the transformer.

inverse_transform(X, y=None)[source]

Inverse Kubelka-Munk to recover reflectance.

From F(R) = (1-R)² / (2R), solving for R: R = 1 + F - sqrt(F² + 2F)

transform(X, y=None)[source]

Apply Kubelka-Munk transformation.

class nirs4all.operators.transforms.LinearBaselineDrift(apply_on='samples', random_state=None, *, copy=True, offset_range: Tuple[float, float] = (-0.1, 0.1), slope_range: Tuple[float, float] = (-0.001, 0.001), lambda_axis: ndarray | None = None)[source]

Bases: Augmenter

Adds a linear baseline drift. X_aug = X + a + b * lambda

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.LocalClipping(apply_on='samples', random_state=None, *, copy=True, n_regions: int = 1, width_range: Tuple[int, int] = (5, 20))[source]

Bases: Augmenter

Clips values in a local region to simulate saturation.

Optimized with pre-generated random parameters.

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.LocalMixupAugmenter(apply_on='samples', random_state=None, *, copy=True, alpha: float = 0.2, k_neighbors: int = 5)[source]

Bases: Augmenter

Mixup with nearest neighbors.

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

fit(X, y=None)[source]

Fit to data.

Parameters:
  • X (array-like) – Input data to fit.

  • y (array-like or None) – Target variable (unused).

Returns:

self – Returns the instance itself.

Return type:

object

class nirs4all.operators.transforms.LocalStandardNormalVariate(window=11, pad_mode='reflect', constant_values=0.0, copy=True)[source]

Bases: TransformerMixin, BaseEstimator

Local Standard Normal Variate (LSNV).

Per-sample local normalization with a sliding window along features. For each sample and feature j:

mean_w = mean(X[…, j-w//2 : j+w//2+1]) std_w = std (X[…, j-w//2 : j+w//2+1]) X’[j] = (X[j] - mean_w) / std_w

Parameters:
  • window (int, default=11) – Odd positive window size along features.

  • pad_mode ({'reflect','edge','constant'}, default='reflect') – Padding mode at boundaries.

  • constant_values (float, default=0.0) – Used only if pad_mode=’constant’.

  • copy (bool, default=True) – If False, try in-place.

Notes

  • Operates row-wise (axis=1). Input must be (n_samples, n_features).

  • std_w==0 → divide by 1 to avoid NaN.

fit(X, y=None)[source]
fit_transform(X, y=None)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters. Pass only if the estimator accepts additional params in its fit method.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

transform(X)[source]
class nirs4all.operators.transforms.LocalWavelengthWarp(apply_on='samples', random_state=None, *, copy=True, n_control_points: int = 5, max_shift: float = 1.0, lambda_axis: ndarray | None = None)[source]

Bases: Augmenter

Applies a non-linear warp to the wavelength axis.

Optimized implementation with pre-computed control points.

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.LogTransform(base: float = 2.718281828459045, offset: float = 0.0, auto_offset: bool = True, min_value: float = 1e-08, *, copy: bool = True)[source]

Bases: TransformerMixin, BaseEstimator

Elementwise logarithm with automatic handling of edge cases.

Parameters:
  • base (float, default=np.e) – Logarithm base.

  • offset (float, default=0.0) – Fixed value added before log to handle non-positives.

  • auto_offset (bool, default=True) – If True, automatically add offset to handle zeros/negatives.

  • min_value (float, default=1e-8) – Minimum value after offset when auto_offset=True.

  • copy (bool, default=True) – Whether to copy input.

fit(X, y=None)[source]
inverse_transform(X)[source]

Exact inverse of the forward transform.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') LogTransform

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(X, copy=None)[source]
class nirs4all.operators.transforms.MCUVE(n_components: int = 10, n_iterations: int = 100, subset_ratio: float = 0.8, n_noise_variables: int | None = None, threshold_method: Literal['percentile', 'fixed', 'auto'] = 'auto', threshold_percentile: float = 99, threshold_value: float = 2.0, random_state: int | None = None)[source]

Bases: TransformerMixin, BaseEstimator

Monte-Carlo Uninformative Variable Elimination (MC-UVE) for wavelength selection.

MC-UVE identifies uninformative variables by comparing the stability of regression coefficients between real variables and random noise variables. Variables with low stability (similar to noise) are eliminated.

The method works by: 1. Augmenting X with noise variables (same distribution as X) 2. Performing multiple PLS fits on bootstrap samples 3. Calculating stability (mean/std) of regression coefficients 4. Selecting variables with stability significantly higher than noise

Parameters:
  • n_components (int, default=10) – Number of PLS components for the internal PLS model.

  • n_iterations (int, default=100) – Number of Monte-Carlo iterations (bootstrap samples).

  • subset_ratio (float, default=0.8) – Ratio of samples to use in each bootstrap iteration.

  • n_noise_variables (int or None, default=None) – Number of noise variables to add. If None, uses n_features.

  • threshold_method ({'percentile', 'fixed', 'auto'}, default='auto') – Method to determine selection threshold: - ‘percentile’: Use percentile of noise stability as threshold - ‘fixed’: Use fixed stability threshold - ‘auto’: Automatically select based on noise distribution

  • threshold_percentile (float, default=99) – Percentile of noise stability used as threshold (for ‘percentile’ method).

  • threshold_value (float, default=2.0) – Fixed stability threshold value (for ‘fixed’ method).

  • random_state (int or None, default=None) – Random seed for reproducibility.

selected_indices_

Indices of selected features/wavelengths.

Type:

ndarray of shape (n_selected,)

selection_mask_

Boolean mask indicating selected features.

Type:

ndarray of shape (n_features,)

n_features_in_

Number of features in input data.

Type:

int

n_features_out_

Number of selected features.

Type:

int

stability_

Stability values for each real variable.

Type:

ndarray of shape (n_features,)

noise_stability_

Stability values for noise variables.

Type:

ndarray of shape (n_noise_variables,)

threshold_

Threshold value used for selection.

Type:

float

mean_coefs_

Mean regression coefficients across iterations.

Type:

ndarray of shape (n_features,)

std_coefs_

Standard deviation of coefficients across iterations.

Type:

ndarray of shape (n_features,)

Examples

>>> from nirs4all.operators.transforms import MCUVE
>>> import numpy as np
>>>
>>> # Spectral data with 200 wavelengths
>>> X = np.random.randn(100, 200)
>>> y = np.random.randn(100)
>>>
>>> # Select informative wavelengths
>>> mcuve = MCUVE(n_components=10, n_iterations=100)
>>> mcuve.fit(X, y)
>>> X_selected = mcuve.transform(X)
>>> print(f"Selected {X_selected.shape[1]} from {X.shape[1]} wavelengths")

References

Cai, W., Li, Y., & Shao, X. (2008). A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra. Chemometrics and Intelligent Laboratory Systems, 90(2), 188-194.

Notes

  • MC-UVE is robust against random noise

  • Higher stability indicates more informative variables

  • The noise comparison ensures a principled selection threshold

__repr__()[source]

String representation of the selector.

fit(X, y=None, wavelengths: ndarray | None = None)[source]

Fit the MC-UVE selector to identify important wavelengths.

Parameters:
Returns:

self – Fitted selector.

Return type:

MCUVE

get_feature_names_out(input_features=None)[source]

Get output feature names (selected wavelengths as strings).

Parameters:

input_features (array-like of str or None, default=None) – Input feature names. If None, uses indices.

Returns:

feature_names_out – Selected feature names.

Return type:

ndarray of str

get_support(indices: bool = False)[source]

Get a mask or indices of selected features.

Parameters:

indices (bool, default=False) – If True, return indices instead of boolean mask.

Returns:

support – Boolean mask or indices of selected features.

Return type:

ndarray

set_fit_request(*, wavelengths: bool | None | str = '$UNCHANGED$') MCUVE

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

wavelengths (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for wavelengths parameter in fit.

Returns:

self – The updated object.

Return type:

object

transform(X)[source]

Transform data by selecting only the important wavelengths.

Parameters:

X (array-like of shape (n_samples, n_features)) – Data to transform.

Returns:

X_selected – Data with only selected wavelengths.

Return type:

ndarray of shape (n_samples, n_selected)

class nirs4all.operators.transforms.MixupAugmenter(apply_on='samples', random_state=None, *, copy=True, alpha: float = 0.2)[source]

Bases: Augmenter

Mixup augmentation. Note: This modifies both X and y. Standard transform() only returns X.

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.ModPoly(poly_order: int = 5, max_iter: int = 250, tol: float = 0.001, *, copy: bool = True)[source]

Bases: _BaselineMethodAlias

Modified Polynomial baseline correction.

Parameters:
  • poly_order (int, default=5) – Polynomial order for fitting.

  • max_iter (int, default=250) – Maximum number of iterations.

  • tol (float, default=1e-3) – Convergence tolerance.

  • copy (bool, default=True) – Whether to copy input data.

References

Lieber, C.A. and Mahadevan-Jansen, A. (2003). Automated method for subtraction of fluorescence from biological Raman spectra. Applied Spectroscopy, 57(11), 1363-1367.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') ModPoly

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

class nirs4all.operators.transforms.MultiplicativeNoise(apply_on='samples', random_state=None, *, copy=True, sigma_gain: float = 0.05, per_wavelength: bool = False)[source]

Bases: Augmenter

Multiplies spectra by a random gain factor. X_aug = (1 + epsilon) * X

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.MultiplicativeScatterCorrection(scale=True, *, copy=True)[source]

Bases: TransformerMixin, BaseEstimator

fit(X, y=None)[source]
inverse_transform(X)[source]
partial_fit(X, y=None)[source]
transform(X)[source]
class nirs4all.operators.transforms.Normalize(feature_range=(-1, 1), *, copy=True)[source]

Bases: TransformerMixin, BaseEstimator

Normalize spectrum using either custom range of linalg normalization

Parameters:
  • feature_range (tuple (min, max), default=(-1, -1)) – Desired range of transformed data. If range min and max equals -1, linalg normalization is applied, otherwise user defined normalization is applied

  • copy (bool, default=True) – Set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array).

fit(X, y=None)[source]

Fit the Normalize transformer on the training data.

Parameters:
Returns:

self – Returns the instance itself.

Return type:

object

inverse_transform(X)[source]

Transform the normalized data back to the original representation.

Parameters:

X (array-like of shape (n_samples, n_features)) – The normalized data to be transformed back.

Returns:

X – The inverse transformed data.

Return type:

ndarray of shape (n_samples, n_features)

partial_fit(X, y=None)[source]

Perform incremental fit on the training data.

Parameters:
Returns:

self – Returns the instance itself.

Return type:

object

transform(X)[source]

Transform the input data.

Parameters:

X (array-like of shape (n_samples, n_features)) – The input data to be transformed.

Returns:

X – The transformed data.

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.transforms.PercentToFraction[source]

Bases: TransformerMixin, BaseEstimator

Convert percentage values to fractional [0, 1] range.

Simply divides by 100.

Examples

>>> transformer = PercentToFraction()
>>> X_pct = np.array([[50, 60], [70, 80]])
>>> X_frac = transformer.fit_transform(X_pct)
>>> # X_frac = [[0.5, 0.6], [0.7, 0.8]]
fit(X, y=None)[source]

Fit the transformer.

inverse_transform(X, y=None)[source]

Transform fraction to percent.

transform(X, y=None)[source]

Transform percent to fraction.

class nirs4all.operators.transforms.PolynomialBaselineDrift(apply_on='samples', random_state=None, *, copy=True, degree: int = 3, coeff_ranges: List[Tuple[float, float]] | None = None, lambda_axis: ndarray | None = None)[source]

Bases: Augmenter

Adds a polynomial baseline drift.

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.PyBaselineCorrection(method: str = 'asls', *, copy: bool = True, **method_params)[source]

Bases: TransformerMixin, BaseEstimator

General baseline correction using pybaselines library.

A flexible wrapper for the pybaselines library that provides access to numerous baseline correction algorithms. This transformer allows easy integration of any pybaselines method into sklearn pipelines.

Parameters:
  • method (str, default='asls') –

    The baseline correction method to use. Available methods by category:

    Whittaker-based (smooth baselines with asymmetric weighting):
    • ’asls’: Asymmetric Least Squares

    • ’iasls’: Improved Asymmetric Least Squares

    • ’airpls’: Adaptive Iteratively Reweighted PLS

    • ’arpls’: Asymmetrically Reweighted PLS

    • ’drpls’: Doubly Reweighted PLS

    • ’iarpls’: Improved ARPLS

    • ’aspls’: Adaptive Smoothness PLS

    • ’psalsa’: Peaked Signal’s Asymmetric Least Squares

    • ’derpsalsa’: Derivative PSALSA

    Polynomial (polynomial fitting):
    • ’poly’: Regular polynomial

    • ’modpoly’: Modified polynomial

    • ’imodpoly’: Improved modified polynomial

    • ’penalized_poly’: Penalized polynomial

    • ’loess’: Locally estimated scatterplot smoothing

    • ’quant_reg’: Quantile regression

    Morphological (morphological operations):
    • ’mor’: Morphological

    • ’imor’: Improved morphological

    • ’mormol’: Morphological and mollified

    • ’amormol’: Averaging morphological and mollified

    • ’rolling_ball’: Rolling ball algorithm

    • ’mwmv’: Moving window minimum value

    • ’tophat’: Top-hat transform

    • ’mpspline’: Morphological penalized spline

    • ’jbcd’: Joint baseline correction and denoising

    Spline (spline-based methods):
    • ’mixture_model’: Mixture model

    • ’irsqr’: Iteratively reweighted spline quantile regression

    • ’corner_cutting’: Corner-cutting

    • ’pspline_asls’, ‘pspline_iasls’, ‘pspline_airpls’, etc.

    Smooth (smoothing-based):
    • ’noise_median’: Noise median

    • ’snip’: Statistics-sensitive Non-linear Iterative Peak-clipping

    • ’swima’: Small-Window Moving Average

    • ’ipsa’: Iterative Polynomial Smoothing Algorithm

    Misc:
    • ’beads’: Baseline estimation and denoising with sparsity

    • ’interp_pts’: Interpolation between points

  • copy (bool, default=True) – Whether to copy input data.

  • **method_params (dict) – Additional parameters passed to the specific baseline method. Common parameters include: - lam (float): Smoothness parameter for Whittaker methods - p (float): Asymmetry parameter for ASLS-type methods - poly_order (int): Polynomial order for polynomial methods - max_half_window (int): Window size for morphological/smooth methods - max_iter (int): Maximum iterations - tol (float): Convergence tolerance

n_features_in_

Number of features seen during fit.

Type:

int

Examples

>>> from nirs4all.operators.transforms.nirs import PyBaselineCorrection
>>> import numpy as np

Basic usage with ASLS: >>> transformer = PyBaselineCorrection(method=’asls’, lam=1e6, p=0.01) >>> corrected = transformer.fit_transform(spectra)

Using airPLS: >>> transformer = PyBaselineCorrection(method=’airpls’, lam=1e5) >>> corrected = transformer.fit_transform(spectra)

Using improved modified polynomial: >>> transformer = PyBaselineCorrection(method=’imodpoly’, poly_order=3) >>> corrected = transformer.fit_transform(spectra)

Using SNIP for Raman-like data: >>> transformer = PyBaselineCorrection(method=’snip’, max_half_window=40) >>> corrected = transformer.fit_transform(spectra)

Using rolling ball: >>> transformer = PyBaselineCorrection(method=’rolling_ball’, half_window=50) >>> corrected = transformer.fit_transform(spectra)

In a pipeline: >>> from sklearn.pipeline import Pipeline >>> from sklearn.preprocessing import StandardScaler >>> pipeline = Pipeline([ … (‘baseline’, PyBaselineCorrection(method=’airpls’, lam=1e5)), … (‘scale’, StandardScaler()), … ])

References

pybaselines documentation: https://pybaselines.readthedocs.io/

fit(X, y=None)[source]

Fit the transformer (validates method and stores number of features).

Parameters:
Returns:

self – Fitted transformer.

Return type:

PyBaselineCorrection

get_params(deep=True)[source]

Get parameters for this estimator.

static list_methods()[source]

List all available baseline correction methods.

Returns:

Dictionary with method categories as keys and list of methods as values.

Return type:

dict

set_params(**params)[source]

Set parameters for this estimator.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') PyBaselineCorrection

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(X, copy=None)[source]

Apply baseline correction to the data.

Parameters:
Returns:

X_corrected – Baseline-corrected spectra.

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.transforms.Random_X_Operation(apply_on='global', random_state=None, *, copy=True, operator_func=<built-in function mul>, operator_range=(0.97, 1.03))[source]

Bases: Augmenter

Class for applying random operation on data augmentation.

Parameters:
  • apply_on (str, optional) – Apply augmentation on “features” or “samples” data. Default is “features”.

  • random_state (int or None, optional) – Random seed for reproducibility. Default is None.

  • copy (bool, optional) – If True, creates a copy of the input data. Default is True.

  • operator_func (function, optional) – Operator function to be applied. Default is operator.mul.

  • operator_range (tuple, optional) – Range for generating random values for the operator. Default is (0.97, 1.03).

augment(X, apply_on='global')[source]

Augment the data by applying random operation.

Parameters:
  • X (ndarray) – Input data to be augmented.

  • apply_on (str, optional) – Apply augmentation on “features” or “samples” data. Default is “features”.

Returns:

Augmented data.

Return type:

ndarray

class nirs4all.operators.transforms.RangeDiscretizer(bins)[source]

Bases: BaseEstimator, TransformerMixin

__sklearn_clone__()[source]

Custom cloning method for sklearn compatibility.

fit(X, y=None)[source]
get_params(deep=True)[source]

Get parameters for this estimator.

inverse_transform(X)[source]
set_params(**params)[source]

Set the parameters of this estimator.

transform(X)[source]
class nirs4all.operators.transforms.ReflectanceToAbsorbance(min_value: float = 1e-08, percent: bool = False, *, copy: bool = True)[source]

Bases: TransformerMixin, BaseEstimator

Convert reflectance spectra to absorbance using Beer-Lambert law.

Applies the transformation: A = -log10(R) = log10(1/R) where R is reflectance and A is absorbance.

This is a fundamental transformation in NIR spectroscopy, as absorbance is linearly related to concentration (Beer-Lambert law), while reflectance is not.

Parameters:
  • min_value (float, default=1e-8) – Minimum value to clamp reflectance to avoid log(0). Values below this threshold will be set to min_value before applying the log transform.

  • percent (bool, default=False) – If True, assumes input reflectance is in percentage (0-100) and divides by 100 before conversion.

  • copy (bool, default=True) – Whether to copy input data.

Notes

  • Input reflectance values should be positive.

  • For reflectance in range (0, 1], output absorbance is non-negative.

  • For reflectance > 1 (e.g., percentage values), set percent=True.

Examples

>>> from nirs4all.operators.transforms.nirs import ReflectanceToAbsorbance
>>> import numpy as np
>>> R = np.array([[0.5, 0.25, 0.1], [0.8, 0.4, 0.2]])
>>> transformer = ReflectanceToAbsorbance()
>>> A = transformer.fit_transform(R)
>>> # A ≈ [[0.301, 0.602, 1.0], [0.097, 0.398, 0.699]]
fit(X, y=None)[source]

Fit the transformer (no-op, included for API compatibility).

Parameters:
Returns:

self – Fitted transformer.

Return type:

ReflectanceToAbsorbance

inverse_transform(X)[source]

Convert absorbance back to reflectance.

Parameters:

X (array-like of shape (n_samples, n_features)) – Absorbance spectra.

Returns:

X_reflectance – Reflectance spectra.

Return type:

ndarray of shape (n_samples, n_features)

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') ReflectanceToAbsorbance

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(X, copy=None)[source]

Convert reflectance to absorbance.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Reflectance spectra.

  • copy (bool or None, optional) – Whether to copy the input data.

Returns:

X_transformed – Absorbance spectra.

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.transforms.ResampleTransformer(num_samples: int)[source]

Bases: BaseEstimator, TransformerMixin

fit(X, y=None)[source]
transform(X)[source]
class nirs4all.operators.transforms.Resampler(target_wavelengths: ndarray, method: Literal['linear', 'nearest', 'cubic', 'quadratic', 'slinear', 'zero'] = 'linear', crop_range: Tuple[float, float] | None = None, fill_value: float | str = 0.0, bounds_error: bool = False, copy: bool = True)[source]

Bases: TransformerMixin, BaseEstimator

Resample spectral data to new wavelength grid using interpolation.

This transformer interpolates NIRS spectral data from the original wavelength grid to a target wavelength grid using scipy interpolation methods.

Parameters:
  • target_wavelengths (array-like) – Target wavelengths for resampling. Must be 1D array.

  • method (str, default='linear') – Interpolation method. Supported methods: - ‘linear’: Linear interpolation - ‘nearest’: Nearest neighbor interpolation - ‘cubic’: Cubic spline interpolation - ‘quadratic’: Quadratic spline interpolation - ‘slinear’: Linear spline (order 1) - ‘zero’: Zero-order spline (piecewise constant) Future: May support additional scipy methods

  • crop_range (tuple of (float, float) or None, default=None) – Optional (min_wavelength, max_wavelength) to crop original data before resampling.

  • fill_value (float or 'extrapolate', default=0.0) – Value to use for target wavelengths outside the original range. - float: Use this constant value for extrapolation - ‘extrapolate’: Extrapolate using the interpolation method - 0.0: Default padding with zeros (safe choice)

  • bounds_error (bool, default=False) – If True, raise error when target wavelengths are outside original range. If False, use fill_value for out-of-bounds points.

  • copy (bool, default=True) – Whether to copy input data or modify in place.

original_wavelengths_

Original wavelength grid from fit data

Type:

ndarray of shape (n_features,)

n_features_in_

Number of features (wavelengths) in input data

Type:

int

n_features_out_

Number of features (wavelengths) in output data

Type:

int

interpolator_params_

Stored interpolation parameters for reconstruction

Type:

dict

Examples

>>> from nirs4all.operators.transforms import Resampler
>>> import numpy as np
>>>
>>> # Original data at 1000-2500 nm with 200 points
>>> X = np.random.randn(100, 200)
>>> original_wl = np.linspace(1000, 2500, 200)
>>>
>>> # Resample to 100 evenly-spaced wavelengths
>>> target_wl = np.linspace(1000, 2500, 100)
>>> resampler = Resampler(target_wavelengths=target_wl, method='cubic')
>>> resampler.fit(X, wavelengths=original_wl)
>>> X_resampled = resampler.transform(X)
>>> X_resampled.shape
(100, 100)

Notes

  • Wavelengths must be strictly increasing

  • Warns if target wavelengths extend beyond original range

  • Raises error if no wavelengths overlap between original and target

__repr__()[source]

String representation of the resampler.

fit(X, y=None, wavelengths: ndarray | None = None)[source]

Fit the resampler by storing original wavelength grid.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (None) – Ignored. Present for API consistency.

  • wavelengths (array-like of shape (n_features,), optional) – Original wavelength grid. If None, will be extracted from dataset headers by the controller.

Returns:

self – Fitted resampler.

Return type:

Resampler

get_feature_names_out(input_features=None)[source]

Get output feature names (target wavelengths as strings).

Parameters:

input_features (array-like of str or None, default=None) – Ignored. Present for API consistency.

Returns:

feature_names_out – Target wavelengths as strings.

Return type:

ndarray of str

set_fit_request(*, wavelengths: bool | None | str = '$UNCHANGED$') Resampler

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

wavelengths (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for wavelengths parameter in fit.

Returns:

self – The updated object.

Return type:

object

transform(X)[source]

Resample spectral data to target wavelength grid.

Parameters:

X (array-like of shape (n_samples, n_features)) – Spectral data to resample. Should have same number of features as training data.

Returns:

X_resampled – Resampled spectral data.

Return type:

ndarray of shape (n_samples, n_features_out_)

class nirs4all.operators.transforms.RobustStandardNormalVariate(axis=1, with_center=True, with_scale=True, k=1.4826, copy=True)[source]

Bases: TransformerMixin, BaseEstimator

Robust Standard Normal Variate (RSNV).

Per-sample robust centering and scaling using median and MAD:

med = median(X, axis=1, keepdims=True) mad = median(|X - med|, axis=1, keepdims=True) X’ = (X - med) / (k * mad)

Parameters:
  • axis (int, default=1) – 1 for row-wise (spectroscopy default). 0 for column-wise.

  • with_center (bool, default=True) – If True, subtract median.

  • with_scale (bool, default=True) – If True, divide by k * MAD.

  • k (float, default=1.4826) – Consistency constant to make MAD a robust estimator of std for Gaussian data.

  • copy (bool, default=True) – If False, try in-place.

Notes

  • MAD==0 → divide by 1 to avoid NaN.

fit(X, y=None)[source]
fit_transform(X, y=None)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters. Pass only if the estimator accepts additional params in its fit method.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

transform(X)[source]
class nirs4all.operators.transforms.RollingBall(half_window: int = 50, smooth_half_window: int = None, *, copy: bool = True)[source]

Bases: _BaselineMethodAlias

Rolling Ball baseline correction.

A morphological approach that simulates rolling a ball beneath the spectrum.

Parameters:
  • half_window (int, default=50) – Half-window size for the rolling ball.

  • smooth_half_window (int or None, default=None) – Half-window for smoothing. None means no smoothing.

  • copy (bool, default=True) – Whether to copy input data.

References

Kneen, M.A. and Annegarn, H.J. (1996). Algorithm for fitting XRF, SEM and PIXE X-ray spectra backgrounds. Nuclear Instruments and Methods in Physics Research B, 109, 209-213.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') RollingBall

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

class nirs4all.operators.transforms.Rotate_Translate(apply_on='samples', random_state=None, *, copy=True, p_range=2, y_factor=3)[source]

Bases: Augmenter

Class for rotating and translating data augmentation.

Vectorized implementation that processes all samples in batch.

Parameters:
  • apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.

  • random_state (int or None, optional) – Random seed for reproducibility. Default is None.

  • copy (bool, optional) – If True, creates a copy of the input data. Default is True.

  • p_range (int, optional) – Range for generating random slope values. Default is 2.

  • y_factor (int, optional) – Scaling factor for the initial value. Default is 3.

augment(X, apply_on='samples')[source]

Augment the data by rotating and translating the signal.

Vectorized implementation using NumPy broadcasting.

Parameters:
  • X (ndarray) – Input data to be augmented, shape (n_samples, n_features).

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.

Returns:

Augmented data.

Return type:

ndarray

class nirs4all.operators.transforms.SNIP(max_half_window: int = 40, decreasing: bool = True, smooth_half_window: int = None, *, copy: bool = True)[source]

Bases: _BaselineMethodAlias

Statistics-sensitive Non-linear Iterative Peak-clipping baseline correction.

Particularly effective for spectra with many peaks (e.g., Raman, XRF).

Parameters:
  • max_half_window (int, default=40) – Maximum half-window size for the algorithm.

  • decreasing (bool, default=True) – Whether to use decreasing window sizes.

  • smooth_half_window (int or None, default=None) – Half-window for smoothing. None means no smoothing.

  • copy (bool, default=True) – Whether to copy input data.

References

Ryan, C.G., et al. (1988). SNIP, a statistics-sensitive background treatment for the quantitative analysis of PIXE spectra in geoscience applications. Nuclear Instruments and Methods in Physics Research B, 34(3), 396-402.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') SNIP

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

class nirs4all.operators.transforms.SavitzkyGolay(window_length: int = 11, polyorder: int = 3, deriv: int = 0, delta: float = 1.0, *, copy: bool = True)[source]

Bases: TransformerMixin, BaseEstimator

A class for smoothing and differentiating data using the Savitzky-Golay filter.

Parameters:

window_lengthint, optional (default=11)

The length of the window used for smoothing.

polyorderint, optional (default=3)

The order of the polynomial used for fitting the samples within the window.

derivint, optional (default=0)

The order of the derivative to compute.

deltafloat, optional (default=1.0)

The sampling distance of the data.

copybool, optional (default=True)

Whether to copy the input data.

Methods:

fit(X, y=None)

Fits the transformer to the data X.

transform(X, copy=None)

Applies the Savitzky-Golay filter to the data X.

fit(X, y=None)[source]

Verify the X data compliance with Savitzky-Golay filter.

Parameters:
  • X (array-like) – The data to transform.

  • y (None) – Ignored.

Raises:

ValueError – If the input X is a sparse matrix.

Returns:

The fitted object.

Return type:

SavitzkyGolay

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') SavitzkyGolay

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(X, copy=None)[source]

Apply the Savitzky-Golay filter to the data X.

Parameters:
  • X (array-like) – The data to transform.

  • copy (bool or None, optional) – Whether to copy the input data.

Returns:

The transformed data.

Return type:

numpy.ndarray

class nirs4all.operators.transforms.ScatterSimulationMSC(apply_on='samples', random_state=None, *, copy=True, reference_mode: str = 'self', a_range: Tuple[float, float] = (-0.1, 0.1), b_range: Tuple[float, float] = (0.9, 1.1))[source]

Bases: Augmenter

Simulates scatter variation: x_aug = a + b * x

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

fit(X, y=None)[source]

Fit to data.

Parameters:
  • X (array-like) – Input data to fit.

  • y (array-like or None) – Target variable (unused).

Returns:

self – Returns the instance itself.

Return type:

object

class nirs4all.operators.transforms.SecondDerivative(delta: float = 1.0, edge_order: int = 2, *, copy: bool = True)[source]

Bases: TransformerMixin, BaseEstimator

Second numerical derivative using numpy.gradient.

Parameters:
  • delta (float, default=1.0) – Sampling step along the feature axis.

  • edge_order (int, default=2) – 1 or 2, order of accuracy at the boundaries.

  • copy (bool, default=True) – Whether to copy input.

fit(X, y=None)[source]
set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') SecondDerivative

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(X, copy=None)[source]
class nirs4all.operators.transforms.SignalTypeConverter(source_type: str | SignalType = 'reflectance', target_type: str | SignalType = 'absorbance', epsilon: float = 1e-10)[source]

Bases: TransformerMixin, BaseEstimator

General-purpose signal type converter.

Automatically determines the conversion path between source and target signal types and applies the appropriate transformation.

Parameters:
  • source_type (str or SignalType) – Input signal type

  • target_type (str or SignalType) – Output signal type

  • epsilon (float, default=1e-10) – Small value to avoid numerical issues

Examples

>>> from nirs4all.operators.transforms.signal_conversion import SignalTypeConverter
>>> converter = SignalTypeConverter(
...     source_type="reflectance%",
...     target_type="absorbance"
... )
>>> R_pct = np.array([[50, 40], [60, 50]])
>>> A = converter.fit_transform(R_pct)
fit(X, y=None)[source]

Fit the converter by determining the conversion path.

inverse_transform(X, y=None)[source]

Apply inverse transformation.

transform(X, y=None)[source]

Apply the conversion transformation.

class nirs4all.operators.transforms.SimpleScale(copy=True)[source]

Bases: TransformerMixin, BaseEstimator

fit(X, y=None)[source]
inverse_transform(X)[source]
partial_fit(X, y=None)[source]
transform(X)[source]
class nirs4all.operators.transforms.SmoothMagnitudeWarp(apply_on='samples', random_state=None, *, copy=True, n_control_points: int = 5, gain_range: Tuple[float, float] = (0.9, 1.1), lambda_axis: ndarray | None = None)[source]

Bases: Augmenter

Multiplies the spectrum by a smooth curve.

Optimized implementation with pre-computed control points.

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.SpikeNoise(apply_on='samples', random_state=None, *, copy=True, n_spikes_range: Tuple[int, int] = (1, 3), amplitude_range: Tuple[float, float] = (-0.5, 0.5))[source]

Bases: Augmenter

Adds spikes to the spectrum.

Optimized with pre-generated random parameters.

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.Spline_Curve_Simplification(apply_on='samples', random_state=None, *, copy=True, spline_points=None, uniform=False)[source]

Bases: Augmenter

Class to simplify a 1D signal using B-spline interpolation along the curve.

Optimized implementation with pre-allocated output arrays.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).

  • spline_points (int, optional) – Number of spline points for simplification. Default is None: the length of the sample / 4.

  • uniform (bool, optional) – If True, the spline points are uniformly spaced. Default is False.

augment(X, apply_on='samples')[source]

Select regularly spaced points on the x-axis and adjust a spline.

Optimized with pre-allocated output array.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “features” (default: “samples”).

Returns:

Augmented data.

Return type:

ndarray

class nirs4all.operators.transforms.Spline_Smoothing(apply_on='samples', random_state=None, *, copy=True)[source]

Bases: Augmenter

Class to apply a smoothing spline to a 1D signal.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).

augment(X, apply_on='samples')[source]

Apply a smoothing spline to the data.

Optimized implementation with pre-allocated output array.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).

Returns:

Augmented data.

Return type:

ndarray

class nirs4all.operators.transforms.Spline_X_Perturbations(apply_on='samples', random_state=None, *, copy=True, spline_degree=3, perturbation_density=0.05, perturbation_range=(-10, 10))[source]

Bases: Augmenter

Class to apply a perturbation to a 1D signal using B-spline interpolation.

Optimized implementation with pre-generated random parameters.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).

  • spline_degree (int, optional) – Degree of the spline. Default is 3 (cubic).

  • perturbation_density (float, optional) – Density of perturbation points relative to data size. Default is 0.05.

  • perturbation_range (tuple, optional) – Range of perturbation values (min, max). Default is (-10, 10).

augment(X, apply_on='samples')[source]

Augment the data with a perturbation using B-spline interpolation.

Optimized with pre-allocated arrays and batch random generation.

Parameters:
  • X (ndarray) – Input data to be augmented.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.

Returns:

Augmented data.

Return type:

ndarray

class nirs4all.operators.transforms.Spline_X_Simplification(apply_on='samples', random_state=None, *, copy=True, spline_points=None, uniform=False)[source]

Bases: Augmenter

Class to simplify a 1D signal using B-spline interpolation along the x-axis.

Optimized implementation with pre-generated random parameters.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).

  • spline_points (int, optional) – Number of spline points for simplification. Default is None: the length of the sample / 4.

  • uniform (bool, optional) – If True, the spline points are uniformly spaced. Default is False.

augment(X, apply_on='samples')[source]

Select randomly spaced points along the x-axis and adjust a spline.

Optimized with pre-allocated arrays and batch random generation.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).

Returns:

Augmented data.

Return type:

ndarray

class nirs4all.operators.transforms.Spline_Y_Perturbations(apply_on='samples', random_state=None, *, copy=True, spline_points=None, perturbation_intensity=0.005)[source]

Bases: Augmenter

Augment the data with a perturbation on the y-axis using B-spline interpolation.

Optimized implementation with pre-generated random parameters.

Parameters:
  • X (ndarray) – Input data.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).

  • spline_points (int, optional) – Number of spline points. Default is None (uses sample length / 2).

  • perturbation_intensity (float, optional) – Intensity of perturbation relative to max value. Default is 0.005.

augment(X, apply_on='samples')[source]

Augment the data with a perturbation on the y-axis using B-spline interpolation.

Optimized with pre-allocated arrays and batch random generation.

Parameters:
  • X (ndarray) – Input data to be augmented.

  • apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.

Returns:

Augmented data.

Return type:

ndarray

class nirs4all.operators.transforms.StandardNormalVariate(axis=1, with_mean=True, with_std=True, ddof=0, copy=True)[source]

Bases: TransformerMixin, BaseEstimator

Standard Normal Variate (SNV) transformation.

SNV is a row-wise normalization technique commonly used in spectroscopy to remove scatter effects. Each sample (row) is centered and scaled independently.

For each sample: SNV = (X - mean(X)) / std(X)

Parameters:
  • axis (int, default=1) – Axis along which to compute mean and standard deviation. - axis=1: Row-wise (default, standard SNV behavior for spectroscopy) - axis=0: Column-wise (equivalent to StandardScaler)

  • with_mean (bool, default=True) – If True, center the data before scaling.

  • with_std (bool, default=True) – If True, scale the data to unit variance.

  • ddof (int, default=0) – Delta Degrees of Freedom for standard deviation calculation.

  • copy (bool, default=True) – If False, try to avoid a copy and do inplace scaling instead.

Examples

>>> from nirs4all.operators.transforms import StandardNormalVariate
>>> import numpy as np
>>> X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float)
>>> snv = StandardNormalVariate()
>>> X_transformed = snv.fit_transform(X)
fit(X, y=None)[source]

Fit the StandardNormalVariate transformer.

For SNV, this is a no-op as the transformation is computed independently for each sample.

Parameters:
Returns:

self – Returns the instance itself.

Return type:

object

fit_transform(X, y=None)[source]

Fit to data, then transform it.

Parameters:
Returns:

X_transformed – The transformed data.

Return type:

ndarray of shape (n_samples, n_features)

transform(X)[source]

Perform SNV transformation.

Parameters:

X (array-like of shape (n_samples, n_features)) – The input data to be transformed.

Returns:

X_transformed – The transformed data.

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.transforms.ToAbsorbance(source_type: str | SignalType = 'reflectance', epsilon: float = 1e-10, clip_negative: bool = True)[source]

Bases: TransformerMixin, BaseEstimator

Convert reflectance or transmittance to absorbance.

Applies the log transform: A = -log10(X)

For reflectance, this gives “pseudo-absorbance” which is widely used in NIR but is not identical to true absorbance in transmission.

Parameters:
  • source_type (str or SignalType) – Input signal type. If “auto”, attempts to detect. Valid: “reflectance”, “reflectance%”, “transmittance”, “transmittance%”

  • epsilon (float, default=1e-10) – Small value to add to avoid log(0)

  • clip_negative (bool, default=True) – If True, clips negative values to epsilon before log transform

source_type_

Detected or specified source signal type

Type:

SignalType

is_percent_

Whether source was in percent (requires /100)

Type:

bool

Examples

>>> from nirs4all.operators.transforms.signal_conversion import ToAbsorbance
>>> transformer = ToAbsorbance(source_type="reflectance")
>>> R = np.array([[0.5, 0.4, 0.3], [0.6, 0.5, 0.4]])
>>> A = transformer.fit_transform(R)
>>> # A ≈ [[0.301, 0.398, 0.523], [0.222, 0.301, 0.398]]
fit(X, y=None)[source]

Fit the transformer.

Parameters:
Return type:

self

inverse_transform(X, y=None)[source]

Convert absorbance back to reflectance/transmittance.

Parameters:

X (array-like of shape (n_samples, n_features)) – Absorbance values

Returns:

X_original – Reflectance or transmittance values

Return type:

ndarray of shape (n_samples, n_features)

transform(X, y=None)[source]

Transform reflectance/transmittance to absorbance.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input spectral data

Returns:

X_transformed – Absorbance values

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.transforms.UnsharpSpectralMask(apply_on='samples', random_state=None, *, copy=True, amount_range: Tuple[float, float] = (0.1, 0.5), sigma: float = 1.0, kernel_width: int = 11)[source]

Bases: Augmenter

Applies unsharp masking (sharpening). X_aug = X + k * (X - smooth(X))

Vectorized implementation using batch convolution.

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.WavelengthShift(apply_on='samples', random_state=None, *, copy=True, shift_range: Tuple[float, float] = (-2.0, 2.0), lambda_axis: ndarray | None = None)[source]

Bases: Augmenter

Shifts the wavelength axis.

Vectorized implementation using batch interpolation.

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.WavelengthStretch(apply_on='samples', random_state=None, *, copy=True, stretch_range: Tuple[float, float] = (0.99, 1.01), lambda_axis: ndarray | None = None)[source]

Bases: Augmenter

Stretches or compresses the wavelength axis.

Vectorized implementation using batch interpolation.

augment(X, apply_on='samples')[source]

Perform data augmentation.

Parameters:
  • X (array-like) – Input data to augment.

  • apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.

Returns:

Augmented data.

Return type:

array-like

class nirs4all.operators.transforms.Wavelet(wavelet: str = 'haar', mode: str = 'periodization', *, copy: bool = True)[source]

Bases: TransformerMixin, BaseEstimator

Single level Discrete Wavelet Transform.

Performs a discrete wavelet transform on data, using a wavelet function.

Parameters:
  • wavelet (Wavelet object or name, default='haar') – Wavelet to use: [‘Haar’, ‘Daubechies’, ‘Symlets’, ‘Coiflets’, ‘Biorthogonal’, ‘Reverse biorthogonal’, ‘Discrete Meyer (FIR Approximation)’…]

  • mode (str, optional, default='periodization') – Signal extension mode.

fit(X, y=None)[source]

Verify the X data compliance with wavelet transform.

Parameters:
  • X (array-like, spectra) – The data to transform.

  • y (None) – Ignored.

Raises:

ValueError – If the input X is a sparse matrix.

Returns:

The fitted object.

Return type:

Wavelet

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') Wavelet

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(X, copy=None)[source]

Apply wavelet transform to the data X.

Parameters:
  • X (array-like) – The data to transform.

  • copy (bool or None, optional) – Whether to copy the input data.

Returns:

The transformed data.

Return type:

numpy.ndarray

class nirs4all.operators.transforms.WaveletFeatures(wavelet: str = 'db4', max_level: int = 5, n_coeffs_per_level: int = 10, *, copy: bool = True)[source]

Bases: TransformerMixin, BaseEstimator

Discrete Wavelet Transform feature extractor for spectral data.

Decomposes spectra into approximation (smooth trends) and detail (sharp features) coefficients at multiple scales, then extracts statistical features from each level. This captures both global baseline variations and local absorption peaks.

Scientific basis:
  • Multi-resolution analysis captures features at different scales

  • Daubechies wavelets (db4) are well-suited for smooth signals

  • Wavelet coefficients are partially decorrelated

Parameters:
  • wavelet (str, default='db4') – Wavelet to use (e.g., ‘haar’, ‘db4’, ‘coif3’, ‘sym4’).

  • max_level (int, default=5) – Maximum decomposition level.

  • n_coeffs_per_level (int, default=10) – Number of top coefficients (by magnitude) to extract per level.

  • copy (bool, default=True) – Whether to copy input data.

actual_level_

Actual decomposition level used (may be less than max_level depending on signal length).

Type:

int

n_features_out_

Number of output features.

Type:

int

References

Mallat (1989). A theory for multiresolution signal decomposition: the wavelet representation. IEEE PAMI.

fit(X, y=None)[source]

Fit the wavelet feature extractor.

Parameters:
Returns:

self – Fitted transformer.

Return type:

WaveletFeatures

get_feature_names_out(input_features=None)[source]

Get output feature names.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') WaveletFeatures

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(X, copy=None)[source]

Extract wavelet features from spectra.

Parameters:
Returns:

X_transformed – Wavelet features.

Return type:

ndarray of shape (n_samples, n_features_out_)

class nirs4all.operators.transforms.WaveletPCA(wavelet: str = 'db4', max_level: int = 4, n_components_per_level: int = 3, whiten: bool = True, *, copy: bool = True)[source]

Bases: TransformerMixin, BaseEstimator

Multi-scale PCA on wavelet coefficients.

Applies PCA separately to each wavelet decomposition level, creating a compact multi-scale representation where each scale contributes a few principal components. This preserves frequency-specific information while reducing dimensionality.

Scientific basis:
  • Combines multi-resolution analysis with decorrelation

  • Each scale captures different frequency information

  • PCA per scale reduces redundancy within each frequency band

  • Results in a compact, interpretable feature set

Parameters:
  • wavelet (str, default='db4') – Wavelet to use (e.g., ‘haar’, ‘db4’, ‘coif3’, ‘sym4’).

  • max_level (int, default=4) – Maximum decomposition level.

  • n_components_per_level (int, default=3) – Number of PCA components to keep per decomposition level.

  • whiten (bool, default=True) – Whether to whiten the PCA components.

  • copy (bool, default=True) – Whether to copy input data.

actual_level_

Actual decomposition level used.

Type:

int

pcas_

Fitted PCA objects per level.

Type:

dict

scalers_

Fitted StandardScaler objects per level.

Type:

dict

n_features_out_

Number of output features.

Type:

int

References

Trygg & Wold (1998). PLS regression on wavelet compressed NIR spectra.

fit(X, y=None)[source]

Fit the wavelet-PCA transformer.

Parameters:
Returns:

self – Fitted transformer.

Return type:

WaveletPCA

get_feature_names_out(input_features=None)[source]

Get output feature names.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') WaveletPCA

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(X, copy=None)[source]

Transform spectra to wavelet-PCA features.

Parameters:
Returns:

X_transformed – Wavelet-PCA features.

Return type:

ndarray of shape (n_samples, n_features_out_)

class nirs4all.operators.transforms.WaveletSVD(wavelet: str = 'db4', max_level: int = 4, n_components_per_level: int = 3, *, copy: bool = True)[source]

Bases: TransformerMixin, BaseEstimator

Multi-scale SVD on wavelet coefficients.

Applies Truncated SVD separately to each wavelet decomposition level, creating a compact multi-scale representation. Similar to WaveletPCA but uses SVD which doesn’t center data and works better for sparse data.

Scientific basis:
  • Combines multi-resolution analysis with dimensionality reduction

  • Each scale captures different frequency information

  • SVD per scale reduces redundancy within each frequency band

  • Results in a compact feature set

Parameters:
  • wavelet (str, default='db4') – Wavelet to use (e.g., ‘haar’, ‘db4’, ‘coif3’, ‘sym4’).

  • max_level (int, default=4) – Maximum decomposition level.

  • n_components_per_level (int, default=3) – Number of SVD components to keep per decomposition level.

  • copy (bool, default=True) – Whether to copy input data.

actual_level_

Actual decomposition level used.

Type:

int

svds_

Fitted TruncatedSVD objects per level.

Type:

dict

n_features_out_

Number of output features.

Type:

int

References

Trygg & Wold (1998). PLS regression on wavelet compressed NIR spectra.

fit(X, y=None)[source]

Fit the wavelet-SVD transformer.

Parameters:
Returns:

self – Fitted transformer.

Return type:

WaveletSVD

get_feature_names_out(input_features=None)[source]

Get output feature names.

set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') WaveletSVD

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for copy parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(X, copy=None)[source]

Transform spectra to wavelet-SVD features.

Parameters:
Returns:

X_transformed – Wavelet-SVD features.

Return type:

ndarray of shape (n_samples, n_features_out_)

nirs4all.operators.transforms.asls_baseline(spectra: ndarray, lam: float = 1000000.0, p: float = 0.01, max_iter: int = 50, tol: float = 0.001) ndarray[source]

Compute baseline using Asymmetric Least Squares Smoothing.

This is a convenience wrapper around pybaseline_correction with method=’asls’.

Parameters:
  • spectra (numpy.ndarray) – NIRS data matrix (n_samples, n_features).

  • lam (float) – Smoothness parameter (lambda). Default is 1e6.

  • p (float) – Asymmetry parameter (0 < p < 1). Default is 0.01.

  • max_iter (int) – Maximum number of iterations. Default is 50.

  • tol (float) – Convergence tolerance. Default is 1e-3.

Returns:

Baseline-corrected spectra with same shape as input.

Return type:

numpy.ndarray

nirs4all.operators.transforms.baseline(spectra)[source]

Removes baseline (mean) from each spectrum.

Parameters:

spectra (numpy.ndarray) – NIRS data matrix.

Returns:

Mean-centered NIRS data matrix.

Return type:

numpy.ndarray

nirs4all.operators.transforms.decon_set()[source]
nirs4all.operators.transforms.derivate(spectra, order=1, delta=1)[source]

Computes Nth order derivatives with the desired spacing using numpy.gradient.

Parameters:
  • spectra (numpy.ndarray) – NIRS data matrix.

  • order (float, optional) – Order of the derivation, by default 1.

  • delta (int, optional) – Delta of the derivative (in samples), by default 1.

Returns:

spectra – Derived NIR spectra.

Return type:

numpy.ndarray

nirs4all.operators.transforms.detrend(spectra, bp=0)[source]

Perform spectral detrending to remove linear trend from data.

Parameters:
  • spectra (numpy.ndarray) – NIRS data matrix.

  • bp (list, optional) – A sequence of break points. If given, an individual linear fit is performed for each part of data between two break points. Break points are specified as indices into data. Default is 0.

Returns:

Detrended NIR spectra.

Return type:

numpy.ndarray

nirs4all.operators.transforms.dumb_and_dumber_set()[source]
nirs4all.operators.transforms.dumb_set()[source]
nirs4all.operators.transforms.dumb_set_2D()[source]
nirs4all.operators.transforms.fat_set()[source]
nirs4all.operators.transforms.first_derivative(spectra: ndarray, delta: float = 1.0, edge_order: int = 2) ndarray[source]

First numerical derivative along feature axis using central differences.

Parameters:
  • spectra (numpy.ndarray) – NIRS data matrix (n_samples, n_features).

  • delta (float) – Sampling step along the feature axis.

  • edge_order (int) – 1 or 2, order of accuracy at the boundaries.

Returns:

First derivative dX/dλ with same shape as input.

Return type:

numpy.ndarray

nirs4all.operators.transforms.gaussian(spectra, order=2, sigma=1)[source]

Computes 1D gaussian filter using scipy.ndimage gaussian 1d filter.

Parameters:
  • spectra (numpy.ndarray) – NIRS data matrix.

  • order (float, optional) – Order of the derivation.

  • sigma (int, optional) – Sigma of the gaussian.

Returns:

Gaussian NIR spectra.

Return type:

numpy.ndarray

nirs4all.operators.transforms.haar_only()[source]
nirs4all.operators.transforms.id_preprocessing()[source]
nirs4all.operators.transforms.list_of_2D_sets()[source]
nirs4all.operators.transforms.log_transform(spectra: ndarray, base: float = 2.718281828459045, offset: float = 0.0, auto_offset: bool = True, min_value: float = 1e-08) ndarray[source]

Apply elementwise logarithm with automatic handling of edge cases.

Parameters:
  • spectra (numpy.ndarray) – NIRS data matrix.

  • base (float) – Logarithm base. Default is e.

  • offset (float) – Fixed value added before log to handle non-positives.

  • auto_offset (bool) – If True, automatically add offset for problematic values.

  • min_value (float) – Minimum value after offset when auto_offset=True.

Returns:

Log-transformed spectra.

Return type:

numpy.ndarray

nirs4all.operators.transforms.msc(spectra, scaled=True)[source]

Performs multiplicative scatter correction to the mean.

Parameters:
  • spectra (numpy.ndarray) – NIRS data matrix.

  • scaled (bool) – Whether to scale the data. Defaults to True.

Returns:

Scatter-corrected NIR spectra.

Return type:

numpy.ndarray

nirs4all.operators.transforms.nicon_set()[source]
nirs4all.operators.transforms.norml(spectra, feature_range=(-1, 1))[source]

Perform spectral normalization with user-defined limits.

Parameters:
  • spectra (numpy.ndarray) – NIRS data matrix.

  • feature_range (tuple (min, max), default=(-1, 1)) – Desired range of transformed data. If range min and max equals -1, linalg normalization is applied; otherwise, user bounds-defined normalization is applied.

Returns:

spectra – Normalized NIR spectra.

Return type:

numpy.ndarray

nirs4all.operators.transforms.optimal_set_2D()[source]
nirs4all.operators.transforms.preprocessing_list()[source]
nirs4all.operators.transforms.pybaseline_correction(spectra: ndarray, method: str = 'asls', **kwargs) ndarray[source]

Apply baseline correction using pybaselines library.

This is a general wrapper for all pybaselines methods, allowing flexible baseline correction with various algorithms.

Parameters:
  • spectra (numpy.ndarray) – NIRS data matrix (n_samples, n_features).

  • method (str) –

    Baseline correction method. Available methods: Whittaker: ‘asls’, ‘iasls’, ‘airpls’, ‘arpls’, ‘drpls’, ‘iarpls’,

    ’aspls’, ‘psalsa’, ‘derpsalsa’

    Polynomial: ‘poly’, ‘modpoly’, ‘imodpoly’, ‘penalized_poly’, ‘loess’, ‘quant_reg’ Morphological: ‘mor’, ‘imor’, ‘mormol’, ‘amormol’, ‘rolling_ball’,

    ’mwmv’, ‘tophat’, ‘mpspline’, ‘jbcd’

    Spline: ‘mixture_model’, ‘irsqr’, ‘corner_cutting’, ‘pspline_asls’, etc. Smooth: ‘noise_median’, ‘snip’, ‘swima’, ‘ipsa’ Classification: ‘dietrich’, ‘golotvin’, ‘std_distribution’, ‘fastchrom’, ‘cwt_br’ Optimizers: ‘collab_pls’, ‘optimize_extended_range’, ‘adaptive_minmax’ Misc: ‘interp_pts’, ‘beads’

  • **kwargs – Additional parameters passed to the specific baseline method.

Returns:

Baseline-corrected spectra with same shape as input.

Return type:

numpy.ndarray

Raises:

Examples

>>> from nirs4all.operators.transforms.nirs import pybaseline_correction
>>> corrected = pybaseline_correction(spectra, method='airpls', lam=1e5)
>>> corrected = pybaseline_correction(spectra, method='imodpoly', poly_order=3)
>>> corrected = pybaseline_correction(spectra, method='snip', max_half_window=30)
nirs4all.operators.transforms.reflectance_to_absorbance(spectra: ndarray, min_value: float = 1e-08) ndarray[source]

Convert reflectance spectra to absorbance.

Applies the Beer-Lambert law: A = -log10(R) = log10(1/R) where R is reflectance and A is absorbance.

Parameters:
  • spectra (numpy.ndarray) – Reflectance NIRS data matrix (n_samples, n_features). Values should be in range (0, 1] or as percentages (0, 100].

  • min_value (float) – Minimum value to clamp reflectance to avoid log(0). Default is 1e-8.

Returns:

Absorbance spectra with same shape as input.

Return type:

numpy.ndarray

nirs4all.operators.transforms.savgol(spectra: ndarray, window_length: int = 11, polyorder: int = 3, deriv: int = 0, delta: float = 1.0) ndarray[source]

Perform Savitzky–Golay filtering on the data (also calculates derivatives). This function is a wrapper for scipy.signal.savgol_filter.

Parameters:
  • spectra (numpy.ndarray) – NIRS data matrix.

  • window_length (int) – Size of the filter window in samples (default 11).

  • polyorder (int) – Order of the polynomial estimation (default 3).

  • deriv (int) – Order of the derivation (default 0).

  • delta (float) – Sampling distance of the data.

Returns:

NIRS data smoothed with Savitzky-Golay filtering.

Return type:

numpy.ndarray

nirs4all.operators.transforms.savgol_only()[source]
nirs4all.operators.transforms.second_derivative(spectra: ndarray, delta: float = 1.0, edge_order: int = 2) ndarray[source]

Second numerical derivative along feature axis.

Parameters:
  • spectra (numpy.ndarray) – NIRS data matrix (n_samples, n_features).

  • delta (float) – Sampling step along the feature axis.

  • edge_order (int) – 1 or 2, order of accuracy at the boundaries.

Returns:

Second derivative d²X/dλ² with same shape as input.

Return type:

numpy.ndarray

nirs4all.operators.transforms.senseen_set()[source]
nirs4all.operators.transforms.small_set()[source]
nirs4all.operators.transforms.special_set()[source]
nirs4all.operators.transforms.spl_norml(spectra)[source]

Perform simple spectral normalization.

Parameters:

spectra (numpy.ndarray) – NIRS data matrix.

Returns:

spectra – Normalized NIR spectra.

Return type:

numpy.ndarray

nirs4all.operators.transforms.transf_set()[source]
nirs4all.operators.transforms.wavelet_transform(spectra: ndarray, wavelet: str, mode: str = 'periodization') ndarray[source]

Computes transform using pywavelet transform.

Parameters:
  • spectra (numpy.ndarray) – NIRS data matrix.

  • wavelet (str) – wavelet family transformation.

  • mode (str) – signal extension mode.

Returns:

wavelet and resampled spectra.

Return type:

numpy.ndarray