nirs4all.operators.transforms package
Submodules
- nirs4all.operators.transforms.feature_selection module
CARSMCUVEMCUVE.selected_indices_MCUVE.selection_mask_MCUVE.n_features_in_MCUVE.n_features_out_MCUVE.stability_MCUVE.noise_stability_MCUVE.threshold_MCUVE.mean_coefs_MCUVE.std_coefs_MCUVE.__repr__()MCUVE.fit()MCUVE.get_feature_names_out()MCUVE.get_support()MCUVE.set_fit_request()MCUVE.transform()
- nirs4all.operators.transforms.features module
- nirs4all.operators.transforms.nirs module
ASLSBaselineAirPLSArPLSAreaNormalizationBEADSExtendedMultiplicativeScatterCorrectionFirstDerivativeHaarIASLSIModPolyLogTransformModPolyMultiplicativeScatterCorrectionPyBaselineCorrectionReflectanceToAbsorbanceRollingBallSNIPSavitzkyGolaySecondDerivativeWaveletWaveletFeaturesWaveletPCAWaveletSVDasls_baseline()first_derivative()log_transform()msc()pybaseline_correction()reflectance_to_absorbance()savgol()second_derivative()wavelet_transform()
- nirs4all.operators.transforms.presets module
- nirs4all.operators.transforms.resampler module
- nirs4all.operators.transforms.scalers module
- nirs4all.operators.transforms.signal module
- nirs4all.operators.transforms.signal_conversion module
- nirs4all.operators.transforms.targets module
Module contents
- class nirs4all.operators.transforms.ASLSBaseline(lam: float = 1000000.0, p: float = 0.01, max_iter: int = 50, tol: float = 0.001, *, copy: bool = True)[source]
Bases:
_BaselineMethodAliasAsymmetric Least Squares (AsLS) baseline correction.
Convenience class for ASLS baseline correction. This is equivalent to PyBaselineCorrection(method=’asls’, …).
- Parameters:
References
Eilers, P.H.C. and Boelens, H.F.M. (2005). Baseline Correction with Asymmetric Least Squares Smoothing.
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') ASLSBaseline
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.AirPLS(lam: float = 1000000.0, max_iter: int = 50, tol: float = 0.001, *, copy: bool = True)[source]
Bases:
_BaselineMethodAliasAdaptive Iteratively Reweighted Penalized Least Squares baseline correction.
A robust baseline correction method that adaptively adjusts weights based on the difference between the fitted baseline and the data.
- Parameters:
References
Zhang, Z.M., et al. (2010). Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst, 135(5), 1138-1146.
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') AirPLS
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.ArPLS(lam: float = 1000000.0, max_iter: int = 50, tol: float = 0.001, *, copy: bool = True)[source]
Bases:
_BaselineMethodAliasAsymmetrically Reweighted Penalized Least Squares baseline correction.
- Parameters:
References
Baek, S.J., et al. (2015). Baseline correction using asymmetrically reweighted penalized least squares smoothing. Analyst, 140(1), 250-257.
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') ArPLS
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.Augmenter(apply_on='samples', random_state=None, *, copy=True)[source]
Bases:
TransformerMixin,BaseEstimatorBase class for data augmentation transformers.
- abstractmethod augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- fit(X, y=None)[source]
Fit to data.
- Parameters:
X (array-like) – Input data to fit.
y (array-like or None) – Target variable (unused).
- Returns:
self – Returns the instance itself.
- Return type:
- fit_transform(X, y=None, **fit_params)[source]
Fit to data and transform it.
- Parameters:
X (array-like) – Input data to fit and transform.
y (array-like or None) – Target variable (unused).
**fit_params (dict) – Additional fitting parameters (unused).
- Returns:
Transformed data.
- Return type:
array-like
- class nirs4all.operators.transforms.BEADS(lam_0: float = 1.0, lam_1: float = 1.0, lam_2: float = 1.0, max_iter: int = 50, tol: float = 0.01, *, copy: bool = True)[source]
Bases:
_BaselineMethodAliasBaseline Estimation And Denoising with Sparsity.
Simultaneously estimates baseline and removes noise using sparsity constraints.
- Parameters:
lam_0 (float, default=1.0) – Regularization parameter for the baseline.
lam_1 (float, default=1.0) – Regularization parameter for the first derivative.
lam_2 (float, default=1.0) – Regularization parameter for the second derivative.
max_iter (int, default=50) – Maximum number of iterations.
tol (float, default=1e-2) – Convergence tolerance.
copy (bool, default=True) – Whether to copy input data.
References
Ning, X., et al. (2014). Chromatogram baseline estimation and denoising using sparsity (BEADS). Chemometrics and Intelligent Laboratory Systems, 139, 156-167.
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') BEADS
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.BandMasking(apply_on='samples', random_state=None, *, copy=True, n_bands_range: Tuple[int, int] = (1, 3), bandwidth_range: Tuple[int, int] = (5, 20), mode: str = 'interp')[source]
Bases:
AugmenterMasks out bands of the spectrum.
Optimized with pre-generated random parameters.
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.BandPerturbation(apply_on='samples', random_state=None, *, copy=True, n_bands: int = 3, bandwidth_range: Tuple[int, int] = (5, 20), gain_range: Tuple[float, float] = (0.9, 1.1), offset_range: Tuple[float, float] = (-0.01, 0.01))[source]
Bases:
AugmenterPerturbs specific bands of the spectrum.
Optimized with pre-generated random parameters.
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.Baseline(*, copy=True)[source]
Bases:
TransformerMixin,BaseEstimatorRemoves baseline (mean) from each spectrum.
- Parameters:
copy (bool, optional) – Flag to indicate whether to make a copy of the object, by default True.
- fit(X, y=None)[source]
Compute the minimum and maximum to be used for later scaling.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The data used to compute the per-feature minimum and maximum used for later scaling along the features axis.
y (None) – Ignored.
- Returns:
self – Fitted Baseline object.
- Return type:
- class nirs4all.operators.transforms.CARS(n_components: int = 10, n_sampling_runs: int = 50, n_variables_ratio_start: float = 1.0, n_variables_ratio_end: float = 0.1, cv_folds: int = 5, subset_ratio: float = 0.8, random_state: int | None = None)[source]
Bases:
TransformerMixin,BaseEstimatorCompetitive Adaptive Reweighted Sampling (CARS) for wavelength selection.
CARS is a variable selection method that iteratively selects important wavelengths by: 1. Fitting PLS models on subsets of samples 2. Calculating variable importance weights from regression coefficients 3. Using exponentially decreasing function to reduce variable count 4. Applying adaptive reweighted sampling based on importance
The method was introduced by Li et al. (2009) and is widely used for NIRS wavelength selection.
- Parameters:
n_components (int, default=10) – Number of PLS components for the internal PLS model.
n_sampling_runs (int, default=50) – Number of Monte-Carlo sampling runs.
n_variables_ratio_start (float, default=1.0) – Starting ratio of variables to keep (1.0 = all variables).
n_variables_ratio_end (float, default=0.1) – Ending ratio of variables to keep.
cv_folds (int, default=5) – Number of cross-validation folds for RMSECV calculation.
subset_ratio (float, default=0.8) – Ratio of samples to use in each Monte-Carlo run.
random_state (int or None, default=None) – Random seed for reproducibility.
- selection_mask_
Boolean mask indicating selected features.
- Type:
ndarray of shape (n_features,)
- n_variables_history_
Number of variables at each iteration.
- Type:
ndarray of shape (n_sampling_runs,)
Examples
>>> from nirs4all.operators.transforms import CARS >>> import numpy as np >>> >>> # Spectral data with 200 wavelengths >>> X = np.random.randn(100, 200) >>> y = np.random.randn(100) >>> >>> # Select informative wavelengths >>> cars = CARS(n_components=10, n_sampling_runs=30) >>> cars.fit(X, y) >>> X_selected = cars.transform(X) >>> print(f"Selected {X_selected.shape[1]} from {X.shape[1]} wavelengths")
References
Li, H., Liang, Y., Xu, Q., & Cao, D. (2009). Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Analytica Chimica Acta, 648(1), 77-84.
Notes
CARS works best with standardized/scaled data
The exponential decay function ensures smooth variable reduction
Final selection is based on minimum cross-validated RMSECV
- fit(X, y=None, wavelengths: ndarray | None = None)[source]
Fit the CARS selector to identify important wavelengths.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
y (array-like of shape (n_samples,) or (n_samples, 1)) – Target values. Required for CARS.
wavelengths (array-like of shape (n_features,), optional) – Original wavelength grid. Stored for reference but not required.
- Returns:
self – Fitted selector.
- Return type:
- get_feature_names_out(input_features=None)[source]
Get output feature names (selected wavelengths as strings).
- get_support(indices: bool = False)[source]
Get a mask or indices of selected features.
- Parameters:
indices (bool, default=False) – If True, return indices instead of boolean mask.
- Returns:
support – Boolean mask or indices of selected features.
- Return type:
ndarray
- set_fit_request(*, wavelengths: bool | None | str = '$UNCHANGED$') CARS
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.ChannelDropout(apply_on='samples', random_state=None, *, copy=True, dropout_prob: float = 0.01, mode: str = 'interp')[source]
Bases:
AugmenterDrops individual wavelengths (sets to zero or interpolates).
Optimized with vectorized mask generation.
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.CropTransformer(start: int = 0, end: int = None)[source]
Bases:
BaseEstimator,TransformerMixin
- class nirs4all.operators.transforms.Derivate(order=1, delta=1, copy=True)[source]
Bases:
TransformerMixin,BaseEstimator- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') Derivate
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.Detrend(bp=0, *, copy=True)[source]
Bases:
TransformerMixin,BaseEstimatorPerform spectral detrending to remove linear trend from data.
- Parameters:
- fit(X, y=None)[source]
Fit the transformer to the data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input data.
y (None) – Ignored.
- Returns:
self – Returns self.
- Return type:
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') Detrend
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform(X, copy=None)[source]
Transform the data by removing linear trend.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input data.
copy (bool or None, optional) – Whether to make a copy of the input data. If None, self.copy is used. Default is None.
- Returns:
The transformed data.
- Return type:
- class nirs4all.operators.transforms.FirstDerivative(delta: float = 1.0, edge_order: int = 2, *, copy: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorFirst numerical derivative using numpy.gradient.
- Parameters:
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') FirstDerivative
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.FlattenPreprocessing(sources: str | int | List[int] = 'all')[source]
Bases:
BaseEstimator,TransformerMixinFlatten the preprocessing dimension of a 3D feature array.
Transforms a 3D array of shape (samples, preprocessings, features) into a 2D array of shape (samples, preprocessings * features) by horizontally concatenating all preprocessing views.
This is useful after feature_augmentation when you want to flatten multiple preprocessing views into a single feature vector for models that expect 2D input.
- Parameters:
sources – Which sources to apply the flattening to. - “all” (default): Apply to all sources - List of indices: [0, 2] to apply only to sources 0 and 2 - Single int: Apply to only that source If a source is not in the list, it is passed through unchanged.
Example
>>> # Input: (100, 4, 2151) - 4 preprocessing views of 2151 features each >>> flattener = FlattenPreprocessing() >>> output = flattener.transform(X) >>> # Output: (100, 8604) - 4 * 2151 = 8604 features
>>> # Apply only to specific sources >>> flattener = FlattenPreprocessing(sources=[0, 2]) >>> # Only sources 0 and 2 will be flattened
Note
If input is already 2D, it is returned unchanged.
The transformer is stateless (fit does nothing).
- class nirs4all.operators.transforms.FractionToPercent[source]
Bases:
TransformerMixin,BaseEstimatorConvert fractional [0, 1] values to percentage [0, 100] range.
Simply multiplies by 100.
Examples
>>> transformer = FractionToPercent() >>> X_frac = np.array([[0.5, 0.6], [0.7, 0.8]]) >>> X_pct = transformer.fit_transform(X_frac) >>> # X_pct = [[50, 60], [70, 80]]
- class nirs4all.operators.transforms.FromAbsorbance(target_type: str | SignalType = 'reflectance')[source]
Bases:
TransformerMixin,BaseEstimatorConvert absorbance to reflectance or transmittance.
Applies the inverse log transform: R/T = 10^(-A)
- Parameters:
target_type (str or SignalType) – Output signal type. Valid: “reflectance”, “reflectance%”, “transmittance”, “transmittance%”
Examples
>>> from nirs4all.operators.transforms.signal_conversion import FromAbsorbance >>> transformer = FromAbsorbance(target_type="reflectance") >>> A = np.array([[0.301, 0.398], [0.222, 0.301]]) >>> R = transformer.fit_transform(A) >>> # R ≈ [[0.5, 0.4], [0.6, 0.5]]
- class nirs4all.operators.transforms.Gaussian(order=2, sigma=1, *, copy=True)[source]
Bases:
TransformerMixin,BaseEstimator- fit(X, y=None)[source]
Fit the Gaussian filter.
- Parameters:
X (numpy.ndarray) – Input data.
y (None) – Ignored.
- Returns:
self – Returns the instance itself.
- Return type:
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') Gaussian
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform(X, copy=None)[source]
Transform the input data using the Gaussian filter.
- Parameters:
X (numpy.ndarray) – Input data.
copy (bool, default=None) – Whether to make a copy of the input data.
- Returns:
Transformed data.
- Return type:
- class nirs4all.operators.transforms.GaussianAdditiveNoise(apply_on='samples', random_state=None, *, copy=True, sigma: float = 0.01, smoothing_kernel_width: int = 1)[source]
Bases:
AugmenterAdds Gaussian noise to the spectra. X_aug = X + noise
Vectorized implementation using batch convolution.
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.GaussianSmoothingJitter(apply_on='samples', random_state=None, *, copy=True, sigma_range: Tuple[float, float] = (0.5, 2.0), kernel_width: int = 11)[source]
Bases:
AugmenterApplies Gaussian smoothing with random sigma.
Optimized with pre-generated random parameters. Note: Due to per-sample kernel requirements, this still uses a loop but with pre-generated random values.
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.Haar(*, copy: bool = True)[source]
Bases:
WaveletShortcut to the Wavelet haar transform.
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') Haar
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.IASLS(lam: float = 1000000.0, p: float = 0.01, lam_1: float = 0.0001, max_iter: int = 50, tol: float = 0.001, *, copy: bool = True)[source]
Bases:
_BaselineMethodAliasImproved Asymmetric Least Squares baseline correction.
An improvement over ASLS that uses a different weighting scheme.
- Parameters:
lam (float, default=1e6) – Smoothness parameter.
p (float, default=0.01) – Asymmetry parameter.
lam_1 (float, default=1e-4) – First derivative smoothing parameter.
max_iter (int, default=50) – Maximum number of iterations.
tol (float, default=1e-3) – Convergence tolerance.
copy (bool, default=True) – Whether to copy input data.
References
He, S., et al. (2014). Baseline correction for Raman spectra using an improved asymmetric least squares method. Analytical Methods, 6(12), 4402-4407.
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') IASLS
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.IModPoly(poly_order: int = 5, max_iter: int = 250, tol: float = 0.001, *, copy: bool = True)[source]
Bases:
_BaselineMethodAliasImproved Modified Polynomial baseline correction.
A polynomial-based baseline correction that iteratively fits and removes points above the baseline.
- Parameters:
References
Zhao, J., et al. (2007). Automated autofluorescence background subtraction algorithm for biomedical Raman spectroscopy. Applied Spectroscopy, 61(11), 1225-1232.
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') IModPoly
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.IdentityAugmenter(apply_on='samples', random_state=None, *, copy=True)[source]
Bases:
AugmenterAn augmenter that returns the input data without any changes.
- nirs4all.operators.transforms.IdentityTransformer
alias of
FunctionTransformer
- class nirs4all.operators.transforms.IntegerKBinsDiscretizer(n_bins=5, encode='ordinal', strategy='quantile')[source]
Bases:
BaseEstimator,TransformerMixinKBinsDiscretizer qui retourne des entiers au lieu de floats
- class nirs4all.operators.transforms.KubelkaMunk(source_type: str | SignalType = 'reflectance', epsilon: float = 1e-10)[source]
Bases:
TransformerMixin,BaseEstimatorApply Kubelka-Munk transformation for diffuse reflectance.
The Kubelka-Munk function: F(R) = (1-R)² / (2R)
This is theoretically more appropriate for scattering media (powders) than simple log(1/R), though in NIR the benefit is dataset-dependent.
- Parameters:
source_type (str or SignalType) – Input signal type. Valid: “reflectance”, “reflectance%”
epsilon (float, default=1e-10) – Small value to avoid division by zero
Examples
>>> from nirs4all.operators.transforms.signal_conversion import KubelkaMunk >>> transformer = KubelkaMunk(source_type="reflectance") >>> R = np.array([[0.5, 0.4], [0.6, 0.5]]) >>> F_R = transformer.fit_transform(R) >>> # F_R[0,0] = (1-0.5)² / (2*0.5) = 0.25 / 1 = 0.25
- class nirs4all.operators.transforms.LinearBaselineDrift(apply_on='samples', random_state=None, *, copy=True, offset_range: Tuple[float, float] = (-0.1, 0.1), slope_range: Tuple[float, float] = (-0.001, 0.001), lambda_axis: ndarray | None = None)[source]
Bases:
AugmenterAdds a linear baseline drift. X_aug = X + a + b * lambda
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.LocalClipping(apply_on='samples', random_state=None, *, copy=True, n_regions: int = 1, width_range: Tuple[int, int] = (5, 20))[source]
Bases:
AugmenterClips values in a local region to simulate saturation.
Optimized with pre-generated random parameters.
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.LocalMixupAugmenter(apply_on='samples', random_state=None, *, copy=True, alpha: float = 0.2, k_neighbors: int = 5)[source]
Bases:
AugmenterMixup with nearest neighbors.
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.LocalStandardNormalVariate(window=11, pad_mode='reflect', constant_values=0.0, copy=True)[source]
Bases:
TransformerMixin,BaseEstimatorLocal Standard Normal Variate (LSNV).
Per-sample local normalization with a sliding window along features. For each sample and feature j:
mean_w = mean(X[…, j-w//2 : j+w//2+1]) std_w = std (X[…, j-w//2 : j+w//2+1]) X’[j] = (X[j] - mean_w) / std_w
- Parameters:
Notes
Operates row-wise (axis=1). Input must be (n_samples, n_features).
std_w==0 → divide by 1 to avoid NaN.
- fit_transform(X, y=None)[source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters. Pass only if the estimator accepts additional params in its fit method.
- Returns:
X_new – Transformed array.
- Return type:
- class nirs4all.operators.transforms.LocalWavelengthWarp(apply_on='samples', random_state=None, *, copy=True, n_control_points: int = 5, max_shift: float = 1.0, lambda_axis: ndarray | None = None)[source]
Bases:
AugmenterApplies a non-linear warp to the wavelength axis.
Optimized implementation with pre-computed control points.
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.LogTransform(base: float = 2.718281828459045, offset: float = 0.0, auto_offset: bool = True, min_value: float = 1e-08, *, copy: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorElementwise logarithm with automatic handling of edge cases.
- Parameters:
base (float, default=np.e) – Logarithm base.
offset (float, default=0.0) – Fixed value added before log to handle non-positives.
auto_offset (bool, default=True) – If True, automatically add offset to handle zeros/negatives.
min_value (float, default=1e-8) – Minimum value after offset when auto_offset=True.
copy (bool, default=True) – Whether to copy input.
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') LogTransform
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.MCUVE(n_components: int = 10, n_iterations: int = 100, subset_ratio: float = 0.8, n_noise_variables: int | None = None, threshold_method: Literal['percentile', 'fixed', 'auto'] = 'auto', threshold_percentile: float = 99, threshold_value: float = 2.0, random_state: int | None = None)[source]
Bases:
TransformerMixin,BaseEstimatorMonte-Carlo Uninformative Variable Elimination (MC-UVE) for wavelength selection.
MC-UVE identifies uninformative variables by comparing the stability of regression coefficients between real variables and random noise variables. Variables with low stability (similar to noise) are eliminated.
The method works by: 1. Augmenting X with noise variables (same distribution as X) 2. Performing multiple PLS fits on bootstrap samples 3. Calculating stability (mean/std) of regression coefficients 4. Selecting variables with stability significantly higher than noise
- Parameters:
n_components (int, default=10) – Number of PLS components for the internal PLS model.
n_iterations (int, default=100) – Number of Monte-Carlo iterations (bootstrap samples).
subset_ratio (float, default=0.8) – Ratio of samples to use in each bootstrap iteration.
n_noise_variables (int or None, default=None) – Number of noise variables to add. If None, uses n_features.
threshold_method ({'percentile', 'fixed', 'auto'}, default='auto') – Method to determine selection threshold: - ‘percentile’: Use percentile of noise stability as threshold - ‘fixed’: Use fixed stability threshold - ‘auto’: Automatically select based on noise distribution
threshold_percentile (float, default=99) – Percentile of noise stability used as threshold (for ‘percentile’ method).
threshold_value (float, default=2.0) – Fixed stability threshold value (for ‘fixed’ method).
random_state (int or None, default=None) – Random seed for reproducibility.
- selection_mask_
Boolean mask indicating selected features.
- Type:
ndarray of shape (n_features,)
- stability_
Stability values for each real variable.
- Type:
ndarray of shape (n_features,)
- mean_coefs_
Mean regression coefficients across iterations.
- Type:
ndarray of shape (n_features,)
- std_coefs_
Standard deviation of coefficients across iterations.
- Type:
ndarray of shape (n_features,)
Examples
>>> from nirs4all.operators.transforms import MCUVE >>> import numpy as np >>> >>> # Spectral data with 200 wavelengths >>> X = np.random.randn(100, 200) >>> y = np.random.randn(100) >>> >>> # Select informative wavelengths >>> mcuve = MCUVE(n_components=10, n_iterations=100) >>> mcuve.fit(X, y) >>> X_selected = mcuve.transform(X) >>> print(f"Selected {X_selected.shape[1]} from {X.shape[1]} wavelengths")
References
Cai, W., Li, Y., & Shao, X. (2008). A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra. Chemometrics and Intelligent Laboratory Systems, 90(2), 188-194.
Notes
MC-UVE is robust against random noise
Higher stability indicates more informative variables
The noise comparison ensures a principled selection threshold
- fit(X, y=None, wavelengths: ndarray | None = None)[source]
Fit the MC-UVE selector to identify important wavelengths.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
y (array-like of shape (n_samples,) or (n_samples, 1)) – Target values. Required for MC-UVE.
wavelengths (array-like of shape (n_features,), optional) – Original wavelength grid. Stored for reference but not required.
- Returns:
self – Fitted selector.
- Return type:
- get_feature_names_out(input_features=None)[source]
Get output feature names (selected wavelengths as strings).
- get_support(indices: bool = False)[source]
Get a mask or indices of selected features.
- Parameters:
indices (bool, default=False) – If True, return indices instead of boolean mask.
- Returns:
support – Boolean mask or indices of selected features.
- Return type:
ndarray
- set_fit_request(*, wavelengths: bool | None | str = '$UNCHANGED$') MCUVE
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.MixupAugmenter(apply_on='samples', random_state=None, *, copy=True, alpha: float = 0.2)[source]
Bases:
AugmenterMixup augmentation. Note: This modifies both X and y. Standard transform() only returns X.
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.ModPoly(poly_order: int = 5, max_iter: int = 250, tol: float = 0.001, *, copy: bool = True)[source]
Bases:
_BaselineMethodAliasModified Polynomial baseline correction.
- Parameters:
References
Lieber, C.A. and Mahadevan-Jansen, A. (2003). Automated method for subtraction of fluorescence from biological Raman spectra. Applied Spectroscopy, 57(11), 1363-1367.
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') ModPoly
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.MultiplicativeNoise(apply_on='samples', random_state=None, *, copy=True, sigma_gain: float = 0.05, per_wavelength: bool = False)[source]
Bases:
AugmenterMultiplies spectra by a random gain factor. X_aug = (1 + epsilon) * X
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.MultiplicativeScatterCorrection(scale=True, *, copy=True)[source]
Bases:
TransformerMixin,BaseEstimator
- class nirs4all.operators.transforms.Normalize(feature_range=(-1, 1), *, copy=True)[source]
Bases:
TransformerMixin,BaseEstimatorNormalize spectrum using either custom range of linalg normalization
- Parameters:
feature_range (tuple (min, max), default=(-1, -1)) – Desired range of transformed data. If range min and max equals -1, linalg normalization is applied, otherwise user defined normalization is applied
copy (bool, default=True) – Set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array).
- fit(X, y=None)[source]
Fit the Normalize transformer on the training data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The training data.
y (None) – Ignored variable.
- Returns:
self – Returns the instance itself.
- Return type:
- inverse_transform(X)[source]
Transform the normalized data back to the original representation.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The normalized data to be transformed back.
- Returns:
X – The inverse transformed data.
- Return type:
ndarray of shape (n_samples, n_features)
- partial_fit(X, y=None)[source]
Perform incremental fit on the training data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The training data.
y (None) – Ignored variable.
- Returns:
self – Returns the instance itself.
- Return type:
- transform(X)[source]
Transform the input data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input data to be transformed.
- Returns:
X – The transformed data.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.transforms.PercentToFraction[source]
Bases:
TransformerMixin,BaseEstimatorConvert percentage values to fractional [0, 1] range.
Simply divides by 100.
Examples
>>> transformer = PercentToFraction() >>> X_pct = np.array([[50, 60], [70, 80]]) >>> X_frac = transformer.fit_transform(X_pct) >>> # X_frac = [[0.5, 0.6], [0.7, 0.8]]
- class nirs4all.operators.transforms.PolynomialBaselineDrift(apply_on='samples', random_state=None, *, copy=True, degree: int = 3, coeff_ranges: List[Tuple[float, float]] | None = None, lambda_axis: ndarray | None = None)[source]
Bases:
AugmenterAdds a polynomial baseline drift.
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.PyBaselineCorrection(method: str = 'asls', *, copy: bool = True, **method_params)[source]
Bases:
TransformerMixin,BaseEstimatorGeneral baseline correction using pybaselines library.
A flexible wrapper for the pybaselines library that provides access to numerous baseline correction algorithms. This transformer allows easy integration of any pybaselines method into sklearn pipelines.
- Parameters:
method (str, default='asls') –
The baseline correction method to use. Available methods by category:
- Whittaker-based (smooth baselines with asymmetric weighting):
’asls’: Asymmetric Least Squares
’iasls’: Improved Asymmetric Least Squares
’airpls’: Adaptive Iteratively Reweighted PLS
’arpls’: Asymmetrically Reweighted PLS
’drpls’: Doubly Reweighted PLS
’iarpls’: Improved ARPLS
’aspls’: Adaptive Smoothness PLS
’psalsa’: Peaked Signal’s Asymmetric Least Squares
’derpsalsa’: Derivative PSALSA
- Polynomial (polynomial fitting):
’poly’: Regular polynomial
’modpoly’: Modified polynomial
’imodpoly’: Improved modified polynomial
’penalized_poly’: Penalized polynomial
’loess’: Locally estimated scatterplot smoothing
’quant_reg’: Quantile regression
- Morphological (morphological operations):
’mor’: Morphological
’imor’: Improved morphological
’mormol’: Morphological and mollified
’amormol’: Averaging morphological and mollified
’rolling_ball’: Rolling ball algorithm
’mwmv’: Moving window minimum value
’tophat’: Top-hat transform
’mpspline’: Morphological penalized spline
’jbcd’: Joint baseline correction and denoising
- Spline (spline-based methods):
’mixture_model’: Mixture model
’irsqr’: Iteratively reweighted spline quantile regression
’corner_cutting’: Corner-cutting
’pspline_asls’, ‘pspline_iasls’, ‘pspline_airpls’, etc.
- Smooth (smoothing-based):
’noise_median’: Noise median
’snip’: Statistics-sensitive Non-linear Iterative Peak-clipping
’swima’: Small-Window Moving Average
’ipsa’: Iterative Polynomial Smoothing Algorithm
- Misc:
’beads’: Baseline estimation and denoising with sparsity
’interp_pts’: Interpolation between points
copy (bool, default=True) – Whether to copy input data.
**method_params (dict) – Additional parameters passed to the specific baseline method. Common parameters include: - lam (float): Smoothness parameter for Whittaker methods - p (float): Asymmetry parameter for ASLS-type methods - poly_order (int): Polynomial order for polynomial methods - max_half_window (int): Window size for morphological/smooth methods - max_iter (int): Maximum iterations - tol (float): Convergence tolerance
Examples
>>> from nirs4all.operators.transforms.nirs import PyBaselineCorrection >>> import numpy as np
Basic usage with ASLS: >>> transformer = PyBaselineCorrection(method=’asls’, lam=1e6, p=0.01) >>> corrected = transformer.fit_transform(spectra)
Using airPLS: >>> transformer = PyBaselineCorrection(method=’airpls’, lam=1e5) >>> corrected = transformer.fit_transform(spectra)
Using improved modified polynomial: >>> transformer = PyBaselineCorrection(method=’imodpoly’, poly_order=3) >>> corrected = transformer.fit_transform(spectra)
Using SNIP for Raman-like data: >>> transformer = PyBaselineCorrection(method=’snip’, max_half_window=40) >>> corrected = transformer.fit_transform(spectra)
Using rolling ball: >>> transformer = PyBaselineCorrection(method=’rolling_ball’, half_window=50) >>> corrected = transformer.fit_transform(spectra)
In a pipeline: >>> from sklearn.pipeline import Pipeline >>> from sklearn.preprocessing import StandardScaler >>> pipeline = Pipeline([ … (‘baseline’, PyBaselineCorrection(method=’airpls’, lam=1e5)), … (‘scale’, StandardScaler()), … ])
References
pybaselines documentation: https://pybaselines.readthedocs.io/
- fit(X, y=None)[source]
Fit the transformer (validates method and stores number of features).
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
y (None) – Ignored.
- Returns:
self – Fitted transformer.
- Return type:
- static list_methods()[source]
List all available baseline correction methods.
- Returns:
Dictionary with method categories as keys and list of methods as values.
- Return type:
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') PyBaselineCorrection
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform(X, copy=None)[source]
Apply baseline correction to the data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input spectra.
copy (bool or None, optional) – Whether to copy the input data.
- Returns:
X_corrected – Baseline-corrected spectra.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.transforms.Random_X_Operation(apply_on='global', random_state=None, *, copy=True, operator_func=<built-in function mul>, operator_range=(0.97, 1.03))[source]
Bases:
AugmenterClass for applying random operation on data augmentation.
- Parameters:
apply_on (str, optional) – Apply augmentation on “features” or “samples” data. Default is “features”.
random_state (int or None, optional) – Random seed for reproducibility. Default is None.
copy (bool, optional) – If True, creates a copy of the input data. Default is True.
operator_func (function, optional) – Operator function to be applied. Default is operator.mul.
operator_range (tuple, optional) – Range for generating random values for the operator. Default is (0.97, 1.03).
- class nirs4all.operators.transforms.RangeDiscretizer(bins)[source]
Bases:
BaseEstimator,TransformerMixin
- class nirs4all.operators.transforms.ReflectanceToAbsorbance(min_value: float = 1e-08, percent: bool = False, *, copy: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorConvert reflectance spectra to absorbance using Beer-Lambert law.
Applies the transformation: A = -log10(R) = log10(1/R) where R is reflectance and A is absorbance.
This is a fundamental transformation in NIR spectroscopy, as absorbance is linearly related to concentration (Beer-Lambert law), while reflectance is not.
- Parameters:
min_value (float, default=1e-8) – Minimum value to clamp reflectance to avoid log(0). Values below this threshold will be set to min_value before applying the log transform.
percent (bool, default=False) – If True, assumes input reflectance is in percentage (0-100) and divides by 100 before conversion.
copy (bool, default=True) – Whether to copy input data.
Notes
Input reflectance values should be positive.
For reflectance in range (0, 1], output absorbance is non-negative.
For reflectance > 1 (e.g., percentage values), set percent=True.
Examples
>>> from nirs4all.operators.transforms.nirs import ReflectanceToAbsorbance >>> import numpy as np >>> R = np.array([[0.5, 0.25, 0.1], [0.8, 0.4, 0.2]]) >>> transformer = ReflectanceToAbsorbance() >>> A = transformer.fit_transform(R) >>> # A ≈ [[0.301, 0.602, 1.0], [0.097, 0.398, 0.699]]
- fit(X, y=None)[source]
Fit the transformer (no-op, included for API compatibility).
- Parameters:
X (array-like of shape (n_samples, n_features)) – Reflectance spectra.
y (None) – Ignored.
- Returns:
self – Fitted transformer.
- Return type:
- inverse_transform(X)[source]
Convert absorbance back to reflectance.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Absorbance spectra.
- Returns:
X_reflectance – Reflectance spectra.
- Return type:
ndarray of shape (n_samples, n_features)
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') ReflectanceToAbsorbance
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform(X, copy=None)[source]
Convert reflectance to absorbance.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Reflectance spectra.
copy (bool or None, optional) – Whether to copy the input data.
- Returns:
X_transformed – Absorbance spectra.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.transforms.ResampleTransformer(num_samples: int)[source]
Bases:
BaseEstimator,TransformerMixin
- class nirs4all.operators.transforms.Resampler(target_wavelengths: ndarray, method: Literal['linear', 'nearest', 'cubic', 'quadratic', 'slinear', 'zero'] = 'linear', crop_range: Tuple[float, float] | None = None, fill_value: float | str = 0.0, bounds_error: bool = False, copy: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorResample spectral data to new wavelength grid using interpolation.
This transformer interpolates NIRS spectral data from the original wavelength grid to a target wavelength grid using scipy interpolation methods.
- Parameters:
target_wavelengths (array-like) – Target wavelengths for resampling. Must be 1D array.
method (str, default='linear') – Interpolation method. Supported methods: - ‘linear’: Linear interpolation - ‘nearest’: Nearest neighbor interpolation - ‘cubic’: Cubic spline interpolation - ‘quadratic’: Quadratic spline interpolation - ‘slinear’: Linear spline (order 1) - ‘zero’: Zero-order spline (piecewise constant) Future: May support additional scipy methods
crop_range (tuple of (float, float) or None, default=None) – Optional (min_wavelength, max_wavelength) to crop original data before resampling.
fill_value (float or 'extrapolate', default=0.0) – Value to use for target wavelengths outside the original range. - float: Use this constant value for extrapolation - ‘extrapolate’: Extrapolate using the interpolation method - 0.0: Default padding with zeros (safe choice)
bounds_error (bool, default=False) – If True, raise error when target wavelengths are outside original range. If False, use fill_value for out-of-bounds points.
copy (bool, default=True) – Whether to copy input data or modify in place.
- original_wavelengths_
Original wavelength grid from fit data
- Type:
ndarray of shape (n_features,)
Examples
>>> from nirs4all.operators.transforms import Resampler >>> import numpy as np >>> >>> # Original data at 1000-2500 nm with 200 points >>> X = np.random.randn(100, 200) >>> original_wl = np.linspace(1000, 2500, 200) >>> >>> # Resample to 100 evenly-spaced wavelengths >>> target_wl = np.linspace(1000, 2500, 100) >>> resampler = Resampler(target_wavelengths=target_wl, method='cubic') >>> resampler.fit(X, wavelengths=original_wl) >>> X_resampled = resampler.transform(X) >>> X_resampled.shape (100, 100)
Notes
Wavelengths must be strictly increasing
Warns if target wavelengths extend beyond original range
Raises error if no wavelengths overlap between original and target
- fit(X, y=None, wavelengths: ndarray | None = None)[source]
Fit the resampler by storing original wavelength grid.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
y (None) – Ignored. Present for API consistency.
wavelengths (array-like of shape (n_features,), optional) – Original wavelength grid. If None, will be extracted from dataset headers by the controller.
- Returns:
self – Fitted resampler.
- Return type:
- get_feature_names_out(input_features=None)[source]
Get output feature names (target wavelengths as strings).
- set_fit_request(*, wavelengths: bool | None | str = '$UNCHANGED$') Resampler
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform(X)[source]
Resample spectral data to target wavelength grid.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Spectral data to resample. Should have same number of features as training data.
- Returns:
X_resampled – Resampled spectral data.
- Return type:
ndarray of shape (n_samples, n_features_out_)
- class nirs4all.operators.transforms.RobustStandardNormalVariate(axis=1, with_center=True, with_scale=True, k=1.4826, copy=True)[source]
Bases:
TransformerMixin,BaseEstimatorRobust Standard Normal Variate (RSNV).
- Per-sample robust centering and scaling using median and MAD:
med = median(X, axis=1, keepdims=True) mad = median(|X - med|, axis=1, keepdims=True) X’ = (X - med) / (k * mad)
- Parameters:
axis (int, default=1) – 1 for row-wise (spectroscopy default). 0 for column-wise.
with_center (bool, default=True) – If True, subtract median.
with_scale (bool, default=True) – If True, divide by k * MAD.
k (float, default=1.4826) – Consistency constant to make MAD a robust estimator of std for Gaussian data.
copy (bool, default=True) – If False, try in-place.
Notes
MAD==0 → divide by 1 to avoid NaN.
- fit_transform(X, y=None)[source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters. Pass only if the estimator accepts additional params in its fit method.
- Returns:
X_new – Transformed array.
- Return type:
- class nirs4all.operators.transforms.RollingBall(half_window: int = 50, smooth_half_window: int = None, *, copy: bool = True)[source]
Bases:
_BaselineMethodAliasRolling Ball baseline correction.
A morphological approach that simulates rolling a ball beneath the spectrum.
- Parameters:
References
Kneen, M.A. and Annegarn, H.J. (1996). Algorithm for fitting XRF, SEM and PIXE X-ray spectra backgrounds. Nuclear Instruments and Methods in Physics Research B, 109, 209-213.
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') RollingBall
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.Rotate_Translate(apply_on='samples', random_state=None, *, copy=True, p_range=2, y_factor=3)[source]
Bases:
AugmenterClass for rotating and translating data augmentation.
Vectorized implementation that processes all samples in batch.
- Parameters:
apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.
random_state (int or None, optional) – Random seed for reproducibility. Default is None.
copy (bool, optional) – If True, creates a copy of the input data. Default is True.
p_range (int, optional) – Range for generating random slope values. Default is 2.
y_factor (int, optional) – Scaling factor for the initial value. Default is 3.
- augment(X, apply_on='samples')[source]
Augment the data by rotating and translating the signal.
Vectorized implementation using NumPy broadcasting.
- Parameters:
X (ndarray) – Input data to be augmented, shape (n_samples, n_features).
apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.
- Returns:
Augmented data.
- Return type:
ndarray
- class nirs4all.operators.transforms.SNIP(max_half_window: int = 40, decreasing: bool = True, smooth_half_window: int = None, *, copy: bool = True)[source]
Bases:
_BaselineMethodAliasStatistics-sensitive Non-linear Iterative Peak-clipping baseline correction.
Particularly effective for spectra with many peaks (e.g., Raman, XRF).
- Parameters:
max_half_window (int, default=40) – Maximum half-window size for the algorithm.
decreasing (bool, default=True) – Whether to use decreasing window sizes.
smooth_half_window (int or None, default=None) – Half-window for smoothing. None means no smoothing.
copy (bool, default=True) – Whether to copy input data.
References
Ryan, C.G., et al. (1988). SNIP, a statistics-sensitive background treatment for the quantitative analysis of PIXE spectra in geoscience applications. Nuclear Instruments and Methods in Physics Research B, 34(3), 396-402.
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') SNIP
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.SavitzkyGolay(window_length: int = 11, polyorder: int = 3, deriv: int = 0, delta: float = 1.0, *, copy: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorA class for smoothing and differentiating data using the Savitzky-Golay filter.
Parameters:
- window_lengthint, optional (default=11)
The length of the window used for smoothing.
- polyorderint, optional (default=3)
The order of the polynomial used for fitting the samples within the window.
- derivint, optional (default=0)
The order of the derivative to compute.
- deltafloat, optional (default=1.0)
The sampling distance of the data.
- copybool, optional (default=True)
Whether to copy the input data.
Methods:
- fit(X, y=None)
Fits the transformer to the data X.
- transform(X, copy=None)
Applies the Savitzky-Golay filter to the data X.
- fit(X, y=None)[source]
Verify the X data compliance with Savitzky-Golay filter.
- Parameters:
X (array-like) – The data to transform.
y (None) – Ignored.
- Raises:
ValueError – If the input X is a sparse matrix.
- Returns:
The fitted object.
- Return type:
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') SavitzkyGolay
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.ScatterSimulationMSC(apply_on='samples', random_state=None, *, copy=True, reference_mode: str = 'self', a_range: Tuple[float, float] = (-0.1, 0.1), b_range: Tuple[float, float] = (0.9, 1.1))[source]
Bases:
AugmenterSimulates scatter variation: x_aug = a + b * x
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.SecondDerivative(delta: float = 1.0, edge_order: int = 2, *, copy: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorSecond numerical derivative using numpy.gradient.
- Parameters:
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') SecondDerivative
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.SignalTypeConverter(source_type: str | SignalType = 'reflectance', target_type: str | SignalType = 'absorbance', epsilon: float = 1e-10)[source]
Bases:
TransformerMixin,BaseEstimatorGeneral-purpose signal type converter.
Automatically determines the conversion path between source and target signal types and applies the appropriate transformation.
- Parameters:
source_type (str or SignalType) – Input signal type
target_type (str or SignalType) – Output signal type
epsilon (float, default=1e-10) – Small value to avoid numerical issues
Examples
>>> from nirs4all.operators.transforms.signal_conversion import SignalTypeConverter >>> converter = SignalTypeConverter( ... source_type="reflectance%", ... target_type="absorbance" ... ) >>> R_pct = np.array([[50, 40], [60, 50]]) >>> A = converter.fit_transform(R_pct)
- class nirs4all.operators.transforms.SimpleScale(copy=True)[source]
Bases:
TransformerMixin,BaseEstimator
- class nirs4all.operators.transforms.SmoothMagnitudeWarp(apply_on='samples', random_state=None, *, copy=True, n_control_points: int = 5, gain_range: Tuple[float, float] = (0.9, 1.1), lambda_axis: ndarray | None = None)[source]
Bases:
AugmenterMultiplies the spectrum by a smooth curve.
Optimized implementation with pre-computed control points.
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.SpikeNoise(apply_on='samples', random_state=None, *, copy=True, n_spikes_range: Tuple[int, int] = (1, 3), amplitude_range: Tuple[float, float] = (-0.5, 0.5))[source]
Bases:
AugmenterAdds spikes to the spectrum.
Optimized with pre-generated random parameters.
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.Spline_Curve_Simplification(apply_on='samples', random_state=None, *, copy=True, spline_points=None, uniform=False)[source]
Bases:
AugmenterClass to simplify a 1D signal using B-spline interpolation along the curve.
Optimized implementation with pre-allocated output arrays.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).
spline_points (int, optional) – Number of spline points for simplification. Default is None: the length of the sample / 4.
uniform (bool, optional) – If True, the spline points are uniformly spaced. Default is False.
- augment(X, apply_on='samples')[source]
Select regularly spaced points on the x-axis and adjust a spline.
Optimized with pre-allocated output array.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “features” (default: “samples”).
- Returns:
Augmented data.
- Return type:
ndarray
- class nirs4all.operators.transforms.Spline_Smoothing(apply_on='samples', random_state=None, *, copy=True)[source]
Bases:
AugmenterClass to apply a smoothing spline to a 1D signal.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).
- augment(X, apply_on='samples')[source]
Apply a smoothing spline to the data.
Optimized implementation with pre-allocated output array.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).
- Returns:
Augmented data.
- Return type:
ndarray
- class nirs4all.operators.transforms.Spline_X_Perturbations(apply_on='samples', random_state=None, *, copy=True, spline_degree=3, perturbation_density=0.05, perturbation_range=(-10, 10))[source]
Bases:
AugmenterClass to apply a perturbation to a 1D signal using B-spline interpolation.
Optimized implementation with pre-generated random parameters.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).
spline_degree (int, optional) – Degree of the spline. Default is 3 (cubic).
perturbation_density (float, optional) – Density of perturbation points relative to data size. Default is 0.05.
perturbation_range (tuple, optional) – Range of perturbation values (min, max). Default is (-10, 10).
- augment(X, apply_on='samples')[source]
Augment the data with a perturbation using B-spline interpolation.
Optimized with pre-allocated arrays and batch random generation.
- Parameters:
X (ndarray) – Input data to be augmented.
apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.
- Returns:
Augmented data.
- Return type:
ndarray
- class nirs4all.operators.transforms.Spline_X_Simplification(apply_on='samples', random_state=None, *, copy=True, spline_points=None, uniform=False)[source]
Bases:
AugmenterClass to simplify a 1D signal using B-spline interpolation along the x-axis.
Optimized implementation with pre-generated random parameters.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).
spline_points (int, optional) – Number of spline points for simplification. Default is None: the length of the sample / 4.
uniform (bool, optional) – If True, the spline points are uniformly spaced. Default is False.
- augment(X, apply_on='samples')[source]
Select randomly spaced points along the x-axis and adjust a spline.
Optimized with pre-allocated arrays and batch random generation.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).
- Returns:
Augmented data.
- Return type:
ndarray
- class nirs4all.operators.transforms.Spline_Y_Perturbations(apply_on='samples', random_state=None, *, copy=True, spline_points=None, perturbation_intensity=0.005)[source]
Bases:
AugmenterAugment the data with a perturbation on the y-axis using B-spline interpolation.
Optimized implementation with pre-generated random parameters.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).
spline_points (int, optional) – Number of spline points. Default is None (uses sample length / 2).
perturbation_intensity (float, optional) – Intensity of perturbation relative to max value. Default is 0.005.
- augment(X, apply_on='samples')[source]
Augment the data with a perturbation on the y-axis using B-spline interpolation.
Optimized with pre-allocated arrays and batch random generation.
- Parameters:
X (ndarray) – Input data to be augmented.
apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.
- Returns:
Augmented data.
- Return type:
ndarray
- class nirs4all.operators.transforms.StandardNormalVariate(axis=1, with_mean=True, with_std=True, ddof=0, copy=True)[source]
Bases:
TransformerMixin,BaseEstimatorStandard Normal Variate (SNV) transformation.
SNV is a row-wise normalization technique commonly used in spectroscopy to remove scatter effects. Each sample (row) is centered and scaled independently.
For each sample: SNV = (X - mean(X)) / std(X)
- Parameters:
axis (int, default=1) – Axis along which to compute mean and standard deviation. - axis=1: Row-wise (default, standard SNV behavior for spectroscopy) - axis=0: Column-wise (equivalent to StandardScaler)
with_mean (bool, default=True) – If True, center the data before scaling.
with_std (bool, default=True) – If True, scale the data to unit variance.
ddof (int, default=0) – Delta Degrees of Freedom for standard deviation calculation.
copy (bool, default=True) – If False, try to avoid a copy and do inplace scaling instead.
Examples
>>> from nirs4all.operators.transforms import StandardNormalVariate >>> import numpy as np >>> X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float) >>> snv = StandardNormalVariate() >>> X_transformed = snv.fit_transform(X)
- fit(X, y=None)[source]
Fit the StandardNormalVariate transformer.
For SNV, this is a no-op as the transformation is computed independently for each sample.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The training data.
y (None) – Ignored variable.
- Returns:
self – Returns the instance itself.
- Return type:
- fit_transform(X, y=None)[source]
Fit to data, then transform it.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input data.
y (None) – Ignored variable.
- Returns:
X_transformed – The transformed data.
- Return type:
ndarray of shape (n_samples, n_features)
- transform(X)[source]
Perform SNV transformation.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input data to be transformed.
- Returns:
X_transformed – The transformed data.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.transforms.ToAbsorbance(source_type: str | SignalType = 'reflectance', epsilon: float = 1e-10, clip_negative: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorConvert reflectance or transmittance to absorbance.
Applies the log transform: A = -log10(X)
For reflectance, this gives “pseudo-absorbance” which is widely used in NIR but is not identical to true absorbance in transmission.
- Parameters:
source_type (str or SignalType) – Input signal type. If “auto”, attempts to detect. Valid: “reflectance”, “reflectance%”, “transmittance”, “transmittance%”
epsilon (float, default=1e-10) – Small value to add to avoid log(0)
clip_negative (bool, default=True) – If True, clips negative values to epsilon before log transform
- source_type_
Detected or specified source signal type
- Type:
Examples
>>> from nirs4all.operators.transforms.signal_conversion import ToAbsorbance >>> transformer = ToAbsorbance(source_type="reflectance") >>> R = np.array([[0.5, 0.4, 0.3], [0.6, 0.5, 0.4]]) >>> A = transformer.fit_transform(R) >>> # A ≈ [[0.301, 0.398, 0.523], [0.222, 0.301, 0.398]]
- fit(X, y=None)[source]
Fit the transformer.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input spectral data
y (None) – Ignored
- Return type:
self
- inverse_transform(X, y=None)[source]
Convert absorbance back to reflectance/transmittance.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Absorbance values
- Returns:
X_original – Reflectance or transmittance values
- Return type:
ndarray of shape (n_samples, n_features)
- transform(X, y=None)[source]
Transform reflectance/transmittance to absorbance.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input spectral data
- Returns:
X_transformed – Absorbance values
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.transforms.UnsharpSpectralMask(apply_on='samples', random_state=None, *, copy=True, amount_range: Tuple[float, float] = (0.1, 0.5), sigma: float = 1.0, kernel_width: int = 11)[source]
Bases:
AugmenterApplies unsharp masking (sharpening). X_aug = X + k * (X - smooth(X))
Vectorized implementation using batch convolution.
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.WavelengthShift(apply_on='samples', random_state=None, *, copy=True, shift_range: Tuple[float, float] = (-2.0, 2.0), lambda_axis: ndarray | None = None)[source]
Bases:
AugmenterShifts the wavelength axis.
Vectorized implementation using batch interpolation.
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.WavelengthStretch(apply_on='samples', random_state=None, *, copy=True, stretch_range: Tuple[float, float] = (0.99, 1.01), lambda_axis: ndarray | None = None)[source]
Bases:
AugmenterStretches or compresses the wavelength axis.
Vectorized implementation using batch interpolation.
- augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- class nirs4all.operators.transforms.Wavelet(wavelet: str = 'haar', mode: str = 'periodization', *, copy: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorSingle level Discrete Wavelet Transform.
Performs a discrete wavelet transform on data, using a wavelet function.
- Parameters:
- fit(X, y=None)[source]
Verify the X data compliance with wavelet transform.
- Parameters:
X (array-like, spectra) – The data to transform.
y (None) – Ignored.
- Raises:
ValueError – If the input X is a sparse matrix.
- Returns:
The fitted object.
- Return type:
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') Wavelet
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.transforms.WaveletFeatures(wavelet: str = 'db4', max_level: int = 5, n_coeffs_per_level: int = 10, *, copy: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorDiscrete Wavelet Transform feature extractor for spectral data.
Decomposes spectra into approximation (smooth trends) and detail (sharp features) coefficients at multiple scales, then extracts statistical features from each level. This captures both global baseline variations and local absorption peaks.
- Scientific basis:
Multi-resolution analysis captures features at different scales
Daubechies wavelets (db4) are well-suited for smooth signals
Wavelet coefficients are partially decorrelated
- Parameters:
wavelet (str, default='db4') – Wavelet to use (e.g., ‘haar’, ‘db4’, ‘coif3’, ‘sym4’).
max_level (int, default=5) – Maximum decomposition level.
n_coeffs_per_level (int, default=10) – Number of top coefficients (by magnitude) to extract per level.
copy (bool, default=True) – Whether to copy input data.
- actual_level_
Actual decomposition level used (may be less than max_level depending on signal length).
- Type:
References
Mallat (1989). A theory for multiresolution signal decomposition: the wavelet representation. IEEE PAMI.
- fit(X, y=None)[source]
Fit the wavelet feature extractor.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
y (None) – Ignored.
- Returns:
self – Fitted transformer.
- Return type:
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') WaveletFeatures
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform(X, copy=None)[source]
Extract wavelet features from spectra.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input spectra.
copy (bool or None, optional) – Ignored (for API compatibility).
- Returns:
X_transformed – Wavelet features.
- Return type:
ndarray of shape (n_samples, n_features_out_)
- class nirs4all.operators.transforms.WaveletPCA(wavelet: str = 'db4', max_level: int = 4, n_components_per_level: int = 3, whiten: bool = True, *, copy: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorMulti-scale PCA on wavelet coefficients.
Applies PCA separately to each wavelet decomposition level, creating a compact multi-scale representation where each scale contributes a few principal components. This preserves frequency-specific information while reducing dimensionality.
- Scientific basis:
Combines multi-resolution analysis with decorrelation
Each scale captures different frequency information
PCA per scale reduces redundancy within each frequency band
Results in a compact, interpretable feature set
- Parameters:
wavelet (str, default='db4') – Wavelet to use (e.g., ‘haar’, ‘db4’, ‘coif3’, ‘sym4’).
max_level (int, default=4) – Maximum decomposition level.
n_components_per_level (int, default=3) – Number of PCA components to keep per decomposition level.
whiten (bool, default=True) – Whether to whiten the PCA components.
copy (bool, default=True) – Whether to copy input data.
References
Trygg & Wold (1998). PLS regression on wavelet compressed NIR spectra.
- fit(X, y=None)[source]
Fit the wavelet-PCA transformer.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
y (None) – Ignored.
- Returns:
self – Fitted transformer.
- Return type:
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') WaveletPCA
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform(X, copy=None)[source]
Transform spectra to wavelet-PCA features.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input spectra.
copy (bool or None, optional) – Ignored (for API compatibility).
- Returns:
X_transformed – Wavelet-PCA features.
- Return type:
ndarray of shape (n_samples, n_features_out_)
- class nirs4all.operators.transforms.WaveletSVD(wavelet: str = 'db4', max_level: int = 4, n_components_per_level: int = 3, *, copy: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorMulti-scale SVD on wavelet coefficients.
Applies Truncated SVD separately to each wavelet decomposition level, creating a compact multi-scale representation. Similar to WaveletPCA but uses SVD which doesn’t center data and works better for sparse data.
- Scientific basis:
Combines multi-resolution analysis with dimensionality reduction
Each scale captures different frequency information
SVD per scale reduces redundancy within each frequency band
Results in a compact feature set
- Parameters:
References
Trygg & Wold (1998). PLS regression on wavelet compressed NIR spectra.
- fit(X, y=None)[source]
Fit the wavelet-SVD transformer.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
y (None) – Ignored.
- Returns:
self – Fitted transformer.
- Return type:
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') WaveletSVD
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform(X, copy=None)[source]
Transform spectra to wavelet-SVD features.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input spectra.
copy (bool or None, optional) – Ignored (for API compatibility).
- Returns:
X_transformed – Wavelet-SVD features.
- Return type:
ndarray of shape (n_samples, n_features_out_)
- nirs4all.operators.transforms.asls_baseline(spectra: ndarray, lam: float = 1000000.0, p: float = 0.01, max_iter: int = 50, tol: float = 0.001) ndarray[source]
Compute baseline using Asymmetric Least Squares Smoothing.
This is a convenience wrapper around pybaseline_correction with method=’asls’.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix (n_samples, n_features).
lam (float) – Smoothness parameter (lambda). Default is 1e6.
p (float) – Asymmetry parameter (0 < p < 1). Default is 0.01.
max_iter (int) – Maximum number of iterations. Default is 50.
tol (float) – Convergence tolerance. Default is 1e-3.
- Returns:
Baseline-corrected spectra with same shape as input.
- Return type:
- nirs4all.operators.transforms.baseline(spectra)[source]
Removes baseline (mean) from each spectrum.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
- Returns:
Mean-centered NIRS data matrix.
- Return type:
- nirs4all.operators.transforms.derivate(spectra, order=1, delta=1)[source]
Computes Nth order derivatives with the desired spacing using numpy.gradient.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
order (float, optional) – Order of the derivation, by default 1.
delta (int, optional) – Delta of the derivative (in samples), by default 1.
- Returns:
spectra – Derived NIR spectra.
- Return type:
- nirs4all.operators.transforms.detrend(spectra, bp=0)[source]
Perform spectral detrending to remove linear trend from data.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
bp (list, optional) – A sequence of break points. If given, an individual linear fit is performed for each part of data between two break points. Break points are specified as indices into data. Default is 0.
- Returns:
Detrended NIR spectra.
- Return type:
- nirs4all.operators.transforms.first_derivative(spectra: ndarray, delta: float = 1.0, edge_order: int = 2) ndarray[source]
First numerical derivative along feature axis using central differences.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix (n_samples, n_features).
delta (float) – Sampling step along the feature axis.
edge_order (int) – 1 or 2, order of accuracy at the boundaries.
- Returns:
First derivative dX/dλ with same shape as input.
- Return type:
- nirs4all.operators.transforms.gaussian(spectra, order=2, sigma=1)[source]
Computes 1D gaussian filter using scipy.ndimage gaussian 1d filter.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
order (float, optional) – Order of the derivation.
sigma (int, optional) – Sigma of the gaussian.
- Returns:
Gaussian NIR spectra.
- Return type:
- nirs4all.operators.transforms.log_transform(spectra: ndarray, base: float = 2.718281828459045, offset: float = 0.0, auto_offset: bool = True, min_value: float = 1e-08) ndarray[source]
Apply elementwise logarithm with automatic handling of edge cases.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
base (float) – Logarithm base. Default is e.
offset (float) – Fixed value added before log to handle non-positives.
auto_offset (bool) – If True, automatically add offset for problematic values.
min_value (float) – Minimum value after offset when auto_offset=True.
- Returns:
Log-transformed spectra.
- Return type:
- nirs4all.operators.transforms.msc(spectra, scaled=True)[source]
Performs multiplicative scatter correction to the mean.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
scaled (bool) – Whether to scale the data. Defaults to True.
- Returns:
Scatter-corrected NIR spectra.
- Return type:
- nirs4all.operators.transforms.norml(spectra, feature_range=(-1, 1))[source]
Perform spectral normalization with user-defined limits.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
feature_range (tuple (min, max), default=(-1, 1)) – Desired range of transformed data. If range min and max equals -1, linalg normalization is applied; otherwise, user bounds-defined normalization is applied.
- Returns:
spectra – Normalized NIR spectra.
- Return type:
- nirs4all.operators.transforms.pybaseline_correction(spectra: ndarray, method: str = 'asls', **kwargs) ndarray[source]
Apply baseline correction using pybaselines library.
This is a general wrapper for all pybaselines methods, allowing flexible baseline correction with various algorithms.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix (n_samples, n_features).
method (str) –
Baseline correction method. Available methods: Whittaker: ‘asls’, ‘iasls’, ‘airpls’, ‘arpls’, ‘drpls’, ‘iarpls’,
’aspls’, ‘psalsa’, ‘derpsalsa’
Polynomial: ‘poly’, ‘modpoly’, ‘imodpoly’, ‘penalized_poly’, ‘loess’, ‘quant_reg’ Morphological: ‘mor’, ‘imor’, ‘mormol’, ‘amormol’, ‘rolling_ball’,
’mwmv’, ‘tophat’, ‘mpspline’, ‘jbcd’
Spline: ‘mixture_model’, ‘irsqr’, ‘corner_cutting’, ‘pspline_asls’, etc. Smooth: ‘noise_median’, ‘snip’, ‘swima’, ‘ipsa’ Classification: ‘dietrich’, ‘golotvin’, ‘std_distribution’, ‘fastchrom’, ‘cwt_br’ Optimizers: ‘collab_pls’, ‘optimize_extended_range’, ‘adaptive_minmax’ Misc: ‘interp_pts’, ‘beads’
**kwargs – Additional parameters passed to the specific baseline method.
- Returns:
Baseline-corrected spectra with same shape as input.
- Return type:
- Raises:
ImportError – If pybaselines is not installed.
ValueError – If an unknown method is specified.
Examples
>>> from nirs4all.operators.transforms.nirs import pybaseline_correction >>> corrected = pybaseline_correction(spectra, method='airpls', lam=1e5) >>> corrected = pybaseline_correction(spectra, method='imodpoly', poly_order=3) >>> corrected = pybaseline_correction(spectra, method='snip', max_half_window=30)
- nirs4all.operators.transforms.reflectance_to_absorbance(spectra: ndarray, min_value: float = 1e-08) ndarray[source]
Convert reflectance spectra to absorbance.
Applies the Beer-Lambert law: A = -log10(R) = log10(1/R) where R is reflectance and A is absorbance.
- Parameters:
spectra (numpy.ndarray) – Reflectance NIRS data matrix (n_samples, n_features). Values should be in range (0, 1] or as percentages (0, 100].
min_value (float) – Minimum value to clamp reflectance to avoid log(0). Default is 1e-8.
- Returns:
Absorbance spectra with same shape as input.
- Return type:
- nirs4all.operators.transforms.savgol(spectra: ndarray, window_length: int = 11, polyorder: int = 3, deriv: int = 0, delta: float = 1.0) ndarray[source]
Perform Savitzky–Golay filtering on the data (also calculates derivatives). This function is a wrapper for scipy.signal.savgol_filter.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
window_length (int) – Size of the filter window in samples (default 11).
polyorder (int) – Order of the polynomial estimation (default 3).
deriv (int) – Order of the derivation (default 0).
delta (float) – Sampling distance of the data.
- Returns:
NIRS data smoothed with Savitzky-Golay filtering.
- Return type:
- nirs4all.operators.transforms.second_derivative(spectra: ndarray, delta: float = 1.0, edge_order: int = 2) ndarray[source]
Second numerical derivative along feature axis.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix (n_samples, n_features).
delta (float) – Sampling step along the feature axis.
edge_order (int) – 1 or 2, order of accuracy at the boundaries.
- Returns:
Second derivative d²X/dλ² with same shape as input.
- Return type:
- nirs4all.operators.transforms.spl_norml(spectra)[source]
Perform simple spectral normalization.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
- Returns:
spectra – Normalized NIR spectra.
- Return type:
- nirs4all.operators.transforms.wavelet_transform(spectra: ndarray, wavelet: str, mode: str = 'periodization') ndarray[source]
Computes transform using pywavelet transform.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
wavelet (str) – wavelet family transformation.
mode (str) – signal extension mode.
- Returns:
wavelet and resampled spectra.
- Return type: