nirs4all.operators package
Subpackages
- nirs4all.operators.augmentation package
- Submodules
- nirs4all.operators.augmentation.abc_augmenter module
- nirs4all.operators.augmentation.edge_artifacts module
- nirs4all.operators.augmentation.environmental module
- nirs4all.operators.augmentation.random module
- nirs4all.operators.augmentation.scattering module
- nirs4all.operators.augmentation.spectral module
BandMaskingBandPerturbationChannelDropoutGaussianAdditiveNoiseGaussianSmoothingJitterLinearBaselineDriftLocalClippingLocalMixupAugmenterLocalWavelengthWarpMixupAugmenterMultiplicativeNoisePolynomialBaselineDriftScatterSimulationMSCSmoothMagnitudeWarpSpikeNoiseUnsharpSpectralMaskWavelengthShiftWavelengthStretch
- nirs4all.operators.augmentation.splines module
- Module contents
AugmenterDetectorRollOffAugmenterEMSCDistortionAugmenterEdgeArtifactsAugmenterEdgeCurvatureAugmenterIdentityAugmenterMoistureAugmenterParticleSizeAugmenterRandom_X_OperationRotate_TranslateSpline_Curve_SimplificationSpline_SmoothingSpline_X_PerturbationsSpline_X_SimplificationSpline_Y_PerturbationsStrayLightAugmenterTemperatureAugmenterTruncatedPeakAugmenter
- Submodules
- nirs4all.operators.base package
- nirs4all.operators.data package
- Submodules
- Module contents
AggregationStrategyBranchPredictionConfigBranchPredictionConfig.branchBranchPredictionConfig.selectBranchPredictionConfig.metricBranchPredictionConfig.aggregateBranchPredictionConfig.weight_metricBranchPredictionConfig.probaBranchPredictionConfig.sourcesBranchPredictionConfig.__post_init__()BranchPredictionConfig.aggregateBranchPredictionConfig.branchBranchPredictionConfig.get_aggregation_strategy()BranchPredictionConfig.get_selection_strategy()BranchPredictionConfig.metricBranchPredictionConfig.probaBranchPredictionConfig.selectBranchPredictionConfig.sourcesBranchPredictionConfig.weight_metric
BranchTypeDisjointSelectionCriterionDisjointSelectionCriterion.MSEDisjointSelectionCriterion.RMSEDisjointSelectionCriterion.MAEDisjointSelectionCriterion.R2DisjointSelectionCriterion.ORDERDisjointSelectionCriterion.MAEDisjointSelectionCriterion.MSEDisjointSelectionCriterion.ORDERDisjointSelectionCriterion.R2DisjointSelectionCriterion.RMSE
MergeConfigMergeConfig.collect_featuresMergeConfig.feature_branchesMergeConfig.collect_predictionsMergeConfig.prediction_branchesMergeConfig.prediction_configsMergeConfig.model_filterMergeConfig.use_probaMergeConfig.include_originalMergeConfig.on_missingMergeConfig.on_shape_mismatchMergeConfig.unsafeMergeConfig.output_asMergeConfig.source_namesMergeConfig.__post_init__()MergeConfig.collect_featuresMergeConfig.collect_predictionsMergeConfig.feature_branchesMergeConfig.from_dict()MergeConfig.get_feature_branches()MergeConfig.get_merge_mode()MergeConfig.get_prediction_configs()MergeConfig.get_selection_criterion()MergeConfig.get_shape_mismatch_strategy()MergeConfig.has_per_branch_config()MergeConfig.include_originalMergeConfig.model_filterMergeConfig.n_columnsMergeConfig.on_missingMergeConfig.on_shape_mismatchMergeConfig.output_asMergeConfig.prediction_branchesMergeConfig.prediction_configsMergeConfig.select_byMergeConfig.source_namesMergeConfig.to_dict()MergeConfig.unsafeMergeConfig.use_proba
MergeModeRepetitionConfigRepetitionConfig.columnRepetitionConfig.on_unequalRepetitionConfig.expected_repsRepetitionConfig.source_namesRepetitionConfig.pp_namesRepetitionConfig.preserve_orderRepetitionConfig.aggregate_metadataRepetitionConfig.__post_init__()RepetitionConfig.aggregate_metadataRepetitionConfig.columnRepetitionConfig.expected_repsRepetitionConfig.from_dict()RepetitionConfig.from_step_value()RepetitionConfig.get_pp_name()RepetitionConfig.get_source_name()RepetitionConfig.get_unequal_strategy()RepetitionConfig.is_y_groupingRepetitionConfig.on_unequalRepetitionConfig.pp_namesRepetitionConfig.preserve_orderRepetitionConfig.resolve_column()RepetitionConfig.source_namesRepetitionConfig.to_dict()RepetitionConfig.uses_dataset_aggregate
SelectionStrategyShapeMismatchStrategySourceIncompatibleStrategySourceMergeConfigSourceMergeConfig.strategySourceMergeConfig.sourcesSourceMergeConfig.on_incompatibleSourceMergeConfig.output_nameSourceMergeConfig.preserve_source_infoSourceMergeConfig.__post_init__()SourceMergeConfig.from_dict()SourceMergeConfig.get_incompatible_strategy()SourceMergeConfig.get_source_indices()SourceMergeConfig.get_strategy()SourceMergeConfig.on_incompatibleSourceMergeConfig.output_nameSourceMergeConfig.preserve_source_infoSourceMergeConfig.sourcesSourceMergeConfig.strategySourceMergeConfig.to_dict()
SourceMergeStrategyUnequelRepsStrategy
- nirs4all.operators.filters package
- Submodules
- nirs4all.operators.filters.base module
- nirs4all.operators.filters.high_leverage module
- nirs4all.operators.filters.metadata module
- nirs4all.operators.filters.report module
- nirs4all.operators.filters.spectral_quality module
- nirs4all.operators.filters.x_outlier module
- nirs4all.operators.filters.y_outlier module
- Module contents
CompositeFilterFilterResultFilterResult.filter_nameFilterResult.reasonFilterResult.n_samplesFilterResult.n_excludedFilterResult.n_keptFilterResult.exclusion_rateFilterResult.excluded_indicesFilterResult.statsFilterResult.excluded_indicesFilterResult.exclusion_rateFilterResult.filter_nameFilterResult.n_excludedFilterResult.n_keptFilterResult.n_samplesFilterResult.reasonFilterResult.statsFilterResult.to_dict()
FilteringReportFilteringReport.dataset_nameFilteringReport.partitionFilteringReport.timestampFilteringReport.filter_resultsFilteringReport.combined_modeFilteringReport.n_total_samplesFilteringReport.n_final_excludedFilteringReport.n_final_keptFilteringReport.cascade_to_augmentedFilteringReport.n_augmented_excludedFilteringReport.add_filter_result()FilteringReport.cascade_to_augmentedFilteringReport.combined_modeFilteringReport.dataset_nameFilteringReport.filter_resultsFilteringReport.final_exclusion_rateFilteringReport.n_augmented_excludedFilteringReport.n_final_excludedFilteringReport.n_final_keptFilteringReport.n_total_samplesFilteringReport.partitionFilteringReport.print_report()FilteringReport.summary()FilteringReport.timestampFilteringReport.to_dict()FilteringReport.to_json()
FilteringReportGeneratorHighLeverageFilterMetadataFilterSampleFilterSpectralQualityFilterSpectralQualityFilter.max_nan_ratioSpectralQualityFilter.max_zero_ratioSpectralQualityFilter.min_varianceSpectralQualityFilter.max_valueSpectralQualityFilter.min_valueSpectralQualityFilter.__repr__()SpectralQualityFilter.exclusion_reasonSpectralQualityFilter.fit()SpectralQualityFilter.get_filter_stats()SpectralQualityFilter.get_mask()SpectralQualityFilter.get_quality_breakdown()
XOutlierFilterYOutlierFilter
- Submodules
- nirs4all.operators.models package
- Subpackages
- Submodules
- Module contents
AllPreviousModelsSelectorBaseModelOperatorBranchScopeCoverageStrategyDiPLSDiversitySelectorExplicitModelSelectorFCKPLSFractionalConvFeaturizerFractionalPLSIKPLSIdentityFeaturizerIntervalPLSKOPLSKPLSKernelPLSLWPLSMBPLSMetaModelModelCandidateModelCandidate.model_nameModelCandidate.model_classnameModelCandidate.step_idxModelCandidate.fold_idModelCandidate.branch_idModelCandidate.branch_nameModelCandidate.val_scoreModelCandidate.metricModelCandidate.predictionsModelCandidate.branch_idModelCandidate.branch_nameModelCandidate.fold_idModelCandidate.metricModelCandidate.model_classnameModelCandidate.model_nameModelCandidate.predictionsModelCandidate.step_idxModelCandidate.val_score
NLPLSOKLMPLSOPLSOPLSDAPLSDAPolynomialFeaturizerRBFFeaturizerRecursivePLSRecursivePLS.n_features_in_RecursivePLS.n_components_RecursivePLS.n_samples_seen_RecursivePLS.x_mean_RecursivePLS.x_std_RecursivePLS.y_mean_RecursivePLS.y_std_RecursivePLS.x_weights_RecursivePLS.x_loadings_RecursivePLS.y_loadings_RecursivePLS.coef_RecursivePLS.__repr__()RecursivePLS.fit()RecursivePLS.get_params()RecursivePLS.partial_fit()RecursivePLS.predict()RecursivePLS.set_params()RecursivePLS.set_score_request()RecursivePLS.transform()
RobustPLSRobustPLS.n_features_in_RobustPLS.n_components_RobustPLS.x_mean_RobustPLS.x_std_RobustPLS.y_mean_RobustPLS.y_std_RobustPLS.x_scores_RobustPLS.y_scores_RobustPLS.x_weights_RobustPLS.x_loadings_RobustPLS.y_loadings_RobustPLS.coef_RobustPLS.sample_weights_RobustPLS.__repr__()RobustPLS.fit()RobustPLS.get_outlier_mask()RobustPLS.get_params()RobustPLS.predict()RobustPLS.set_params()RobustPLS.set_predict_request()RobustPLS.set_score_request()RobustPLS.transform()
SIMPLSSIMPLS.n_features_in_SIMPLS.n_components_SIMPLS.x_mean_SIMPLS.x_std_SIMPLS.y_mean_SIMPLS.y_std_SIMPLS.x_scores_SIMPLS.y_scores_SIMPLS.x_weights_SIMPLS.x_loadings_SIMPLS.y_loadings_SIMPLS.coef_SIMPLS.__repr__()SIMPLS.fit()SIMPLS.get_params()SIMPLS.predict()SIMPLS.set_params()SIMPLS.set_predict_request()SIMPLS.set_score_request()SIMPLS.transform()
SelectorFactorySourceModelSelectorSparsePLSStackingConfigStackingConfig.coverage_strategyStackingConfig.test_aggregationStackingConfig.branch_scopeStackingConfig.allow_no_cvStackingConfig.min_coverage_ratioStackingConfig.levelStackingConfig.allow_meta_sourcesStackingConfig.max_levelStackingConfig.__post_init__()StackingConfig.allow_meta_sourcesStackingConfig.allow_no_cvStackingConfig.branch_scopeStackingConfig.coverage_strategyStackingConfig.levelStackingConfig.max_levelStackingConfig.min_coverage_ratioStackingConfig.test_aggregation
StackingLevelTestAggregationTopKByMetricSelector
- nirs4all.operators.splitters package
- nirs4all.operators.transforms package
- Submodules
- nirs4all.operators.transforms.feature_selection module
- nirs4all.operators.transforms.features module
- nirs4all.operators.transforms.nirs module
ASLSBaselineAirPLSArPLSAreaNormalizationBEADSExtendedMultiplicativeScatterCorrectionFirstDerivativeHaarIASLSIModPolyLogTransformModPolyMultiplicativeScatterCorrectionPyBaselineCorrectionReflectanceToAbsorbanceRollingBallSNIPSavitzkyGolaySecondDerivativeWaveletWaveletFeaturesWaveletPCAWaveletSVDasls_baseline()first_derivative()log_transform()msc()pybaseline_correction()reflectance_to_absorbance()savgol()second_derivative()wavelet_transform()
- nirs4all.operators.transforms.presets module
- nirs4all.operators.transforms.resampler module
- nirs4all.operators.transforms.scalers module
- nirs4all.operators.transforms.signal module
- nirs4all.operators.transforms.signal_conversion module
- nirs4all.operators.transforms.targets module
- Module contents
ASLSBaselineAirPLSArPLSAugmenterBEADSBandMaskingBandPerturbationBaselineCARSChannelDropoutCropTransformerDerivateDetrendFirstDerivativeFlattenPreprocessingFractionToPercentFromAbsorbanceGaussianGaussianAdditiveNoiseGaussianSmoothingJitterHaarIASLSIModPolyIdentityAugmenterIdentityTransformerIntegerKBinsDiscretizerKubelkaMunkLinearBaselineDriftLocalClippingLocalMixupAugmenterLocalStandardNormalVariateLocalWavelengthWarpLogTransformMCUVEMCUVE.selected_indices_MCUVE.selection_mask_MCUVE.n_features_in_MCUVE.n_features_out_MCUVE.stability_MCUVE.noise_stability_MCUVE.threshold_MCUVE.mean_coefs_MCUVE.std_coefs_MCUVE.__repr__()MCUVE.fit()MCUVE.get_feature_names_out()MCUVE.get_support()MCUVE.set_fit_request()MCUVE.transform()
MixupAugmenterModPolyMultiplicativeNoiseMultiplicativeScatterCorrectionNormalizePercentToFractionPolynomialBaselineDriftPyBaselineCorrectionRandom_X_OperationRangeDiscretizerReflectanceToAbsorbanceResampleTransformerResamplerRobustStandardNormalVariateRollingBallRotate_TranslateSNIPSavitzkyGolayScatterSimulationMSCSecondDerivativeSignalTypeConverterSimpleScaleSmoothMagnitudeWarpSpikeNoiseSpline_Curve_SimplificationSpline_SmoothingSpline_X_PerturbationsSpline_X_SimplificationSpline_Y_PerturbationsStandardNormalVariateToAbsorbanceUnsharpSpectralMaskWavelengthShiftWavelengthStretchWaveletWaveletFeaturesWaveletPCAWaveletSVDasls_baseline()baseline()decon_set()derivate()detrend()dumb_and_dumber_set()dumb_set()dumb_set_2D()fat_set()first_derivative()gaussian()haar_only()id_preprocessing()list_of_2D_sets()log_transform()msc()nicon_set()norml()optimal_set_2D()preprocessing_list()pybaseline_correction()reflectance_to_absorbance()savgol()savgol_only()second_derivative()senseen_set()small_set()special_set()spl_norml()transf_set()wavelet_transform()
- Submodules
Module contents
- class nirs4all.operators.Augmenter(apply_on='samples', random_state=None, *, copy=True)[source]
Bases:
TransformerMixin,BaseEstimatorBase class for data augmentation transformers.
- abstractmethod augment(X, apply_on='samples')[source]
Perform data augmentation.
- Parameters:
X (array-like) – Input data to augment.
apply_on (str) – The level at which augmentation is applied. Can be one of ‘samples’, ‘features’, ‘subsets’, or ‘global’. Defaults to ‘samples’.
- Returns:
Augmented data.
- Return type:
array-like
- fit(X, y=None)[source]
Fit to data.
- Parameters:
X (array-like) – Input data to fit.
y (array-like or None) – Target variable (unused).
- Returns:
self – Returns the instance itself.
- Return type:
- fit_transform(X, y=None, **fit_params)[source]
Fit to data and transform it.
- Parameters:
X (array-like) – Input data to fit and transform.
y (array-like or None) – Target variable (unused).
**fit_params (dict) – Additional fitting parameters (unused).
- Returns:
Transformed data.
- Return type:
array-like
- class nirs4all.operators.Baseline(*, copy=True)[source]
Bases:
TransformerMixin,BaseEstimatorRemoves baseline (mean) from each spectrum.
- Parameters:
copy (bool, optional) – Flag to indicate whether to make a copy of the object, by default True.
- fit(X, y=None)[source]
Compute the minimum and maximum to be used for later scaling.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The data used to compute the per-feature minimum and maximum used for later scaling along the features axis.
y (None) – Ignored.
- Returns:
self – Fitted Baseline object.
- Return type:
- class nirs4all.operators.CropTransformer(start: int = 0, end: int = None)[source]
Bases:
BaseEstimator,TransformerMixin
- class nirs4all.operators.Derivate(order=1, delta=1, copy=True)[source]
Bases:
TransformerMixin,BaseEstimator- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') Derivate
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.DetectorRollOffAugmenter(detector_model: str = 'generic_nir', effect_strength: float = 1.0, noise_amplification: float = 0.02, include_baseline_distortion: bool = True, random_state: int | None = None)[source]
Bases:
SpectraTransformerMixinSimulate detector sensitivity roll-off at spectral edges.
NIR detectors have wavelength-dependent sensitivity curves that typically roll off at the edges of their spectral range. This causes: - Increased noise at edge wavelengths (lower SNR) - Apparent baseline curvature near spectral boundaries - Reduced peak heights at the edges
The effect is modeled as an exponential decay of detector sensitivity outside the optimal wavelength range, which manifests as multiplicative noise amplification and slight baseline distortion.
- Parameters:
detector_model (str, default="generic_nir") – Detector type to simulate. Available models: - “ingaas_standard”: Standard InGaAs (1000-1600 nm optimal) - “ingaas_extended”: Extended InGaAs (1100-2200 nm optimal) - “pbs”: Lead sulfide (1000-2800 nm optimal) - “silicon_ccd”: Silicon CCD (400-900 nm optimal) - “generic_nir”: Generic NIR detector
effect_strength (float, default=1.0) – Scaling factor for the roll-off effect (0-2).
noise_amplification (float, default=0.02) – Additional noise added at low-sensitivity wavelengths.
include_baseline_distortion (bool, default=True) – Whether to include slight baseline distortion at edges.
random_state (int, optional) – Random seed for reproducibility.
Examples
>>> from nirs4all.operators.augmentation import DetectorRollOffAugmenter >>> aug = DetectorRollOffAugmenter(detector_model="ingaas_standard") >>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Stronger effect for portable spectrometers >>> aug = DetectorRollOffAugmenter(effect_strength=1.5) >>> pipeline = [aug, SNV(), PLSRegression(10)]
References
JASCO (2020). Advantages of high-sensitivity InGaAs detector.
LaserComponents InGaAs Photodiodes specifications.
- set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') DetectorRollOffAugmenter
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]
Apply detector roll-off effects to spectra.
- Parameters:
X (ndarray of shape (n_samples, n_features)) – Input spectra.
wavelengths (ndarray of shape (n_features,)) – Wavelength array in nm.
- Returns:
X_transformed – Spectra with detector roll-off effects applied.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.Detrend(bp=0, *, copy=True)[source]
Bases:
TransformerMixin,BaseEstimatorPerform spectral detrending to remove linear trend from data.
- Parameters:
- fit(X, y=None)[source]
Fit the transformer to the data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input data.
y (None) – Ignored.
- Returns:
self – Returns self.
- Return type:
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') Detrend
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform(X, copy=None)[source]
Transform the data by removing linear trend.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input data.
copy (bool or None, optional) – Whether to make a copy of the input data. If None, self.copy is used. Default is None.
- Returns:
The transformed data.
- Return type:
- class nirs4all.operators.EMSCDistortionAugmenter(multiplicative_range: Tuple[float, float] = (0.9, 1.1), additive_range: Tuple[float, float] = (-0.05, 0.05), polynomial_order: int = 2, polynomial_strength: float = 0.02, correlation: float = 0.3, random_state: int | None = None)[source]
Bases:
SpectraTransformerMixinApply EMSC-style scatter distortions for data augmentation.
Simulates the spectral distortions that Extended Multiplicative Scatter Correction (EMSC) is designed to correct:
x_distorted = a + b*x + c1*λ + c2*λ² + c3*λ³ + …
where: - a is additive offset - b is multiplicative gain - c1, c2, … are polynomial scattering coefficients
- Parameters:
multiplicative_range (tuple of (float, float), default=(0.9, 1.1)) – Range for multiplicative gain factor (b term).
additive_range (tuple of (float, float), default=(-0.05, 0.05)) – Range for additive offset (a term).
polynomial_order (int, default=2) – Order of wavelength polynomial (0 = no polynomial term).
polynomial_strength (float, default=0.02) – Base strength of polynomial scattering terms.
correlation (float, default=0.3) – Correlation between multiplicative and additive terms. Higher values create more realistic scatter patterns.
random_state (int, optional) – Random seed for reproducibility.
Examples
>>> from nirs4all.operators.augmentation import EMSCDistortionAugmenter >>> aug = EMSCDistortionAugmenter(multiplicative_range=(0.85, 1.15)) >>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Use in pipeline for data augmentation >>> aug = EMSCDistortionAugmenter(polynomial_order=3) >>> pipeline = [aug, SNV(), PLSRegression(10)]
Notes
This augmenter is particularly useful when: - Training models that need to be robust to scatter variations - Simulating data from different instruments or sample presentation - Creating training data for transfer learning
References
Martens et al. (2003). Light scattering and light absorbance separated by extended multiplicative signal correction. Analytical Chemistry.
- set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') EMSCDistortionAugmenter
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]
Apply EMSC-style distortions to spectra.
- Parameters:
X (ndarray of shape (n_samples, n_features)) – Input spectra.
wavelengths (ndarray of shape (n_features,)) – Wavelength array in nm.
- Returns:
X_transformed – Spectra with EMSC-style distortions applied.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.EdgeArtifactsAugmenter(detector_roll_off: bool = True, stray_light: bool = True, edge_curvature: bool = True, truncated_peaks: bool = True, overall_strength: float = 1.0, detector_model: str = 'generic_nir', random_state: int | None = None)[source]
Bases:
SpectraTransformerMixinCombined augmenter for edge-related spectral artifacts.
This is a convenience class that combines multiple edge artifact effects: - Detector roll-off - Stray light - Edge curvature - Truncated peaks
Each effect can be individually enabled/disabled.
- Parameters:
detector_roll_off (bool, default=True) – Enable detector sensitivity roll-off effect.
stray_light (bool, default=True) – Enable stray light effect.
edge_curvature (bool, default=True) – Enable edge curvature/bending effect.
truncated_peaks (bool, default=True) – Enable truncated peak effect at boundaries.
overall_strength (float, default=1.0) – Scaling factor for all effects (0-2).
detector_model (str, default="generic_nir") – Detector model for roll-off simulation.
random_state (int, optional) – Random seed for reproducibility.
Examples
>>> from nirs4all.operators.augmentation import EdgeArtifactsAugmenter >>> aug = EdgeArtifactsAugmenter(overall_strength=0.8) >>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Only detector and stray light effects >>> aug = EdgeArtifactsAugmenter( ... detector_roll_off=True, ... stray_light=True, ... edge_curvature=False, ... truncated_peaks=False ... ) >>> pipeline = [aug, SNV(), PLSRegression(10)]
- set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') EdgeArtifactsAugmenter
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]
Apply all enabled edge artifact effects to spectra.
- Parameters:
X (ndarray of shape (n_samples, n_features)) – Input spectra.
wavelengths (ndarray of shape (n_features,)) – Wavelength array in nm.
- Returns:
X_transformed – Spectra with edge artifacts applied.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.EdgeCurvatureAugmenter(curvature_strength: float = 0.02, curvature_type: str = 'random', asymmetry: float = 0.0, edge_focus: float = 0.7, random_state: int | None = None)[source]
Bases:
SpectraTransformerMixinSimulate edge curvature and baseline bending at spectral boundaries.
Edge curvature can arise from various sources: - Optical aberrations in the spectrometer - Wavelength-dependent baseline drift - Polynomial baseline correction artifacts - Sample holder effects
This operator adds smooth curvature that increases towards the spectral edges, mimicking the characteristic “smile” or “frown” patterns often seen in real spectra.
- Parameters:
curvature_strength (float, default=0.02) – Maximum curvature amplitude (in absorbance units).
curvature_type (str, default="random") – Type of curvature pattern: - “random”: Randomly choose smile/frown/asymmetric - “smile”: Upward curvature at edges (convex) - “frown”: Downward curvature at edges (concave) - “asymmetric”: Different curvature at each edge
asymmetry (float, default=0.0) – For “asymmetric” type, ratio of left/right curvature (-1 to 1). Positive values emphasize left edge, negative emphasize right.
edge_focus (float, default=0.7) – How concentrated the curvature is at edges (0-1). Higher values create sharper edge effects.
random_state (int, optional) – Random seed for reproducibility.
Examples
>>> from nirs4all.operators.augmentation import EdgeCurvatureAugmenter >>> aug = EdgeCurvatureAugmenter(curvature_strength=0.03) >>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Simulate baseline correction artifacts >>> aug = EdgeCurvatureAugmenter( ... curvature_type="asymmetric", ... asymmetry=0.5, ... edge_focus=0.8 ... ) >>> pipeline = [aug, Detrend(), PLSRegression(10)]
References
Cao, A., et al. (2007). A robust method for automated background subtraction of tissue fluorescence. Journal of Raman Spectroscopy.
NIRPY Research (2019). Two methods for baseline correction of spectral data.
- set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') EdgeCurvatureAugmenter
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]
Apply edge curvature effects to spectra.
- Parameters:
X (ndarray of shape (n_samples, n_features)) – Input spectra.
wavelengths (ndarray of shape (n_features,)) – Wavelength array in nm.
- Returns:
X_transformed – Spectra with edge curvature applied.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.Gaussian(order=2, sigma=1, *, copy=True)[source]
Bases:
TransformerMixin,BaseEstimator- fit(X, y=None)[source]
Fit the Gaussian filter.
- Parameters:
X (numpy.ndarray) – Input data.
y (None) – Ignored.
- Returns:
self – Returns the instance itself.
- Return type:
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') Gaussian
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform(X, copy=None)[source]
Transform the input data using the Gaussian filter.
- Parameters:
X (numpy.ndarray) – Input data.
copy (bool, default=None) – Whether to make a copy of the input data.
- Returns:
Transformed data.
- Return type:
- class nirs4all.operators.Haar(*, copy: bool = True)[source]
Bases:
WaveletShortcut to the Wavelet haar transform.
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') Haar
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.IdentityAugmenter(apply_on='samples', random_state=None, *, copy=True)[source]
Bases:
AugmenterAn augmenter that returns the input data without any changes.
- nirs4all.operators.IdentityTransformer
alias of
FunctionTransformer
- class nirs4all.operators.LocalStandardNormalVariate(window=11, pad_mode='reflect', constant_values=0.0, copy=True)[source]
Bases:
TransformerMixin,BaseEstimatorLocal Standard Normal Variate (LSNV).
Per-sample local normalization with a sliding window along features. For each sample and feature j:
mean_w = mean(X[…, j-w//2 : j+w//2+1]) std_w = std (X[…, j-w//2 : j+w//2+1]) X’[j] = (X[j] - mean_w) / std_w
- Parameters:
Notes
Operates row-wise (axis=1). Input must be (n_samples, n_features).
std_w==0 → divide by 1 to avoid NaN.
- fit_transform(X, y=None)[source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters. Pass only if the estimator accepts additional params in its fit method.
- Returns:
X_new – Transformed array.
- Return type:
- class nirs4all.operators.MoistureAugmenter(water_activity_delta: float = 0.1, water_activity_range: Tuple[float, float] | None = None, reference_water_activity: float = 0.5, free_water_fraction: float = 0.3, bound_water_shift: float = 25.0, moisture_content: float = 0.1, enable_shift: bool = True, enable_intensity: bool = True, random_state: int | None = None)[source]
Bases:
SpectraTransformerMixinSimulate moisture-induced spectral changes for data augmentation.
Water activity and moisture content affect NIR spectra through shifts in water bands between free and bound states. Higher water activity leads to more free water, while lower water activity means more water is hydrogen-bonded to the sample matrix.
- Parameters:
water_activity_delta (float, default=0.1) – Change in water activity from reference (0-1 scale).
water_activity_range (tuple of (float, float), optional) – If provided, randomly sample water_activity_delta from this range for each sample.
reference_water_activity (float, default=0.5) – Reference water activity for the input spectra.
free_water_fraction (float, default=0.3) – Base fraction of water that is “free” vs. bound (0-1).
bound_water_shift (float, default=25.0) – Wavelength shift (nm) for bound water relative to free water.
moisture_content (float, default=0.10) – Base moisture content as fraction (affects intensity).
enable_shift (bool, default=True) – Apply water band position shifts.
enable_intensity (bool, default=True) – Apply water band intensity changes based on moisture content.
random_state (int, optional) – Random seed for reproducibility.
Examples
>>> from nirs4all.operators.augmentation import MoistureAugmenter >>> aug = MoistureAugmenter(water_activity_delta=0.2) >>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Random moisture variation in pipeline >>> aug = MoistureAugmenter(water_activity_range=(-0.2, 0.2)) >>> pipeline = [aug, PLSRegression(10)]
References
Büning-Pfaue, H. (2003). Analysis of water in food by near infrared spectroscopy. Food Chemistry, 82(1), 107-115.
Luck, W. A. P. (1998). The importance of cooperativity for the properties of liquid water. Journal of Molecular Structure.
- BOUND_WATER_PEAK_1ST = 1460
- BOUND_WATER_PEAK_COMB = 1940
- FREE_WATER_PEAK_1ST = 1410
- FREE_WATER_PEAK_COMB = 1920
- set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') MoistureAugmenter
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]
Apply moisture effects to spectra.
- Parameters:
X (ndarray of shape (n_samples, n_features)) – Input spectra.
wavelengths (ndarray of shape (n_features,)) – Wavelength array in nm.
- Returns:
X_transformed – Spectra with moisture effects applied.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.MultiplicativeScatterCorrection(scale=True, *, copy=True)[source]
Bases:
TransformerMixin,BaseEstimator
- class nirs4all.operators.Normalize(feature_range=(-1, 1), *, copy=True)[source]
Bases:
TransformerMixin,BaseEstimatorNormalize spectrum using either custom range of linalg normalization
- Parameters:
feature_range (tuple (min, max), default=(-1, -1)) – Desired range of transformed data. If range min and max equals -1, linalg normalization is applied, otherwise user defined normalization is applied
copy (bool, default=True) – Set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array).
- fit(X, y=None)[source]
Fit the Normalize transformer on the training data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The training data.
y (None) – Ignored variable.
- Returns:
self – Returns the instance itself.
- Return type:
- inverse_transform(X)[source]
Transform the normalized data back to the original representation.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The normalized data to be transformed back.
- Returns:
X – The inverse transformed data.
- Return type:
ndarray of shape (n_samples, n_features)
- partial_fit(X, y=None)[source]
Perform incremental fit on the training data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The training data.
y (None) – Ignored variable.
- Returns:
self – Returns the instance itself.
- Return type:
- transform(X)[source]
Transform the input data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input data to be transformed.
- Returns:
X – The transformed data.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.ParticleSizeAugmenter(mean_size_um: float = 50.0, size_variation_um: float = 15.0, size_range_um: Tuple[float, float] | None = None, reference_size_um: float = 50.0, wavelength_exponent: float = 1.5, size_effect_strength: float = 0.1, include_path_length: bool = True, path_length_sensitivity: float = 0.5, random_state: int | None = None)[source]
Bases:
SpectraTransformerMixinSimulate particle size effects on scattering for data augmentation.
Particle size affects NIR spectra through wavelength-dependent baseline scattering, typically following a λ^(-n) relationship where n depends on the particle size regime (Rayleigh vs Mie).
Smaller particles cause: - Increased scattering baseline (especially at shorter wavelengths) - Reduced effective optical path length - Additional sample-to-sample variation
- Parameters:
mean_size_um (float, default=50.0) – Mean particle size in micrometers.
size_variation_um (float, default=15.0) – Standard deviation of particle size.
size_range_um (tuple of (float, float), optional) – If provided, randomly sample particle sizes from this range. Overrides mean_size_um and size_variation_um.
reference_size_um (float, default=50.0) – Reference particle size for baseline calculations.
wavelength_exponent (float, default=1.5) – Exponent for wavelength dependence (higher = finer particles). - 4.0 = Rayleigh regime (particles << wavelength) - 1.0-2.0 = Typical for NIR powder samples - 0.0 = No wavelength dependence
size_effect_strength (float, default=0.1) – Overall strength of the scattering effect (0-1).
include_path_length (bool, default=True) – Whether to include path length effects (multiplicative).
path_length_sensitivity (float, default=0.5) – How strongly particle size affects effective path length.
random_state (int, optional) – Random seed for reproducibility.
Examples
>>> from nirs4all.operators.augmentation import ParticleSizeAugmenter >>> aug = ParticleSizeAugmenter(mean_size_um=30.0) >>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Random particle size in pipeline >>> aug = ParticleSizeAugmenter(size_range_um=(20, 100)) >>> pipeline = [aug, PLSRegression(10)]
References
Dahm & Dahm (2007). Interpreting Diffuse Reflectance and Transmittance.
- set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') ParticleSizeAugmenter
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]
Apply particle size effects to spectra.
- Parameters:
X (ndarray of shape (n_samples, n_features)) – Input spectra.
wavelengths (ndarray of shape (n_features,)) – Wavelength array in nm.
- Returns:
X_transformed – Spectra with particle size effects applied.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.Random_X_Operation(apply_on='global', random_state=None, *, copy=True, operator_func=<built-in function mul>, operator_range=(0.97, 1.03))[source]
Bases:
AugmenterClass for applying random operation on data augmentation.
- Parameters:
apply_on (str, optional) – Apply augmentation on “features” or “samples” data. Default is “features”.
random_state (int or None, optional) – Random seed for reproducibility. Default is None.
copy (bool, optional) – If True, creates a copy of the input data. Default is True.
operator_func (function, optional) – Operator function to be applied. Default is operator.mul.
operator_range (tuple, optional) – Range for generating random values for the operator. Default is (0.97, 1.03).
- class nirs4all.operators.ResampleTransformer(num_samples: int)[source]
Bases:
BaseEstimator,TransformerMixin
- class nirs4all.operators.RobustStandardNormalVariate(axis=1, with_center=True, with_scale=True, k=1.4826, copy=True)[source]
Bases:
TransformerMixin,BaseEstimatorRobust Standard Normal Variate (RSNV).
- Per-sample robust centering and scaling using median and MAD:
med = median(X, axis=1, keepdims=True) mad = median(|X - med|, axis=1, keepdims=True) X’ = (X - med) / (k * mad)
- Parameters:
axis (int, default=1) – 1 for row-wise (spectroscopy default). 0 for column-wise.
with_center (bool, default=True) – If True, subtract median.
with_scale (bool, default=True) – If True, divide by k * MAD.
k (float, default=1.4826) – Consistency constant to make MAD a robust estimator of std for Gaussian data.
copy (bool, default=True) – If False, try in-place.
Notes
MAD==0 → divide by 1 to avoid NaN.
- fit_transform(X, y=None)[source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters. Pass only if the estimator accepts additional params in its fit method.
- Returns:
X_new – Transformed array.
- Return type:
- class nirs4all.operators.Rotate_Translate(apply_on='samples', random_state=None, *, copy=True, p_range=2, y_factor=3)[source]
Bases:
AugmenterClass for rotating and translating data augmentation.
Vectorized implementation that processes all samples in batch.
- Parameters:
apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.
random_state (int or None, optional) – Random seed for reproducibility. Default is None.
copy (bool, optional) – If True, creates a copy of the input data. Default is True.
p_range (int, optional) – Range for generating random slope values. Default is 2.
y_factor (int, optional) – Scaling factor for the initial value. Default is 3.
- augment(X, apply_on='samples')[source]
Augment the data by rotating and translating the signal.
Vectorized implementation using NumPy broadcasting.
- Parameters:
X (ndarray) – Input data to be augmented, shape (n_samples, n_features).
apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.
- Returns:
Augmented data.
- Return type:
ndarray
- class nirs4all.operators.SampleFilter(reason: str | None = None)[source]
Bases:
TransformerMixin,BaseEstimator,ABCBase class for sample filtering operators.
Sample filters identify samples that should be excluded from training datasets. Unlike transformers that modify data, filters mark samples for exclusion without altering the underlying data.
The filtering pattern works as follows: 1. fit(): Learn filter criteria from training data (e.g., compute thresholds) 2. get_mask(): Return boolean mask indicating which samples to KEEP 3. transform(): No-op (filtering happens at indexer level, not data level)
All concrete filter implementations must override the get_mask() method.
- reason
Identifier for this filter type, used to track exclusion reasons in the indexer. Default is the class name.
- Type:
Example
>>> class MyFilter(SampleFilter): ... def __init__(self, threshold: float = 1.0): ... super().__init__() ... self.threshold = threshold ... ... def fit(self, X, y=None): ... self.mean_ = np.mean(y) ... self.std_ = np.std(y) ... return self ... ... def get_mask(self, X, y=None) -> np.ndarray: ... z_scores = np.abs((y - self.mean_) / self.std_) ... return z_scores <= self.threshold # True = keep
- property exclusion_reason: str
Get the exclusion reason identifier for this filter.
- Returns:
Reason string to be stored in indexer’s exclusion_reason column.
- Return type:
- fit(X: ndarray, y: ndarray | None = None) SampleFilter[source]
Compute filter criteria from training data.
This method should learn any thresholds, statistics, or models needed to identify outliers/bad samples. Override in subclasses for filters that need to learn from data.
- Parameters:
X – Feature array of shape (n_samples, n_features).
y – Target array of shape (n_samples,) or (n_samples, n_targets). May be None for X-only filters.
- Returns:
The fitted filter instance.
- Return type:
self
- fit_transform(X: ndarray, y: ndarray | None = None, **fit_params) ndarray[source]
Fit to data and return unchanged (transform is no-op).
- Parameters:
X – Feature array of shape (n_samples, n_features).
y – Target array of shape (n_samples,) or (n_samples, n_targets).
**fit_params – Additional fitting parameters (unused).
- Returns:
The unchanged input array.
- Return type:
np.ndarray
- get_excluded_indices(X: ndarray, y: ndarray | None = None) ndarray[source]
Get indices of samples to be excluded.
Convenience method that inverts get_mask() to return indices of samples marked for exclusion.
- Parameters:
X – Feature array of shape (n_samples, n_features).
y – Target array of shape (n_samples,) or (n_samples, n_targets).
- Returns:
Integer array of indices for samples to exclude.
- Return type:
np.ndarray
Example
>>> filter = YOutlierFilter(method="iqr") >>> filter.fit(X_train, y_train) >>> excluded_idx = filter.get_excluded_indices(X_train, y_train) >>> print(f"Excluding {len(excluded_idx)} samples")
- get_filter_stats(X: ndarray, y: ndarray | None = None) Dict[str, Any][source]
Get statistics about filter application.
Override in subclasses to provide filter-specific statistics (e.g., thresholds used, distribution of values, etc.).
- Parameters:
X – Feature array of shape (n_samples, n_features).
y – Target array of shape (n_samples,) or (n_samples, n_targets).
- Returns:
- Dictionary containing filter statistics:
n_samples: Total number of samples
n_excluded: Number of samples to exclude
n_kept: Number of samples to keep
exclusion_rate: Ratio of excluded to total
reason: Exclusion reason string
- Return type:
Dict[str, Any]
- get_kept_indices(X: ndarray, y: ndarray | None = None) ndarray[source]
Get indices of samples to be kept.
Convenience method that returns indices of samples NOT marked for exclusion.
- Parameters:
X – Feature array of shape (n_samples, n_features).
y – Target array of shape (n_samples,) or (n_samples, n_targets).
- Returns:
Integer array of indices for samples to keep.
- Return type:
np.ndarray
- abstractmethod get_mask(X: ndarray, y: ndarray | None = None) ndarray[source]
Compute boolean mask indicating which samples to KEEP.
This is the core method that must be implemented by all concrete filters. Returns True for samples that should be kept, False for samples to exclude.
- Parameters:
X – Feature array of shape (n_samples, n_features).
y – Target array of shape (n_samples,) or (n_samples, n_targets). May be None for X-only filters.
- Returns:
- Boolean array of shape (n_samples,) where:
True means KEEP the sample
False means EXCLUDE the sample
- Return type:
np.ndarray
- Raises:
NotImplementedError – If the subclass doesn’t implement this method.
- transform(X: ndarray) ndarray[source]
Transform is a no-op for filters.
Filtering happens at the indexer level, not by modifying the data array. This method returns the input unchanged to maintain sklearn compatibility.
- Parameters:
X – Feature array of shape (n_samples, n_features).
- Returns:
The unchanged input array.
- Return type:
np.ndarray
- class nirs4all.operators.SavitzkyGolay(window_length: int = 11, polyorder: int = 3, deriv: int = 0, delta: float = 1.0, *, copy: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorA class for smoothing and differentiating data using the Savitzky-Golay filter.
Parameters:
- window_lengthint, optional (default=11)
The length of the window used for smoothing.
- polyorderint, optional (default=3)
The order of the polynomial used for fitting the samples within the window.
- derivint, optional (default=0)
The order of the derivative to compute.
- deltafloat, optional (default=1.0)
The sampling distance of the data.
- copybool, optional (default=True)
Whether to copy the input data.
Methods:
- fit(X, y=None)
Fits the transformer to the data X.
- transform(X, copy=None)
Applies the Savitzky-Golay filter to the data X.
- fit(X, y=None)[source]
Verify the X data compliance with Savitzky-Golay filter.
- Parameters:
X (array-like) – The data to transform.
y (None) – Ignored.
- Raises:
ValueError – If the input X is a sparse matrix.
- Returns:
The fitted object.
- Return type:
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') SavitzkyGolay
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.SimpleScale(copy=True)[source]
Bases:
TransformerMixin,BaseEstimator
- class nirs4all.operators.SpectraTransformerMixin[source]
Bases:
TransformerMixin,BaseEstimatorBase class for spectral transformations that require wavelength information.
This mixin extends sklearn’s TransformerMixin to support wavelength-aware transformations. The controller automatically provides wavelengths from the dataset when available and when the operator declares it needs them.
Subclasses must implement transform_with_wavelengths() instead of transform().
- Parameters:
parameters. (None - this is a mixin class. Subclasses define their own)
- _requires_wavelengths
Class-level flag indicating whether this operator requires wavelengths. If True (default), transform() will raise ValueError if wavelengths are not provided. Subclasses can set this to False if wavelengths are optional.
- Type:
Examples
>>> class TemperatureAugmenter(SpectraTransformerMixin): ... def __init__(self, temperature_delta: float = 5.0): ... self.temperature_delta = temperature_delta ... ... def transform_with_wavelengths( ... self, X: np.ndarray, wavelengths: np.ndarray ... ) -> np.ndarray: ... # Apply temperature-dependent spectral changes ... # ... implementation ... ... return X_transformed
Notes
The controller detects SpectraTransformerMixin instances via:
needs_wavelengths = ( isinstance(op, SpectraTransformerMixin) and getattr(op, '_requires_wavelengths', False) )
Wavelengths are extracted from the dataset using dataset.wavelengths_nm(source).
- fit(X, y=None, **fit_params)[source]
Fit is a no-op for most spectral transformations.
- Parameters:
- Returns:
self – Returns self.
- Return type:
- set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') SpectraTransformerMixin
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform(X, wavelengths: ndarray | None = None)[source]
Transform method that delegates to transform_with_wavelengths.
If wavelengths are not provided and the operator requires them, this will raise a ValueError.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input spectra array.
wavelengths (ndarray of shape (n_features,) or None, default=None) – Wavelength array in nm. Required if _requires_wavelengths is True.
- Returns:
X_transformed – Transformed spectra array.
- Return type:
ndarray of shape (n_samples, n_features)
- Raises:
ValueError – If wavelengths are not provided and _requires_wavelengths is True.
- abstractmethod transform_with_wavelengths(X: ndarray, wavelengths: ndarray | None) ndarray[source]
Apply the transformation using wavelength information.
Subclasses must implement this method to perform the actual transformation.
- Parameters:
X (ndarray of shape (n_samples, n_features)) – Input spectra.
wavelengths (ndarray of shape (n_features,) or None) – Wavelength array in nm. May be None if _requires_wavelengths is False.
- Returns:
X_transformed – Transformed spectra.
- Return type:
ndarray of shape (n_samples, n_features)
- Raises:
NotImplementedError – If the subclass does not implement this method.
- class nirs4all.operators.Spline_Curve_Simplification(apply_on='samples', random_state=None, *, copy=True, spline_points=None, uniform=False)[source]
Bases:
AugmenterClass to simplify a 1D signal using B-spline interpolation along the curve.
Optimized implementation with pre-allocated output arrays.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).
spline_points (int, optional) – Number of spline points for simplification. Default is None: the length of the sample / 4.
uniform (bool, optional) – If True, the spline points are uniformly spaced. Default is False.
- augment(X, apply_on='samples')[source]
Select regularly spaced points on the x-axis and adjust a spline.
Optimized with pre-allocated output array.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “features” (default: “samples”).
- Returns:
Augmented data.
- Return type:
ndarray
- class nirs4all.operators.Spline_Smoothing(apply_on='samples', random_state=None, *, copy=True)[source]
Bases:
AugmenterClass to apply a smoothing spline to a 1D signal.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).
- augment(X, apply_on='samples')[source]
Apply a smoothing spline to the data.
Optimized implementation with pre-allocated output array.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).
- Returns:
Augmented data.
- Return type:
ndarray
- class nirs4all.operators.Spline_X_Perturbations(apply_on='samples', random_state=None, *, copy=True, spline_degree=3, perturbation_density=0.05, perturbation_range=(-10, 10))[source]
Bases:
AugmenterClass to apply a perturbation to a 1D signal using B-spline interpolation.
Optimized implementation with pre-generated random parameters.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).
spline_degree (int, optional) – Degree of the spline. Default is 3 (cubic).
perturbation_density (float, optional) – Density of perturbation points relative to data size. Default is 0.05.
perturbation_range (tuple, optional) – Range of perturbation values (min, max). Default is (-10, 10).
- augment(X, apply_on='samples')[source]
Augment the data with a perturbation using B-spline interpolation.
Optimized with pre-allocated arrays and batch random generation.
- Parameters:
X (ndarray) – Input data to be augmented.
apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.
- Returns:
Augmented data.
- Return type:
ndarray
- class nirs4all.operators.Spline_X_Simplification(apply_on='samples', random_state=None, *, copy=True, spline_points=None, uniform=False)[source]
Bases:
AugmenterClass to simplify a 1D signal using B-spline interpolation along the x-axis.
Optimized implementation with pre-generated random parameters.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).
spline_points (int, optional) – Number of spline points for simplification. Default is None: the length of the sample / 4.
uniform (bool, optional) – If True, the spline points are uniformly spaced. Default is False.
- augment(X, apply_on='samples')[source]
Select randomly spaced points along the x-axis and adjust a spline.
Optimized with pre-allocated arrays and batch random generation.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).
- Returns:
Augmented data.
- Return type:
ndarray
- class nirs4all.operators.Spline_Y_Perturbations(apply_on='samples', random_state=None, *, copy=True, spline_points=None, perturbation_intensity=0.005)[source]
Bases:
AugmenterAugment the data with a perturbation on the y-axis using B-spline interpolation.
Optimized implementation with pre-generated random parameters.
- Parameters:
X (ndarray) – Input data.
apply_on (str, optional) – Apply augmentation on “samples” or “global” (default: “samples”).
spline_points (int, optional) – Number of spline points. Default is None (uses sample length / 2).
perturbation_intensity (float, optional) – Intensity of perturbation relative to max value. Default is 0.005.
- augment(X, apply_on='samples')[source]
Augment the data with a perturbation on the y-axis using B-spline interpolation.
Optimized with pre-allocated arrays and batch random generation.
- Parameters:
X (ndarray) – Input data to be augmented.
apply_on (str, optional) – Apply augmentation on “samples” or “global” data. Default is “samples”.
- Returns:
Augmented data.
- Return type:
ndarray
- class nirs4all.operators.StandardNormalVariate(axis=1, with_mean=True, with_std=True, ddof=0, copy=True)[source]
Bases:
TransformerMixin,BaseEstimatorStandard Normal Variate (SNV) transformation.
SNV is a row-wise normalization technique commonly used in spectroscopy to remove scatter effects. Each sample (row) is centered and scaled independently.
For each sample: SNV = (X - mean(X)) / std(X)
- Parameters:
axis (int, default=1) – Axis along which to compute mean and standard deviation. - axis=1: Row-wise (default, standard SNV behavior for spectroscopy) - axis=0: Column-wise (equivalent to StandardScaler)
with_mean (bool, default=True) – If True, center the data before scaling.
with_std (bool, default=True) – If True, scale the data to unit variance.
ddof (int, default=0) – Delta Degrees of Freedom for standard deviation calculation.
copy (bool, default=True) – If False, try to avoid a copy and do inplace scaling instead.
Examples
>>> from nirs4all.operators.transforms import StandardNormalVariate >>> import numpy as np >>> X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float) >>> snv = StandardNormalVariate() >>> X_transformed = snv.fit_transform(X)
- fit(X, y=None)[source]
Fit the StandardNormalVariate transformer.
For SNV, this is a no-op as the transformation is computed independently for each sample.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The training data.
y (None) – Ignored variable.
- Returns:
self – Returns the instance itself.
- Return type:
- fit_transform(X, y=None)[source]
Fit to data, then transform it.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input data.
y (None) – Ignored variable.
- Returns:
X_transformed – The transformed data.
- Return type:
ndarray of shape (n_samples, n_features)
- transform(X)[source]
Perform SNV transformation.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input data to be transformed.
- Returns:
X_transformed – The transformed data.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.StrayLightAugmenter(stray_light_fraction: float = 0.001, edge_enhancement: float = 2.0, edge_width: float = 0.1, include_peak_truncation: bool = True, random_state: int | None = None)[source]
Bases:
SpectraTransformerMixinSimulate stray light effects on NIR spectra.
Stray light is unwanted radiation that reaches the detector without passing through the intended optical path. Its effects are most pronounced: - At high-absorbance wavelengths (peaks appear truncated) - At spectral edges where instrument sensitivity is lower - Near the limits of the detector’s wavelength range
The primary effect is a reduction in observed peak height, causing apparent negative deviations from Beer’s law. This is particularly problematic at the edges of spectra where stray light often constitutes a larger fraction of the total signal.
- Parameters:
stray_light_fraction (float, default=0.001) – Base stray light as fraction of total signal (0.001 = 0.1%). Typical values: 0.0001-0.01 depending on instrument quality.
edge_enhancement (float, default=2.0) – Factor by which stray light increases at spectral edges.
edge_width (float, default=0.1) – Fraction of spectral range considered “edge” (0-0.5).
include_peak_truncation (bool, default=True) – Whether to simulate peak height reduction at high absorbance.
random_state (int, optional) – Random seed for reproducibility.
Examples
>>> from nirs4all.operators.augmentation import StrayLightAugmenter >>> aug = StrayLightAugmenter(stray_light_fraction=0.005) >>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # High stray light (older/portable instruments) >>> aug = StrayLightAugmenter(stray_light_fraction=0.01, edge_enhancement=3.0) >>> pipeline = [aug, MSC(), PLSRegression(10)]
Notes
- The observed transmittance with stray light is:
T_obs = (T_true + s) / (1 + s)
where s is the stray light fraction. This causes: - At high absorbance (low T_true): T_obs ≈ s, creating a floor effect - At low absorbance (high T_true): Minimal effect
- Converting to absorbance:
A_obs = -log10(T_obs) < A_true
References
Applied Optics (1975). Resolution and stray light in near infrared spectroscopy, 14(8), 1977.
Chalmers & Griffiths (2001). Mid-Infrared Spectroscopy: Anomalies, Artifacts and Common Errors.
- set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') StrayLightAugmenter
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]
Apply stray light effects to spectra.
- Parameters:
X (ndarray of shape (n_samples, n_features)) – Input spectra (in absorbance units).
wavelengths (ndarray of shape (n_features,)) – Wavelength array in nm.
- Returns:
X_transformed – Spectra with stray light effects applied.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.TemperatureAugmenter(temperature_delta: float = 5.0, temperature_range: Tuple[float, float] | None = None, reference_temperature: float = 25.0, enable_shift: bool = True, enable_intensity: bool = True, enable_broadening: bool = True, region_specific: bool = True, random_state: int | None = None)[source]
Bases:
SpectraTransformerMixinSimulate temperature-induced spectral changes for data augmentation.
Temperature affects NIR spectra through: - Peak position shifts (especially O-H, N-H bands) - Intensity changes (hydrogen bonding disruption) - Band broadening (thermal motion)
This operator applies region-specific temperature effects based on literature values for NIR spectroscopy.
- Parameters:
temperature_delta (float, default=5.0) – Temperature change from reference (°C). Positive = heating.
temperature_range (tuple of (float, float), optional) – If provided, randomly sample temperature_delta from this range for each sample. Overrides temperature_delta parameter.
reference_temperature (float, default=25.0) – Reference temperature for the input spectra (°C).
enable_shift (bool, default=True) – Apply peak position shifts.
enable_intensity (bool, default=True) – Apply intensity changes.
enable_broadening (bool, default=True) – Apply band broadening.
region_specific (bool, default=True) – Apply region-specific effects (recommended). If False, applies uniform average effects across all wavelengths.
random_state (int, optional) – Random seed for reproducibility.
Examples
>>> from nirs4all.operators.augmentation import TemperatureAugmenter >>> aug = TemperatureAugmenter(temperature_delta=10.0) >>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Random temperature variation in pipeline >>> aug = TemperatureAugmenter(temperature_range=(-5, 10)) >>> pipeline = [aug, PLSRegression(10)]
References
Maeda et al. (1995). JNIR Spectroscopy, 3(4), 191-201.
Segtnan et al. (2001). Analytical Chemistry, 73(13), 3153-3161.
- set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') TemperatureAugmenter
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]
Apply temperature effects to spectra.
- Parameters:
X (ndarray of shape (n_samples, n_features)) – Input spectra.
wavelengths (ndarray of shape (n_features,)) – Wavelength array in nm.
- Returns:
X_transformed – Spectra with temperature effects applied.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.TruncatedPeakAugmenter(peak_probability: float = 0.3, amplitude_range: Tuple[float, float] = (0.01, 0.1), width_range: Tuple[float, float] = (50, 200), left_edge: bool = True, right_edge: bool = True, random_state: int | None = None)[source]
Bases:
SpectraTransformerMixinSimulate truncated absorption peaks at spectral boundaries.
When measuring NIR spectra, absorption bands that have their centers outside the measured wavelength range will appear as partial peaks at the spectral edges. This creates characteristic rising or falling baselines at the spectrum boundaries.
This effect is common when: - The spectrometer range doesn’t cover the full absorption band - Strong absorbers (e.g., water) have peaks just outside the range - Mid-IR absorption bands tail into the NIR region
- Parameters:
peak_probability (float, default=0.3) – Probability of adding truncated peaks (0-1).
amplitude_range (tuple of (float, float), default=(0.01, 0.1)) – Range of peak amplitudes (in absorbance units).
width_range (tuple of (float, float), default=(50, 200)) – Range of peak widths (in nm). Controls how fast the edge rises/falls.
left_edge (bool, default=True) – Whether to potentially add truncated peak at left (low wavelength) edge.
right_edge (bool, default=True) – Whether to potentially add truncated peak at right (high wavelength) edge.
random_state (int, optional) – Random seed for reproducibility.
Examples
>>> from nirs4all.operators.augmentation import TruncatedPeakAugmenter >>> aug = TruncatedPeakAugmenter(peak_probability=0.5) >>> X_aug = aug.transform(X, wavelengths=wavelengths)
>>> # Strong truncated peaks (e.g., water band edge) >>> aug = TruncatedPeakAugmenter( ... amplitude_range=(0.05, 0.2), ... width_range=(100, 300) ... ) >>> pipeline = [aug, SNV(), PLSRegression(10)]
Notes
The truncated peak is modeled as a Gaussian band with its center positioned outside the measured wavelength range. Only the “tail” of this band appears in the spectrum.
- set_transform_request(*, wavelengths: bool | None | str = '$UNCHANGED$') TruncatedPeakAugmenter
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform_with_wavelengths(X: ndarray, wavelengths: ndarray) ndarray[source]
Apply truncated peak effects to spectra.
- Parameters:
X (ndarray of shape (n_samples, n_features)) – Input spectra.
wavelengths (ndarray of shape (n_features,)) – Wavelength array in nm.
- Returns:
X_transformed – Spectra with truncated peaks at edges.
- Return type:
ndarray of shape (n_samples, n_features)
- class nirs4all.operators.Wavelet(wavelet: str = 'haar', mode: str = 'periodization', *, copy: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorSingle level Discrete Wavelet Transform.
Performs a discrete wavelet transform on data, using a wavelet function.
- Parameters:
- fit(X, y=None)[source]
Verify the X data compliance with wavelet transform.
- Parameters:
X (array-like, spectra) – The data to transform.
y (None) – Ignored.
- Raises:
ValueError – If the input X is a sparse matrix.
- Returns:
The fitted object.
- Return type:
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$') Wavelet
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class nirs4all.operators.YOutlierFilter(method: Literal['iqr', 'zscore', 'percentile', 'mad'] = 'iqr', threshold: float = 1.5, lower_percentile: float = 1.0, upper_percentile: float = 99.0, reason: str | None = None)[source]
Bases:
SampleFilterFilter samples with outlier target values.
This filter identifies samples whose y-values are statistical outliers using one of several detection methods. It’s commonly used to remove samples with extreme or erroneous target values before training.
Supported methods: - “iqr”: Interquartile Range method (default) - “zscore”: Z-score (standard deviations from mean) - “percentile”: Direct percentile cutoffs - “mad”: Median Absolute Deviation (robust to outliers)
Example
>>> from nirs4all.operators.filters import YOutlierFilter >>> >>> # IQR method (default, threshold=1.5 is standard) >>> filter_iqr = YOutlierFilter(method="iqr", threshold=1.5) >>> >>> # Z-score method (threshold=3.0 is common) >>> filter_zscore = YOutlierFilter(method="zscore", threshold=3.0) >>> >>> # Percentile method >>> filter_pct = YOutlierFilter( ... method="percentile", ... lower_percentile=1.0, ... upper_percentile=99.0 ... ) >>> >>> # Fit and get mask >>> filter_iqr.fit(X_train, y_train) >>> mask = filter_iqr.get_mask(X_train, y_train) # True = keep
- In Pipeline:
>>> pipeline = [ ... { ... "sample_filter": { ... "filters": [YOutlierFilter(method="iqr", threshold=1.5)], ... } ... }, ... "snv", ... "model:PLSRegression", ... ]
- fit(X: ndarray, y: ndarray | None = None) YOutlierFilter[source]
Compute outlier detection bounds from training data.
- Parameters:
X – Feature array of shape (n_samples, n_features). Not used but required for sklearn compatibility.
y – Target array of shape (n_samples,) or (n_samples, n_targets). Required for Y-based filtering.
- Returns:
The fitted filter instance.
- Return type:
self
- Raises:
ValueError – If y is None (required for Y-based filtering).
ValueError – If y has no valid (non-NaN) values.
- get_filter_stats(X: ndarray, y: ndarray | None = None) Dict[str, Any][source]
Get statistics about filter application including method-specific details.
- Parameters:
X – Feature array.
y – Target array.
- Returns:
Base stats (n_samples, n_excluded, n_kept, exclusion_rate)
method: Detection method used
threshold: Threshold value
lower_bound: Computed lower bound
upper_bound: Computed upper bound
center: Central value (mean/median)
scale: Scale measure (std/IQR/MAD)
y_range: (min, max) of input y values
- Return type:
Dict containing
- get_mask(X: ndarray, y: ndarray | None = None) ndarray[source]
Compute boolean mask indicating which samples to KEEP.
- Parameters:
X – Feature array of shape (n_samples, n_features). Not used but required for API consistency.
y – Target array of shape (n_samples,) or (n_samples, n_targets). Required for Y-based filtering.
- Returns:
- Boolean array of shape (n_samples,) where:
True means KEEP the sample (within bounds)
False means EXCLUDE the sample (outside bounds)
- Return type:
np.ndarray
- Raises:
ValueError – If y is None.
ValueError – If filter has not been fitted (bounds not set).
- nirs4all.operators.baseline(spectra)[source]
Removes baseline (mean) from each spectrum.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
- Returns:
Mean-centered NIRS data matrix.
- Return type:
- nirs4all.operators.derivate(spectra, order=1, delta=1)[source]
Computes Nth order derivatives with the desired spacing using numpy.gradient.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
order (float, optional) – Order of the derivation, by default 1.
delta (int, optional) – Delta of the derivative (in samples), by default 1.
- Returns:
spectra – Derived NIR spectra.
- Return type:
- nirs4all.operators.detrend(spectra, bp=0)[source]
Perform spectral detrending to remove linear trend from data.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
bp (list, optional) – A sequence of break points. If given, an individual linear fit is performed for each part of data between two break points. Break points are specified as indices into data. Default is 0.
- Returns:
Detrended NIR spectra.
- Return type:
- nirs4all.operators.gaussian(spectra, order=2, sigma=1)[source]
Computes 1D gaussian filter using scipy.ndimage gaussian 1d filter.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
order (float, optional) – Order of the derivation.
sigma (int, optional) – Sigma of the gaussian.
- Returns:
Gaussian NIR spectra.
- Return type:
- nirs4all.operators.msc(spectra, scaled=True)[source]
Performs multiplicative scatter correction to the mean.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
scaled (bool) – Whether to scale the data. Defaults to True.
- Returns:
Scatter-corrected NIR spectra.
- Return type:
- nirs4all.operators.norml(spectra, feature_range=(-1, 1))[source]
Perform spectral normalization with user-defined limits.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
feature_range (tuple (min, max), default=(-1, 1)) – Desired range of transformed data. If range min and max equals -1, linalg normalization is applied; otherwise, user bounds-defined normalization is applied.
- Returns:
spectra – Normalized NIR spectra.
- Return type:
- nirs4all.operators.savgol(spectra: ndarray, window_length: int = 11, polyorder: int = 3, deriv: int = 0, delta: float = 1.0) ndarray[source]
Perform Savitzky–Golay filtering on the data (also calculates derivatives). This function is a wrapper for scipy.signal.savgol_filter.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
window_length (int) – Size of the filter window in samples (default 11).
polyorder (int) – Order of the polynomial estimation (default 3).
deriv (int) – Order of the derivation (default 0).
delta (float) – Sampling distance of the data.
- Returns:
NIRS data smoothed with Savitzky-Golay filtering.
- Return type:
- nirs4all.operators.spl_norml(spectra)[source]
Perform simple spectral normalization.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
- Returns:
spectra – Normalized NIR spectra.
- Return type:
- nirs4all.operators.wavelet_transform(spectra: ndarray, wavelet: str, mode: str = 'periodization') ndarray[source]
Computes transform using pywavelet transform.
- Parameters:
spectra (numpy.ndarray) – NIRS data matrix.
wavelet (str) – wavelet family transformation.
mode (str) – signal extension mode.
- Returns:
wavelet and resampled spectra.
- Return type: