nirs4all.operators.models package

Subpackages

Submodules

Module contents

Models module for presets.

This module contains model definitions and references organized by framework. TensorFlow and PyTorch models are loaded lazily to avoid importing heavy frameworks at package load time.

class nirs4all.operators.models.AllPreviousModelsSelector(include_averaged: bool = False, exclude_classnames: Set[str] | None = None)[source]

Bases: SourceModelSelector

Select all models from previous steps in current branch.

This is the default selector that includes all models trained before the meta-model step within the same branch context.

include_averaged

If True, include fold-averaged models. Default False (uses individual fold models).

exclude_classnames

Set of model class names to exclude.

Example

>>> selector = AllPreviousModelsSelector(include_averaged=True)
select(candidates: List[ModelCandidate], context: ExecutionContext, prediction_store: Predictions) List[ModelCandidate][source]

Select all previous models in current branch.

Parameters:
  • candidates – List of candidate models.

  • context – Execution context with step and branch info.

  • prediction_store – Predictions store (unused in this selector).

Returns:

Filtered list of candidates ordered by step index.

class nirs4all.operators.models.BaseModelOperator[source]

Bases: ABC

Abstract base class for all model operators.

Model operators are building blocks in pipelines that represent machine learning models (sklearn, tensorflow, pytorch, etc.).

The actual execution logic is handled by corresponding controllers in the nirs4all.controllers.models module.

abstractmethod get_controller_type() str[source]

Return the type of controller that handles this operator.

Returns:

Controller type identifier (e.g., ‘sklearn’, ‘tensorflow’, ‘pytorch’)

Return type:

str

abstractmethod get_params(deep: bool = True) Dict[str, Any][source]

Get parameters for this operator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this operator and contained subobjects that are estimators.

Returns:

Parameter names mapped to their values.

Return type:

dict

abstractmethod set_params(**params) BaseModelOperator[source]

Set the parameters of this operator.

Parameters:

**params (dict) – Operator parameters.

Returns:

self – Operator instance.

Return type:

BaseModelOperator

class nirs4all.operators.models.BranchScope(value)[source]

Bases: Enum

Which branches to include as source models.

Controls which branches’ predictions are used for stacking when the pipeline contains branching.

CURRENT_ONLY

Only use models from the current branch (default).

ALL_BRANCHES

Use models from all branches (requires compatible samples).

SPECIFIED

Use explicit list from source_models parameter.

ALL_BRANCHES = 'all_branches'
CURRENT_ONLY = 'current_only'
SPECIFIED = 'specified'
class nirs4all.operators.models.CoverageStrategy(value)[source]

Bases: Enum

Strategy for handling partial coverage in OOF reconstruction.

When some samples are missing predictions (e.g., from sample partitioning), this determines how to handle them.

STRICT

Raise error if any sample is missing predictions (default).

DROP_INCOMPLETE

Drop samples missing any source model predictions.

IMPUTE_ZERO

Fill missing predictions with zeros.

IMPUTE_MEAN

Fill missing predictions with mean of available predictions.

IMPUTE_FOLD_MEAN

Fill with mean from the same fold.

DROP_INCOMPLETE = 'drop_incomplete'
IMPUTE_FOLD_MEAN = 'impute_fold_mean'
IMPUTE_MEAN = 'impute_mean'
IMPUTE_ZERO = 'impute_zero'
STRICT = 'strict'
class nirs4all.operators.models.DiPLS(n_components: int = 5, lags: int = 1, cv_splits: int = 7, tol: float = 1e-08, max_iter: int = 1000)[source]

Bases: BaseEstimator, RegressorMixin

Dynamic PLS (DiPLS) regressor.

DiPLS extends PLS to handle dynamic systems by including time-lagged variables. It uses the trendfitter package.

Parameters:
  • n_components (int, default=5) – Number of latent variables to extract.

  • lags (int, default=1) – Number of time lags to consider (s parameter in DiPLS).

  • cv_splits (int, default=7) – Number of cross-validation splits for automatic component selection.

  • tol (float, default=1e-8) – Convergence tolerance.

  • max_iter (int, default=1000) – Maximum number of iterations.

n_features_in_

Number of features seen during fit.

Type:

int

n_components_

Actual number of components used.

Type:

int

Examples

>>> from nirs4all.operators.models.sklearn.pls import DiPLS
>>> import numpy as np
>>> X = np.random.randn(100, 50)
>>> y = np.random.randn(100)
>>> model = DiPLS(n_components=5, lags=2)
>>> model.fit(X, y)
DiPLS(n_components=5, lags=2)
>>> predictions = model.predict(X)

Notes

Requires the trendfitter package: pip install trendfitter

DiPLS is particularly useful for: - Process monitoring with temporal dependencies - NIR data collected over time - Batch process analytics

See also

sklearn.cross_decomposition.PLSRegression

Standard PLS without dynamics.

References

fit(X, y)[source]

Fit the DiPLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data (time-ordered measurements).

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

Returns:

self – Fitted estimator.

Return type:

DiPLS

Raises:

ImportError – If trendfitter package is not installed.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X)[source]

Predict using the DiPLS model.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

y_pred – Predicted values.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_targets)

Notes

DiPLS uses Hankelization which may produce fewer predictions than input samples. This implementation pads the beginning with the first predicted value to maintain compatibility with sklearn cross-validation.

set_params(**params)[source]

Set the parameters of this estimator.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

DiPLS

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') DiPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class nirs4all.operators.models.DiversitySelector(max_per_class: int = 1, preferred_classes: List[str] | None = None)[source]

Bases: SourceModelSelector

Select diverse models by class type to maximize ensemble diversity.

Ensures the stacking ensemble includes different types of models rather than multiple similar models.

max_per_class

Maximum models per class type.

preferred_classes

Optional list of preferred class names.

Example

>>> selector = DiversitySelector(
...     max_per_class=2,
...     preferred_classes=["PLSRegression", "RandomForestRegressor"]
... )
select(candidates: List[ModelCandidate], context: ExecutionContext, prediction_store: Predictions) List[ModelCandidate][source]

Select diverse models by class type.

Parameters:
  • candidates – List of candidate models.

  • context – Execution context.

  • prediction_store – Predictions store (unused).

Returns:

List of diverse models with at most max_per_class per type.

class nirs4all.operators.models.ExplicitModelSelector(model_names: List[str], strict: bool = True)[source]

Bases: SourceModelSelector

Select explicitly named models.

Uses a predefined list of model names to select sources. Model names must match exactly (case-sensitive).

model_names

List of model names to select.

strict

If True, raise error if any named model is not found.

Example

>>> selector = ExplicitModelSelector(
...     model_names=["PLS", "RandomForest", "XGBoost"],
...     strict=True
... )
select(candidates: List[ModelCandidate], context: ExecutionContext, prediction_store: Predictions) List[ModelCandidate][source]

Select models matching the specified names.

Parameters:
  • candidates – List of candidate models.

  • context – Execution context.

  • prediction_store – Predictions store (unused).

Returns:

List of candidates matching specified names, in the order specified by model_names.

Raises:

ValueError – If strict=True and any model name is not found.

class nirs4all.operators.models.FCKPLS(n_components: int = 10, alphas: Sequence[float] = (0.0, 0.5, 1.0, 1.5, 2.0), sigmas: Sequence[float] = (2.0,), kernel_size: int = 15, mode: Literal['same', 'valid'] = 'same', kernel_type: Literal['heuristic', 'grunwald'] = 'heuristic', standardize: bool = True, backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Fractional Convolutional Kernel PLS (FCK-PLS).

FCK-PLS builds spectral features by convolving input spectra with a bank of fractional order filters, then applies PLS regression on the expanded feature space. This approach captures derivative-like information at various fractional orders.

The pipeline is: 1. Optional standardization of X and Y 2. FractionalConvFeaturizer: X -> X_feat (feature expansion) 3. PLSRegression: X_feat, Y -> predictions

Parameters:
  • n_components (int, default=10) – Number of PLS components to extract.

  • alphas (sequence of float, default=(0.0, 0.5, 1.0, 1.5, 2.0)) – Fractional orders for the filter bank.

  • sigmas (sequence of float, default=(2.0,)) – Scale parameters for fractional kernels.

  • kernel_size (int, default=15) – Size of convolution kernels (must be odd).

  • mode (str, default='same') – Convolution mode: ‘same’ or ‘valid’.

  • kernel_type (str, default='heuristic') – Fractional kernel type: ‘heuristic’ or ‘grunwald’.

  • standardize (bool, default=True) – Whether to standardize X and Y before fitting.

  • backend (str, default='numpy') – Computational backend: - ‘numpy’: NumPy/SciPy backend (CPU) - ‘jax’: JAX backend (supports GPU/TPU)

n_features_in_

Number of input features.

Type:

int

n_features_out_

Number of features after convolution.

Type:

int

featurizer_

The fitted fractional featurizer.

Type:

FractionalConvFeaturizer

pls_

The fitted PLS model.

Type:

PLSRegression

Examples

>>> from nirs4all.operators.models.sklearn.fckpls import FCKPLS
>>> import numpy as np
>>> # Generate spectral data
>>> np.random.seed(42)
>>> X = np.random.randn(100, 200)  # 100 samples, 200 wavelengths
>>> y = X[:, 50:60].mean(axis=1) + 0.1 * np.random.randn(100)
>>> # Fit FCK-PLS with default fractional orders
>>> model = FCKPLS(n_components=10, alphas=(0.0, 0.5, 1.0, 1.5, 2.0))
>>> model.fit(X, y)
FCKPLS(...)
>>> predictions = model.predict(X)
>>> # Use specific fractional orders
>>> model2 = FCKPLS(n_components=10, alphas=(0.0, 1.0, 2.0), sigmas=(3.0,))
>>> model2.fit(X, y)

Notes

The fractional order α controls the type of spectral feature extracted: - α ≈ 0: Smoothed spectrum (low-pass filtering) - α ≈ 1: First derivative-like (highlights slopes) - α ≈ 2: Second derivative-like (highlights peaks/valleys) - Fractional α: Intermediate behavior

The sigma parameter controls the scale of the filter. Larger sigma captures broader spectral features; smaller sigma captures local details.

FCK-PLS can be computationally expensive with many filters and large spectra. Consider using the JAX backend for GPU acceleration.

See also

SIMPLS

Standard PLS without feature expansion.

IntervalPLS

PLS with wavelength interval selection.

__repr__() str[source]

Return string representation.

fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) FCKPLS[source]

Fit the FCK-PLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training spectra.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

Returns:

self – Fitted estimator.

Return type:

FCKPLS

Raises:
get_filter_info() dict[source]

Get information about the fractional filter bank.

Returns:

info – Dictionary containing filter parameters.

Return type:

dict

get_fractional_features(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Get the fractional convolution features.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input spectra.

Returns:

X_feat – Fractional convolution features.

Return type:

ndarray of shape (n_samples, n_features_out)

get_params(deep: bool = True) dict[source]

Get parameters for this estimator.

predict(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Predict using the FCK-PLS model.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

y_pred – Predicted values.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_targets)

set_params(**params) FCKPLS[source]

Set the parameters of this estimator.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') FCKPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Transform X to PLS score space.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to transform.

Returns:

T – PLS scores in the feature-expanded space.

Return type:

ndarray of shape (n_samples, n_components)

class nirs4all.operators.models.FractionalConvFeaturizer(alphas: Sequence[float] = (0.0, 0.5, 1.0, 1.5, 2.0), sigmas: Sequence[float] = (2.0,), kernel_size: int = 15, mode: Literal['same', 'valid'] = 'same', kernel_type: Literal['heuristic', 'grunwald'] = 'heuristic')[source]

Bases: BaseEstimator, TransformerMixin

Convolutional featurizer using a bank of fractional filters.

Builds features by convolving input spectra with multiple fractional order filters at different scales. This captures derivative-like information at various fractional orders, which can be useful for identifying spectral features.

Parameters:
  • alphas (sequence of float, default=(0.0, 0.5, 1.0, 1.5, 2.0)) – Fractional orders for the filter bank. - 0: Smoothing/identity-like - 0.5: Half-derivative - 1: First derivative - 1.5: Fractional between 1st and 2nd derivative - 2: Second derivative

  • sigmas (sequence of float, default=(2.0,)) – Scale parameters. If single value, same sigma for all alphas. If same length as alphas, pairs (alpha[i], sigma[i]).

  • kernel_size (int, default=15) – Size of convolution kernels (should be odd).

  • mode (str, default='same') – Convolution mode: - ‘same’: Output same length as input - ‘valid’: Output shorter (no padding)

  • kernel_type (str, default='heuristic') – Type of fractional kernel: - ‘heuristic’: Gaussian-modulated fractional power - ‘grunwald’: Grünwald-Letnikov coefficients

kernels_

Precomputed convolution kernels.

Type:

list of ndarray

n_kernels_

Number of kernels in the filter bank.

Type:

int

fit(X, y=None)[source]

Precompute convolution kernels.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data (used only for validation).

  • y (ignored)

Returns:

self

Return type:

FractionalConvFeaturizer

get_kernel_info() dict[source]

Get information about the filter bank.

Returns:

info – Dictionary containing kernel parameters and shapes.

Return type:

dict

transform(X)[source]

Apply fractional convolution filter bank.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input spectra.

Returns:

X_feat – Convolved features. n_features_out depends on mode: - ‘same’: n_features * n_kernels - ‘valid’: (n_features - kernel_size + 1) * n_kernels

Return type:

ndarray of shape (n_samples, n_features_out)

nirs4all.operators.models.FractionalPLS

alias of FCKPLS

class nirs4all.operators.models.IKPLS(n_components: int = 10, algorithm: int = 1, center: bool = True, scale: bool = True, backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Improved Kernel PLS (IKPLS) regressor.

A sklearn-compatible wrapper for the ikpls package, which provides fast PLS implementations using NumPy or JAX (for GPU/TPU acceleration). IKPLS is significantly faster than sklearn’s PLSRegression, especially for cross-validation.

Parameters:
  • n_components (int, default=10) – Number of PLS components to extract.

  • algorithm (int, default=1) – IKPLS algorithm variant (1 or 2). Algorithm 1 is generally faster.

  • center (bool, default=True) – Whether to center X and Y before fitting.

  • scale (bool, default=True) – Whether to scale X and Y before fitting.

  • backend (str, default='numpy') – Backend to use for computation. Options are: - ‘numpy’: Use NumPy backend (CPU only). - ‘jax’: Use JAX backend (supports GPU/TPU acceleration). JAX backend requires JAX to be installed: pip install jax For GPU support: pip install jax[cuda12]

n_features_in_

Number of features seen during fit.

Type:

int

n_components_

Actual number of components used (may be less than n_components if limited by data dimensions).

Type:

int

coef_

Regression coefficients.

Type:

ndarray of shape (n_features, n_targets)

Examples

>>> from nirs4all.operators.models.sklearn.pls import IKPLS
>>> import numpy as np
>>> X = np.random.randn(100, 50)
>>> y = np.random.randn(100)
>>> # NumPy backend (default)
>>> model = IKPLS(n_components=10)
>>> model.fit(X, y)
IKPLS(n_components=10)
>>> predictions = model.predict(X)
>>> # JAX backend for GPU acceleration
>>> model_jax = IKPLS(n_components=10, backend='jax')

Notes

Requires the ikpls package: pip install ikpls

For JAX backend with GPU support, install JAX with CUDA: pip install jax[cuda12]

The JAX backend is end-to-end differentiable, allowing gradient propagation when using PLS as a layer in a deep learning model.

See also

sklearn.cross_decomposition.PLSRegression

Standard sklearn PLS.

References

fit(X, y)[source]

Fit the IKPLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

Returns:

self – Fitted estimator.

Return type:

IKPLS

Raises:
  • ImportError – If ikpls package is not installed, or JAX is not available when using ‘jax’ backend.

  • ValueError – If backend is not ‘numpy’ or ‘jax’.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X, n_components=None)[source]

Predict using the IKPLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Samples to predict.

  • n_components (int, optional) – Number of components to use for prediction. If None, uses all fitted components.

Returns:

y_pred – Predicted values (always returns NumPy arrays).

Return type:

ndarray of shape (n_samples,) or (n_samples, n_targets)

set_params(**params)[source]

Set the parameters of this estimator.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

IKPLS

set_predict_request(*, n_components: bool | None | str = '$UNCHANGED$') IKPLS

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

n_components (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for n_components parameter in predict.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IKPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class nirs4all.operators.models.IdentityFeaturizer[source]

Bases: BaseEstimator, TransformerMixin

Identity featurizer: ψ(x) = x.

This is the default featurizer for OKLMPLS when no nonlinear transformation is needed.

fit(X, y=None)[source]

Fit the featurizer (no-op for identity).

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (ignored)

Returns:

self

Return type:

IdentityFeaturizer

transform(X)[source]

Transform X (identity).

Parameters:

X (array-like of shape (n_samples, n_features)) – Data to transform.

Returns:

X – Same as input.

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.models.IntervalPLS(n_components: int = 5, n_intervals: int = 10, interval_width: int | None = None, cv: int = 5, scoring: str = 'r2', mode: Literal['single', 'forward', 'backward'] = 'forward', combination_method: Literal['best', 'union'] = 'union', backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Interval Partial Least Squares (iPLS) regressor.

iPLS evaluates PLS models on contiguous wavelength intervals to identify optimal spectral regions for prediction. This is particularly useful for NIR spectroscopy where not all wavelengths contribute equally to the prediction.

The algorithm divides the spectrum into intervals and evaluates each interval (or combination of intervals) using cross-validation. Different selection modes are available: - ‘single’: Select only the best performing interval - ‘forward’: Iteratively add intervals that improve performance - ‘backward’: Start with all intervals and remove those that don’t help

Parameters:
  • n_components (int, default=5) – Number of PLS components to extract for each interval model.

  • n_intervals (int, default=10) – Number of equal-width intervals to divide X into.

  • interval_width (int, optional) – Fixed width for each interval. If specified, overrides n_intervals.

  • cv (int, default=5) – Number of cross-validation folds for interval evaluation.

  • scoring (str, default='r2') – Scoring metric for cross-validation. Supports sklearn metrics like ‘r2’, ‘neg_mean_squared_error’, etc.

  • mode ({'single', 'forward', 'backward'}, default='forward') – Interval selection mode: - ‘single’: Use only the best single interval - ‘forward’: Forward selection of intervals - ‘backward’: Backward elimination of intervals

  • combination_method ({'best', 'union'}, default='union') – How to combine selected intervals for the final model: - ‘best’: Use only the single best interval - ‘union’: Use union of all selected intervals

  • backend (str, default='numpy') – Computational backend: - ‘numpy’: NumPy backend (CPU only, default) - ‘jax’: JAX backend (supports GPU/TPU acceleration) Note: JAX backend accelerates interval evaluation but final model fitting uses sklearn for compatibility.

n_features_in\_

Number of features seen during fit.

Type:

int

n_components\_

Actual number of components used in final model.

Type:

int

interval_scores\_

Cross-validation scores for each interval.

Type:

ndarray of shape (n_intervals,)

interval_starts\_

Start indices for each interval.

Type:

ndarray of shape (n_intervals,)

interval_ends\_

End indices for each interval.

Type:

ndarray of shape (n_intervals,)

n_intervals\_

Actual number of intervals.

Type:

int

selected_intervals\_

Indices of selected intervals.

Type:

list of int

selected_regions\_

(start, end) pairs for selected spectral regions.

Type:

list of tuple

coef\_

Regression coefficients for selected features.

Type:

ndarray of shape (n_selected_features, n_targets)

feature_mask\_

Boolean mask indicating selected features.

Type:

ndarray of shape (n_features,)

Examples

>>> from nirs4all.operators.models.sklearn.ipls import IntervalPLS
>>> import numpy as np
>>> # Generate sample spectral data
>>> np.random.seed(42)
>>> X = np.random.randn(100, 200)  # 200 wavelengths
>>> y = X[:, 50:70].sum(axis=1) + 0.1 * np.random.randn(100)  # Signal in 50-70
>>> # Fit iPLS to find informative regions
>>> model = IntervalPLS(n_components=5, n_intervals=10, mode='forward')
>>> model.fit(X, y)
IntervalPLS(n_components=5, n_intervals=10, mode='forward')
>>> # See which intervals were selected
>>> print(f"Selected intervals: {model.selected_intervals_}")
>>> print(f"Selected regions: {model.selected_regions_}")
>>> # Predict
>>> predictions = model.predict(X)

Notes

iPLS is particularly effective for NIR spectroscopy because: 1. Different spectral regions contain different chemical information 2. Some regions may be dominated by noise or uninformative signals 3. Selecting optimal intervals can improve both prediction and interpretation

The JAX backend provides acceleration for interval evaluation when using GPU/TPU, which is beneficial when evaluating many intervals.

See also

sklearn.cross_decomposition.PLSRegression

Standard PLS regression.

SIMPLS

SIMPLS algorithm implementation.

References

  • Norgaard, L., et al. (2000). Interval partial least-squares regression (iPLS): A comparative chemometric study with an example from near-infrared spectroscopy. Applied Spectroscopy, 54(3), 413-419.

__repr__() str[source]

Return string representation.

fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) IntervalPLS[source]

Fit the IntervalPLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data (e.g., spectral data).

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

Returns:

self – Fitted estimator.

Return type:

IntervalPLS

Raises:
  • ValueError – If backend is invalid or mode is invalid.

  • ImportError – If backend is ‘jax’ and JAX is not installed.

get_interval_info() dict[source]

Get detailed information about intervals and selection.

Returns:

info – Dictionary containing: - ‘n_intervals’: Number of intervals - ‘interval_scores’: CV scores for each interval - ‘interval_ranges’: List of (start, end) for each interval - ‘selected_intervals’: Indices of selected intervals - ‘selected_regions’: (start, end) pairs for selected regions - ‘n_selected_features’: Total number of selected features

Return type:

dict

get_params(deep: bool = True) dict[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Predict using the fitted IntervalPLS model.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

y_pred – Predicted values.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_targets)

set_params(**params) IntervalPLS[source]

Set the parameters of this estimator.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

IntervalPLS

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IntervalPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Transform X to selected feature space.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to transform.

Returns:

X_selected – Selected features only.

Return type:

ndarray of shape (n_samples, n_selected_features)

class nirs4all.operators.models.KOPLS(n_components: int = 5, n_ortho_components: int = 1, kernel: Literal['linear', 'rbf', 'poly'] = 'rbf', gamma: float | None = None, degree: int = 3, coef0: float = 1.0, center: bool = True, scale: bool = True, backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Kernel Orthogonal PLS (K-OPLS) regressor.

K-OPLS combines kernel methods with Orthogonal PLS to handle nonlinear relationships in the data. It first removes Y-orthogonal variation from the kernel matrix, then fits a kernel PLS model on the filtered kernel.

This implementation follows the algorithm from ConsensusOPLS R package, which is based on the original K-OPLS algorithm by Rantalainen et al.

Parameters:
  • n_components (int, default=5) – Number of predictive PLS components.

  • n_ortho_components (int, default=1) – Number of orthogonal components to remove. These represent Y-orthogonal variation that would hurt prediction.

  • kernel (str, default='rbf') – Kernel function to use: - ‘linear’: Linear kernel K(x,y) = x^T y - ‘rbf’: Radial basis function K(x,y) = exp(-gamma ||x-y||^2) - ‘poly’: Polynomial kernel K(x,y) = (gamma x^T y + coef0)^degree

  • gamma (float, optional) – Kernel coefficient for ‘rbf’ and ‘poly’ kernels. If None, uses 1/n_features.

  • degree (int, default=3) – Degree for polynomial kernel.

  • coef0 (float, default=1.0) – Independent term in polynomial kernel.

  • center (bool, default=True) – Whether to center the kernel matrix.

  • scale (bool, default=True) – Whether to scale Y to unit variance.

  • backend (str, default='numpy') – Computational backend to use: - ‘numpy’: NumPy backend (CPU only). - ‘jax’: JAX backend (supports GPU/TPU acceleration).

n_features_in\_

Number of features seen during fit.

Type:

int

n_components\_

Actual number of predictive components used.

Type:

int

n_ortho_components\_

Actual number of orthogonal components used.

Type:

int

X_train\_

Training data (stored for kernel computation at predict time).

Type:

ndarray of shape (n_samples, n_features)

y_mean\_

Mean of Y.

Type:

ndarray of shape (n_targets,)

y_std\_

Standard deviation of Y.

Type:

ndarray of shape (n_targets,)

x_scores\_

X scores from filtered kernel PLS (T).

Type:

ndarray of shape (n_samples, n_components)

y_scores\_

Y scores (U).

Type:

ndarray of shape (n_samples, n_components)

y_loadings\_

Y loadings (C).

Type:

ndarray of shape (n_targets, n_components)

ortho_scores\_

Orthogonal scores (T_ortho).

Type:

ndarray of shape (n_samples, n_ortho_components)

Examples

>>> from nirs4all.operators.models.sklearn.kopls import KOPLS
>>> import numpy as np
>>> # Generate nonlinear data
>>> np.random.seed(42)
>>> X = np.random.randn(100, 50)
>>> y = np.sin(X[:, :5].sum(axis=1)) + 0.1 * np.random.randn(100)
>>> # Fit K-OPLS with RBF kernel
>>> model = KOPLS(n_components=5, n_ortho_components=2, kernel='rbf')
>>> model.fit(X, y)
KOPLS(...)
>>> predictions = model.predict(X)
>>> # Transform to score space
>>> T = model.transform(X)
>>> print(T.shape)
(100, 5)

References

  • Rantalainen, M., Bylesjo, M., Cloarec, O., Nicholson, J. K., Holmes, E., & Trygg, J. (2007). Kernel-based orthogonal projections to latent structures (K-OPLS). Journal of Chemometrics, 21(7-9), 376-385.

  • ConsensusOPLS R package: https://github.com/sib-swiss/ConsensusOPLS

__repr__() str[source]

Return string representation.

fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) KOPLS[source]

Fit the K-OPLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

Returns:

self – Fitted estimator.

Return type:

KOPLS

get_params(deep: bool = True) dict[source]

Get parameters for this estimator.

predict(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Predict using the K-OPLS model.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

y_pred – Predicted values.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_targets)

set_params(**params) KOPLS[source]

Set the parameters of this estimator.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KOPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Transform X to K-OPLS score space.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to transform.

Returns:

T – X scores in the filtered kernel PLS space.

Return type:

ndarray of shape (n_samples, n_components_)

nirs4all.operators.models.KPLS

alias of KernelPLS

class nirs4all.operators.models.KernelPLS(n_components: int = 10, kernel: Literal['rbf', 'linear', 'poly', 'sigmoid'] = 'rbf', gamma: float | None = None, degree: int = 3, coef0: float = 1.0, center_kernel: bool = True, scale_y: bool = True, backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Nonlinear PLS using Kernel Methods (Kernel PLS / NL-PLS).

Kernel PLS maps the input data X into a higher-dimensional feature space using a kernel function (RBF, polynomial, sigmoid) and then fits a PLS model on the kernel matrix K(X, X). This allows capturing nonlinear relationships between X and Y while retaining the interpretability of PLS.

The algorithm: 1. Compute kernel matrix K = kernel(X_train, X_train) 2. Center the kernel matrix 3. Fit PLS on K with target Y 4. For prediction: K_test = kernel(X_test, X_train), center, predict

This is a simple and effective approach for nonlinear regression that combines the power of kernel methods with PLS dimensionality reduction.

Parameters:
  • n_components (int, default=10) – Number of PLS components to extract.

  • kernel ({'rbf', 'linear', 'poly', 'sigmoid'}, default='rbf') – Kernel function to use: - ‘rbf’: Radial basis function K(x,y) = exp(-gamma ||x-y||^2) - ‘linear’: Linear kernel K(x,y) = x^T y (equivalent to standard PLS) - ‘poly’: Polynomial kernel K(x,y) = (gamma * x^T y + coef0)^degree - ‘sigmoid’: Sigmoid kernel K(x,y) = tanh(gamma * x^T y + coef0)

  • gamma (float, optional) – Kernel coefficient for ‘rbf’, ‘poly’, and ‘sigmoid’ kernels. If None, defaults to 1/n_features.

  • degree (int, default=3) – Degree for polynomial kernel.

  • coef0 (float, default=1.0) – Independent term in polynomial and sigmoid kernels.

  • center_kernel (bool, default=True) – Whether to center the kernel matrix. Recommended for most cases.

  • scale_y (bool, default=True) – Whether to center and scale Y to zero mean and unit variance.

  • backend (str, default='numpy') – Computational backend to use: - ‘numpy’: NumPy backend (CPU only). - ‘jax’: JAX backend (supports GPU/TPU acceleration).

n_features_in\_

Number of features seen during fit.

Type:

int

n_components\_

Actual number of components used.

Type:

int

X_train\_

Training data (stored for kernel computation at predict time).

Type:

ndarray of shape (n_train, n_features)

K_train\_

Raw (uncentered) training kernel matrix.

Type:

ndarray of shape (n_train, n_train)

y_mean\_

Mean of Y (if scale_y=True).

Type:

ndarray of shape (n_targets,)

y_std\_

Standard deviation of Y (if scale_y=True).

Type:

ndarray of shape (n_targets,)

x_scores\_

X scores in kernel space (T).

Type:

ndarray of shape (n_train, n_components)

y_scores\_

Y scores (U).

Type:

ndarray of shape (n_train, n_components)

coef\_

Kernel regression coefficients.

Type:

ndarray of shape (n_train, n_targets)

Examples

>>> from nirs4all.operators.models.sklearn.nlpls import KernelPLS
>>> import numpy as np
>>> # Generate nonlinear data
>>> np.random.seed(42)
>>> X = np.random.randn(100, 50)
>>> y = np.sin(X[:, :5].sum(axis=1)) + 0.1 * np.random.randn(100)
>>> # Fit Kernel PLS with RBF kernel
>>> model = KernelPLS(n_components=10, kernel='rbf', gamma=0.1)
>>> model.fit(X, y)
KernelPLS(...)
>>> predictions = model.predict(X)
>>> print(f"R^2 score: {model.score(X, y):.4f}")

Notes

Kernel PLS is particularly useful when: - The relationship between X and Y is nonlinear - Standard linear PLS gives poor predictions - You want to use kernel methods but need PLS-style dimensionality reduction

The choice of kernel and gamma parameter significantly affects performance. Cross-validation is recommended for hyperparameter tuning.

For NIRS data, the RBF kernel with small gamma often works well for capturing nonlinear spectral-property relationships.

See also

KOPLS

Kernel OPLS with orthogonal variation filtering.

sklearn.cross_decomposition.PLSRegression

Standard linear PLS.

References

  • Rosipal, R., & Trejo, L. J. (2001). Kernel partial least squares regression in reproducing kernel hilbert space. Journal of Machine Learning Research, 2, 97-123.

__repr__() str[source]

Return string representation.

fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) KernelPLS[source]

Fit the Kernel PLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

Returns:

self – Fitted estimator.

Return type:

KernelPLS

Raises:
  • ValueError – If backend is not ‘numpy’ or ‘jax’. If kernel is not one of the supported types.

  • ImportError – If backend is ‘jax’ and JAX is not installed.

get_params(deep: bool = True) dict[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], n_components: int | None = None) ndarray[tuple[Any, ...], dtype[floating]][source]

Predict using the Kernel PLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Samples to predict.

  • n_components (int, optional) – Number of components to use for prediction. If None, uses all fitted components.

Returns:

y_pred – Predicted values.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_targets)

set_params(**params) KernelPLS[source]

Set the parameters of this estimator.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

KernelPLS

set_predict_request(*, n_components: bool | None | str = '$UNCHANGED$') KernelPLS

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

n_components (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for n_components parameter in predict.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KernelPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Transform X to kernel PLS score space.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to transform.

Returns:

T – X scores in kernel space.

Return type:

ndarray of shape (n_samples, n_components_)

class nirs4all.operators.models.LWPLS(n_components: int = 10, lambda_in_similarity: float = 1.0, scale: bool = True, backend: str = 'numpy', batch_size: int = 64)[source]

Bases: BaseEstimator, RegressorMixin

Locally-Weighted Partial Least Squares (LWPLS) regressor.

LWPLS builds a local PLS model for each query sample, weighting training samples by their similarity (proximity) to the query. This approach is useful for:

  • Data with local nonlinearity

  • Drifting processes where the relationship changes over time

  • Heterogeneous data where a single global model is inadequate

The similarity is computed using a Gaussian kernel based on Euclidean distance, controlled by the lambda_in_similarity parameter.

Parameters:
  • n_components (int, default=10) – Maximum number of PLS components to extract for each local model.

  • lambda_in_similarity (float, default=1.0) – Kernel width parameter. Smaller values create more localized models (more weight on nearby samples), larger values approach global PLS. Typical values range from 2^-9 to 2^5 depending on the data.

  • scale (bool, default=True) – Whether to standardize X and y before fitting. Strongly recommended as LWPLS uses Euclidean distances.

  • backend (str, default='numpy') – Computational backend to use. Options are: - ‘numpy’: NumPy backend (CPU only, default). - ‘jax’: JAX backend (supports GPU/TPU acceleration). - ‘torch’: PyTorch backend (supports GPU acceleration). JAX backend requires JAX to be installed: pip install jax For GPU support: pip install jax[cuda12] PyTorch backend requires PyTorch: pip install torch For GPU support: pip install torch with CUDA.

  • batch_size (int, default=64) – Number of test samples to process per batch (JAX/torch backends). Reduce this if running out of GPU memory on large datasets. Ignored for NumPy backend.

n_features_in\_

Number of features seen during fit.

Type:

int

n_components\_

Actual number of components used (limited by data dimensions).

Type:

int

X_train\_

Stored training X data (standardized if scale=True).

Type:

ndarray of shape (n_samples, n_features)

y_train\_

Stored training y data (standardized if scale=True).

Type:

ndarray of shape (n_samples,)

x_scaler\_

Fitted scaler for X (if scale=True).

Type:

StandardScaler or None

y_scaler\_

Fitted scaler for y (if scale=True).

Type:

StandardScaler or None

Examples

>>> from nirs4all.operators.models.sklearn.lwpls import LWPLS
>>> import numpy as np
>>> # Nonlinear data
>>> np.random.seed(42)
>>> X = 5 * np.random.rand(100, 2)
>>> y = 3 * X[:, 0]**2 + 10 * np.log(X[:, 1] + 0.1) + np.random.randn(100)
>>> # Split data
>>> X_train, X_test = X[:70], X[70:]
>>> y_train, y_test = y[:70], y[70:]
>>> # Fit LWPLS with NumPy backend (default)
>>> model = LWPLS(n_components=5, lambda_in_similarity=0.25)
>>> model.fit(X_train, y_train)
LWPLS(n_components=5, lambda_in_similarity=0.25)
>>> y_pred = model.predict(X_test)
>>> # Use JAX backend for GPU acceleration
>>> model_jax = LWPLS(n_components=5, lambda_in_similarity=0.25, backend='jax')
>>> model_jax.fit(X_train, y_train)
>>> y_pred_jax = model_jax.predict(X_test)
>>> # Use PyTorch backend for GPU acceleration
>>> model_torch = LWPLS(n_components=5, lambda_in_similarity=0.25, backend='torch')
>>> model_torch.fit(X_train, y_train)
>>> y_pred_torch = model_torch.predict(X_test)

Notes

LWPLS is computationally more expensive than standard PLS because it builds a separate weighted model for each prediction. The training data must be stored for prediction.

The JAX backend provides significant speedups on GPU by: - Vectorizing the per-sample loop using jax.vmap - JIT-compiling the prediction function - Running on GPU/TPU when available

The PyTorch backend provides GPU acceleration by: - Running tensor operations on CUDA or MPS devices - Batched processing to control memory usage - Automatic device selection when device=’auto’

The optimal lambda_in_similarity should be tuned via cross-validation. Typical search range is 2^k for k in [-9, 6].

This implementation is adapted from the original code by Hiromasa Kaneko (https://github.com/hkaneko1985/lwpls), licensed under MIT License.

See also

sklearn.cross_decomposition.PLSRegression

Standard global PLS.

IKPLS

Fast PLS implementation.

References

  • Kim, S., et al. (2011). Estimation of active pharmaceutical ingredient content using locally weighted partial least squares. International Journal of Pharmaceutics, 421(2), 269-274.

__repr__() str[source]

Return string representation.

fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) LWPLS[source]

Fit the LWPLS model.

This stores the training data and fits scalers if requested. Actual model building happens lazily at prediction time.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,) or (n_samples, 1)) – Target values.

Returns:

self – Fitted estimator.

Return type:

LWPLS

Raises:
  • ValueError – If backend is not ‘numpy’, ‘jax’, or ‘torch’.

  • ImportError – If backend is ‘jax’ and JAX is not installed, or if backend is ‘torch’ and PyTorch is not installed.

get_params(deep: bool = True) dict[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], n_components: int | None = None) ndarray[tuple[Any, ...], dtype[floating]][source]

Predict using the LWPLS model.

Builds a local weighted PLS model for each test sample.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Samples to predict.

  • n_components (int, optional) – Number of components to use for prediction. If None, uses n_components_ (all fitted components).

Returns:

y_pred – Predicted target values.

Return type:

ndarray of shape (n_samples,)

predict_all_components(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Predict with all component numbers (for component selection).

Returns predictions for each number of components, which can be used for cross-validation to select the optimal n_components.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

y_pred_all – Predictions where column i contains predictions using i+1 components.

Return type:

ndarray of shape (n_samples, n_components)

set_params(**params) LWPLS[source]

Set the parameters of this estimator.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

LWPLS

set_predict_request(*, n_components: bool | None | str = '$UNCHANGED$') LWPLS

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

n_components (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for n_components parameter in predict.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LWPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class nirs4all.operators.models.MBPLS(n_components: int = 5, method: str = 'NIPALS', standardize: bool = True, max_tol: float = 1e-14, backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Multiblock PLS (MB-PLS) regressor.

MB-PLS fuses multiple X blocks (e.g., different preprocessing variants, multiple sensors) into a single predictive model. Each block contributes to the latent variables according to its relevance to Y.

Parameters:
  • n_components (int, default=5) – Number of latent variables to extract.

  • method (str, default='NIPALS') – Decomposition method. Currently only ‘NIPALS’ is supported.

  • standardize (bool, default=True) – Whether to standardize blocks before fitting.

  • max_tol (float, default=1e-14) – Convergence tolerance for NIPALS.

  • backend (str, default='numpy') –

    Backend to use for computation. Options are: - ‘numpy’: Use NumPy backend (CPU only). - ‘jax’: Use JAX backend (supports GPU/TPU acceleration).

    Note: JAX backend only supports single-block mode.

    JAX backend requires JAX: pip install jax For GPU support: pip install jax[cuda12]

n_features_in_

Number of features seen during fit.

Type:

int

n_components_

Actual number of components used.

Type:

int

coef_

Regression coefficients.

Type:

ndarray of shape (n_features, n_targets)

Examples

>>> from nirs4all.operators.models.sklearn.pls import MBPLS
>>> import numpy as np
>>> X = np.random.randn(100, 50)
>>> y = np.random.randn(100)
>>> model = MBPLS(n_components=5)
>>> model.fit(X, y)
MBPLS(n_components=5)
>>> predictions = model.predict(X)
>>> # Multiblock usage
>>> X1 = np.random.randn(100, 30)
>>> X2 = np.random.randn(100, 20)
>>> model.fit([X1, X2], y)
>>> # JAX backend for GPU acceleration
>>> model_jax = MBPLS(n_components=5, backend='jax')

Notes

For JAX with GPU support: pip install jax[cuda12]

See also

sklearn.cross_decomposition.PLSRegression

Standard single-block PLS.

References

__repr__()[source]

Return string representation.

fit(X, y)[source]

Fit the MB-PLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features) or list of arrays) – Training data. Can be a single matrix or a list of X blocks for true multiblock analysis (NumPy backend only).

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

Returns:

self – Fitted estimator.

Return type:

MBPLS

Raises:
  • ImportError – If mbpls package is not installed (NumPy backend), or JAX is not available (JAX backend).

  • ValueError – If backend is not ‘numpy’ or ‘jax’, or if multiblock input is used with JAX backend.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X)[source]

Predict using the MB-PLS model.

Parameters:

X (array-like of shape (n_samples, n_features) or list of arrays) – Samples to predict. Must match the format used in fit().

Returns:

y_pred – Predicted values.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_targets)

set_params(**params)[source]

Set the parameters of this estimator.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

MBPLS

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MBPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X)[source]

Transform X to latent space.

Parameters:

X (array-like of shape (n_samples, n_features) or list of arrays) – Samples to transform.

Returns:

T – Latent variables (scores).

Return type:

ndarray of shape (n_samples, n_components)

class nirs4all.operators.models.MetaModel(model: Any, source_models: str | List[str] = 'all', use_proba: bool = False, stacking_config: StackingConfig | None = None, selector: Any | None = None, name: str | None = None, finetune_space: Dict[str, Any] | None = None)[source]

Bases: BaseModelOperator

Wrapper for meta-model stacking using pipeline predictions.

Creates a meta-learner that uses predictions from previously trained models in the pipeline as input features. Implements stacked generalization with proper out-of-fold prediction handling to prevent data leakage.

The meta-model: 1. Collects out-of-fold (OOF) predictions from specified source models 2. Constructs training features from these predictions 3. Trains on these features using the provided sklearn-compatible model 4. For test data, aggregates source model predictions across folds

Multi-Level Stacking (Phase 7): MetaModel supports multi-level stacking where meta-models can use predictions from other meta-models as sources. This enables hierarchical ensemble architectures: - Level 0: Base models (PLS, RF, XGBoost, etc.) - Level 1: First meta-models (stack on Level 0) - Level 2: Second meta-models (stack on Level 0 + Level 1) - Level 3: Third meta-models (stack on all previous levels)

The level is auto-detected by default but can be explicitly set via stacking_config.level. Circular dependencies are automatically prevented.

model

Sklearn-compatible model to use as meta-learner.

source_models

Which models to use as sources (“all” or list of names).

use_proba

For classification, use probabilities instead of class predictions.

stacking_config

Configuration for OOF reconstruction and multi-level stacking.

selector

Optional custom source model selector.

finetune_space

Optional hyperparameter search space for Optuna finetuning.

Example

>>> # Basic usage - stack all previous models
>>> MetaModel(model=Ridge())
>>>
>>> # Explicit source selection
>>> MetaModel(
...     model=Ridge(),
...     source_models=["PLS", "RandomForest", "XGBoost"]
... )
>>>
>>> # Multi-level stacking
>>> pipeline = [
...     KFold(n_splits=5),
...     PLSRegression(n_components=5),         # Level 0
...     RandomForestRegressor(),               # Level 0
...     {"model": MetaModel(model=Ridge())},   # Level 1 (auto-detected)
...     {"model": MetaModel(                   # Level 2 (uses Level 0 + Level 1)
...         model=Lasso(),
...         stacking_config=StackingConfig(level=StackingLevel.LEVEL_2)
...     )},
... ]
>>>
>>> # With probability features for classification
>>> MetaModel(
...     model=LogisticRegression(),
...     use_proba=True
... )
>>>
>>> # With Optuna hyperparameter tuning
>>> MetaModel(
...     model=Ridge(),
...     finetune_space={"model__alpha": (0.001, 100.0)}
... )

Notes

  • Source models must be from earlier steps in the pipeline

  • In branched pipelines, only models from the current branch are used by default

  • For sample_partitioner branches, stacking is done within each partition

  • Multi-level stacking supports up to 3 levels by default (configurable)

  • Circular dependencies are automatically detected and prevented

__repr__() str[source]

Return string representation.

get_controller_type() str[source]

Return the type of controller that handles this operator.

Returns:

“meta” to indicate MetaModelController should handle this.

Return type:

str

get_finetune_params() Dict[str, Any] | None[source]

Get finetuning parameters for Optuna optimization.

Returns the finetune_space with proper formatting for the Optuna manager.

Returns:

Dict with finetune configuration or None if no finetuning configured.

get_params(deep: bool = True) Dict[str, Any][source]

Get parameters for this operator.

Parameters:

deep – If True, returns nested parameters from the model.

Returns:

Parameter names mapped to their values.

Return type:

dict

property level: int

Get the stacking level of this meta-model.

Returns the detected level if AUTO, otherwise the configured level.

Returns:

Stacking level (1, 2, or 3).

Return type:

int

property name: str

Get the display name for this meta-model.

Returns:

User-provided name or ‘MetaModel_<model_class>’.

Return type:

str

set_params(**params) MetaModel[source]

Set the parameters of this operator.

Parameters:

**params – Operator parameters. Supports nested parameters for the model using ‘model__param_name’ syntax.

Returns:

MetaModel instance.

Return type:

self

class nirs4all.operators.models.ModelCandidate(model_name: str, model_classname: str, step_idx: int, fold_id: str | None = None, branch_id: int | None = None, branch_name: str | None = None, val_score: float | None = None, metric: str | None = None, predictions: Dict[str, ndarray] | None = None)[source]

Bases: object

Information about a candidate source model.

Contains metadata and optionally predictions for a model that may be selected as a source for stacking.

model_name

Name of the model.

Type:

str

model_classname

Class name of the model (e.g., “PLSRegression”).

Type:

str

step_idx

Pipeline step index where the model was trained.

Type:

int

fold_id

Fold identifier (or “avg”/”w_avg” for averaged models).

Type:

str | None

branch_id

Branch identifier if in a branched pipeline.

Type:

int | None

branch_name

Human-readable branch name.

Type:

str | None

val_score

Validation score for the model.

Type:

float | None

metric

Metric used for scoring.

Type:

str | None

predictions

Optional dictionary with predictions data.

Type:

Dict[str, numpy.ndarray] | None

branch_id: int | None = None
branch_name: str | None = None
fold_id: str | None = None
metric: str | None = None
model_classname: str
model_name: str
predictions: Dict[str, ndarray] | None = None
step_idx: int
val_score: float | None = None
nirs4all.operators.models.NLPLS

alias of KernelPLS

class nirs4all.operators.models.OKLMPLS(n_components: int = 5, featurizer: TransformerMixin | None = None, lambda_dyn: float = 1.0, lambda_reg_y: float = 1.0, max_iter: int = 50, tol: float = 0.0001, warm_start_pls: bool = True, standardize: bool = True, backend: str = 'numpy', random_state: int | None = None)[source]

Bases: BaseEstimator, RegressorMixin

Online Koopman Latent-Mode Partial Least Squares (OKLM-PLS).

OKLM-PLS combines Koopman operator theory with PLS for time-series regression. It learns latent scores T = ψ(X) @ W and simultaneously: - Enforces dynamic coherence: T_{t+1} ≈ F @ T_t - Learns regression: Y_t ≈ T_t @ B

This is useful for spectral data collected over time where temporal coherence provides additional predictive information.

Parameters:
  • n_components (int, default=5) – Number of latent components.

  • featurizer (TransformerMixin, optional) – Feature map ψ: X -> Z. If None, identity is used. Options include PolynomialFeaturizer and RBFFeaturizer.

  • lambda_dyn (float, default=1.0) – Weight for dynamic consistency loss ||T_{t+1} - F @ T_t||². Higher values enforce stronger temporal coherence.

  • lambda_reg_y (float, default=1.0) – Weight for regression loss ||Y - T @ B||².

  • max_iter (int, default=50) – Maximum alternating optimization iterations.

  • tol (float, default=1e-4) – Convergence tolerance on the objective function.

  • warm_start_pls (bool, default=True) – If True, initialize W/B from a standard PLSRegression fit.

  • standardize (bool, default=True) – Whether to standardize X and Y before fitting.

  • backend (str, default='numpy') – Computational backend: - ‘numpy’: NumPy backend (CPU only). - ‘jax’: JAX backend (supports GPU/TPU).

  • random_state (int, optional) – Random seed for initialization.

n_features_in_

Number of features in input X.

Type:

int

n_components_

Actual number of components.

Type:

int

W_

Projection weights (in featurized space).

Type:

ndarray of shape (n_features_z, n_components_)

F_

Dynamics matrix for latent scores.

Type:

ndarray of shape (n_components_, n_components_)

B_

Regression coefficients.

Type:

ndarray of shape (n_components_, n_targets)

n_iter_

Number of iterations until convergence.

Type:

int

Examples

>>> from nirs4all.operators.models.sklearn.oklmpls import OKLMPLS
>>> import numpy as np
>>> # Generate time-series data
>>> np.random.seed(42)
>>> X = np.random.randn(100, 50)
>>> y = X[:, :5].sum(axis=1) + 0.1 * np.random.randn(100)
>>> # Fit OKLM-PLS
>>> model = OKLMPLS(n_components=10, lambda_dyn=1.0, lambda_reg_y=1.0)
>>> model.fit(X, y)
OKLMPLS(...)
>>> predictions = model.predict(X)
>>> # Use with polynomial featurizer for nonlinearity
>>> from nirs4all.operators.models.sklearn.oklmpls import PolynomialFeaturizer
>>> model_poly = OKLMPLS(n_components=10, featurizer=PolynomialFeaturizer(degree=2))
>>> model_poly.fit(X, y)

Notes

OKLM-PLS is designed for temporally-ordered data where samples are sequential in time. The dynamics constraint helps capture temporal patterns and can improve prediction when the underlying process has smooth temporal evolution.

For non-temporal data, set lambda_dyn=0 to disable the dynamics constraint (equivalent to standard PLS with optional featurization).

See also

SIMPLS

Standard PLS without dynamics.

RecursivePLS

Online PLS with forgetting factor.

__repr__() str[source]

Return string representation.

fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) OKLMPLS[source]

Fit the OKLM-PLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data. Samples should be temporally ordered.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

Returns:

self – Fitted estimator.

Return type:

OKLMPLS

Raises:
get_params(deep: bool = True) dict[source]

Get parameters for this estimator.

predict(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Predict using the OKLM-PLS model.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

y_pred – Predicted values.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_targets)

predict_dynamic(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], n_steps: int = 1) ndarray[tuple[Any, ...], dtype[floating]][source]

Predict using dynamics model for future timesteps.

Given the last sample’s latent scores, predict future values using the learned dynamics T_{t+1} = F @ T_t.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Current data. Uses last sample for propagation.

  • n_steps (int, default=1) – Number of future timesteps to predict.

Returns:

y_future – Predicted future values.

Return type:

ndarray of shape (n_steps, n_targets)

set_params(**params) OKLMPLS[source]

Set the parameters of this estimator.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') OKLMPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Transform X to latent score space.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to transform.

Returns:

T – Latent scores.

Return type:

ndarray of shape (n_samples, n_components_)

class nirs4all.operators.models.OPLS(n_components: int = 1, pls_components: int = 1, scale: bool = True, backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Orthogonal PLS (OPLS) regressor. (See pls.py for full docstring)

fit(X, y)[source]
get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X)[source]
set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') OPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X)[source]
class nirs4all.operators.models.OPLSDA(n_components: int = 1, pls_components: int = 5, scale: bool = True)[source]

Bases: BaseEstimator, ClassifierMixin

Orthogonal PLS Discriminant Analysis (OPLS-DA) classifier.

# Explicitly declare estimator type for sklearn compatibility (e.g., StackingClassifier) _estimator_type = “classifier”

OPLS-DA combines OPLS filtering with PLS-DA classification. It removes Y-orthogonal variation from X before applying PLS-DA, improving class separation and model interpretability.

Parameters:
  • n_components (int, default=1) – Number of orthogonal components to remove.

  • pls_components (int, default=5) – Number of PLS components for the discriminant model.

  • scale (bool, default=True) – Whether to scale X before fitting.

classes_

Unique class labels.

Type:

ndarray of shape (n_classes,)

n_features_in_

Number of features seen during fit.

Type:

int

opls_

Fitted OPLS transformer.

Type:

pyopls.OPLS

plsda_

Fitted PLS-DA model on filtered data.

Type:

PLSDA

Examples

>>> from nirs4all.operators.models.sklearn.pls import OPLSDA
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=100, n_features=50, n_classes=2,
...                            n_informative=10, random_state=42)
>>> model = OPLSDA(n_components=1, pls_components=5)
>>> model.fit(X, y)
OPLSDA(n_components=1, pls_components=5)
>>> predictions = model.predict(X)

Notes

Requires the pyopls package: pip install pyopls

See also

PLSDA

Standard PLS-DA without orthogonal filtering.

OPLS

OPLS for regression tasks.

References

  • Bylesjö, M., et al. (2006). OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification. Journal of Chemometrics, 20(8-10), 341-351.

fit(X, y)[source]

Fit the OPLS-DA model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target class labels.

Returns:

self – Fitted estimator.

Return type:

OPLSDA

Raises:

ImportError – If pyopls package is not installed.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X)[source]

Predict class labels for samples in X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

y_pred – Predicted class labels.

Return type:

ndarray of shape (n_samples,)

predict_proba(X)[source]

Return pseudo-probabilities (PLS responses).

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples.

Returns:

proba – Pseudo-probability estimates.

Return type:

ndarray of shape (n_samples, n_classes)

set_params(**params)[source]

Set the parameters of this estimator.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

OPLSDA

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') OPLSDA

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X)[source]

Transform X by removing orthogonal variation.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to transform.

Returns:

X_filtered – Transformed samples with orthogonal variation removed.

Return type:

ndarray of shape (n_samples, n_features)

class nirs4all.operators.models.PLSDA(n_components: int = 5)[source]

Bases: BaseEstimator, ClassifierMixin

PLS Discriminant Analysis (PLS-DA) classifier. (See pls.py for full docstring)

fit(X, y)[source]
get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X)[source]
predict_proba(X)[source]
set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PLSDA

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class nirs4all.operators.models.PolynomialFeaturizer(degree: int = 2, include_original: bool = True)[source]

Bases: BaseEstimator, TransformerMixin

Polynomial featurizer for OKLM-PLS.

Creates polynomial features up to specified degree without interaction terms (for efficiency with high-dimensional spectral data).

Parameters:
  • degree (int, default=2) – Maximum degree of polynomial features.

  • include_original (bool, default=True) – Whether to include the original features (degree 1).

fit(X, y=None)[source]

Fit the featurizer.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (ignored)

Returns:

self

Return type:

PolynomialFeaturizer

transform(X)[source]

Transform X to polynomial features.

Parameters:

X (array-like of shape (n_samples, n_features)) – Data to transform.

Returns:

X_poly – Polynomial features.

Return type:

ndarray of shape (n_samples, n_features * degree)

class nirs4all.operators.models.RBFFeaturizer(n_components: int = 100, gamma: float | None = None, random_state: int | None = None)[source]

Bases: BaseEstimator, TransformerMixin

Random Fourier Features (RBF approximation) featurizer for OKLM-PLS.

Approximates the RBF kernel using random Fourier features, which is useful for adding nonlinearity to the Koopman embedding.

Parameters:
  • n_components (int, default=100) – Number of random Fourier features.

  • gamma (float, optional) – Kernel coefficient. If None, uses 1/n_features.

  • random_state (int, optional) – Random seed for reproducibility.

fit(X, y=None)[source]

Fit the featurizer by sampling random frequencies.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (ignored)

Returns:

self

Return type:

RBFFeaturizer

transform(X)[source]

Transform X to random Fourier features.

Parameters:

X (array-like of shape (n_samples, n_features)) – Data to transform.

Returns:

X_rff – Random Fourier features.

Return type:

ndarray of shape (n_samples, n_components)

class nirs4all.operators.models.RecursivePLS(n_components: int = 10, forgetting_factor: float = 0.99, scale: bool = True, center: bool = True, backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Recursive Partial Least Squares (Recursive PLS) regressor.

Recursive PLS enables online model updates for drifting processes. It uses a forgetting factor to exponentially weight old samples, allowing the model to adapt to non-stationary data streams.

The algorithm maintains running covariance matrices that are updated incrementally with each new batch of samples. The PLS loadings are then recomputed from these updated covariances.

Parameters:
  • n_components (int, default=10) – Number of PLS components to extract.

  • forgetting_factor (float, default=0.99) – Forgetting factor in (0, 1]. Controls the rate of adaptation: - 1.0: No forgetting, standard batch PLS - <1.0: Exponential forgetting of old samples - Typical values: 0.95-0.999 depending on drift speed

  • scale (bool, default=True) – Whether to scale X and Y to unit variance.

  • center (bool, default=True) – Whether to center X and Y (subtract mean).

  • backend (str, default='numpy') – Computational backend to use: - ‘numpy’: NumPy backend (CPU only). - ‘jax’: JAX backend (supports GPU/TPU acceleration).

n_features_in_

Number of features seen during fit.

Type:

int

n_components_

Actual number of components used.

Type:

int

n_samples_seen_

Total number of samples seen (including partial_fit calls).

Type:

int

x_mean_

Mean of X (updated with exponential moving average).

Type:

ndarray of shape (n_features,)

x_std_

Standard deviation of X.

Type:

ndarray of shape (n_features,)

y_mean_

Mean of Y (updated with exponential moving average).

Type:

ndarray of shape (n_targets,)

y_std_

Standard deviation of Y.

Type:

ndarray of shape (n_targets,)

x_weights_

X weights (W).

Type:

ndarray of shape (n_features, n_components_)

x_loadings_

X loadings (P).

Type:

ndarray of shape (n_features, n_components_)

y_loadings_

Y loadings (Q).

Type:

ndarray of shape (n_targets, n_components_)

coef_

Regression coefficients.

Type:

ndarray of shape (n_features, n_targets)

Examples

>>> from nirs4all.operators.models.sklearn.recursive_pls import RecursivePLS
>>> import numpy as np
>>> # Initial batch fit
>>> np.random.seed(42)
>>> X_init = np.random.randn(100, 50)
>>> y_init = X_init[:, :5].sum(axis=1) + 0.1 * np.random.randn(100)
>>> model = RecursivePLS(n_components=10, forgetting_factor=0.99)
>>> model.fit(X_init, y_init)
RecursivePLS(n_components=10)
>>> # Online update with new samples
>>> X_new = np.random.randn(10, 50)
>>> y_new = X_new[:, :5].sum(axis=1) + 0.1 * np.random.randn(10)
>>> model.partial_fit(X_new, y_new)
>>> # Predict
>>> predictions = model.predict(X_new)
>>> print(f"Samples seen: {model.n_samples_seen_}")

Notes

Recursive PLS is particularly useful when: - Data arrives in streams and batch retraining is too expensive - Process conditions drift over time (sensor aging, raw material changes) - You need to adapt a calibration model to local conditions

The forgetting factor controls the adaptation speed: - Higher values (0.99-0.999): Slow adaptation, stable model - Lower values (0.9-0.95): Fast adaptation, may be unstable

See also

SIMPLS

Batch SIMPLS algorithm.

sklearn.cross_decomposition.PLSRegression

sklearn’s batch PLS.

References

  • Qin, S. J. (1998). Recursive PLS algorithms for adaptive data modeling. Computers & Chemical Engineering, 22(4-5), 503-514.

  • Dayal, B. S., & MacGregor, J. F. (1997). Recursive exponentially weighted PLS and its applications to adaptive control and prediction. Journal of Process Control, 7(3), 169-179.

__repr__() str[source]

Return string representation.

fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) RecursivePLS[source]

Fit the Recursive PLS model with initial batch.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

Returns:

self – Fitted estimator.

Return type:

RecursivePLS

Raises:
  • ValueError – If backend is not ‘numpy’ or ‘jax’. If forgetting_factor is not in (0, 1].

  • ImportError – If backend is ‘jax’ and JAX is not installed.

get_params(deep: bool = True) dict[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

partial_fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) RecursivePLS[source]

Update the Recursive PLS model with new samples.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – New training data.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – New target values.

Returns:

self – Updated estimator.

Return type:

RecursivePLS

Raises:

NotFittedError – If the model has not been fitted yet.

predict(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Predict using the Recursive PLS model.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

y_pred – Predicted values.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_targets)

set_params(**params) RecursivePLS[source]

Set the parameters of this estimator.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

RecursivePLS

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RecursivePLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Transform X to score space.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to transform.

Returns:

T – X scores.

Return type:

ndarray of shape (n_samples, n_components_)

class nirs4all.operators.models.RobustPLS(n_components: int = 10, weighting: Literal['huber', 'tukey'] = 'huber', c: float | None = None, max_iter: int = 100, tol: float = 1e-06, scale: bool = True, center: bool = True, backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Robust Partial Least Squares (Robust PLS) regressor.

Robust PLS uses iteratively reweighted least squares (IRLS) to down-weight outliers during model fitting. This makes the model more resistant to outliers in both X (leverage points) and Y (vertical outliers).

The algorithm iterates between: 1. Fitting PLS with weighted covariance matrix 2. Computing residuals and updating weights using robust M-estimation

Two weighting schemes are available: - ‘huber’: Huber’s psi function - smooth transition from L2 to L1 - ‘tukey’: Tukey’s bisquare - completely down-weights extreme outliers

Parameters:
  • n_components (int, default=10) – Number of PLS components to extract.

  • weighting ({'huber', 'tukey'}, default='huber') – Robust weighting scheme: - ‘huber’: Huber’s psi function with smooth redescending. - ‘tukey’: Tukey’s bisquare with hard rejection of outliers.

  • c (float or None, default=None) – Tuning constant for the weight function. Controls the threshold beyond which observations are down-weighted. - For ‘huber’: default is 1.345 (95% efficiency) - For ‘tukey’: default is 4.685 (95% efficiency)

  • max_iter (int, default=100) – Maximum number of IRLS iterations.

  • tol (float, default=1e-6) – Convergence tolerance for weight changes.

  • scale (bool, default=True) – Whether to scale X and Y to unit variance.

  • center (bool, default=True) – Whether to center X and Y (subtract mean).

  • backend (str, default='numpy') – Computational backend to use: - ‘numpy’: NumPy backend (CPU only). - ‘jax’: JAX backend (supports GPU/TPU acceleration). Note: IRLS weight computation is always done in NumPy for consistency. The backend affects only the final PLS fit and prediction.

n_features_in_

Number of features seen during fit.

Type:

int

n_components_

Actual number of components used.

Type:

int

x_mean_

Mean of X.

Type:

ndarray of shape (n_features,)

x_std_

Standard deviation of X.

Type:

ndarray of shape (n_features,)

y_mean_

Mean of Y.

Type:

ndarray of shape (n_targets,)

y_std_

Standard deviation of Y.

Type:

ndarray of shape (n_targets,)

x_scores_

X scores (T).

Type:

ndarray of shape (n_samples, n_components_)

y_scores_

Y scores (U).

Type:

ndarray of shape (n_samples, n_components_)

x_weights_

X weights (W).

Type:

ndarray of shape (n_features, n_components_)

x_loadings_

X loadings (P).

Type:

ndarray of shape (n_features, n_components_)

y_loadings_

Y loadings (Q).

Type:

ndarray of shape (n_targets, n_components_)

coef_

Regression coefficients.

Type:

ndarray of shape (n_features, n_targets)

sample_weights_

Final sample weights from IRLS. Low values indicate potential outliers.

Type:

ndarray of shape (n_samples,)

Examples

>>> from nirs4all.operators.models.sklearn.robust_pls import RobustPLS
>>> import numpy as np
>>> # Generate data with outliers
>>> np.random.seed(42)
>>> X = np.random.randn(100, 50)
>>> y = X[:, :5].sum(axis=1) + 0.1 * np.random.randn(100)
>>> # Add outliers
>>> y[0:5] = y[0:5] + 10  # Vertical outliers
>>> # Fit Robust PLS
>>> model = RobustPLS(n_components=10, weighting='huber')
>>> model.fit(X, y)
RobustPLS(n_components=10, weighting='huber')
>>> predictions = model.predict(X)
>>> # Check which samples were down-weighted (potential outliers)
>>> outlier_mask = model.sample_weights_ < 0.5
>>> print(f"Potential outliers: {np.where(outlier_mask)[0]}")

Notes

Robust PLS is particularly useful when: - Data contains outliers in X or Y - Standard PLS gives poor predictions due to leverage points - You want to identify potential outliers via sample weights

The sample_weights_ attribute can be used to identify outliers after fitting. Samples with low weights (e.g., < 0.5) may be outliers worth investigating.

See also

SIMPLS

Standard SIMPLS algorithm without robust weighting.

sklearn.cross_decomposition.PLSRegression

sklearn’s PLS implementation.

References

  • Hubert, M., & Vanden Branden, K. (2003). Robust procedures for partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 65(2), 101-121.

  • Gil, J. A., & Romera, R. (1998). On robust partial least squares (PLS) methods. Journal of Chemometrics, 12(6), 365-378.

__repr__() str[source]

Return string representation.

fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) RobustPLS[source]

Fit the Robust PLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

Returns:

self – Fitted estimator.

Return type:

RobustPLS

Raises:
  • ValueError – If backend is not ‘numpy’ or ‘jax’. If weighting is not ‘huber’ or ‘tukey’.

  • ImportError – If backend is ‘jax’ and JAX is not installed.

get_outlier_mask(threshold: float = 0.5) ndarray[tuple[Any, ...], dtype[bool]][source]

Get mask of potential outliers based on sample weights.

Parameters:

threshold (float, default=0.5) – Weight threshold below which samples are considered outliers.

Returns:

outlier_mask – Boolean mask where True indicates potential outlier.

Return type:

ndarray of shape (n_samples,)

get_params(deep: bool = True) dict[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], n_components: int | None = None) ndarray[tuple[Any, ...], dtype[floating]][source]

Predict using the Robust PLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Samples to predict.

  • n_components (int, optional) – Number of components to use for prediction. If None, uses all fitted components.

Returns:

y_pred – Predicted values.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_targets)

set_params(**params) RobustPLS[source]

Set the parameters of this estimator.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

RobustPLS

set_predict_request(*, n_components: bool | None | str = '$UNCHANGED$') RobustPLS

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

n_components (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for n_components parameter in predict.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RobustPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Transform X to score space.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to transform.

Returns:

T – X scores.

Return type:

ndarray of shape (n_samples, n_components_)

class nirs4all.operators.models.SIMPLS(n_components: int = 10, scale: bool = True, center: bool = True, backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

SIMPLS (Simple PLS) regressor.

SIMPLS is an alternative to NIPALS-based PLS that computes components via projections of the covariance matrix X’Y. It produces the same predictions as PLSRegression for univariate Y, and slightly different (but equivalent in terms of prediction accuracy) results for multivariate Y.

SIMPLS is often faster than NIPALS for high-dimensional data because it avoids the iterative deflation of X.

Parameters:
  • n_components (int, default=10) – Number of PLS components to extract.

  • scale (bool, default=True) – Whether to scale X and Y to unit variance.

  • center (bool, default=True) – Whether to center X and Y (subtract mean).

  • backend (str, default='numpy') – Computational backend to use: - ‘numpy’: NumPy backend (CPU only). - ‘jax’: JAX backend (supports GPU/TPU acceleration). JAX backend requires JAX to be installed: pip install jax For GPU support: pip install jax[cuda12]

n_features_in_

Number of features seen during fit.

Type:

int

n_components_

Actual number of components used (may be less than n_components if limited by data dimensions).

Type:

int

x_mean_

Mean of X.

Type:

ndarray of shape (n_features,)

x_std_

Standard deviation of X.

Type:

ndarray of shape (n_features,)

y_mean_

Mean of Y.

Type:

ndarray of shape (n_targets,)

y_std_

Standard deviation of Y.

Type:

ndarray of shape (n_targets,)

x_scores_

X scores (T).

Type:

ndarray of shape (n_samples, n_components_)

y_scores_

Y scores (U).

Type:

ndarray of shape (n_samples, n_components_)

x_weights_

X weights (W).

Type:

ndarray of shape (n_features, n_components_)

x_loadings_

X loadings (P).

Type:

ndarray of shape (n_features, n_components_)

y_loadings_

Y loadings (Q).

Type:

ndarray of shape (n_targets, n_components_)

coef_

Regression coefficients (using all components).

Type:

ndarray of shape (n_features, n_targets)

Examples

>>> from nirs4all.operators.models.sklearn.simpls import SIMPLS
>>> import numpy as np
>>> # Generate sample data
>>> np.random.seed(42)
>>> X = np.random.randn(100, 50)
>>> y = X[:, :5].sum(axis=1) + 0.1 * np.random.randn(100)
>>> # Fit SIMPLS model
>>> model = SIMPLS(n_components=10)
>>> model.fit(X, y)
SIMPLS(n_components=10)
>>> predictions = model.predict(X)
>>> # Use JAX backend for GPU acceleration
>>> model_jax = SIMPLS(n_components=10, backend='jax')

Notes

SIMPLS differs from NIPALS in how the deflation is performed: - NIPALS deflates X after each component (X := X - t*p’) - SIMPLS deflates the covariance matrix S = X’Y

For univariate Y, both methods produce identical predictions. For multivariate Y, SIMPLS produces Y loadings that span the same space as NIPALS but with slightly different orientations.

See also

sklearn.cross_decomposition.PLSRegression

sklearn’s NIPALS-based PLS.

IKPLS

Fast PLS implementation from the ikpls package.

References

  • de Jong, S. (1993). SIMPLS: An alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 18(3), 251-263.

__repr__() str[source]

Return string representation.

fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) SIMPLS[source]

Fit the SIMPLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

Returns:

self – Fitted estimator.

Return type:

SIMPLS

Raises:
  • ValueError – If backend is not ‘numpy’ or ‘jax’.

  • ImportError – If backend is ‘jax’ and JAX is not installed.

get_params(deep: bool = True) dict[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], n_components: int | None = None) ndarray[tuple[Any, ...], dtype[floating]][source]

Predict using the SIMPLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Samples to predict.

  • n_components (int, optional) – Number of components to use for prediction. If None, uses all fitted components.

Returns:

y_pred – Predicted values.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_targets)

set_params(**params) SIMPLS[source]

Set the parameters of this estimator.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

SIMPLS

set_predict_request(*, n_components: bool | None | str = '$UNCHANGED$') SIMPLS

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

n_components (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for n_components parameter in predict.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SIMPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Transform X to score space.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to transform.

Returns:

T – X scores.

Return type:

ndarray of shape (n_samples, n_components_)

class nirs4all.operators.models.SelectorFactory[source]

Bases: object

Factory for creating source model selectors.

Provides a convenient way to instantiate selectors by name.

Example

>>> selector = SelectorFactory.create("all")
>>> selector = SelectorFactory.create("explicit", model_names=["PLS", "RF"])
>>> selector = SelectorFactory.create("top_k", k=5, metric="rmse")
classmethod create(selector_type: str, **kwargs) SourceModelSelector[source]

Create a selector by type name.

Parameters:
  • selector_type – Type name (e.g., “all”, “explicit”, “top_k”, “diversity”).

  • **kwargs – Arguments passed to the selector constructor.

Returns:

SourceModelSelector instance.

Raises:

ValueError – If selector_type is not recognized.

classmethod register(name: str, selector_class: type) None[source]

Register a custom selector type.

Parameters:
  • name – Name to register under.

  • selector_class – Selector class (must inherit from SourceModelSelector).

Raises:

TypeError – If selector_class doesn’t inherit from SourceModelSelector.

class nirs4all.operators.models.SourceModelSelector[source]

Bases: ABC

Abstract base class for source model selection strategies.

Defines the interface for selecting which models to include as sources in a stacking ensemble.

Subclasses must implement the select() method to define their selection logic.

Example

>>> class CustomSelector(SourceModelSelector):
...     def select(self, candidates, context, prediction_store):
...         # Custom selection logic
...         return [c for c in candidates if c.val_score > 0.9]
abstractmethod select(candidates: List[ModelCandidate], context: ExecutionContext, prediction_store: Predictions) List[ModelCandidate][source]

Select source models from candidates.

Parameters:
  • candidates – List of candidate models to select from.

  • context – Execution context with current step and branch info.

  • prediction_store – Predictions store for accessing model data.

Returns:

List of selected ModelCandidate objects in the order they should be used as features (determines column order in meta-features).

validate(selected: List[ModelCandidate], context: ExecutionContext) None[source]

Validate the selection (optional override).

Can raise ValueError if selection is invalid for the context.

Parameters:
  • selected – List of selected model candidates.

  • context – Execution context for validation.

Raises:

ValueError – If selection is invalid.

class nirs4all.operators.models.SparsePLS(n_components: int = 5, alpha: float = 1.0, max_iter: int = 500, tol: float = 1e-06, scale: bool = True, backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Sparse PLS (sPLS) regressor with L1 regularization.

Sparse PLS performs joint prediction and variable selection by applying L1 (Lasso) regularization to the PLS loadings. This produces sparse loadings where many wavelengths/features have zero weights, effectively selecting the most relevant variables.

Parameters:
  • n_components (int, default=5) – Number of latent variables to extract.

  • alpha (float, default=1.0) – Regularization strength. Higher values produce more sparsity.

  • max_iter (int, default=500) – Maximum number of iterations.

  • tol (float, default=1e-6) – Convergence tolerance.

  • scale (bool, default=True) – Whether to scale X and y before fitting.

  • backend (str, default='numpy') – Backend to use for computation. Options are: - ‘numpy’: Use NumPy backend (CPU only). - ‘jax’: Use JAX backend (supports GPU/TPU acceleration). JAX backend requires JAX: pip install jax For GPU support: pip install jax[cuda12]

n_features_in_

Number of features seen during fit.

Type:

int

n_components_

Actual number of components used.

Type:

int

coef_

Regression coefficients (sparse).

Type:

ndarray of shape (n_features,) or (n_features, n_targets)

Examples

>>> from nirs4all.operators.models.sklearn.pls import SparsePLS
>>> import numpy as np
>>> X = np.random.randn(100, 50)
>>> y = np.random.randn(100)
>>> model = SparsePLS(n_components=5, alpha=0.5)
>>> model.fit(X, y)
SparsePLS(n_components=5, alpha=0.5)
>>> predictions = model.predict(X)
>>> # Check sparsity
>>> n_selected = np.sum(model.coef_ != 0)
>>> # JAX backend for GPU acceleration
>>> model_jax = SparsePLS(n_components=5, alpha=0.5, backend='jax')

Notes

For JAX with GPU support: pip install jax[cuda12]

The alpha parameter controls the trade-off between prediction accuracy and sparsity. Use cross-validation to find the optimal value.

See also

sklearn.cross_decomposition.PLSRegression

Standard non-sparse PLS.

References

  • Lê Cao, K.-A., et al. (2008). Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics, 9(1), 1-18.

__repr__()[source]

Return string representation.

fit(X, y)[source]

Fit the Sparse PLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

Returns:

self – Fitted estimator.

Return type:

SparsePLS

Raises:
  • ImportError – If JAX is not available (JAX backend).

  • ValueError – If backend is not ‘numpy’ or ‘jax’.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

get_selected_features()[source]

Get indices of selected (non-zero) features.

Returns:

indices – Indices of features with non-zero coefficients.

Return type:

ndarray

predict(X)[source]

Predict using the Sparse PLS model.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

y_pred – Predicted values.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_targets)

set_params(**params)[source]

Set the parameters of this estimator.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

SparsePLS

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SparsePLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X)[source]

Transform X to latent space.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to transform.

Returns:

T – Latent variables (scores).

Return type:

ndarray of shape (n_samples, n_components)

class nirs4all.operators.models.StackingConfig(coverage_strategy: CoverageStrategy = CoverageStrategy.STRICT, test_aggregation: TestAggregation = TestAggregation.MEAN, branch_scope: BranchScope = BranchScope.CURRENT_ONLY, allow_no_cv: bool = False, min_coverage_ratio: float = 1.0, level: StackingLevel = StackingLevel.AUTO, allow_meta_sources: bool = True, max_level: int = 3)[source]

Bases: object

Configuration for meta-model training set reconstruction.

Controls how out-of-fold predictions are collected and processed to build the training features for the meta-model.

coverage_strategy

How to handle samples with missing predictions.

Type:

nirs4all.operators.models.meta.CoverageStrategy

test_aggregation

How to aggregate test predictions across folds.

Type:

nirs4all.operators.models.meta.TestAggregation

branch_scope

Which branches to include as source models.

Type:

nirs4all.operators.models.meta.BranchScope

allow_no_cv

If True, allow stacking without cross-validation (with warning).

Type:

bool

min_coverage_ratio

Minimum ratio of source models required per sample.

Type:

float

level

Stacking level for multi-level stacking (AUTO, LEVEL_1, LEVEL_2, LEVEL_3).

Type:

nirs4all.operators.models.meta.StackingLevel

allow_meta_sources

If True, allow other MetaModels as source models.

Type:

bool

max_level

Maximum allowed stacking level (for validation).

Type:

int

Example

>>> config = StackingConfig(
...     coverage_strategy=CoverageStrategy.DROP_INCOMPLETE,
...     test_aggregation=TestAggregation.WEIGHTED_MEAN,
...     min_coverage_ratio=0.5,
...     level=StackingLevel.AUTO,
...     allow_meta_sources=True
... )
__post_init__()[source]

Validate configuration after initialization.

allow_meta_sources: bool = True
allow_no_cv: bool = False
branch_scope: BranchScope = 'current_only'
coverage_strategy: CoverageStrategy = 'strict'
level: StackingLevel = 'auto'
max_level: int = 3
min_coverage_ratio: float = 1.0
test_aggregation: TestAggregation = 'mean'
class nirs4all.operators.models.StackingLevel(value)[source]

Bases: Enum

Level of stacking in multi-level stacking architecture.

Indicates where this meta-model sits in a stacking hierarchy. Used for validation and dependency tracking.

AUTO

Automatically detect level based on source models (default).

LEVEL_1

First meta-level (stacks on base models only).

LEVEL_2

Second meta-level (can stack on LEVEL_1 meta-models).

LEVEL_3

Third meta-level (can stack on LEVEL_1 and LEVEL_2).

AUTO = 'auto'
LEVEL_1 = 1
LEVEL_2 = 2
LEVEL_3 = 3
class nirs4all.operators.models.TestAggregation(value)[source]

Bases: Enum

Strategy for aggregating test predictions from multiple folds.

When base models are trained with cross-validation, each fold produces predictions for the test set. This determines how to combine them.

MEAN

Simple average across folds (default).

WEIGHTED_MEAN

Weighted average by validation scores.

BEST_FOLD

Use prediction from best-scoring fold only.

BEST_FOLD = 'best'
MEAN = 'mean'
WEIGHTED_MEAN = 'weighted'
class nirs4all.operators.models.TopKByMetricSelector(k: int, metric: str = 'val_score', ascending: bool | None = None, per_class: bool = False)[source]

Bases: SourceModelSelector

Select top K models by a validation metric.

Ranks models by their validation score and selects the top K performers.

k

Number of top models to select.

metric

Metric to rank by (e.g., “rmse”, “r2”, “accuracy”).

ascending

Sort direction. If None, inferred from metric.

per_class

If True, select top K per model class (for diversity).

Example

>>> selector = TopKByMetricSelector(k=3, metric="rmse", ascending=True)
select(candidates: List[ModelCandidate], context: ExecutionContext, prediction_store: Predictions) List[ModelCandidate][source]

Select top K models by metric.

Parameters:
  • candidates – List of candidate models.

  • context – Execution context.

  • prediction_store – Predictions store (unused).

Returns:

Top K models sorted by metric.

nirs4all.operators.models.__getattr__(name)[source]

Lazy attribute access for TensorFlow models.