nirs4all.operators.models.sklearn.ipls module

Interval PLS (iPLS) regressor for nirs4all.

A sklearn-compatible implementation of Interval PLS for wavelength interval selection in spectroscopic data. iPLS evaluates PLS models on contiguous wavelength windows to identify optimal spectral regions.

Supports both NumPy (CPU) and JAX (GPU/TPU) backends.

References

  • Norgaard, L., Saudland, A., Wagner, J., Nielsen, J. P., Munck, L., & Engelsen, S. B. (2000). Interval partial least-squares regression (iPLS): A comparative chemometric study with an example from near-infrared spectroscopy. Applied Spectroscopy, 54(3), 413-419.

  • Leardi, R., & Nørgaard, L. (2004). Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions. Journal of Chemometrics, 18(11), 486-497.

class nirs4all.operators.models.sklearn.ipls.IntervalPLS(n_components: int = 5, n_intervals: int = 10, interval_width: int | None = None, cv: int = 5, scoring: str = 'r2', mode: Literal['single', 'forward', 'backward'] = 'forward', combination_method: Literal['best', 'union'] = 'union', backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Interval Partial Least Squares (iPLS) regressor.

iPLS evaluates PLS models on contiguous wavelength intervals to identify optimal spectral regions for prediction. This is particularly useful for NIR spectroscopy where not all wavelengths contribute equally to the prediction.

The algorithm divides the spectrum into intervals and evaluates each interval (or combination of intervals) using cross-validation. Different selection modes are available: - ‘single’: Select only the best performing interval - ‘forward’: Iteratively add intervals that improve performance - ‘backward’: Start with all intervals and remove those that don’t help

Parameters:
  • n_components (int, default=5) – Number of PLS components to extract for each interval model.

  • n_intervals (int, default=10) – Number of equal-width intervals to divide X into.

  • interval_width (int, optional) – Fixed width for each interval. If specified, overrides n_intervals.

  • cv (int, default=5) – Number of cross-validation folds for interval evaluation.

  • scoring (str, default='r2') – Scoring metric for cross-validation. Supports sklearn metrics like ‘r2’, ‘neg_mean_squared_error’, etc.

  • mode ({'single', 'forward', 'backward'}, default='forward') – Interval selection mode: - ‘single’: Use only the best single interval - ‘forward’: Forward selection of intervals - ‘backward’: Backward elimination of intervals

  • combination_method ({'best', 'union'}, default='union') – How to combine selected intervals for the final model: - ‘best’: Use only the single best interval - ‘union’: Use union of all selected intervals

  • backend (str, default='numpy') – Computational backend: - ‘numpy’: NumPy backend (CPU only, default) - ‘jax’: JAX backend (supports GPU/TPU acceleration) Note: JAX backend accelerates interval evaluation but final model fitting uses sklearn for compatibility.

n_features_in\_

Number of features seen during fit.

Type:

int

n_components\_

Actual number of components used in final model.

Type:

int

interval_scores\_

Cross-validation scores for each interval.

Type:

ndarray of shape (n_intervals,)

interval_starts\_

Start indices for each interval.

Type:

ndarray of shape (n_intervals,)

interval_ends\_

End indices for each interval.

Type:

ndarray of shape (n_intervals,)

n_intervals\_

Actual number of intervals.

Type:

int

selected_intervals\_

Indices of selected intervals.

Type:

list of int

selected_regions\_

(start, end) pairs for selected spectral regions.

Type:

list of tuple

coef\_

Regression coefficients for selected features.

Type:

ndarray of shape (n_selected_features, n_targets)

feature_mask\_

Boolean mask indicating selected features.

Type:

ndarray of shape (n_features,)

Examples

>>> from nirs4all.operators.models.sklearn.ipls import IntervalPLS
>>> import numpy as np
>>> # Generate sample spectral data
>>> np.random.seed(42)
>>> X = np.random.randn(100, 200)  # 200 wavelengths
>>> y = X[:, 50:70].sum(axis=1) + 0.1 * np.random.randn(100)  # Signal in 50-70
>>> # Fit iPLS to find informative regions
>>> model = IntervalPLS(n_components=5, n_intervals=10, mode='forward')
>>> model.fit(X, y)
IntervalPLS(n_components=5, n_intervals=10, mode='forward')
>>> # See which intervals were selected
>>> print(f"Selected intervals: {model.selected_intervals_}")
>>> print(f"Selected regions: {model.selected_regions_}")
>>> # Predict
>>> predictions = model.predict(X)

Notes

iPLS is particularly effective for NIR spectroscopy because: 1. Different spectral regions contain different chemical information 2. Some regions may be dominated by noise or uninformative signals 3. Selecting optimal intervals can improve both prediction and interpretation

The JAX backend provides acceleration for interval evaluation when using GPU/TPU, which is beneficial when evaluating many intervals.

See also

sklearn.cross_decomposition.PLSRegression

Standard PLS regression.

SIMPLS

SIMPLS algorithm implementation.

References

  • Norgaard, L., et al. (2000). Interval partial least-squares regression (iPLS): A comparative chemometric study with an example from near-infrared spectroscopy. Applied Spectroscopy, 54(3), 413-419.

__repr__() str[source]

Return string representation.

fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) IntervalPLS[source]

Fit the IntervalPLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data (e.g., spectral data).

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

Returns:

self – Fitted estimator.

Return type:

IntervalPLS

Raises:
  • ValueError – If backend is invalid or mode is invalid.

  • ImportError – If backend is ‘jax’ and JAX is not installed.

get_interval_info() dict[source]

Get detailed information about intervals and selection.

Returns:

info – Dictionary containing: - ‘n_intervals’: Number of intervals - ‘interval_scores’: CV scores for each interval - ‘interval_ranges’: List of (start, end) for each interval - ‘selected_intervals’: Indices of selected intervals - ‘selected_regions’: (start, end) pairs for selected regions - ‘n_selected_features’: Total number of selected features

Return type:

dict

get_params(deep: bool = True) dict[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Predict using the fitted IntervalPLS model.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

y_pred – Predicted values.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_targets)

set_params(**params) IntervalPLS[source]

Set the parameters of this estimator.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

IntervalPLS

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IntervalPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Transform X to selected feature space.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to transform.

Returns:

X_selected – Selected features only.

Return type:

ndarray of shape (n_samples, n_selected_features)