nirs4all.operators.models.sklearn.ipls module

Interval PLS (iPLS) regressor for nirs4all.

A sklearn-compatible implementation of Interval PLS for wavelength interval selection in spectroscopic data. iPLS evaluates PLS models on contiguous wavelength windows to identify optimal spectral regions.

Supports both NumPy (CPU) and JAX (GPU/TPU) backends.

References

Norgaard, L., Saudland, A., Wagner, J., Nielsen, J. P., Munck, L., & Engelsen, S. B. (2000). Interval partial least-squares regression (iPLS): A comparative chemometric study with an example from near-infrared spectroscopy. Applied Spectroscopy, 54(3), 413-419.
Leardi, R., & Nørgaard, L. (2004). Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions. Journal of Chemometrics, 18(11), 486-497.

class nirs4all.operators.models.sklearn.ipls.IntervalPLS(n_components: int = 5, n_intervals: int = 10, interval_width: int | None = None, cv: int = 5, scoring: str = 'r2', mode: Literal['single', 'forward', 'backward'] = 'forward', combination_method: Literal['best', 'union'] = 'union', backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Interval Partial Least Squares (iPLS) regressor.

iPLS evaluates PLS models on contiguous wavelength intervals to identify optimal spectral regions for prediction. This is particularly useful for NIR spectroscopy where not all wavelengths contribute equally to the prediction.

The algorithm divides the spectrum into intervals and evaluates each interval (or combination of intervals) using cross-validation. Different selection modes are available: - ‘single’: Select only the best performing interval - ‘forward’: Iteratively add intervals that improve performance - ‘backward’: Start with all intervals and remove those that don’t help

Parameters:

n_components (int, default=5) – Number of PLS components to extract for each interval model.
n_intervals (int, default=10) – Number of equal-width intervals to divide X into.
interval_width (int, optional) – Fixed width for each interval. If specified, overrides n_intervals.
cv (int, default=5) – Number of cross-validation folds for interval evaluation.
scoring (str, default='r2') – Scoring metric for cross-validation. Supports sklearn metrics like ‘r2’, ‘neg_mean_squared_error’, etc.
mode ({'single', 'forward', 'backward'}, default='forward') – Interval selection mode: - ‘single’: Use only the best single interval - ‘forward’: Forward selection of intervals - ‘backward’: Backward elimination of intervals
combination_method ({'best', 'union'}, default='union') – How to combine selected intervals for the final model: - ‘best’: Use only the single best interval - ‘union’: Use union of all selected intervals
backend (str, default='numpy') – Computational backend: - ‘numpy’: NumPy backend (CPU only, default) - ‘jax’: JAX backend (supports GPU/TPU acceleration) Note: JAX backend accelerates interval evaluation but final model fitting uses sklearn for compatibility.

n_features_in\_

Number of features seen during fit.

Type:: int

n_components\_

Actual number of components used in final model.

Type:: int

interval_scores\_

Cross-validation scores for each interval.

Type:: ndarray of shape (n_intervals,)

interval_starts\_

Start indices for each interval.

Type:: ndarray of shape (n_intervals,)

interval_ends\_

End indices for each interval.

Type:: ndarray of shape (n_intervals,)

n_intervals\_

Actual number of intervals.

Type:: int

selected_intervals\_

Indices of selected intervals.

Type:: list of int

selected_regions\_

(start, end) pairs for selected spectral regions.

Type:: list of tuple

coef\_

Regression coefficients for selected features.

Type:: ndarray of shape (n_selected_features, n_targets)

feature_mask\_

Boolean mask indicating selected features.

Type:: ndarray of shape (n_features,)

Examples

>>> from nirs4all.operators.models.sklearn.ipls import IntervalPLS
>>> import numpy as np
>>> # Generate sample spectral data
>>> np.random.seed(42)
>>> X = np.random.randn(100, 200)  # 200 wavelengths
>>> y = X[:, 50:70].sum(axis=1) + 0.1 * np.random.randn(100)  # Signal in 50-70
>>> # Fit iPLS to find informative regions
>>> model = IntervalPLS(n_components=5, n_intervals=10, mode='forward')
>>> model.fit(X, y)
IntervalPLS(n_components=5, n_intervals=10, mode='forward')
>>> # See which intervals were selected
>>> print(f"Selected intervals: {model.selected_intervals_}")
>>> print(f"Selected regions: {model.selected_regions_}")
>>> # Predict
>>> predictions = model.predict(X)

Notes

iPLS is particularly effective for NIR spectroscopy because: 1. Different spectral regions contain different chemical information 2. Some regions may be dominated by noise or uninformative signals 3. Selecting optimal intervals can improve both prediction and interpretation

The JAX backend provides acceleration for interval evaluation when using GPU/TPU, which is beneficial when evaluating many intervals.