nirs4all.operators.models.sklearn.ipls module
Interval PLS (iPLS) regressor for nirs4all.
A sklearn-compatible implementation of Interval PLS for wavelength interval selection in spectroscopic data. iPLS evaluates PLS models on contiguous wavelength windows to identify optimal spectral regions.
Supports both NumPy (CPU) and JAX (GPU/TPU) backends.
References
Norgaard, L., Saudland, A., Wagner, J., Nielsen, J. P., Munck, L., & Engelsen, S. B. (2000). Interval partial least-squares regression (iPLS): A comparative chemometric study with an example from near-infrared spectroscopy. Applied Spectroscopy, 54(3), 413-419.
Leardi, R., & Nørgaard, L. (2004). Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions. Journal of Chemometrics, 18(11), 486-497.
- class nirs4all.operators.models.sklearn.ipls.IntervalPLS(n_components: int = 5, n_intervals: int = 10, interval_width: int | None = None, cv: int = 5, scoring: str = 'r2', mode: Literal['single', 'forward', 'backward'] = 'forward', combination_method: Literal['best', 'union'] = 'union', backend: str = 'numpy')[source]
Bases:
BaseEstimator,RegressorMixinInterval Partial Least Squares (iPLS) regressor.
iPLS evaluates PLS models on contiguous wavelength intervals to identify optimal spectral regions for prediction. This is particularly useful for NIR spectroscopy where not all wavelengths contribute equally to the prediction.
The algorithm divides the spectrum into intervals and evaluates each interval (or combination of intervals) using cross-validation. Different selection modes are available: - ‘single’: Select only the best performing interval - ‘forward’: Iteratively add intervals that improve performance - ‘backward’: Start with all intervals and remove those that don’t help
- Parameters:
n_components (int, default=5) – Number of PLS components to extract for each interval model.
n_intervals (int, default=10) – Number of equal-width intervals to divide X into.
interval_width (int, optional) – Fixed width for each interval. If specified, overrides n_intervals.
cv (int, default=5) – Number of cross-validation folds for interval evaluation.
scoring (str, default='r2') – Scoring metric for cross-validation. Supports sklearn metrics like ‘r2’, ‘neg_mean_squared_error’, etc.
mode ({'single', 'forward', 'backward'}, default='forward') – Interval selection mode: - ‘single’: Use only the best single interval - ‘forward’: Forward selection of intervals - ‘backward’: Backward elimination of intervals
combination_method ({'best', 'union'}, default='union') – How to combine selected intervals for the final model: - ‘best’: Use only the single best interval - ‘union’: Use union of all selected intervals
backend (str, default='numpy') – Computational backend: - ‘numpy’: NumPy backend (CPU only, default) - ‘jax’: JAX backend (supports GPU/TPU acceleration) Note: JAX backend accelerates interval evaluation but final model fitting uses sklearn for compatibility.
- n_features_in\_
Number of features seen during fit.
- Type:
- n_components\_
Actual number of components used in final model.
- Type:
- interval_scores\_
Cross-validation scores for each interval.
- Type:
ndarray of shape (n_intervals,)
- interval_starts\_
Start indices for each interval.
- Type:
ndarray of shape (n_intervals,)
- interval_ends\_
End indices for each interval.
- Type:
ndarray of shape (n_intervals,)
- n_intervals\_
Actual number of intervals.
- Type:
- coef\_
Regression coefficients for selected features.
- feature_mask\_
Boolean mask indicating selected features.
- Type:
ndarray of shape (n_features,)
Examples
>>> from nirs4all.operators.models.sklearn.ipls import IntervalPLS >>> import numpy as np >>> # Generate sample spectral data >>> np.random.seed(42) >>> X = np.random.randn(100, 200) # 200 wavelengths >>> y = X[:, 50:70].sum(axis=1) + 0.1 * np.random.randn(100) # Signal in 50-70 >>> # Fit iPLS to find informative regions >>> model = IntervalPLS(n_components=5, n_intervals=10, mode='forward') >>> model.fit(X, y) IntervalPLS(n_components=5, n_intervals=10, mode='forward') >>> # See which intervals were selected >>> print(f"Selected intervals: {model.selected_intervals_}") >>> print(f"Selected regions: {model.selected_regions_}") >>> # Predict >>> predictions = model.predict(X)
Notes
iPLS is particularly effective for NIR spectroscopy because: 1. Different spectral regions contain different chemical information 2. Some regions may be dominated by noise or uninformative signals 3. Selecting optimal intervals can improve both prediction and interpretation
The JAX backend provides acceleration for interval evaluation when using GPU/TPU, which is beneficial when evaluating many intervals.
See also
sklearn.cross_decomposition.PLSRegressionStandard PLS regression.
SIMPLSSIMPLS algorithm implementation.
References
Norgaard, L., et al. (2000). Interval partial least-squares regression (iPLS): A comparative chemometric study with an example from near-infrared spectroscopy. Applied Spectroscopy, 54(3), 413-419.
- fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) IntervalPLS[source]
Fit the IntervalPLS model.
- Parameters:
- Returns:
self – Fitted estimator.
- Return type:
- Raises:
ValueError – If backend is invalid or mode is invalid.
ImportError – If backend is ‘jax’ and JAX is not installed.
- get_interval_info() dict[source]
Get detailed information about intervals and selection.
- Returns:
info – Dictionary containing: - ‘n_intervals’: Number of intervals - ‘interval_scores’: CV scores for each interval - ‘interval_ranges’: List of (start, end) for each interval - ‘selected_intervals’: Indices of selected intervals - ‘selected_regions’: (start, end) pairs for selected regions - ‘n_selected_features’: Total number of selected features
- Return type:
- predict(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]
Predict using the fitted IntervalPLS model.
- set_params(**params) IntervalPLS[source]
Set the parameters of this estimator.
- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IntervalPLS
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]
Transform X to selected feature space.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to transform.
- Returns:
X_selected – Selected features only.
- Return type: