nirs4all.operators.models.sklearn.sparsepls module

Sparse PLS (sPLS) regressor with L1 regularization for nirs4all.

See pls.py for full documentation and usage examples.

class nirs4all.operators.models.sklearn.sparsepls.SparsePLS(n_components: int = 5, alpha: float = 1.0, max_iter: int = 500, tol: float = 1e-06, scale: bool = True, backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Sparse PLS (sPLS) regressor with L1 regularization.

Sparse PLS performs joint prediction and variable selection by applying L1 (Lasso) regularization to the PLS loadings. This produces sparse loadings where many wavelengths/features have zero weights, effectively selecting the most relevant variables.

Parameters:

n_components (int, default=5) – Number of latent variables to extract.
alpha (float, default=1.0) – Regularization strength. Higher values produce more sparsity.
max_iter (int, default=500) – Maximum number of iterations.
tol (float, default=1e-6) – Convergence tolerance.
scale (bool, default=True) – Whether to scale X and y before fitting.
backend (str, default='numpy') – Backend to use for computation. Options are: - ‘numpy’: Use NumPy backend (CPU only). - ‘jax’: Use JAX backend (supports GPU/TPU acceleration). JAX backend requires JAX: pip install jax For GPU support: pip install jax[cuda12]

n_features_in_

Number of features seen during fit.

Type:: int

n_components_

Actual number of components used.

Type:: int

coef_

Regression coefficients (sparse).

Type:: ndarray of shape (n_features,) or (n_features, n_targets)

Examples

>>> from nirs4all.operators.models.sklearn.pls import SparsePLS
>>> import numpy as np
>>> X = np.random.randn(100, 50)
>>> y = np.random.randn(100)
>>> model = SparsePLS(n_components=5, alpha=0.5)
>>> model.fit(X, y)
SparsePLS(n_components=5, alpha=0.5)
>>> predictions = model.predict(X)
>>> # Check sparsity
>>> n_selected = np.sum(model.coef_ != 0)
>>> # JAX backend for GPU acceleration
>>> model_jax = SparsePLS(n_components=5, alpha=0.5, backend='jax')

Notes

For JAX with GPU support: pip install jax[cuda12]

The alpha parameter controls the trade-off between prediction accuracy and sparsity. Use cross-validation to find the optimal value.

See also

sklearn.cross_decomposition.PLSRegression: Standard non-sparse PLS.

References

Lê Cao, K.-A., et al. (2008). Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics, 9(1), 1-18.

__repr__()[source]: Return string representation.

fit(X, y)[source]

Fit the Sparse PLS model.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.
y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

Returns:

self – Fitted estimator.

Return type:

SparsePLS

Raises:

ImportError – If JAX is not available (JAX backend).
ValueError – If backend is not ‘numpy’ or ‘jax’.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

get_selected_features()[source]

Get indices of selected (non-zero) features.

Returns:: indices – Indices of features with non-zero coefficients.
Return type:: ndarray

predict(X)[source]

Predict using the Sparse PLS model.

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to predict.
Returns:: y_pred – Predicted values.
Return type:: ndarray of shape (n_samples,) or (n_samples, n_targets)

set_params(**params)[source]

Set the parameters of this estimator.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: SparsePLS

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → SparsePLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

transform(X)[source]

Transform X to latent space.

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to transform.
Returns:: T – Latent variables (scores).
Return type:: ndarray of shape (n_samples, n_components)