nirs4all.operators.models.sklearn.lwpls module

Locally-Weighted Partial Least Squares (LWPLS) model operator.

This module provides a sklearn-compatible LWPLS implementation for nirs4all. The core algorithm is adapted from the original implementation by Hiromasa Kaneko (https://github.com/hkaneko1985/lwpls), licensed under MIT License.

LWPLS builds just-in-time local PLS models near each query sample, which is useful when dealing with drift, local nonlinearity, or heterogeneous data.

Supports both NumPy (CPU) and JAX (GPU/TPU) backends.

References

Kim, S., Kano, M., Nakagawa, H., & Hasebe, S. (2011). Estimation of active pharmaceutical ingredient content using locally weighted partial least squares and statistical wavelength selection. International Journal of Pharmaceutics, 421(2), 269-274.
https://datachemeng.com/locallyweightedpartialleastsquares/

License

Original lwpls.py by Hiromasa Kaneko is MIT licensed.

class nirs4all.operators.models.sklearn.lwpls.LWPLS(n_components: int = 10, lambda_in_similarity: float = 1.0, scale: bool = True, backend: str = 'numpy', batch_size: int = 64)[source]

Bases: BaseEstimator, RegressorMixin

Locally-Weighted Partial Least Squares (LWPLS) regressor.

LWPLS builds a local PLS model for each query sample, weighting training samples by their similarity (proximity) to the query. This approach is useful for:

Data with local nonlinearity
Drifting processes where the relationship changes over time
Heterogeneous data where a single global model is inadequate

The similarity is computed using a Gaussian kernel based on Euclidean distance, controlled by the lambda_in_similarity parameter.

Parameters:

n_components (int, default=10) – Maximum number of PLS components to extract for each local model.
lambda_in_similarity (float, default=1.0) – Kernel width parameter. Smaller values create more localized models (more weight on nearby samples), larger values approach global PLS. Typical values range from 2^-9 to 2^5 depending on the data.
scale (bool, default=True) – Whether to standardize X and y before fitting. Strongly recommended as LWPLS uses Euclidean distances.
backend (str, default='numpy') – Computational backend to use. Options are: - ‘numpy’: NumPy backend (CPU only, default). - ‘jax’: JAX backend (supports GPU/TPU acceleration). - ‘torch’: PyTorch backend (supports GPU acceleration). JAX backend requires JAX to be installed: pip install jax For GPU support: pip install jax[cuda12] PyTorch backend requires PyTorch: pip install torch For GPU support: pip install torch with CUDA.
batch_size (int, default=64) – Number of test samples to process per batch (JAX/torch backends). Reduce this if running out of GPU memory on large datasets. Ignored for NumPy backend.

n_features_in\_

Number of features seen during fit.

Type:: int

n_components\_

Actual number of components used (limited by data dimensions).

Type:: int

X_train\_

Stored training X data (standardized if scale=True).

Type:: ndarray of shape (n_samples, n_features)

y_train\_

Stored training y data (standardized if scale=True).

Type:: ndarray of shape (n_samples,)

x_scaler\_

Fitted scaler for X (if scale=True).

Type:: StandardScaler or None

y_scaler\_

Fitted scaler for y (if scale=True).

Type:: StandardScaler or None

Examples

>>> from nirs4all.operators.models.sklearn.lwpls import LWPLS
>>> import numpy as np
>>> # Nonlinear data
>>> np.random.seed(42)
>>> X = 5 * np.random.rand(100, 2)
>>> y = 3 * X[:, 0]**2 + 10 * np.log(X[:, 1] + 0.1) + np.random.randn(100)
>>> # Split data
>>> X_train, X_test = X[:70], X[70:]
>>> y_train, y_test = y[:70], y[70:]
>>> # Fit LWPLS with NumPy backend (default)
>>> model = LWPLS(n_components=5, lambda_in_similarity=0.25)
>>> model.fit(X_train, y_train)
LWPLS(n_components=5, lambda_in_similarity=0.25)
>>> y_pred = model.predict(X_test)
>>> # Use JAX backend for GPU acceleration
>>> model_jax = LWPLS(n_components=5, lambda_in_similarity=0.25, backend='jax')
>>> model_jax.fit(X_train, y_train)
>>> y_pred_jax = model_jax.predict(X_test)
>>> # Use PyTorch backend for GPU acceleration
>>> model_torch = LWPLS(n_components=5, lambda_in_similarity=0.25, backend='torch')
>>> model_torch.fit(X_train, y_train)
>>> y_pred_torch = model_torch.predict(X_test)

Notes

LWPLS is computationally more expensive than standard PLS because it builds a separate weighted model for each prediction. The training data must be stored for prediction.

The JAX backend provides significant speedups on GPU by: - Vectorizing the per-sample loop using jax.vmap - JIT-compiling the prediction function - Running on GPU/TPU when available

The PyTorch backend provides GPU acceleration by: - Running tensor operations on CUDA or MPS devices - Batched processing to control memory usage - Automatic device selection when device=’auto’

The optimal lambda_in_similarity should be tuned via cross-validation. Typical search range is 2^k for k in [-9, 6].

This implementation is adapted from the original code by Hiromasa Kaneko (https://github.com/hkaneko1985/lwpls), licensed under MIT License.

See also

sklearn.cross_decomposition.PLSRegression: Standard global PLS.
IKPLS: Fast PLS implementation.

References

Kim, S., et al. (2011). Estimation of active pharmaceutical ingredient content using locally weighted partial least squares. International Journal of Pharmaceutics, 421(2), 269-274.

__repr__() → str[source]: Return string representation.

Fit the LWPLS model.

This stores the training data and fits scalers if requested. Actual model building happens lazily at prediction time.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.
y (array-like of shape (n_samples,) or (n_samples, 1)) – Target values.

Returns:

self – Fitted estimator.

Return type:

LWPLS

Raises:

ValueError – If backend is not ‘numpy’, ‘jax’, or ‘torch’.
ImportError – If backend is ‘jax’ and JAX is not installed, or if backend is ‘torch’ and PyTorch is not installed.

get_params(deep: bool = True) → dict[source]

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

Predict using the LWPLS model.

Builds a local weighted PLS model for each test sample.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.
n_components (int, optional) – Number of components to use for prediction. If None, uses n_components_ (all fitted components).

Returns:

y_pred – Predicted target values.

Return type:

ndarray of shape (n_samples,)

Predict with all component numbers (for component selection).

Returns predictions for each number of components, which can be used for cross-validation to select the optimal n_components.

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to predict.
Returns:: y_pred_all – Predictions where column i contains predictions using i+1 components.
Return type:: ndarray of shape (n_samples, n_components)

set_params(**params) → LWPLS[source]

Set the parameters of this estimator.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: LWPLS

set_predict_request(*, n_components: bool | None | str = '$UNCHANGED$') → LWPLS

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: n_components (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for n_components parameter in predict.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → LWPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object