nirs4all.operators.models.sklearn.lwpls module
Locally-Weighted Partial Least Squares (LWPLS) model operator.
This module provides a sklearn-compatible LWPLS implementation for nirs4all. The core algorithm is adapted from the original implementation by Hiromasa Kaneko (https://github.com/hkaneko1985/lwpls), licensed under MIT License.
LWPLS builds just-in-time local PLS models near each query sample, which is useful when dealing with drift, local nonlinearity, or heterogeneous data.
Supports both NumPy (CPU) and JAX (GPU/TPU) backends.
References
Kim, S., Kano, M., Nakagawa, H., & Hasebe, S. (2011). Estimation of active pharmaceutical ingredient content using locally weighted partial least squares and statistical wavelength selection. International Journal of Pharmaceutics, 421(2), 269-274.
License
Original lwpls.py by Hiromasa Kaneko is MIT licensed.
- class nirs4all.operators.models.sklearn.lwpls.LWPLS(n_components: int = 10, lambda_in_similarity: float = 1.0, scale: bool = True, backend: str = 'numpy', batch_size: int = 64)[source]
Bases:
BaseEstimator,RegressorMixinLocally-Weighted Partial Least Squares (LWPLS) regressor.
LWPLS builds a local PLS model for each query sample, weighting training samples by their similarity (proximity) to the query. This approach is useful for:
Data with local nonlinearity
Drifting processes where the relationship changes over time
Heterogeneous data where a single global model is inadequate
The similarity is computed using a Gaussian kernel based on Euclidean distance, controlled by the lambda_in_similarity parameter.
- Parameters:
n_components (int, default=10) – Maximum number of PLS components to extract for each local model.
lambda_in_similarity (float, default=1.0) – Kernel width parameter. Smaller values create more localized models (more weight on nearby samples), larger values approach global PLS. Typical values range from 2^-9 to 2^5 depending on the data.
scale (bool, default=True) – Whether to standardize X and y before fitting. Strongly recommended as LWPLS uses Euclidean distances.
backend (str, default='numpy') – Computational backend to use. Options are: - ‘numpy’: NumPy backend (CPU only, default). - ‘jax’: JAX backend (supports GPU/TPU acceleration). - ‘torch’: PyTorch backend (supports GPU acceleration). JAX backend requires JAX to be installed:
pip install jaxFor GPU support:pip install jax[cuda12]PyTorch backend requires PyTorch:pip install torchFor GPU support:pip install torchwith CUDA.batch_size (int, default=64) – Number of test samples to process per batch (JAX/torch backends). Reduce this if running out of GPU memory on large datasets. Ignored for NumPy backend.
- n_features_in\_
Number of features seen during fit.
- Type:
- n_components\_
Actual number of components used (limited by data dimensions).
- Type:
- X_train\_
Stored training X data (standardized if scale=True).
- Type:
ndarray of shape (n_samples, n_features)
- y_train\_
Stored training y data (standardized if scale=True).
- Type:
ndarray of shape (n_samples,)
- x_scaler\_
Fitted scaler for X (if scale=True).
- Type:
StandardScaler or None
- y_scaler\_
Fitted scaler for y (if scale=True).
- Type:
StandardScaler or None
Examples
>>> from nirs4all.operators.models.sklearn.lwpls import LWPLS >>> import numpy as np >>> # Nonlinear data >>> np.random.seed(42) >>> X = 5 * np.random.rand(100, 2) >>> y = 3 * X[:, 0]**2 + 10 * np.log(X[:, 1] + 0.1) + np.random.randn(100) >>> # Split data >>> X_train, X_test = X[:70], X[70:] >>> y_train, y_test = y[:70], y[70:] >>> # Fit LWPLS with NumPy backend (default) >>> model = LWPLS(n_components=5, lambda_in_similarity=0.25) >>> model.fit(X_train, y_train) LWPLS(n_components=5, lambda_in_similarity=0.25) >>> y_pred = model.predict(X_test) >>> # Use JAX backend for GPU acceleration >>> model_jax = LWPLS(n_components=5, lambda_in_similarity=0.25, backend='jax') >>> model_jax.fit(X_train, y_train) >>> y_pred_jax = model_jax.predict(X_test) >>> # Use PyTorch backend for GPU acceleration >>> model_torch = LWPLS(n_components=5, lambda_in_similarity=0.25, backend='torch') >>> model_torch.fit(X_train, y_train) >>> y_pred_torch = model_torch.predict(X_test)
Notes
LWPLS is computationally more expensive than standard PLS because it builds a separate weighted model for each prediction. The training data must be stored for prediction.
The JAX backend provides significant speedups on GPU by: - Vectorizing the per-sample loop using
jax.vmap- JIT-compiling the prediction function - Running on GPU/TPU when availableThe PyTorch backend provides GPU acceleration by: - Running tensor operations on CUDA or MPS devices - Batched processing to control memory usage - Automatic device selection when device=’auto’
The optimal lambda_in_similarity should be tuned via cross-validation. Typical search range is 2^k for k in [-9, 6].
This implementation is adapted from the original code by Hiromasa Kaneko (https://github.com/hkaneko1985/lwpls), licensed under MIT License.
See also
sklearn.cross_decomposition.PLSRegressionStandard global PLS.
IKPLSFast PLS implementation.
References
Kim, S., et al. (2011). Estimation of active pharmaceutical ingredient content using locally weighted partial least squares. International Journal of Pharmaceutics, 421(2), 269-274.
- fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) LWPLS[source]
Fit the LWPLS model.
This stores the training data and fits scalers if requested. Actual model building happens lazily at prediction time.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
y (array-like of shape (n_samples,) or (n_samples, 1)) – Target values.
- Returns:
self – Fitted estimator.
- Return type:
- Raises:
ValueError – If backend is not ‘numpy’, ‘jax’, or ‘torch’.
ImportError – If backend is ‘jax’ and JAX is not installed, or if backend is ‘torch’ and PyTorch is not installed.
- predict(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], n_components: int | None = None) ndarray[tuple[Any, ...], dtype[floating]][source]
Predict using the LWPLS model.
Builds a local weighted PLS model for each test sample.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to predict.
n_components (int, optional) – Number of components to use for prediction. If None, uses
n_components_(all fitted components).
- Returns:
y_pred – Predicted target values.
- Return type:
ndarray of shape (n_samples,)
- predict_all_components(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]
Predict with all component numbers (for component selection).
Returns predictions for each number of components, which can be used for cross-validation to select the optimal n_components.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to predict.
- Returns:
y_pred_all – Predictions where column i contains predictions using i+1 components.
- Return type:
ndarray of shape (n_samples, n_components)
- set_predict_request(*, n_components: bool | None | str = '$UNCHANGED$') LWPLS
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LWPLS
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.