nirs4all.operators.models.sklearn.nlpls module
Nonlinear PLS (NL-PLS / Kernel PLS) regressor for nirs4all.
A sklearn-compatible implementation of Nonlinear PLS using kernel methods. This approach maps the data into a higher-dimensional feature space using a kernel function (e.g., RBF) and then fits a standard PLS model on the kernel matrix.
Supports both NumPy (CPU) and JAX (GPU/TPU) backends.
Two implementations are provided:
KernelPLS (KPLS) - Simple Kernel PLS Maps X into kernel space using a nonlinear kernel (RBF, polynomial, etc.) and fits PLS on the kernel matrix K = kernel(X, X).
MIRPLS - Monotonic Inner Relation PLS (experimental) Implements the MIR-PLS algorithm from Zheng et al. (2024) which uses monotonic cubic spline piecewise regression for the inner model.
References
Rosipal, R., & Trejo, L. J. (2001). Kernel partial least squares regression in reproducing kernel hilbert space. Journal of Machine Learning Research, 2, 97-123.
Zheng, X., Nie, B., Du, J., et al. (2024). A non-linear partial least squares based on monotonic inner relation. Frontiers in Physiology, 15, 1369165. doi:10.3389/fphys.2024.1369165
Qin, S. J., & McAvoy, T. J. (1992). Nonlinear PLS modeling using neural networks. Computers & Chemical Engineering, 16(4), 379-391.
- class nirs4all.operators.models.sklearn.nlpls.KernelPLS(n_components: int = 10, kernel: Literal['rbf', 'linear', 'poly', 'sigmoid'] = 'rbf', gamma: float | None = None, degree: int = 3, coef0: float = 1.0, center_kernel: bool = True, scale_y: bool = True, backend: str = 'numpy')[source]
Bases:
BaseEstimator,RegressorMixinNonlinear PLS using Kernel Methods (Kernel PLS / NL-PLS).
Kernel PLS maps the input data X into a higher-dimensional feature space using a kernel function (RBF, polynomial, sigmoid) and then fits a PLS model on the kernel matrix K(X, X). This allows capturing nonlinear relationships between X and Y while retaining the interpretability of PLS.
The algorithm: 1. Compute kernel matrix K = kernel(X_train, X_train) 2. Center the kernel matrix 3. Fit PLS on K with target Y 4. For prediction: K_test = kernel(X_test, X_train), center, predict
This is a simple and effective approach for nonlinear regression that combines the power of kernel methods with PLS dimensionality reduction.
- Parameters:
n_components (int, default=10) – Number of PLS components to extract.
kernel ({'rbf', 'linear', 'poly', 'sigmoid'}, default='rbf') – Kernel function to use: - ‘rbf’: Radial basis function K(x,y) = exp(-gamma ||x-y||^2) - ‘linear’: Linear kernel K(x,y) = x^T y (equivalent to standard PLS) - ‘poly’: Polynomial kernel K(x,y) = (gamma * x^T y + coef0)^degree - ‘sigmoid’: Sigmoid kernel K(x,y) = tanh(gamma * x^T y + coef0)
gamma (float, optional) – Kernel coefficient for ‘rbf’, ‘poly’, and ‘sigmoid’ kernels. If None, defaults to 1/n_features.
degree (int, default=3) – Degree for polynomial kernel.
coef0 (float, default=1.0) – Independent term in polynomial and sigmoid kernels.
center_kernel (bool, default=True) – Whether to center the kernel matrix. Recommended for most cases.
scale_y (bool, default=True) – Whether to center and scale Y to zero mean and unit variance.
backend (str, default='numpy') – Computational backend to use: - ‘numpy’: NumPy backend (CPU only). - ‘jax’: JAX backend (supports GPU/TPU acceleration).
- n_features_in\_
Number of features seen during fit.
- Type:
- n_components\_
Actual number of components used.
- Type:
- X_train\_
Training data (stored for kernel computation at predict time).
- Type:
ndarray of shape (n_train, n_features)
- K_train\_
Raw (uncentered) training kernel matrix.
- Type:
ndarray of shape (n_train, n_train)
- y_mean\_
Mean of Y (if scale_y=True).
- Type:
ndarray of shape (n_targets,)
- y_std\_
Standard deviation of Y (if scale_y=True).
- Type:
ndarray of shape (n_targets,)
- x_scores\_
X scores in kernel space (T).
- Type:
ndarray of shape (n_train, n_components)
- y_scores\_
Y scores (U).
- Type:
ndarray of shape (n_train, n_components)
- coef\_
Kernel regression coefficients.
- Type:
ndarray of shape (n_train, n_targets)
Examples
>>> from nirs4all.operators.models.sklearn.nlpls import KernelPLS >>> import numpy as np >>> # Generate nonlinear data >>> np.random.seed(42) >>> X = np.random.randn(100, 50) >>> y = np.sin(X[:, :5].sum(axis=1)) + 0.1 * np.random.randn(100) >>> # Fit Kernel PLS with RBF kernel >>> model = KernelPLS(n_components=10, kernel='rbf', gamma=0.1) >>> model.fit(X, y) KernelPLS(...) >>> predictions = model.predict(X) >>> print(f"R^2 score: {model.score(X, y):.4f}")
Notes
Kernel PLS is particularly useful when: - The relationship between X and Y is nonlinear - Standard linear PLS gives poor predictions - You want to use kernel methods but need PLS-style dimensionality reduction
The choice of kernel and gamma parameter significantly affects performance. Cross-validation is recommended for hyperparameter tuning.
For NIRS data, the RBF kernel with small gamma often works well for capturing nonlinear spectral-property relationships.
See also
KOPLSKernel OPLS with orthogonal variation filtering.
sklearn.cross_decomposition.PLSRegressionStandard linear PLS.
References
Rosipal, R., & Trejo, L. J. (2001). Kernel partial least squares regression in reproducing kernel hilbert space. Journal of Machine Learning Research, 2, 97-123.
- fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) KernelPLS[source]
Fit the Kernel PLS model.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.
- Returns:
self – Fitted estimator.
- Return type:
- Raises:
ValueError – If backend is not ‘numpy’ or ‘jax’. If kernel is not one of the supported types.
ImportError – If backend is ‘jax’ and JAX is not installed.
- predict(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], n_components: int | None = None) ndarray[tuple[Any, ...], dtype[floating]][source]
Predict using the Kernel PLS model.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to predict.
n_components (int, optional) – Number of components to use for prediction. If None, uses all fitted components.
- Returns:
y_pred – Predicted values.
- Return type:
ndarray of shape (n_samples,) or (n_samples, n_targets)
- set_predict_request(*, n_components: bool | None | str = '$UNCHANGED$') KernelPLS
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KernelPLS
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- transform(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]
Transform X to kernel PLS score space.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to transform.
- Returns:
T – X scores in kernel space.
- Return type:
ndarray of shape (n_samples, n_components_)