nirs4all.operators.models.sklearn.nlpls module

Nonlinear PLS (NL-PLS / Kernel PLS) regressor for nirs4all.

A sklearn-compatible implementation of Nonlinear PLS using kernel methods. This approach maps the data into a higher-dimensional feature space using a kernel function (e.g., RBF) and then fits a standard PLS model on the kernel matrix.

Supports both NumPy (CPU) and JAX (GPU/TPU) backends.

Two implementations are provided:

KernelPLS (KPLS) - Simple Kernel PLS Maps X into kernel space using a nonlinear kernel (RBF, polynomial, etc.) and fits PLS on the kernel matrix K = kernel(X, X).
MIRPLS - Monotonic Inner Relation PLS (experimental) Implements the MIR-PLS algorithm from Zheng et al. (2024) which uses monotonic cubic spline piecewise regression for the inner model.

References

Rosipal, R., & Trejo, L. J. (2001). Kernel partial least squares regression in reproducing kernel hilbert space. Journal of Machine Learning Research, 2, 97-123.
Zheng, X., Nie, B., Du, J., et al. (2024). A non-linear partial least squares based on monotonic inner relation. Frontiers in Physiology, 15, 1369165. doi:10.3389/fphys.2024.1369165
Qin, S. J., & McAvoy, T. J. (1992). Nonlinear PLS modeling using neural networks. Computers & Chemical Engineering, 16(4), 379-391.

nirs4all.operators.models.sklearn.nlpls.KPLS: alias of KernelPLS

class nirs4all.operators.models.sklearn.nlpls.KernelPLS(n_components: int = 10, kernel: Literal['rbf', 'linear', 'poly', 'sigmoid'] = 'rbf', gamma: float | None = None, degree: int = 3, coef0: float = 1.0, center_kernel: bool = True, scale_y: bool = True, backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Nonlinear PLS using Kernel Methods (Kernel PLS / NL-PLS).

Kernel PLS maps the input data X into a higher-dimensional feature space using a kernel function (RBF, polynomial, sigmoid) and then fits a PLS model on the kernel matrix K(X, X). This allows capturing nonlinear relationships between X and Y while retaining the interpretability of PLS.

The algorithm: 1. Compute kernel matrix K = kernel(X_train, X_train) 2. Center the kernel matrix 3. Fit PLS on K with target Y 4. For prediction: K_test = kernel(X_test, X_train), center, predict

This is a simple and effective approach for nonlinear regression that combines the power of kernel methods with PLS dimensionality reduction.

Parameters:

n_components (int, default=10) – Number of PLS components to extract.
kernel ({'rbf', 'linear', 'poly', 'sigmoid'}, default='rbf') – Kernel function to use: - ‘rbf’: Radial basis function K(x,y) = exp(-gamma ||x-y||^2) - ‘linear’: Linear kernel K(x,y) = x^T y (equivalent to standard PLS) - ‘poly’: Polynomial kernel K(x,y) = (gamma * x^T y + coef0)^degree - ‘sigmoid’: Sigmoid kernel K(x,y) = tanh(gamma * x^T y + coef0)
gamma (float, optional) – Kernel coefficient for ‘rbf’, ‘poly’, and ‘sigmoid’ kernels. If None, defaults to 1/n_features.
degree (int, default=3) – Degree for polynomial kernel.
coef0 (float, default=1.0) – Independent term in polynomial and sigmoid kernels.
center_kernel (bool, default=True) – Whether to center the kernel matrix. Recommended for most cases.
scale_y (bool, default=True) – Whether to center and scale Y to zero mean and unit variance.
backend (str, default='numpy') – Computational backend to use: - ‘numpy’: NumPy backend (CPU only). - ‘jax’: JAX backend (supports GPU/TPU acceleration).

n_features_in\_

Number of features seen during fit.

Type:: int

n_components\_

Actual number of components used.

Type:: int

X_train\_

Training data (stored for kernel computation at predict time).

Type:: ndarray of shape (n_train, n_features)

K_train\_

Raw (uncentered) training kernel matrix.

Type:: ndarray of shape (n_train, n_train)

y_mean\_

Mean of Y (if scale_y=True).

Type:: ndarray of shape (n_targets,)

y_std\_

Standard deviation of Y (if scale_y=True).

Type:: ndarray of shape (n_targets,)

x_scores\_

X scores in kernel space (T).

Type:: ndarray of shape (n_train, n_components)

y_scores\_

Y scores (U).

Type:: ndarray of shape (n_train, n_components)

coef\_

Kernel regression coefficients.

Type:: ndarray of shape (n_train, n_targets)

Examples

>>> from nirs4all.operators.models.sklearn.nlpls import KernelPLS
>>> import numpy as np
>>> # Generate nonlinear data
>>> np.random.seed(42)
>>> X = np.random.randn(100, 50)
>>> y = np.sin(X[:, :5].sum(axis=1)) + 0.1 * np.random.randn(100)
>>> # Fit Kernel PLS with RBF kernel
>>> model = KernelPLS(n_components=10, kernel='rbf', gamma=0.1)
>>> model.fit(X, y)
KernelPLS(...)
>>> predictions = model.predict(X)
>>> print(f"R^2 score: {model.score(X, y):.4f}")

Notes

Kernel PLS is particularly useful when: - The relationship between X and Y is nonlinear - Standard linear PLS gives poor predictions - You want to use kernel methods but need PLS-style dimensionality reduction

The choice of kernel and gamma parameter significantly affects performance. Cross-validation is recommended for hyperparameter tuning.

For NIRS data, the RBF kernel with small gamma often works well for capturing nonlinear spectral-property relationships.