nirs4all.operators.models.sklearn.kopls module

Kernel Orthogonal PLS (K-OPLS) regressor for nirs4all.

A sklearn-compatible implementation of K-OPLS that combines kernel methods with Orthogonal PLS to handle nonlinear relationships in the data. K-OPLS separates Y-predictive variation from Y-orthogonal variation in kernel space.

This implementation is based on the ConsensusOPLS R package algorithm from https://github.com/sib-swiss/ConsensusOPLS, which itself is based on the original K-OPLS algorithm by Bylesjo, Rantalainen, et al.

Supports both NumPy (CPU) and JAX (GPU/TPU) backends.

References

  • Bylesjo, M., Rantalainen, M., Cloarec, O., Nicholson, J. K., Holmes, E., & Trygg, J. (2006). OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification. Journal of Chemometrics, 20(8-10), 341-351.

  • Rantalainen, M., Bylesjo, M., Cloarec, O., Nicholson, J. K., Holmes, E., & Trygg, J. (2007). Kernel-based orthogonal projections to latent structures (K-OPLS). Journal of Chemometrics, 21(7-9), 376-385.

  • ConsensusOPLS R package: https://github.com/sib-swiss/ConsensusOPLS

class nirs4all.operators.models.sklearn.kopls.KOPLS(n_components: int = 5, n_ortho_components: int = 1, kernel: Literal['linear', 'rbf', 'poly'] = 'rbf', gamma: float | None = None, degree: int = 3, coef0: float = 1.0, center: bool = True, scale: bool = True, backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Kernel Orthogonal PLS (K-OPLS) regressor.

K-OPLS combines kernel methods with Orthogonal PLS to handle nonlinear relationships in the data. It first removes Y-orthogonal variation from the kernel matrix, then fits a kernel PLS model on the filtered kernel.

This implementation follows the algorithm from ConsensusOPLS R package, which is based on the original K-OPLS algorithm by Rantalainen et al.

Parameters:
  • n_components (int, default=5) – Number of predictive PLS components.

  • n_ortho_components (int, default=1) – Number of orthogonal components to remove. These represent Y-orthogonal variation that would hurt prediction.

  • kernel (str, default='rbf') – Kernel function to use: - ‘linear’: Linear kernel K(x,y) = x^T y - ‘rbf’: Radial basis function K(x,y) = exp(-gamma ||x-y||^2) - ‘poly’: Polynomial kernel K(x,y) = (gamma x^T y + coef0)^degree

  • gamma (float, optional) – Kernel coefficient for ‘rbf’ and ‘poly’ kernels. If None, uses 1/n_features.

  • degree (int, default=3) – Degree for polynomial kernel.

  • coef0 (float, default=1.0) – Independent term in polynomial kernel.

  • center (bool, default=True) – Whether to center the kernel matrix.

  • scale (bool, default=True) – Whether to scale Y to unit variance.

  • backend (str, default='numpy') – Computational backend to use: - ‘numpy’: NumPy backend (CPU only). - ‘jax’: JAX backend (supports GPU/TPU acceleration).

n_features_in\_

Number of features seen during fit.

Type:

int

n_components\_

Actual number of predictive components used.

Type:

int

n_ortho_components\_

Actual number of orthogonal components used.

Type:

int

X_train\_

Training data (stored for kernel computation at predict time).

Type:

ndarray of shape (n_samples, n_features)

y_mean\_

Mean of Y.

Type:

ndarray of shape (n_targets,)

y_std\_

Standard deviation of Y.

Type:

ndarray of shape (n_targets,)

x_scores\_

X scores from filtered kernel PLS (T).

Type:

ndarray of shape (n_samples, n_components)

y_scores\_

Y scores (U).

Type:

ndarray of shape (n_samples, n_components)

y_loadings\_

Y loadings (C).

Type:

ndarray of shape (n_targets, n_components)

ortho_scores\_

Orthogonal scores (T_ortho).

Type:

ndarray of shape (n_samples, n_ortho_components)

Examples

>>> from nirs4all.operators.models.sklearn.kopls import KOPLS
>>> import numpy as np
>>> # Generate nonlinear data
>>> np.random.seed(42)
>>> X = np.random.randn(100, 50)
>>> y = np.sin(X[:, :5].sum(axis=1)) + 0.1 * np.random.randn(100)
>>> # Fit K-OPLS with RBF kernel
>>> model = KOPLS(n_components=5, n_ortho_components=2, kernel='rbf')
>>> model.fit(X, y)
KOPLS(...)
>>> predictions = model.predict(X)
>>> # Transform to score space
>>> T = model.transform(X)
>>> print(T.shape)
(100, 5)

References

  • Rantalainen, M., Bylesjo, M., Cloarec, O., Nicholson, J. K., Holmes, E., & Trygg, J. (2007). Kernel-based orthogonal projections to latent structures (K-OPLS). Journal of Chemometrics, 21(7-9), 376-385.

  • ConsensusOPLS R package: https://github.com/sib-swiss/ConsensusOPLS

__repr__() str[source]

Return string representation.

fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) KOPLS[source]

Fit the K-OPLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

Returns:

self – Fitted estimator.

Return type:

KOPLS

get_params(deep: bool = True) dict[source]

Get parameters for this estimator.

predict(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Predict using the K-OPLS model.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

y_pred – Predicted values.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_targets)

set_params(**params) KOPLS[source]

Set the parameters of this estimator.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KOPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Transform X to K-OPLS score space.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to transform.

Returns:

T – X scores in the filtered kernel PLS space.

Return type:

ndarray of shape (n_samples, n_components_)