nirs4all.operators.models.sklearn.robust_pls module

Robust PLS (RSIMPLS) regressor for nirs4all.

A sklearn-compatible implementation of Robust PLS using iteratively reweighted SIMPLS. This algorithm down-weights outliers using robust weighting schemes (Huber or Tukey) to provide resistance against outliers in both X and Y space.

Supports both NumPy (CPU) and JAX (GPU/TPU) backends.

References

  • Hubert, M., & Vanden Branden, K. (2003). Robust procedures for partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 65(2), 101-121.

  • Gil, J. A., & Romera, R. (1998). On robust partial least squares (PLS) methods. Journal of Chemometrics, 12(6), 365-378.

class nirs4all.operators.models.sklearn.robust_pls.RobustPLS(n_components: int = 10, weighting: Literal['huber', 'tukey'] = 'huber', c: float | None = None, max_iter: int = 100, tol: float = 1e-06, scale: bool = True, center: bool = True, backend: str = 'numpy')[source]

Bases: BaseEstimator, RegressorMixin

Robust Partial Least Squares (Robust PLS) regressor.

Robust PLS uses iteratively reweighted least squares (IRLS) to down-weight outliers during model fitting. This makes the model more resistant to outliers in both X (leverage points) and Y (vertical outliers).

The algorithm iterates between: 1. Fitting PLS with weighted covariance matrix 2. Computing residuals and updating weights using robust M-estimation

Two weighting schemes are available: - ‘huber’: Huber’s psi function - smooth transition from L2 to L1 - ‘tukey’: Tukey’s bisquare - completely down-weights extreme outliers

Parameters:
  • n_components (int, default=10) – Number of PLS components to extract.

  • weighting ({'huber', 'tukey'}, default='huber') – Robust weighting scheme: - ‘huber’: Huber’s psi function with smooth redescending. - ‘tukey’: Tukey’s bisquare with hard rejection of outliers.

  • c (float or None, default=None) – Tuning constant for the weight function. Controls the threshold beyond which observations are down-weighted. - For ‘huber’: default is 1.345 (95% efficiency) - For ‘tukey’: default is 4.685 (95% efficiency)

  • max_iter (int, default=100) – Maximum number of IRLS iterations.

  • tol (float, default=1e-6) – Convergence tolerance for weight changes.

  • scale (bool, default=True) – Whether to scale X and Y to unit variance.

  • center (bool, default=True) – Whether to center X and Y (subtract mean).

  • backend (str, default='numpy') – Computational backend to use: - ‘numpy’: NumPy backend (CPU only). - ‘jax’: JAX backend (supports GPU/TPU acceleration). Note: IRLS weight computation is always done in NumPy for consistency. The backend affects only the final PLS fit and prediction.

n_features_in_

Number of features seen during fit.

Type:

int

n_components_

Actual number of components used.

Type:

int

x_mean_

Mean of X.

Type:

ndarray of shape (n_features,)

x_std_

Standard deviation of X.

Type:

ndarray of shape (n_features,)

y_mean_

Mean of Y.

Type:

ndarray of shape (n_targets,)

y_std_

Standard deviation of Y.

Type:

ndarray of shape (n_targets,)

x_scores_

X scores (T).

Type:

ndarray of shape (n_samples, n_components_)

y_scores_

Y scores (U).

Type:

ndarray of shape (n_samples, n_components_)

x_weights_

X weights (W).

Type:

ndarray of shape (n_features, n_components_)

x_loadings_

X loadings (P).

Type:

ndarray of shape (n_features, n_components_)

y_loadings_

Y loadings (Q).

Type:

ndarray of shape (n_targets, n_components_)

coef_

Regression coefficients.

Type:

ndarray of shape (n_features, n_targets)

sample_weights_

Final sample weights from IRLS. Low values indicate potential outliers.

Type:

ndarray of shape (n_samples,)

Examples

>>> from nirs4all.operators.models.sklearn.robust_pls import RobustPLS
>>> import numpy as np
>>> # Generate data with outliers
>>> np.random.seed(42)
>>> X = np.random.randn(100, 50)
>>> y = X[:, :5].sum(axis=1) + 0.1 * np.random.randn(100)
>>> # Add outliers
>>> y[0:5] = y[0:5] + 10  # Vertical outliers
>>> # Fit Robust PLS
>>> model = RobustPLS(n_components=10, weighting='huber')
>>> model.fit(X, y)
RobustPLS(n_components=10, weighting='huber')
>>> predictions = model.predict(X)
>>> # Check which samples were down-weighted (potential outliers)
>>> outlier_mask = model.sample_weights_ < 0.5
>>> print(f"Potential outliers: {np.where(outlier_mask)[0]}")

Notes

Robust PLS is particularly useful when: - Data contains outliers in X or Y - Standard PLS gives poor predictions due to leverage points - You want to identify potential outliers via sample weights

The sample_weights_ attribute can be used to identify outliers after fitting. Samples with low weights (e.g., < 0.5) may be outliers worth investigating.

See also

SIMPLS

Standard SIMPLS algorithm without robust weighting.

sklearn.cross_decomposition.PLSRegression

sklearn’s PLS implementation.

References

  • Hubert, M., & Vanden Branden, K. (2003). Robust procedures for partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 65(2), 101-121.

  • Gil, J. A., & Romera, R. (1998). On robust partial least squares (PLS) methods. Journal of Chemometrics, 12(6), 365-378.

__repr__() str[source]

Return string representation.

fit(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) RobustPLS[source]

Fit the Robust PLS model.

Parameters:
Returns:

self – Fitted estimator.

Return type:

RobustPLS

Raises:
  • ValueError – If backend is not ‘numpy’ or ‘jax’. If weighting is not ‘huber’ or ‘tukey’.

  • ImportError – If backend is ‘jax’ and JAX is not installed.

get_outlier_mask(threshold: float = 0.5) ndarray[tuple[Any, ...], dtype[bool]][source]

Get mask of potential outliers based on sample weights.

Parameters:

threshold (float, default=0.5) – Weight threshold below which samples are considered outliers.

Returns:

outlier_mask – Boolean mask where True indicates potential outlier.

Return type:

ndarray of shape (n_samples,)

get_params(deep: bool = True) dict[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], n_components: int | None = None) ndarray[tuple[Any, ...], dtype[floating]][source]

Predict using the Robust PLS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Samples to predict.

  • n_components (int, optional) – Number of components to use for prediction. If None, uses all fitted components.

Returns:

y_pred – Predicted values.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_targets)

set_params(**params) RobustPLS[source]

Set the parameters of this estimator.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

RobustPLS

set_predict_request(*, n_components: bool | None | str = '$UNCHANGED$') RobustPLS

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

n_components (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for n_components parameter in predict.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') RobustPLS

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) ndarray[tuple[Any, ...], dtype[floating]][source]

Transform X to score space.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to transform.

Returns:

T – X scores.

Return type:

ndarray of shape (n_samples, n_components_)