nirs4all.sklearn.pipeline module

sklearn-compatible pipeline wrapper for nirs4all.

NIRSPipeline wraps a trained nirs4all pipeline to provide sklearn’s BaseEstimator interface, enabling use with sklearn tools and SHAP.

Important Design Decision:

NIRSPipeline is a PREDICTION-ONLY wrapper. It does NOT implement fit() for training. This is because: 1. nirs4all’s CV creates N models per fold - no single “fitted” model 2. Generator syntax expansion happens at config time, not fit time 3. Branching pipelines have multiple output paths

Training should be done via nirs4all.run(), then wrapped with from_result().

Example

>>> # Train with nirs4all
>>> result = nirs4all.run(pipeline, dataset)
>>>
>>> # Wrap for sklearn compatibility
>>> pipe = NIRSPipeline.from_result(result)
>>>
>>> # Use with SHAP
>>> explainer = shap.Explainer(pipe.predict, X_background)
>>> shap_values = explainer(X_test)
>>>
>>> # Or from exported bundle
>>> pipe = NIRSPipeline.from_bundle("exports/model.n4a")
class nirs4all.sklearn.pipeline.NIRSPipeline[source]

Bases: object

sklearn-compatible wrapper for trained nirs4all pipelines.

This class wraps a trained nirs4all pipeline to provide sklearn’s BaseEstimator interface. It is designed for PREDICTION and EXPLANATION, not for training (use nirs4all.run() for training).

Construction:

Use class methods to create instances: - NIRSPipeline.from_result(result): From a RunResult - NIRSPipeline.from_bundle(path): From an exported .n4a bundle

is_fitted_

Always True for wrapped pipelines.

model_

The underlying model (fold 0) for SHAP access.

bundle_loader_

BundleLoader instance (if created from bundle).

preprocessing_chain

String summary of preprocessing steps.

model_step_index

Index of the model step in the pipeline.

fold_weights

Dictionary of fold weights for CV ensemble.

predict(X)[source]

Make predictions on new data.

score(X, y)[source]

Compute R² score.

transform(X)[source]

Apply preprocessing steps (without model).

sklearn Compatibility:
  • Implements BaseEstimator interface (get_params, set_params)

  • Implements RegressorMixin (score method)

  • Works with SHAP explainers

  • Works with sklearn.model_selection.cross_val_predict (predict only)

Example

>>> result = nirs4all.run(pipeline, dataset)
>>> pipe = NIRSPipeline.from_result(result)
>>> y_pred = pipe.predict(X_new)
>>> print(f"R²: {pipe.score(X_test, y_test):.4f}")
__repr__() str[source]

Return string representation.

__str__() str[source]

Return user-friendly string representation.

property bundle_loader_: BundleLoader | None

Get the underlying BundleLoader (if created from bundle).

Returns:

BundleLoader instance or None.

fit(X: Any, y: Any, **fit_params: Any) NIRSPipeline[source]

Fit is not supported - use nirs4all.run() for training.

NIRSPipeline is a prediction wrapper, not a training estimator. Training should be done with nirs4all.run(), then wrapped.

Parameters:
  • X – Ignored.

  • y – Ignored.

  • **fit_params – Ignored.

Raises:

NotImplementedError – Always, by design.

Example

>>> # Correct workflow:
>>> result = nirs4all.run(pipeline, dataset)  # Training
>>> pipe = NIRSPipeline.from_result(result)   # Wrapping
>>> y_pred = pipe.predict(X_new)              # Prediction
property fold_weights: Dict[int, float]

Get fold weights for CV ensemble.

Returns:

Dictionary mapping fold_id to weight.

classmethod from_bundle(bundle_path: str | Path, fold: int = 0) NIRSPipeline[source]

Create NIRSPipeline from an exported .n4a bundle.

Parameters:
  • bundle_path – Path to the exported .n4a bundle file.

  • fold – Which fold’s model to use for model_ property (default: 0).

Returns:

NIRSPipeline instance ready for prediction.

Raises:

Example

>>> pipe = NIRSPipeline.from_bundle("exports/model.n4a")
>>> y_pred = pipe.predict(X_new)
classmethod from_result(result: RunResult, source: Dict[str, Any] | None = None, fold: int = 0) NIRSPipeline[source]

Create NIRSPipeline from a RunResult.

This exports the best (or specified) model from the RunResult to a temporary bundle, then loads it for prediction. This ensures consistent prediction behavior between direct bundle loading and result-based creation.

Parameters:
  • result – RunResult from nirs4all.run().

  • source – Optional prediction dict to wrap. If None, uses best model.

  • fold – Which fold’s model to use for model_ property (default: 0).

Returns:

NIRSPipeline instance ready for prediction.

Raises:

Example

>>> result = nirs4all.run(pipeline, dataset)
>>> pipe = NIRSPipeline.from_result(result)
>>> y_pred = pipe.predict(X_new)
get_params(deep: bool = True) Dict[str, Any][source]

Get parameters for this estimator (sklearn interface).

Parameters:

deep – If True, return nested parameters.

Returns:

Parameter dictionary.

get_transformers() List[Tuple[str, Any]][source]

Get list of preprocessing transformers.

Returns:

List of (name, transformer) tuples.

Example

>>> pipe = NIRSPipeline.from_bundle("model.n4a")
>>> for name, transformer in pipe.get_transformers():
...     print(f"{name}: {type(transformer).__name__}")
property is_fitted_: bool

Whether the pipeline is fitted (always True for wrapped pipelines).

property model_: Any

Get the underlying model for SHAP access.

Returns the model from the specified fold (default: fold 0). For tree-based models, this enables TreeExplainer. For neural networks, enables DeepExplainer.

Returns:

The fitted model object.

Raises:

RuntimeError – If model cannot be accessed.

Example

>>> pipe = NIRSPipeline.from_bundle("model.n4a")
>>> model = pipe.model_
>>> explainer = shap.TreeExplainer(model)  # If tree-based
property model_name: str

Get the model name.

Returns:

Model name string.

property model_step_index: int | None

Get the index of the model step in the pipeline.

Returns:

Model step index or None.

property n_folds: int

Get number of CV folds (0 if no CV).

Returns:

Number of folds.

predict(X: ndarray) ndarray[source]

Make predictions on new data.

Parameters:

X – Feature matrix (n_samples, n_features) as numpy array.

Returns:

Predicted values array (n_samples,).

Raises:

RuntimeError – If pipeline is not properly initialized.

Example

>>> pipe = NIRSPipeline.from_bundle("model.n4a")
>>> y_pred = pipe.predict(X_test)
property preprocessing_chain: str

Get string summary of preprocessing steps.

Returns:

Preprocessing chain description.

score(X: ndarray, y: ndarray) float[source]

Compute R² score on test data.

Parameters:
  • X – Feature matrix (n_samples, n_features).

  • y – True target values (n_samples,).

Returns:

R² score (coefficient of determination).

Example

>>> pipe = NIRSPipeline.from_bundle("model.n4a")
>>> r2 = pipe.score(X_test, y_test)
>>> print(f"R²: {r2:.4f}")
set_params(**params: Any) NIRSPipeline[source]

Set parameters for this estimator (sklearn interface).

Parameters:

**params – Parameters to set. Only ‘fold’ is supported.

Returns:

self

property shap_model: Any

Alias for model_ for SHAP compatibility.

Returns:

The fitted model object.

transform(X: ndarray) ndarray[source]

Apply preprocessing steps to data (without model prediction).

This applies all preprocessing transformers but stops before the model step. Useful for getting base model predictions in stacking or for debugging preprocessing.

Parameters:

X – Feature matrix (n_samples, n_features).

Returns:

Transformed features (n_samples, n_transformed_features).

Raises:

RuntimeError – If pipeline is not properly initialized.