nirs4all.sklearn.pipeline module

sklearn-compatible pipeline wrapper for nirs4all.

NIRSPipeline wraps a trained nirs4all pipeline to provide sklearn’s BaseEstimator interface, enabling use with sklearn tools and SHAP.

Important Design Decision:

NIRSPipeline is a PREDICTION-ONLY wrapper. It does NOT implement fit() for training. This is because: 1. nirs4all’s CV creates N models per fold - no single “fitted” model 2. Generator syntax expansion happens at config time, not fit time 3. Branching pipelines have multiple output paths

Training should be done via nirs4all.run(), then wrapped with from_result().

Example

>>> # Train with nirs4all
>>> result = nirs4all.run(pipeline, dataset)
>>>
>>> # Wrap for sklearn compatibility
>>> pipe = NIRSPipeline.from_result(result)
>>>
>>> # Use with SHAP
>>> explainer = shap.Explainer(pipe.predict, X_background)
>>> shap_values = explainer(X_test)
>>>
>>> # Or from exported bundle
>>> pipe = NIRSPipeline.from_bundle("exports/model.n4a")

class nirs4all.sklearn.pipeline.NIRSPipeline[source]

Bases: object

sklearn-compatible wrapper for trained nirs4all pipelines.

This class wraps a trained nirs4all pipeline to provide sklearn’s BaseEstimator interface. It is designed for PREDICTION and EXPLANATION, not for training (use nirs4all.run() for training).

Construction:: Use class methods to create instances: - NIRSPipeline.from_result(result): From a RunResult - NIRSPipeline.from_bundle(path): From an exported .n4a bundle

is_fitted_: Always True for wrapped pipelines.

model_: The underlying model (fold 0) for SHAP access.

bundle_loader_: BundleLoader instance (if created from bundle).

preprocessing_chain: String summary of preprocessing steps.

model_step_index: Index of the model step in the pipeline.

fold_weights: Dictionary of fold weights for CV ensemble.

predict(X)[source]: Make predictions on new data.

score(X, y)[source]: Compute R² score.

transform(X)[source]: Apply preprocessing steps (without model).

sklearn Compatibility:

Implements BaseEstimator interface (get_params, set_params)
Implements RegressorMixin (score method)
Works with SHAP explainers
Works with sklearn.model_selection.cross_val_predict (predict only)

Example

>>> result = nirs4all.run(pipeline, dataset)
>>> pipe = NIRSPipeline.from_result(result)
>>> y_pred = pipe.predict(X_new)
>>> print(f"R²: {pipe.score(X_test, y_test):.4f}")

__repr__() → str[source]: Return string representation.

__str__() → str[source]: Return user-friendly string representation.

property bundle_loader_: BundleLoader | None

Get the underlying BundleLoader (if created from bundle).

Returns:: BundleLoader instance or None.

fit(X: Any, y: Any, **fit_params: Any) → NIRSPipeline[source]

Fit is not supported - use nirs4all.run() for training.

NIRSPipeline is a prediction wrapper, not a training estimator. Training should be done with nirs4all.run(), then wrapped.

Parameters:

X – Ignored.
y – Ignored.
**fit_params – Ignored.

Raises:

NotImplementedError – Always, by design.

Example

>>> # Correct workflow:
>>> result = nirs4all.run(pipeline, dataset)  # Training
>>> pipe = NIRSPipeline.from_result(result)   # Wrapping
>>> y_pred = pipe.predict(X_new)              # Prediction

property fold_weights: Dict[int, float]

Get fold weights for CV ensemble.

Returns:: Dictionary mapping fold_id to weight.

classmethod from_bundle(bundle_path: str | Path, fold: int = 0) → NIRSPipeline[source]

Create NIRSPipeline from an exported .n4a bundle.

Parameters:

bundle_path – Path to the exported .n4a bundle file.
fold – Which fold’s model to use for model_ property (default: 0).

Returns:

NIRSPipeline instance ready for prediction.

Raises:

FileNotFoundError – If bundle file doesn’t exist.
ValueError – If bundle is invalid or corrupted.

Example

>>> pipe = NIRSPipeline.from_bundle("exports/model.n4a")
>>> y_pred = pipe.predict(X_new)

classmethod from_result(result: RunResult, source: Dict[str, Any] | None = None, fold: int = 0) → NIRSPipeline[source]

Create NIRSPipeline from a RunResult.

This exports the best (or specified) model from the RunResult to a temporary bundle, then loads it for prediction. This ensures consistent prediction behavior between direct bundle loading and result-based creation.

Parameters:

result – RunResult from nirs4all.run().
source – Optional prediction dict to wrap. If None, uses best model.
fold – Which fold’s model to use for model_ property (default: 0).

Returns:

NIRSPipeline instance ready for prediction.

Raises:

ValueError – If no predictions available in result.
RuntimeError – If export fails.

Example

>>> result = nirs4all.run(pipeline, dataset)
>>> pipe = NIRSPipeline.from_result(result)
>>> y_pred = pipe.predict(X_new)

get_params(deep: bool = True) → Dict[str, Any][source]

Get parameters for this estimator (sklearn interface).

Parameters:: deep – If True, return nested parameters.
Returns:: Parameter dictionary.

get_transformers() → List[Tuple[str, Any]][source]

Get list of preprocessing transformers.

Returns:: List of (name, transformer) tuples.

Example

>>> pipe = NIRSPipeline.from_bundle("model.n4a")
>>> for name, transformer in pipe.get_transformers():
...     print(f"{name}: {type(transformer).__name__}")

property is_fitted_: bool: Whether the pipeline is fitted (always True for wrapped pipelines).

property model_: Any

Get the underlying model for SHAP access.

Returns the model from the specified fold (default: fold 0). For tree-based models, this enables TreeExplainer. For neural networks, enables DeepExplainer.

Returns:: The fitted model object.
Raises:: RuntimeError – If model cannot be accessed.

Example

>>> pipe = NIRSPipeline.from_bundle("model.n4a")
>>> model = pipe.model_
>>> explainer = shap.TreeExplainer(model)  # If tree-based

property model_name: str

Get the model name.

Returns:: Model name string.

property model_step_index: int | None

Get the index of the model step in the pipeline.

Returns:: Model step index or None.

property n_folds: int

Get number of CV folds (0 if no CV).

Returns:: Number of folds.

predict(X: ndarray) → ndarray[source]

Make predictions on new data.

Parameters:: X – Feature matrix (n_samples, n_features) as numpy array.
Returns:: Predicted values array (n_samples,).
Raises:: RuntimeError – If pipeline is not properly initialized.

Example

>>> pipe = NIRSPipeline.from_bundle("model.n4a")
>>> y_pred = pipe.predict(X_test)

property preprocessing_chain: str

Get string summary of preprocessing steps.

Returns:: Preprocessing chain description.

score(X: ndarray, y: ndarray) → float[source]

Compute R² score on test data.

Parameters:

X – Feature matrix (n_samples, n_features).
y – True target values (n_samples,).

Returns:

R² score (coefficient of determination).

Example

>>> pipe = NIRSPipeline.from_bundle("model.n4a")
>>> r2 = pipe.score(X_test, y_test)
>>> print(f"R²: {r2:.4f}")

set_params(**params: Any) → NIRSPipeline[source]

Set parameters for this estimator (sklearn interface).

Parameters:: **params – Parameters to set. Only ‘fold’ is supported.
Returns:: self

property shap_model: Any

Alias for model_ for SHAP compatibility.

Returns:: The fitted model object.

transform(X: ndarray) → ndarray[source]

Apply preprocessing steps to data (without model prediction).

This applies all preprocessing transformers but stops before the model step. Useful for getting base model predictions in stacking or for debugging preprocessing.

Parameters:: X – Feature matrix (n_samples, n_features).
Returns:: Transformed features (n_samples, n_transformed_features).
Raises:: RuntimeError – If pipeline is not properly initialized.