nirs4all.sklearn.pipeline module
sklearn-compatible pipeline wrapper for nirs4all.
NIRSPipeline wraps a trained nirs4all pipeline to provide sklearn’s BaseEstimator interface, enabling use with sklearn tools and SHAP.
- Important Design Decision:
NIRSPipeline is a PREDICTION-ONLY wrapper. It does NOT implement fit() for training. This is because: 1. nirs4all’s CV creates N models per fold - no single “fitted” model 2. Generator syntax expansion happens at config time, not fit time 3. Branching pipelines have multiple output paths
Training should be done via nirs4all.run(), then wrapped with from_result().
Example
>>> # Train with nirs4all
>>> result = nirs4all.run(pipeline, dataset)
>>>
>>> # Wrap for sklearn compatibility
>>> pipe = NIRSPipeline.from_result(result)
>>>
>>> # Use with SHAP
>>> explainer = shap.Explainer(pipe.predict, X_background)
>>> shap_values = explainer(X_test)
>>>
>>> # Or from exported bundle
>>> pipe = NIRSPipeline.from_bundle("exports/model.n4a")
- class nirs4all.sklearn.pipeline.NIRSPipeline[source]
Bases:
objectsklearn-compatible wrapper for trained nirs4all pipelines.
This class wraps a trained nirs4all pipeline to provide sklearn’s BaseEstimator interface. It is designed for PREDICTION and EXPLANATION, not for training (use nirs4all.run() for training).
- Construction:
Use class methods to create instances: - NIRSPipeline.from_result(result): From a RunResult - NIRSPipeline.from_bundle(path): From an exported .n4a bundle
- is_fitted_
Always True for wrapped pipelines.
- model_
The underlying model (fold 0) for SHAP access.
- bundle_loader_
BundleLoader instance (if created from bundle).
- preprocessing_chain
String summary of preprocessing steps.
- model_step_index
Index of the model step in the pipeline.
- fold_weights
Dictionary of fold weights for CV ensemble.
- sklearn Compatibility:
Implements BaseEstimator interface (get_params, set_params)
Implements RegressorMixin (score method)
Works with SHAP explainers
Works with sklearn.model_selection.cross_val_predict (predict only)
Example
>>> result = nirs4all.run(pipeline, dataset) >>> pipe = NIRSPipeline.from_result(result) >>> y_pred = pipe.predict(X_new) >>> print(f"R²: {pipe.score(X_test, y_test):.4f}")
- property bundle_loader_: BundleLoader | None
Get the underlying BundleLoader (if created from bundle).
- Returns:
BundleLoader instance or None.
- fit(X: Any, y: Any, **fit_params: Any) NIRSPipeline[source]
Fit is not supported - use nirs4all.run() for training.
NIRSPipeline is a prediction wrapper, not a training estimator. Training should be done with nirs4all.run(), then wrapped.
- Parameters:
X – Ignored.
y – Ignored.
**fit_params – Ignored.
- Raises:
NotImplementedError – Always, by design.
Example
>>> # Correct workflow: >>> result = nirs4all.run(pipeline, dataset) # Training >>> pipe = NIRSPipeline.from_result(result) # Wrapping >>> y_pred = pipe.predict(X_new) # Prediction
- property fold_weights: Dict[int, float]
Get fold weights for CV ensemble.
- Returns:
Dictionary mapping fold_id to weight.
- classmethod from_bundle(bundle_path: str | Path, fold: int = 0) NIRSPipeline[source]
Create NIRSPipeline from an exported .n4a bundle.
- Parameters:
bundle_path – Path to the exported .n4a bundle file.
fold – Which fold’s model to use for model_ property (default: 0).
- Returns:
NIRSPipeline instance ready for prediction.
- Raises:
FileNotFoundError – If bundle file doesn’t exist.
ValueError – If bundle is invalid or corrupted.
Example
>>> pipe = NIRSPipeline.from_bundle("exports/model.n4a") >>> y_pred = pipe.predict(X_new)
- classmethod from_result(result: RunResult, source: Dict[str, Any] | None = None, fold: int = 0) NIRSPipeline[source]
Create NIRSPipeline from a RunResult.
This exports the best (or specified) model from the RunResult to a temporary bundle, then loads it for prediction. This ensures consistent prediction behavior between direct bundle loading and result-based creation.
- Parameters:
result – RunResult from nirs4all.run().
source – Optional prediction dict to wrap. If None, uses best model.
fold – Which fold’s model to use for model_ property (default: 0).
- Returns:
NIRSPipeline instance ready for prediction.
- Raises:
ValueError – If no predictions available in result.
RuntimeError – If export fails.
Example
>>> result = nirs4all.run(pipeline, dataset) >>> pipe = NIRSPipeline.from_result(result) >>> y_pred = pipe.predict(X_new)
- get_params(deep: bool = True) Dict[str, Any][source]
Get parameters for this estimator (sklearn interface).
- Parameters:
deep – If True, return nested parameters.
- Returns:
Parameter dictionary.
- get_transformers() List[Tuple[str, Any]][source]
Get list of preprocessing transformers.
- Returns:
List of (name, transformer) tuples.
Example
>>> pipe = NIRSPipeline.from_bundle("model.n4a") >>> for name, transformer in pipe.get_transformers(): ... print(f"{name}: {type(transformer).__name__}")
- property model_: Any
Get the underlying model for SHAP access.
Returns the model from the specified fold (default: fold 0). For tree-based models, this enables TreeExplainer. For neural networks, enables DeepExplainer.
- Returns:
The fitted model object.
- Raises:
RuntimeError – If model cannot be accessed.
Example
>>> pipe = NIRSPipeline.from_bundle("model.n4a") >>> model = pipe.model_ >>> explainer = shap.TreeExplainer(model) # If tree-based
- property model_step_index: int | None
Get the index of the model step in the pipeline.
- Returns:
Model step index or None.
- predict(X: ndarray) ndarray[source]
Make predictions on new data.
- Parameters:
X – Feature matrix (n_samples, n_features) as numpy array.
- Returns:
Predicted values array (n_samples,).
- Raises:
RuntimeError – If pipeline is not properly initialized.
Example
>>> pipe = NIRSPipeline.from_bundle("model.n4a") >>> y_pred = pipe.predict(X_test)
- property preprocessing_chain: str
Get string summary of preprocessing steps.
- Returns:
Preprocessing chain description.
- score(X: ndarray, y: ndarray) float[source]
Compute R² score on test data.
- Parameters:
X – Feature matrix (n_samples, n_features).
y – True target values (n_samples,).
- Returns:
R² score (coefficient of determination).
Example
>>> pipe = NIRSPipeline.from_bundle("model.n4a") >>> r2 = pipe.score(X_test, y_test) >>> print(f"R²: {r2:.4f}")
- set_params(**params: Any) NIRSPipeline[source]
Set parameters for this estimator (sklearn interface).
- Parameters:
**params – Parameters to set. Only ‘fold’ is supported.
- Returns:
self
- property shap_model: Any
Alias for model_ for SHAP compatibility.
- Returns:
The fitted model object.
- transform(X: ndarray) ndarray[source]
Apply preprocessing steps to data (without model prediction).
This applies all preprocessing transformers but stops before the model step. Useful for getting base model predictions in stacking or for debugging preprocessing.
- Parameters:
X – Feature matrix (n_samples, n_features).
- Returns:
Transformed features (n_samples, n_transformed_features).
- Raises:
RuntimeError – If pipeline is not properly initialized.