nirs4all.pipeline.predictor module

Pipeline predictor - Handles prediction mode execution.

This module provides the Predictor class for running predictions using trained pipelines on new datasets.

Phase 5 Enhancement:

The Predictor now supports minimal pipeline execution via TraceBasedExtractor. When an execution trace is available (from Phase 2+), the predictor can extract and run only the required steps, significantly improving prediction speed for complex pipelines.

class nirs4all.pipeline.predictor.Predictor(runner: PipelineRunner, use_minimal_pipeline: bool = True)[source]

Bases: object

Handles prediction using trained pipelines.

This class manages the prediction workflow: loading saved models, replaying pipeline configurations, and generating predictions on new data.

Phase 5 Enhancement:

When use_minimal_pipeline=True (default), the predictor will: 1. Check if an execution trace is available for the prediction 2. Extract the minimal pipeline (only required steps) from the trace 3. Execute only those steps, significantly reducing prediction time

This is especially beneficial for complex pipelines with multiple preprocessing options, branches, or steps that aren’t needed for the specific model being predicted.

runner

Parent PipelineRunner instance

saver

File saver for managing outputs

manifest_manager

Manager for pipeline manifests

pipeline_uid

Unique identifier for the pipeline

artifact_loader

Loader for trained model artifacts

config_path

Path to the pipeline configuration

target_model

Metadata for the target model

use_minimal_pipeline

Whether to use minimal pipeline execution (Phase 5)

predict(prediction_obj: Dict[str, Any] | str, dataset: DatasetConfigs | SpectroDataset | ndarray | Tuple[ndarray, ...] | Dict | List[Dict] | str | List[str], dataset_name: str = 'prediction_dataset', all_predictions: bool = False, verbose: int = 0) Tuple[ndarray, Predictions] | Tuple[Dict[str, Any], Predictions][source]

Run prediction using a saved model on new dataset.

Phase 5 Enhancement:

When use_minimal_pipeline=True and an execution trace is available, this method will use TraceBasedExtractor to extract and execute only the required steps, improving prediction speed.

Parameters:
  • prediction_obj – Model identifier (dict with config_path or prediction ID)

  • dataset – New dataset to predict on

  • dataset_name – Name for the dataset

  • all_predictions – If True, return all predictions; if False, return single best

  • verbose – Verbosity level

Returns:

(y_pred, predictions) If all_predictions=True: (predictions_dict, predictions)

Return type:

If all_predictions=False

Example

>>> predictor = Predictor(runner)
>>> y_pred, preds = predictor.predict(
...     {"config_path": "0001_abc123"},
...     X_new
... )