nirs4all.pipeline.predictor module

Pipeline predictor - Handles prediction mode execution.

This module provides the Predictor class for running predictions using trained pipelines on new datasets.

Phase 5 Enhancement:: The Predictor now supports minimal pipeline execution via TraceBasedExtractor. When an execution trace is available (from Phase 2+), the predictor can extract and run only the required steps, significantly improving prediction speed for complex pipelines.

class nirs4all.pipeline.predictor.Predictor(runner: PipelineRunner, use_minimal_pipeline: bool = True)[source]

Bases: object

Handles prediction using trained pipelines.

This class manages the prediction workflow: loading saved models, replaying pipeline configurations, and generating predictions on new data.

Phase 5 Enhancement:

When use_minimal_pipeline=True (default), the predictor will: 1. Check if an execution trace is available for the prediction 2. Extract the minimal pipeline (only required steps) from the trace 3. Execute only those steps, significantly reducing prediction time

This is especially beneficial for complex pipelines with multiple preprocessing options, branches, or steps that aren’t needed for the specific model being predicted.

runner: Parent PipelineRunner instance

saver: File saver for managing outputs

manifest_manager: Manager for pipeline manifests

pipeline_uid: Unique identifier for the pipeline

artifact_loader: Loader for trained model artifacts

config_path: Path to the pipeline configuration

target_model: Metadata for the target model

use_minimal_pipeline: Whether to use minimal pipeline execution (Phase 5)

Run prediction using a saved model on new dataset.

Phase 5 Enhancement:: When use_minimal_pipeline=True and an execution trace is available, this method will use TraceBasedExtractor to extract and execute only the required steps, improving prediction speed.

Parameters:

prediction_obj – Model identifier (dict with config_path or prediction ID)
dataset – New dataset to predict on
dataset_name – Name for the dataset
all_predictions – If True, return all predictions; if False, return single best
verbose – Verbosity level

Returns:

(y_pred, predictions) If all_predictions=True: (predictions_dict, predictions)

Return type:

If all_predictions=False

Example

>>> predictor = Predictor(runner)
>>> y_pred, preds = predictor.predict(
...     {"config_path": "0001_abc123"},
...     X_new
... )