nirs4all.pipeline.minimal_predictor module

Minimal Pipeline Predictor - Execute minimal pipeline for prediction (V3).

This module provides the MinimalPredictor class which executes a minimal pipeline extracted from an execution trace. It reuses existing controllers in predict mode with artifact injection.

The MinimalPredictor is the key component of Phase 5: it ensures that prediction only runs the required steps, not the entire original pipeline.

V3 Features:

Chain-based artifact identification using chain_path
ArtifactRecord metadata for branch/substep info (no ID parsing)
Support for multi-source pipelines via source_index

Design Principles:

Controller-Agnostic: Uses existing controllers without hardcoding types
Minimal Execution: Only runs steps needed for the specific prediction
Artifact Injection: Provides pre-loaded artifacts to controllers
Deterministic: Same minimal pipeline -> same prediction

Usage:

>>> from nirs4all.pipeline.minimal_predictor import MinimalPredictor
>>> from nirs4all.pipeline.trace import TraceBasedExtractor
>>>
>>> # Extract minimal pipeline
>>> extractor = TraceBasedExtractor()
>>> minimal = extractor.extract(trace, full_pipeline_steps)
>>>
>>> # Predict using minimal pipeline
>>> predictor = MinimalPredictor(artifact_loader, run_dir)
>>> y_pred, predictions = predictor.predict(minimal, dataset)

class nirs4all.pipeline.minimal_predictor.MinimalArtifactProvider(minimal_pipeline: MinimalPipeline, artifact_loader: Any, target_sub_index: int | None = None, target_model_name: str | None = None)[source]

Bases: ArtifactProvider

Artifact provider backed by a MinimalPipeline (V3).

Provides artifacts from the minimal pipeline’s artifact map, which contains StepArtifacts extracted from the execution trace.

This provider uses V3 ArtifactRecord metadata (chain_path, branch_path, substep_index) instead of parsing V2-style artifact IDs.

minimal_pipeline: The source MinimalPipeline

artifact_loader: ArtifactLoader for loading actual artifact objects

target_sub_index: Filter artifacts by substep_index

target_model_name: Filter artifacts by custom_name

get_artifact(step_index: int, fold_id: int | None = None) → Any | None[source]

Get a single artifact for a step.

Parameters:

step_index – 1-based step index
fold_id – Optional fold ID for fold-specific artifacts

Returns:

Artifact object or None if not found

get_artifacts_for_step(step_index: int, branch_path: List[int] | None = None, branch_id: int | None = None, source_index: int | None = None, substep_index: int | None = None) → List[Tuple[str, Any]][source]

Get all artifacts for a step (V3).

Filters artifacts by branch using the branch_path from ArtifactRecord. This is critical for multisource + branching reload, where branch substep artifacts are lumped together in the execution trace but can be distinguished by their artifact records.

Returns tuples of (operator_name, artifact_object) where operator_name is derived from the object class and substep_index (e.g., “MinMaxScaler_1”). This allows transformer controllers to look up artifacts by name.

Parameters:

step_index – 1-based step index
branch_path – Optional branch path filter (e.g., [0] for branch 0)
branch_id – Optional branch ID filter (used when branch_path not available)
source_index – Optional source/dataset index filter for multi-source
substep_index – Optional substep index filter for branch substeps

Returns:

List of (operator_name, artifact_object) tuples

get_fold_artifacts(step_index: int, branch_path: List[int] | None = None) → List[Tuple[int, Any]][source]

Get all fold-specific artifacts for a step.

Filters by target_sub_index when set (for subpipelines with multiple models). When target_sub_index is set, looks through all artifact_ids instead of fold_artifact_ids because fold_artifact_ids only stores the last model’s artifacts when multiple models exist in a subpipeline.

Parameters:

step_index – 1-based step index
branch_path – Optional branch path filter

Returns:

List of (fold_id, artifact_object) tuples, sorted by fold_id

get_fold_weights() → Dict[int, float][source]

Get fold weights for CV ensemble averaging.

Returns:: Dictionary mapping fold_id to weight

has_artifacts_for_step(step_index: int) → bool[source]

Check if artifacts exist for a step.

Parameters:: step_index – 1-based step index
Returns:: True if artifacts are available for this step

class nirs4all.pipeline.minimal_predictor.MinimalPredictor(artifact_loader: Any, run_dir: str | Path, saver: Any = None, manifest_manager: Any = None, verbose: int = 0)[source]

Bases: object

Execute minimal pipeline for prediction.

This class takes a MinimalPipeline (extracted from an ExecutionTrace) and executes only the required steps using existing controllers with artifact injection.

The MinimalPredictor achieves the Phase 5 goal of “execute only needed steps” by: 1. Using the minimal pipeline’s step list (not the full original pipeline) 2. Injecting pre-loaded artifacts via ArtifactProvider 3. Running controllers in predict mode

artifact_loader: ArtifactLoader for loading artifacts

run_dir: Path to run directory

saver: Optional SimulationSaver for outputs

manifest_manager: Optional ManifestManager

verbose: Verbosity level

Example

>>> predictor = MinimalPredictor(artifact_loader, run_dir)
>>> y_pred, predictions = predictor.predict(minimal_pipeline, dataset)

predict(minimal_pipeline: MinimalPipeline, dataset: SpectroDataset, target_model: Dict[str, Any] | None = None) → Tuple[ndarray, Predictions][source]

Execute minimal pipeline and return predictions.

Runs only the steps in the minimal pipeline, using pre-loaded artifacts from the execution trace.

Parameters:

minimal_pipeline – MinimalPipeline to execute
dataset – Dataset to predict on
target_model – Optional target model metadata for filtering

Returns:

Tuple of (y_pred array, Predictions object)

predict_with_fold_ensemble(minimal_pipeline: MinimalPipeline, dataset: SpectroDataset, fold_strategy: str = 'weighted_average') → Tuple[ndarray, Predictions][source]

Execute minimal pipeline with fold ensemble averaging.

For cross-validation models, runs prediction with each fold model and combines results according to fold_strategy.

Parameters:

minimal_pipeline – MinimalPipeline to execute
dataset – Dataset to predict on
fold_strategy – How to combine folds (“average”, “weighted_average”)

Returns:

Tuple of (y_pred array, Predictions object)

validate_minimal_pipeline(minimal_pipeline: MinimalPipeline) → Tuple[bool, List[str]][source]

Validate that minimal pipeline can be executed.

Checks that: - All step configs are present - All required artifacts are loadable - Model step is included

Parameters:: minimal_pipeline – MinimalPipeline to validate
Returns:: Tuple of (is_valid, list of issues)