nirs4all.pipeline.minimal_predictor module
Minimal Pipeline Predictor - Execute minimal pipeline for prediction (V3).
This module provides the MinimalPredictor class which executes a minimal pipeline extracted from an execution trace. It reuses existing controllers in predict mode with artifact injection.
The MinimalPredictor is the key component of Phase 5: it ensures that prediction only runs the required steps, not the entire original pipeline.
- V3 Features:
Chain-based artifact identification using chain_path
ArtifactRecord metadata for branch/substep info (no ID parsing)
Support for multi-source pipelines via source_index
- Design Principles:
Controller-Agnostic: Uses existing controllers without hardcoding types
Minimal Execution: Only runs steps needed for the specific prediction
Artifact Injection: Provides pre-loaded artifacts to controllers
Deterministic: Same minimal pipeline -> same prediction
- Usage:
>>> from nirs4all.pipeline.minimal_predictor import MinimalPredictor >>> from nirs4all.pipeline.trace import TraceBasedExtractor >>> >>> # Extract minimal pipeline >>> extractor = TraceBasedExtractor() >>> minimal = extractor.extract(trace, full_pipeline_steps) >>> >>> # Predict using minimal pipeline >>> predictor = MinimalPredictor(artifact_loader, run_dir) >>> y_pred, predictions = predictor.predict(minimal, dataset)
- class nirs4all.pipeline.minimal_predictor.MinimalArtifactProvider(minimal_pipeline: MinimalPipeline, artifact_loader: Any, target_sub_index: int | None = None, target_model_name: str | None = None)[source]
Bases:
ArtifactProviderArtifact provider backed by a MinimalPipeline (V3).
Provides artifacts from the minimal pipeline’s artifact map, which contains StepArtifacts extracted from the execution trace.
This provider uses V3 ArtifactRecord metadata (chain_path, branch_path, substep_index) instead of parsing V2-style artifact IDs.
- minimal_pipeline
The source MinimalPipeline
- artifact_loader
ArtifactLoader for loading actual artifact objects
- target_sub_index
Filter artifacts by substep_index
- target_model_name
Filter artifacts by custom_name
- get_artifact(step_index: int, fold_id: int | None = None) Any | None[source]
Get a single artifact for a step.
- Parameters:
step_index – 1-based step index
fold_id – Optional fold ID for fold-specific artifacts
- Returns:
Artifact object or None if not found
- get_artifacts_for_step(step_index: int, branch_path: List[int] | None = None, branch_id: int | None = None, source_index: int | None = None, substep_index: int | None = None) List[Tuple[str, Any]][source]
Get all artifacts for a step (V3).
Filters artifacts by branch using the branch_path from ArtifactRecord. This is critical for multisource + branching reload, where branch substep artifacts are lumped together in the execution trace but can be distinguished by their artifact records.
Returns tuples of (operator_name, artifact_object) where operator_name is derived from the object class and substep_index (e.g., “MinMaxScaler_1”). This allows transformer controllers to look up artifacts by name.
- Parameters:
step_index – 1-based step index
branch_path – Optional branch path filter (e.g., [0] for branch 0)
branch_id – Optional branch ID filter (used when branch_path not available)
source_index – Optional source/dataset index filter for multi-source
substep_index – Optional substep index filter for branch substeps
- Returns:
List of (operator_name, artifact_object) tuples
- get_fold_artifacts(step_index: int, branch_path: List[int] | None = None) List[Tuple[int, Any]][source]
Get all fold-specific artifacts for a step.
Filters by target_sub_index when set (for subpipelines with multiple models). When target_sub_index is set, looks through all artifact_ids instead of fold_artifact_ids because fold_artifact_ids only stores the last model’s artifacts when multiple models exist in a subpipeline.
- Parameters:
step_index – 1-based step index
branch_path – Optional branch path filter
- Returns:
List of (fold_id, artifact_object) tuples, sorted by fold_id
- class nirs4all.pipeline.minimal_predictor.MinimalPredictor(artifact_loader: Any, run_dir: str | Path, saver: Any = None, manifest_manager: Any = None, verbose: int = 0)[source]
Bases:
objectExecute minimal pipeline for prediction.
This class takes a MinimalPipeline (extracted from an ExecutionTrace) and executes only the required steps using existing controllers with artifact injection.
The MinimalPredictor achieves the Phase 5 goal of “execute only needed steps” by: 1. Using the minimal pipeline’s step list (not the full original pipeline) 2. Injecting pre-loaded artifacts via ArtifactProvider 3. Running controllers in predict mode
- artifact_loader
ArtifactLoader for loading artifacts
- run_dir
Path to run directory
- saver
Optional SimulationSaver for outputs
- manifest_manager
Optional ManifestManager
- verbose
Verbosity level
Example
>>> predictor = MinimalPredictor(artifact_loader, run_dir) >>> y_pred, predictions = predictor.predict(minimal_pipeline, dataset)
- predict(minimal_pipeline: MinimalPipeline, dataset: SpectroDataset, target_model: Dict[str, Any] | None = None) Tuple[ndarray, Predictions][source]
Execute minimal pipeline and return predictions.
Runs only the steps in the minimal pipeline, using pre-loaded artifacts from the execution trace.
- Parameters:
minimal_pipeline – MinimalPipeline to execute
dataset – Dataset to predict on
target_model – Optional target model metadata for filtering
- Returns:
Tuple of (y_pred array, Predictions object)
- predict_with_fold_ensemble(minimal_pipeline: MinimalPipeline, dataset: SpectroDataset, fold_strategy: str = 'weighted_average') Tuple[ndarray, Predictions][source]
Execute minimal pipeline with fold ensemble averaging.
For cross-validation models, runs prediction with each fold model and combines results according to fold_strategy.
- Parameters:
minimal_pipeline – MinimalPipeline to execute
dataset – Dataset to predict on
fold_strategy – How to combine folds (“average”, “weighted_average”)
- Returns:
Tuple of (y_pred array, Predictions object)
- validate_minimal_pipeline(minimal_pipeline: MinimalPipeline) Tuple[bool, List[str]][source]
Validate that minimal pipeline can be executed.
Checks that: - All step configs are present - All required artifacts are loadable - Model step is included
- Parameters:
minimal_pipeline – MinimalPipeline to validate
- Returns:
Tuple of (is_valid, list of issues)