nirs4all.pipeline.retrainer module
Pipeline retrainer - Handles retraining, transfer learning, and fine-tuning.
This module provides the Retrainer class for retraining pipelines with different modes: full retrain, transfer learning (reuse preprocessing), and fine-tuning.
- Phase 7 Implementation:
This module enables retraining trained pipelines on new data with various modes: - full: Train everything from scratch (same pipeline structure) - transfer: Use existing preprocessing artifacts, train new model - finetune: Continue training existing model with new data
- Design Principles:
Controller-Agnostic: Works with any controller type via per-step mode control
Reuses Existing Infrastructure: Leverages resolver, artifact provider, executor
Composable: Same infrastructure for all retrain modes
- class nirs4all.pipeline.retrainer.ExtractedPipeline(steps: List[Any] = <factory>, trace: ExecutionTrace | None = None, artifact_provider: ArtifactProvider | None = None, model_step_index: int | None = None, preprocessing_chain: str = '', source_pipeline_uid: str = '', metadata: Dict[str, ~typing.Any]=<factory>)[source]
Bases:
objectExtracted pipeline for inspection and modification.
Represents a pipeline extracted from a trained prediction, ready for inspection, modification, or re-execution.
- steps
List of pipeline steps (can be modified)
- Type:
List[Any]
- trace
Original execution trace (read-only)
- artifact_provider
Provider for original artifacts
- Type:
- artifact_provider: ArtifactProvider | None = None
- get_step(index: int) Any[source]
Get a step by 0-based index.
- Parameters:
index – 0-based step index
- Returns:
Step configuration
- set_model(model: Any) None[source]
Replace the model in the model step.
- Parameters:
model – New model to use
- set_step(index: int, step: Any) None[source]
Set a step by 0-based index.
- Parameters:
index – 0-based step index
step – New step configuration
- trace: ExecutionTrace | None = None
- class nirs4all.pipeline.retrainer.RetrainArtifactProvider(base_provider: ArtifactProvider, retrain_config: RetrainConfig, trace: ExecutionTrace | None = None)[source]
Bases:
ArtifactProviderArtifact provider for retraining that respects step modes.
Provides artifacts only for steps that should use existing artifacts (i.e., mode=’predict’), while returning None for steps that should train.
- base_provider
Underlying artifact provider
- retrain_config
Configuration determining which steps use artifacts
- trace
Execution trace for step type detection
- get_artifact(step_index: int, fold_id: int | None = None) Any | None[source]
Get a single artifact for a step if applicable.
- Parameters:
step_index – 1-based step index
fold_id – Optional fold ID for fold-specific artifacts
- Returns:
Artifact object or None if step should train
- get_artifacts_for_step(step_index: int, branch_path: List[int] | None = None) List[Tuple[str, Any]][source]
Get all artifacts for a step if applicable.
- Parameters:
step_index – 1-based step index
branch_path – Optional branch path filter
- Returns:
List of (artifact_id, artifact_object) tuples, or empty if should train
- get_fold_artifacts(step_index: int, branch_path: List[int] | None = None) List[Tuple[int, Any]][source]
Get all fold-specific artifacts for a step if applicable.
- Parameters:
step_index – 1-based step index
branch_path – Optional branch path filter
- Returns:
List of (fold_id, artifact_object) tuples, or empty if should train
- class nirs4all.pipeline.retrainer.RetrainConfig(mode: RetrainMode = RetrainMode.FULL, step_modes: List[StepMode] = <factory>, new_model: Any | None = None, epochs: int | None = None, learning_rate: float | None = None, freeze_layers: List[str] | None = None, metadata: Dict[str, ~typing.Any]=<factory>)[source]
Bases:
objectConfiguration for retraining operation.
- mode
Overall retrain mode (full, transfer, finetune)
- step_modes
Per-step mode overrides (optional, for fine-grained control)
- Type:
- new_model
Optional new model to use instead of original (for transfer)
- Type:
Any | None
- get_step_mode(step_index: int) StepMode | None[source]
Get mode override for a specific step.
- Parameters:
step_index – 1-based step index
- Returns:
StepMode if override exists, None otherwise
- mode: RetrainMode = 'full'
- class nirs4all.pipeline.retrainer.RetrainMode(value)[source]
-
Mode of retraining operation.
- FULL
Train everything from scratch (same pipeline structure)
- TRANSFER
Use existing preprocessing artifacts, train new model
- FINETUNE
Continue training existing model with new data
- FINETUNE = 'finetune'
- FULL = 'full'
- TRANSFER = 'transfer'
- class nirs4all.pipeline.retrainer.Retrainer(runner: PipelineRunner)[source]
Bases:
objectHandles retraining pipelines with various modes.
This class manages the retrain workflow: loading saved pipelines, determining which steps to retrain vs. reuse, and executing the modified pipeline on new data.
- Phase 7 Implementation:
The Retrainer enables three modes: - full: Train from scratch with same pipeline structure - transfer: Use existing preprocessing, train new model - finetune: Continue training existing model
- runner
Parent PipelineRunner instance
- resolver
Prediction resolver for loading sources
- extract(source: Dict[str, Any] | str | Path | Any, verbose: int = 0) ExtractedPipeline[source]
Extract a pipeline for inspection or modification.
Returns an ExtractedPipeline object that can be inspected, modified, and then executed with runner.run().
- Parameters:
source – Prediction source (dict, folder, Run, artifact_id, bundle)
verbose – Verbosity level
- Returns:
ExtractedPipeline for inspection/modification
Example
>>> extracted = retrainer.extract(best_pred) >>> print(extracted.steps) >>> extracted.steps[-1] = {"model": RandomForestRegressor()} >>> preds, _ = runner.run(extracted.steps, new_data)
- retrain(source: Dict[str, Any] | str | Path | Any, dataset: DatasetConfigs | SpectroDataset | ndarray | Tuple[ndarray, ...] | Dict | List[Dict] | str | List[str], mode: str | RetrainMode = 'full', dataset_name: str = 'retrain_dataset', new_model: Any | None = None, epochs: int | None = None, step_modes: List[StepMode] | None = None, verbose: int = 0, **kwargs) Tuple[Predictions, Dict[str, Any]][source]
Retrain a pipeline on new data.
- Parameters:
source – Prediction source (dict, folder, Run, artifact_id, bundle)
dataset – New dataset to train on
mode – Retrain mode (‘full’, ‘transfer’, ‘finetune’)
dataset_name – Name for the dataset
new_model – Optional new model for transfer mode
epochs – Optional epochs for fine-tuning
step_modes – Optional per-step mode overrides
verbose – Verbosity level
**kwargs – Additional parameters (learning_rate, freeze_layers, etc.)
- Returns:
Tuple of (predictions, dataset_predictions_dict)
Example
>>> retrainer = Retrainer(runner) >>> >>> # Full retrain >>> preds, _ = retrainer.retrain(best_pred, new_data, mode='full') >>> >>> # Transfer: use preprocessing, new model >>> preds, _ = retrainer.retrain(best_pred, new_data, mode='transfer') >>> >>> # Finetune: continue training >>> preds, _ = retrainer.retrain(best_pred, new_data, mode='finetune', epochs=10)
- class nirs4all.pipeline.retrainer.StepMode(step_index: int, mode: str = 'train', artifact_id: str | None = None, kwargs: Dict[str, ~typing.Any]=<factory>)[source]
Bases:
objectMode override for a specific step during retraining.
Enables fine-grained control over which steps train vs. use existing artifacts.