nirs4all.pipeline.retrainer module

Pipeline retrainer - Handles retraining, transfer learning, and fine-tuning.

This module provides the Retrainer class for retraining pipelines with different modes: full retrain, transfer learning (reuse preprocessing), and fine-tuning.

Phase 7 Implementation:

This module enables retraining trained pipelines on new data with various modes: - full: Train everything from scratch (same pipeline structure) - transfer: Use existing preprocessing artifacts, train new model - finetune: Continue training existing model with new data

Design Principles:

Controller-Agnostic: Works with any controller type via per-step mode control
Reuses Existing Infrastructure: Leverages resolver, artifact provider, executor
Composable: Same infrastructure for all retrain modes

class nirs4all.pipeline.retrainer.ExtractedPipeline(steps: List[Any] = <factory>, trace: ExecutionTrace | None = None, artifact_provider: ArtifactProvider | None = None, model_step_index: int | None = None, preprocessing_chain: str = '', source_pipeline_uid: str = '', metadata: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

Extracted pipeline for inspection and modification.

Represents a pipeline extracted from a trained prediction, ready for inspection, modification, or re-execution.

steps

List of pipeline steps (can be modified)

Type:: List[Any]

trace

Original execution trace (read-only)

Type:: nirs4all.pipeline.trace.execution_trace.ExecutionTrace | None

artifact_provider

Provider for original artifacts

Type:: nirs4all.pipeline.config.context.ArtifactProvider | None

model_step_index

Index of the model step

Type:: int | None

preprocessing_chain

Summary of preprocessing

Type:: str

source_pipeline_uid

UID of the source pipeline

Type:: str

metadata

Additional metadata

Type:: Dict[str, Any]

artifact_provider: ArtifactProvider | None = None

get_model_step() → Any | None[source]

Get the model step.

Returns:: Model step configuration or None

get_step(index: int) → Any[source]

Get a step by 0-based index.

Parameters:: index – 0-based step index
Returns:: Step configuration

metadata: Dict[str, Any]

model_step_index: int | None = None

preprocessing_chain: str = ''

set_model(model: Any) → None[source]

Replace the model in the model step.

Parameters:: model – New model to use

set_step(index: int, step: Any) → None[source]

Set a step by 0-based index.

Parameters:

index – 0-based step index
step – New step configuration

source_pipeline_uid: str = ''

steps: List[Any]

trace: ExecutionTrace | None = None

class nirs4all.pipeline.retrainer.RetrainArtifactProvider(base_provider: ArtifactProvider, retrain_config: RetrainConfig, trace: ExecutionTrace | None = None)[source]

Bases: ArtifactProvider

Artifact provider for retraining that respects step modes.

Provides artifacts only for steps that should use existing artifacts (i.e., mode=’predict’), while returning None for steps that should train.

base_provider: Underlying artifact provider

retrain_config: Configuration determining which steps use artifacts

trace: Execution trace for step type detection

get_artifact(step_index: int, fold_id: int | None = None) → Any | None[source]

Get a single artifact for a step if applicable.

Parameters:

step_index – 1-based step index
fold_id – Optional fold ID for fold-specific artifacts

Returns:

Artifact object or None if step should train

get_artifacts_for_step(step_index: int, branch_path: List[int] | None = None) → List[Tuple[str, Any]][source]

Get all artifacts for a step if applicable.

Parameters:

step_index – 1-based step index
branch_path – Optional branch path filter

Returns:

List of (artifact_id, artifact_object) tuples, or empty if should train

get_fold_artifacts(step_index: int, branch_path: List[int] | None = None) → List[Tuple[int, Any]][source]

Get all fold-specific artifacts for a step if applicable.

Parameters:

step_index – 1-based step index
branch_path – Optional branch path filter

Returns:

List of (fold_id, artifact_object) tuples, or empty if should train

has_artifacts_for_step(step_index: int) → bool[source]

Check if artifacts should be used for this step.

Parameters:: step_index – 1-based step index
Returns:: True if artifacts are available and should be used

class nirs4all.pipeline.retrainer.RetrainConfig(mode: RetrainMode = RetrainMode.FULL, step_modes: List[StepMode] = <factory>, new_model: Any | None = None, epochs: int | None = None, learning_rate: float | None = None, freeze_layers: List[str] | None = None, metadata: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

Configuration for retraining operation.

mode

Overall retrain mode (full, transfer, finetune)

Type:: nirs4all.pipeline.retrainer.RetrainMode

step_modes

Per-step mode overrides (optional, for fine-grained control)

Type:: List[nirs4all.pipeline.retrainer.StepMode]

new_model

Optional new model to use instead of original (for transfer)

Type:: Any | None

epochs

Optional epochs for fine-tuning

Type:: int | None

learning_rate

Optional learning rate for fine-tuning

Type:: float | None

freeze_layers

Optional list of layers to freeze during fine-tuning

Type:: List[str] | None

metadata

Additional metadata for the retrain operation

Type:: Dict[str, Any]

epochs: int | None = None

freeze_layers: List[str] | None = None

get_step_mode(step_index: int) → StepMode | None[source]

Get mode override for a specific step.

Parameters:: step_index – 1-based step index
Returns:: StepMode if override exists, None otherwise

learning_rate: float | None = None

metadata: Dict[str, Any]

mode: RetrainMode = 'full'

new_model: Any | None = None

should_train_step(step_index: int, is_model: bool = False) → bool[source]

Determine if a step should train based on mode and overrides.

Parameters:

step_index – 1-based step index
is_model – Whether this is the model step

Returns:

True if step should train

step_modes: List[StepMode]

class nirs4all.pipeline.retrainer.RetrainMode(value)[source]

Bases: str, Enum

Mode of retraining operation.

FULL: Train everything from scratch (same pipeline structure)

TRANSFER: Use existing preprocessing artifacts, train new model

FINETUNE: Continue training existing model with new data

FINETUNE = 'finetune'

FULL = 'full'

TRANSFER = 'transfer'

class nirs4all.pipeline.retrainer.Retrainer(runner: PipelineRunner)[source]

Bases: object

Handles retraining pipelines with various modes.

This class manages the retrain workflow: loading saved pipelines, determining which steps to retrain vs. reuse, and executing the modified pipeline on new data.

Phase 7 Implementation:: The Retrainer enables three modes: - full: Train from scratch with same pipeline structure - transfer: Use existing preprocessing, train new model - finetune: Continue training existing model

runner: Parent PipelineRunner instance

resolver: Prediction resolver for loading sources

extract(source: Dict[str, Any] | str | Path | Any, verbose: int = 0) → ExtractedPipeline[source]

Extract a pipeline for inspection or modification.

Returns an ExtractedPipeline object that can be inspected, modified, and then executed with runner.run().

Parameters:

source – Prediction source (dict, folder, Run, artifact_id, bundle)
verbose – Verbosity level

Returns:

ExtractedPipeline for inspection/modification

Example

>>> extracted = retrainer.extract(best_pred)
>>> print(extracted.steps)
>>> extracted.steps[-1] = {"model": RandomForestRegressor()}
>>> preds, _ = runner.run(extracted.steps, new_data)

Retrain a pipeline on new data.

Parameters:

source – Prediction source (dict, folder, Run, artifact_id, bundle)
dataset – New dataset to train on
mode – Retrain mode (‘full’, ‘transfer’, ‘finetune’)
dataset_name – Name for the dataset
new_model – Optional new model for transfer mode
epochs – Optional epochs for fine-tuning
step_modes – Optional per-step mode overrides
verbose – Verbosity level
**kwargs – Additional parameters (learning_rate, freeze_layers, etc.)

Returns:

Tuple of (predictions, dataset_predictions_dict)

Example

>>> retrainer = Retrainer(runner)
>>>
>>> # Full retrain
>>> preds, _ = retrainer.retrain(best_pred, new_data, mode='full')
>>>
>>> # Transfer: use preprocessing, new model
>>> preds, _ = retrainer.retrain(best_pred, new_data, mode='transfer')
>>>
>>> # Finetune: continue training
>>> preds, _ = retrainer.retrain(best_pred, new_data, mode='finetune', epochs=10)

class nirs4all.pipeline.retrainer.StepMode(step_index: int, mode: str = 'train', artifact_id: str | None = None, kwargs: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

Mode override for a specific step during retraining.

Enables fine-grained control over which steps train vs. use existing artifacts.

step_index

1-based step index to apply this mode to

Type:: int

mode

How to execute this step (‘train’, ‘predict’, ‘skip’)

Type:: str

artifact_id

For ‘predict’ mode, specific artifact to use

Type:: str | None

kwargs

Additional step-specific parameters (e.g., epochs for finetune)

Type:: Dict[str, Any]

artifact_id: str | None = None

is_predict() → bool[source]

Check if this step should use existing artifacts.

Returns:: True if step should use existing artifacts (predict mode)

is_train() → bool[source]

Check if this step should train.

Returns:: True if step should be trained

kwargs: Dict[str, Any]

mode: str = 'train'

step_index: int