nirs4all.pipeline.retrainer module

Pipeline retrainer - Handles retraining, transfer learning, and fine-tuning.

This module provides the Retrainer class for retraining pipelines with different modes: full retrain, transfer learning (reuse preprocessing), and fine-tuning.

Phase 7 Implementation:

This module enables retraining trained pipelines on new data with various modes: - full: Train everything from scratch (same pipeline structure) - transfer: Use existing preprocessing artifacts, train new model - finetune: Continue training existing model with new data

Design Principles:
  1. Controller-Agnostic: Works with any controller type via per-step mode control

  2. Reuses Existing Infrastructure: Leverages resolver, artifact provider, executor

  3. Composable: Same infrastructure for all retrain modes

class nirs4all.pipeline.retrainer.ExtractedPipeline(steps: List[Any] = <factory>, trace: ExecutionTrace | None = None, artifact_provider: ArtifactProvider | None = None, model_step_index: int | None = None, preprocessing_chain: str = '', source_pipeline_uid: str = '', metadata: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

Extracted pipeline for inspection and modification.

Represents a pipeline extracted from a trained prediction, ready for inspection, modification, or re-execution.

steps

List of pipeline steps (can be modified)

Type:

List[Any]

trace

Original execution trace (read-only)

Type:

nirs4all.pipeline.trace.execution_trace.ExecutionTrace | None

artifact_provider

Provider for original artifacts

Type:

nirs4all.pipeline.config.context.ArtifactProvider | None

model_step_index

Index of the model step

Type:

int | None

preprocessing_chain

Summary of preprocessing

Type:

str

source_pipeline_uid

UID of the source pipeline

Type:

str

metadata

Additional metadata

Type:

Dict[str, Any]

artifact_provider: ArtifactProvider | None = None
get_model_step() Any | None[source]

Get the model step.

Returns:

Model step configuration or None

get_step(index: int) Any[source]

Get a step by 0-based index.

Parameters:

index – 0-based step index

Returns:

Step configuration

metadata: Dict[str, Any]
model_step_index: int | None = None
preprocessing_chain: str = ''
set_model(model: Any) None[source]

Replace the model in the model step.

Parameters:

model – New model to use

set_step(index: int, step: Any) None[source]

Set a step by 0-based index.

Parameters:
  • index – 0-based step index

  • step – New step configuration

source_pipeline_uid: str = ''
steps: List[Any]
trace: ExecutionTrace | None = None
class nirs4all.pipeline.retrainer.RetrainArtifactProvider(base_provider: ArtifactProvider, retrain_config: RetrainConfig, trace: ExecutionTrace | None = None)[source]

Bases: ArtifactProvider

Artifact provider for retraining that respects step modes.

Provides artifacts only for steps that should use existing artifacts (i.e., mode=’predict’), while returning None for steps that should train.

base_provider

Underlying artifact provider

retrain_config

Configuration determining which steps use artifacts

trace

Execution trace for step type detection

get_artifact(step_index: int, fold_id: int | None = None) Any | None[source]

Get a single artifact for a step if applicable.

Parameters:
  • step_index – 1-based step index

  • fold_id – Optional fold ID for fold-specific artifacts

Returns:

Artifact object or None if step should train

get_artifacts_for_step(step_index: int, branch_path: List[int] | None = None) List[Tuple[str, Any]][source]

Get all artifacts for a step if applicable.

Parameters:
  • step_index – 1-based step index

  • branch_path – Optional branch path filter

Returns:

List of (artifact_id, artifact_object) tuples, or empty if should train

get_fold_artifacts(step_index: int, branch_path: List[int] | None = None) List[Tuple[int, Any]][source]

Get all fold-specific artifacts for a step if applicable.

Parameters:
  • step_index – 1-based step index

  • branch_path – Optional branch path filter

Returns:

List of (fold_id, artifact_object) tuples, or empty if should train

has_artifacts_for_step(step_index: int) bool[source]

Check if artifacts should be used for this step.

Parameters:

step_index – 1-based step index

Returns:

True if artifacts are available and should be used

class nirs4all.pipeline.retrainer.RetrainConfig(mode: RetrainMode = RetrainMode.FULL, step_modes: List[StepMode] = <factory>, new_model: Any | None = None, epochs: int | None = None, learning_rate: float | None = None, freeze_layers: List[str] | None = None, metadata: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

Configuration for retraining operation.

mode

Overall retrain mode (full, transfer, finetune)

Type:

nirs4all.pipeline.retrainer.RetrainMode

step_modes

Per-step mode overrides (optional, for fine-grained control)

Type:

List[nirs4all.pipeline.retrainer.StepMode]

new_model

Optional new model to use instead of original (for transfer)

Type:

Any | None

epochs

Optional epochs for fine-tuning

Type:

int | None

learning_rate

Optional learning rate for fine-tuning

Type:

float | None

freeze_layers

Optional list of layers to freeze during fine-tuning

Type:

List[str] | None

metadata

Additional metadata for the retrain operation

Type:

Dict[str, Any]

epochs: int | None = None
freeze_layers: List[str] | None = None
get_step_mode(step_index: int) StepMode | None[source]

Get mode override for a specific step.

Parameters:

step_index – 1-based step index

Returns:

StepMode if override exists, None otherwise

learning_rate: float | None = None
metadata: Dict[str, Any]
mode: RetrainMode = 'full'
new_model: Any | None = None
should_train_step(step_index: int, is_model: bool = False) bool[source]

Determine if a step should train based on mode and overrides.

Parameters:
  • step_index – 1-based step index

  • is_model – Whether this is the model step

Returns:

True if step should train

step_modes: List[StepMode]
class nirs4all.pipeline.retrainer.RetrainMode(value)[source]

Bases: str, Enum

Mode of retraining operation.

FULL

Train everything from scratch (same pipeline structure)

TRANSFER

Use existing preprocessing artifacts, train new model

FINETUNE

Continue training existing model with new data

FINETUNE = 'finetune'
FULL = 'full'
TRANSFER = 'transfer'
class nirs4all.pipeline.retrainer.Retrainer(runner: PipelineRunner)[source]

Bases: object

Handles retraining pipelines with various modes.

This class manages the retrain workflow: loading saved pipelines, determining which steps to retrain vs. reuse, and executing the modified pipeline on new data.

Phase 7 Implementation:

The Retrainer enables three modes: - full: Train from scratch with same pipeline structure - transfer: Use existing preprocessing, train new model - finetune: Continue training existing model

runner

Parent PipelineRunner instance

resolver

Prediction resolver for loading sources

extract(source: Dict[str, Any] | str | Path | Any, verbose: int = 0) ExtractedPipeline[source]

Extract a pipeline for inspection or modification.

Returns an ExtractedPipeline object that can be inspected, modified, and then executed with runner.run().

Parameters:
  • source – Prediction source (dict, folder, Run, artifact_id, bundle)

  • verbose – Verbosity level

Returns:

ExtractedPipeline for inspection/modification

Example

>>> extracted = retrainer.extract(best_pred)
>>> print(extracted.steps)
>>> extracted.steps[-1] = {"model": RandomForestRegressor()}
>>> preds, _ = runner.run(extracted.steps, new_data)
retrain(source: Dict[str, Any] | str | Path | Any, dataset: DatasetConfigs | SpectroDataset | ndarray | Tuple[ndarray, ...] | Dict | List[Dict] | str | List[str], mode: str | RetrainMode = 'full', dataset_name: str = 'retrain_dataset', new_model: Any | None = None, epochs: int | None = None, step_modes: List[StepMode] | None = None, verbose: int = 0, **kwargs) Tuple[Predictions, Dict[str, Any]][source]

Retrain a pipeline on new data.

Parameters:
  • source – Prediction source (dict, folder, Run, artifact_id, bundle)

  • dataset – New dataset to train on

  • mode – Retrain mode (‘full’, ‘transfer’, ‘finetune’)

  • dataset_name – Name for the dataset

  • new_model – Optional new model for transfer mode

  • epochs – Optional epochs for fine-tuning

  • step_modes – Optional per-step mode overrides

  • verbose – Verbosity level

  • **kwargs – Additional parameters (learning_rate, freeze_layers, etc.)

Returns:

Tuple of (predictions, dataset_predictions_dict)

Example

>>> retrainer = Retrainer(runner)
>>>
>>> # Full retrain
>>> preds, _ = retrainer.retrain(best_pred, new_data, mode='full')
>>>
>>> # Transfer: use preprocessing, new model
>>> preds, _ = retrainer.retrain(best_pred, new_data, mode='transfer')
>>>
>>> # Finetune: continue training
>>> preds, _ = retrainer.retrain(best_pred, new_data, mode='finetune', epochs=10)
class nirs4all.pipeline.retrainer.StepMode(step_index: int, mode: str = 'train', artifact_id: str | None = None, kwargs: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

Mode override for a specific step during retraining.

Enables fine-grained control over which steps train vs. use existing artifacts.

step_index

1-based step index to apply this mode to

Type:

int

mode

How to execute this step (‘train’, ‘predict’, ‘skip’)

Type:

str

artifact_id

For ‘predict’ mode, specific artifact to use

Type:

str | None

kwargs

Additional step-specific parameters (e.g., epochs for finetune)

Type:

Dict[str, Any]

artifact_id: str | None = None
is_predict() bool[source]

Check if this step should use existing artifacts.

Returns:

True if step should use existing artifacts (predict mode)

is_train() bool[source]

Check if this step should train.

Returns:

True if step should be trained

kwargs: Dict[str, Any]
mode: str = 'train'
step_index: int