nirs4all.api.retrain module

Module-level retrain() function for nirs4all.

This module provides a simple interface for retraining nirs4all pipelines on new data. It wraps PipelineRunner.retrain() with ergonomic defaults.

Example

>>> import nirs4all
>>> # Full retrain on new data
>>> result = nirs4all.retrain(
...     source="exports/model.n4a",
...     data=new_data,
...     mode="full"
... )
>>> print(f"New RMSE: {result.best_rmse:.4f}")

Retrain a pipeline on new data.

This function enables retraining trained pipelines with various modes, allowing for full retraining, transfer learning, or fine-tuning.

Parameters:

source – Pipeline source to retrain from. Can be: - Prediction dict from result.best or result.top() - Path to exported bundle: "exports/model.n4a" - Path to pipeline config directory
data – New dataset to train on. Can be: - Path to data folder: "new_data/" - Numpy arrays: (X, y) - Dict: {"X": X, "y": y} - SpectroDataset instance
mode – Retrain mode. Options: - “full”: Train everything from scratch (same pipeline structure) - “transfer”: Use existing preprocessing, train new model - “finetune”: Continue training existing model Default: “full”
name – Name for the retrain dataset (for logging). Default: “retrain_dataset”
new_model – Optional new model for transfer mode. Replaces the original model while keeping preprocessing.
epochs – Optional number of epochs for fine-tuning neural networks.
session – Optional Session for resource reuse. If provided, uses the session’s runner.
verbose – Verbosity level (0=quiet, 1=info, 2=debug). Default: 1
save_artifacts – Whether to save retrained artifacts. Default: True
**kwargs – Additional retraining parameters: - learning_rate: Learning rate for fine-tuning - freeze_layers: List of layers to freeze during fine-tuning - step_modes: Per-step mode overrides (advanced)

Returns:

predictions: Predictions from the retrained pipeline
per_dataset: Per-dataset execution details
best: Best prediction entry
best_score: Best model’s primary test score

Return type:

RunResult containing

Raises:

ValueError – If mode is invalid or source cannot be resolved.
FileNotFoundError – If source references files that don’t exist.

Examples

Full retrain on new data:

>>> import nirs4all
>>>
>>> # Original training
>>> original = nirs4all.run(pipeline, train_data)
>>>
>>> # Retrain on new data with same pipeline
>>> retrained = nirs4all.retrain(
...     source=original.best,
...     data=new_train_data,
...     mode="full"
... )
>>> print(f"Original: {original.best_rmse:.4f}")
>>> print(f"Retrained: {retrained.best_rmse:.4f}")

Transfer learning with new model:

>>> from sklearn.ensemble import RandomForestRegressor
>>>
>>> result = nirs4all.retrain(
...     source="exports/pls_model.n4a",
...     data=new_data,
...     mode="transfer",
...     new_model=RandomForestRegressor(n_estimators=100)
... )

Fine-tune a neural network:

>>> result = nirs4all.retrain(
...     source="exports/nn_model.n4a",
...     data=new_data,
...     mode="finetune",
...     epochs=10,
...     learning_rate=0.0001
... )

Retrain from an exported bundle:

>>> result = nirs4all.retrain(
...     source="exports/wheat_model.n4a",
...     data="new_wheat_data/",
...     mode="full",
...     verbose=2
... )
>>> result.export("exports/retrained_model.n4a")