nirs4all.api.retrain module
Module-level retrain() function for nirs4all.
This module provides a simple interface for retraining nirs4all pipelines on new data. It wraps PipelineRunner.retrain() with ergonomic defaults.
Example
>>> import nirs4all
>>> # Full retrain on new data
>>> result = nirs4all.retrain(
... source="exports/model.n4a",
... data=new_data,
... mode="full"
... )
>>> print(f"New RMSE: {result.best_rmse:.4f}")
- nirs4all.api.retrain.retrain(source: Dict[str, Any] | str | Path, data: str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs, *, mode: str = 'full', name: str = 'retrain_dataset', new_model: Any | None = None, epochs: int | None = None, session: Session | None = None, verbose: int = 1, save_artifacts: bool = True, **kwargs: Any) RunResult[source]
Retrain a pipeline on new data.
This function enables retraining trained pipelines with various modes, allowing for full retraining, transfer learning, or fine-tuning.
- Parameters:
source – Pipeline source to retrain from. Can be: - Prediction dict from
result.bestorresult.top()- Path to exported bundle:"exports/model.n4a"- Path to pipeline config directorydata – New dataset to train on. Can be: - Path to data folder:
"new_data/"- Numpy arrays:(X, y)- Dict:{"X": X, "y": y}- SpectroDataset instancemode – Retrain mode. Options: - “full”: Train everything from scratch (same pipeline structure) - “transfer”: Use existing preprocessing, train new model - “finetune”: Continue training existing model Default: “full”
name – Name for the retrain dataset (for logging). Default: “retrain_dataset”
new_model – Optional new model for transfer mode. Replaces the original model while keeping preprocessing.
epochs – Optional number of epochs for fine-tuning neural networks.
session – Optional Session for resource reuse. If provided, uses the session’s runner.
verbose – Verbosity level (0=quiet, 1=info, 2=debug). Default: 1
save_artifacts – Whether to save retrained artifacts. Default: True
**kwargs – Additional retraining parameters: - learning_rate: Learning rate for fine-tuning - freeze_layers: List of layers to freeze during fine-tuning - step_modes: Per-step mode overrides (advanced)
- Returns:
predictions: Predictions from the retrained pipeline
per_dataset: Per-dataset execution details
best: Best prediction entry
best_score: Best model’s primary test score
- Return type:
RunResult containing
- Raises:
ValueError – If mode is invalid or source cannot be resolved.
FileNotFoundError – If source references files that don’t exist.
Examples
Full retrain on new data:
>>> import nirs4all >>> >>> # Original training >>> original = nirs4all.run(pipeline, train_data) >>> >>> # Retrain on new data with same pipeline >>> retrained = nirs4all.retrain( ... source=original.best, ... data=new_train_data, ... mode="full" ... ) >>> print(f"Original: {original.best_rmse:.4f}") >>> print(f"Retrained: {retrained.best_rmse:.4f}")
Transfer learning with new model:
>>> from sklearn.ensemble import RandomForestRegressor >>> >>> result = nirs4all.retrain( ... source="exports/pls_model.n4a", ... data=new_data, ... mode="transfer", ... new_model=RandomForestRegressor(n_estimators=100) ... )
Fine-tune a neural network:
>>> result = nirs4all.retrain( ... source="exports/nn_model.n4a", ... data=new_data, ... mode="finetune", ... epochs=10, ... learning_rate=0.0001 ... )
Retrain from an exported bundle:
>>> result = nirs4all.retrain( ... source="exports/wheat_model.n4a", ... data="new_wheat_data/", ... mode="full", ... verbose=2 ... ) >>> result.export("exports/retrained_model.n4a")
See also
nirs4all.run(): Train a pipeline from scratchnirs4all.predict(): Make predictionsnirs4all.pipeline.RetrainMode: Retrain mode enum