nirs4all.api package
Submodules
- nirs4all.api.explain module
- nirs4all.api.generate module
- nirs4all.api.predict module
- nirs4all.api.result module
ExplainResultExplainResult.shap_valuesExplainResult.feature_namesExplainResult.base_valueExplainResult.visualizationsExplainResult.explainer_typeExplainResult.model_nameExplainResult.n_samplesExplainResult.get_feature_importance()ExplainResult.get_sample_explanation()ExplainResult.to_dataframe()ExplainResult.__post_init__()ExplainResult.__repr__()ExplainResult.__str__()ExplainResult.base_valueExplainResult.explainer_typeExplainResult.feature_namesExplainResult.get_feature_importance()ExplainResult.get_sample_explanation()ExplainResult.mean_abs_shapExplainResult.model_nameExplainResult.n_samplesExplainResult.shap_valuesExplainResult.shapeExplainResult.to_dataframe()ExplainResult.top_featuresExplainResult.valuesExplainResult.visualizations
PredictResultPredictResult.y_predPredictResult.metadataPredictResult.sample_indicesPredictResult.model_namePredictResult.preprocessing_stepsPredictResult.to_numpy()PredictResult.to_list()PredictResult.to_dataframe()PredictResult.flatten()PredictResult.__len__()PredictResult.__post_init__()PredictResult.__repr__()PredictResult.__str__()PredictResult.flatten()PredictResult.is_multioutputPredictResult.metadataPredictResult.model_namePredictResult.preprocessing_stepsPredictResult.sample_indicesPredictResult.shapePredictResult.to_dataframe()PredictResult.to_list()PredictResult.to_numpy()PredictResult.valuesPredictResult.y_pred
RunResultRunResult.predictionsRunResult.per_datasetRunResult.top()RunResult.export()RunResult.filter()RunResult.get_datasets()RunResult.get_models()RunResult.__repr__()RunResult.__str__()RunResult.artifacts_pathRunResult.bestRunResult.best_accuracyRunResult.best_r2RunResult.best_rmseRunResult.best_scoreRunResult.export()RunResult.export_model()RunResult.filter()RunResult.get_datasets()RunResult.get_models()RunResult.num_predictionsRunResult.per_datasetRunResult.predictionsRunResult.summary()RunResult.top()RunResult.validate()
- nirs4all.api.retrain module
- nirs4all.api.run module
- nirs4all.api.session module
SessionSession.nameSession.pipelineSession.statusSession.is_trainedSession.runnerSession.workspace_pathSession.__enter__()Session.__exit__()Session.__repr__()Session.close()Session.historySession.is_trainedSession.nameSession.pipelineSession.predict()Session.retrain()Session.run()Session.runnerSession.save()Session.statusSession.workspace_path
load_session()session()
Module contents
NIRS4All API Module - High-level functional interface.
This module provides the primary public API for nirs4all, offering simple function-based entry points that wrap the underlying PipelineRunner.
- Public API:
- run(pipeline, dataset, **kwargs) -> RunResult
Execute a training pipeline on a dataset.
- predict(model, data, **kwargs) -> PredictResult
Make predictions with a trained model.
- explain(model, data, **kwargs) -> ExplainResult
Generate SHAP explanations for model predictions.
- retrain(source, data, **kwargs) -> RunResult
Retrain a pipeline on new data.
- session(**kwargs) -> Session
Create an execution session for resource reuse.
- generate(n_samples, **kwargs) -> SpectroDataset | (X, y)
Generate synthetic NIRS data for testing and research.
Example
>>> import nirs4all
>>> from sklearn.preprocessing import MinMaxScaler
>>> from sklearn.cross_decomposition import PLSRegression
>>>
>>> result = nirs4all.run(
... pipeline=[MinMaxScaler(), PLSRegression(10)],
... dataset="sample_data/regression",
... verbose=1
... )
>>> print(f"Best RMSE: {result.best_rmse:.4f}")
For more examples, see the examples/Q40_new_api.py file.
- class nirs4all.api.ExplainResult(shap_values: Any, feature_names: list[str] | None = None, base_value: float | ndarray | None = None, visualizations: dict[str, ~pathlib.Path]=<factory>, explainer_type: str = 'auto', model_name: str = '', n_samples: int = 0)[source]
Bases:
objectResult from nirs4all.explain().
Wraps SHAP explanation outputs with visualization helpers and accessors.
- shap_values
SHAP values array or Explanation object.
- Type:
Any
- base_value
Expected value (baseline prediction).
- Type:
float | numpy.ndarray | None
- visualizations
Paths to generated visualization files.
- Type:
- Properties:
values: Raw SHAP values array. shape: Shape of SHAP values array. mean_abs_shap: Mean absolute SHAP values per feature. top_features: Feature names sorted by importance.
Example
>>> result = nirs4all.explain(model, X_test) >>> print(f"Top features: {result.top_features[:5]}") >>> importance = result.get_feature_importance()
- get_feature_importance(top_n: int | None = None, normalize: bool = False) dict[str, float][source]
Get feature importance ranking.
- Parameters:
top_n – If provided, return only top N features.
normalize – If True, normalize values to sum to 1.
- Returns:
Dictionary mapping feature names to importance values.
- get_sample_explanation(idx: int) dict[str, float][source]
Get SHAP explanation for a single sample.
- Parameters:
idx – Sample index.
- Returns:
Dictionary mapping feature names to SHAP values for that sample.
- property mean_abs_shap: ndarray
Get mean absolute SHAP values per feature.
- Returns:
1D array of mean |SHAP| values, one per feature.
- to_dataframe(include_feature_names: bool = True)[source]
Get SHAP values as pandas DataFrame.
- Parameters:
include_feature_names – If True, use feature names as columns.
- Returns:
pandas DataFrame with SHAP values.
- Raises:
ImportError – If pandas is not available.
- property top_features: list[str]
Get feature names sorted by importance (descending).
- Returns:
List of feature names, most important first. Returns indices as strings if feature_names not available.
- class nirs4all.api.PredictResult(y_pred: ndarray, metadata: dict[str, ~typing.Any]=<factory>, sample_indices: ndarray | None = None, model_name: str = '', preprocessing_steps: list[str] = <factory>)[source]
Bases:
objectResult from nirs4all.predict().
Wraps prediction outputs with convenient accessors and conversion methods.
- y_pred
Predicted values array (n_samples,) or (n_samples, n_outputs).
- Type:
- sample_indices
Optional indices of predicted samples.
- Type:
numpy.ndarray | None
- Properties:
values: Alias for y_pred (for consistency). shape: Shape of prediction array. is_multioutput: True if predictions have multiple outputs.
Example
>>> result = nirs4all.predict(model, X_new) >>> print(f"Predictions shape: {result.shape}") >>> df = result.to_dataframe()
- to_dataframe(include_indices: bool = True)[source]
Get predictions as pandas DataFrame.
- Parameters:
include_indices – If True and sample_indices available, include as column.
- Returns:
pandas DataFrame with predictions.
- Raises:
ImportError – If pandas is not available.
- class nirs4all.api.RunResult(predictions: Predictions, per_dataset: dict[str, Any], _runner: PipelineRunner | None = None)[source]
Bases:
objectResult from nirs4all.run().
Provides convenient access to predictions, best model, and artifacts. Wraps the raw (predictions, per_dataset) tuple returned by PipelineRunner.run().
- predictions
Predictions object containing all pipeline results.
- Type:
- Properties:
best: Best prediction entry by default ranking. best_score: Best model’s primary test score. best_rmse: Best model’s RMSE (regression). best_r2: Best model’s R² (regression). best_accuracy: Best model’s accuracy (classification). artifacts_path: Path to run artifacts directory. num_predictions: Total number of predictions stored.
Example
>>> result = nirs4all.run(pipeline, dataset) >>> print(f"Best RMSE: {result.best_rmse:.4f}") >>> print(f"Best R²: {result.best_r2:.4f}") >>> result.export("exports/best_model.n4a")
- property artifacts_path: Path | None
Get path to workspace artifacts directory.
- Returns:
Path to the workspace directory, or None if not available.
- property best: dict[str, Any]
Get best prediction entry by default ranking.
- Returns:
Dictionary containing best model’s metrics, name, and configuration. Empty dict if no predictions available.
- property best_accuracy: float
Get best model’s accuracy score (for classification).
Looks for ‘accuracy’ as a flat key (from display_metrics), then in scores dict.
- Returns:
Accuracy value or NaN if unavailable.
- property best_r2: float
Get best model’s R² score.
Looks for ‘r2’ as a flat key (from display_metrics), then in scores dict.
- Returns:
R² value or NaN if unavailable.
- property best_rmse: float
Get best model’s RMSE score.
Looks for ‘rmse’ as a flat key (from display_metrics), then in scores dict, then falls back to test_score if metric is rmse-like.
- Returns:
RMSE value or NaN if unavailable.
- property best_score: float
Get best model’s primary test score.
- Returns:
The test_score value from best prediction, or NaN if unavailable.
- export(output_path: str | Path, format: str = 'n4a', source: dict[str, Any] | None = None, chain_id: str | None = None) Path[source]
Export a model to bundle.
Two export paths are supported:
Store-based (preferred) – pass
chain_idto export directly from the DuckDB workspace:>>> result.export("model.n4a", chain_id="abc123")
Resolver-based (legacy) – exports via
PipelineRunner.export:>>> result.export("model.n4a") # uses best prediction
- Parameters:
output_path – Path for the exported bundle file.
format – Export format (‘n4a’ or ‘n4a.py’).
source – Prediction dict to export. If None, exports best model.
chain_id – Chain identifier for store-based export. When provided,
sourceis ignored and the chain is exported directly from the DuckDB store.
- Returns:
Path to the exported bundle file.
- Raises:
RuntimeError – If runner reference is not available.
ValueError – If no predictions available and source not provided.
- export_model(output_path: str | Path, source: dict[str, Any] | None = None, format: str | None = None, fold: int | None = None) Path[source]
Export only the model artifact (lightweight).
Unlike export() which creates a full bundle, this exports just the model.
- Parameters:
output_path – Path for the output model file.
source – Prediction dict to export. If None, exports best model.
format – Model format (inferred from extension if None).
fold – Fold index to export (default: fold 0).
- Returns:
Path to the exported model file.
- Raises:
RuntimeError – If runner reference is not available.
- filter(**kwargs) list[dict[str, Any]][source]
Filter predictions by criteria.
- Parameters:
**kwargs – Filter criteria passed to predictions.filter_predictions(). Supported kwargs include: - dataset_name: Filter by dataset name - model_name: Filter by model name - partition: Filter by partition (‘train’, ‘val’, ‘test’) - fold_id: Filter by fold ID - step_idx: Filter by pipeline step index - branch_id: Filter by branch ID - load_arrays: If True, load actual arrays (default: True)
- Returns:
List of matching prediction dictionaries.
- get_datasets() list[str][source]
Get list of unique dataset names.
- Returns:
List of dataset names in predictions.
- get_models() list[str][source]
Get list of unique model names.
- Returns:
List of model names in predictions.
- property num_predictions: int
Get total number of predictions stored.
- Returns:
Number of prediction entries.
- predictions: Predictions
- summary() str[source]
Get a summary string of the run result.
- Returns:
Multi-line summary string with key metrics.
- top(n: int = 5, **kwargs) list[dict[str, Any]] | dict[tuple, list[dict[str, Any]]][source]
Get top N predictions by ranking.
- Parameters:
n – Number of top predictions to return. When group_by is used, this means top N per group (e.g., top 3 per dataset).
**kwargs –
Additional arguments passed to predictions.top(). Supported kwargs include: - rank_metric: Metric to rank by (default: uses record’s metric) - rank_partition: Partition to rank on (default: “val”) - display_partition: Partition for display metrics (default: “test”) - aggregate_partitions: If True, include train/val/test data - ascending: Sort order (None = infer from metric) - group_by: Group predictions by column(s). Returns top N per group.
Each result includes ‘group_key’ for easy filtering.
return_grouped: If True with group_by, return dict of group->results instead of flat list. Default: False.
- Returns:
- List of prediction dicts,
ranked by score. With group_by, returns top N per group as flat list.
If return_grouped=True: Dict mapping group keys to lists of predictions.
- Return type:
If return_grouped=False (default)
Examples
>>> # Top 5 overall >>> result.top(5) >>> >>> # Top 3 per dataset (flat list) >>> top_per_ds = result.top(3, group_by='dataset_name') >>> ds1 = [r for r in top_per_ds if r['group_key'] == ('my_dataset',)] >>> >>> # Top 3 per dataset (grouped dict) >>> grouped = result.top(3, group_by='dataset_name', return_grouped=True) >>> for key, results in grouped.items(): ... print(f"{key}: {len(results)} results") >>> >>> # Multi-column grouping: top 2 per (dataset, model) combination >>> top_per_combo = result.top(2, group_by=['dataset_name', 'model_name']) >>> # Group keys are tuples: ('wheat', 'PLSRegression'), ('corn', 'RandomForest') >>> for r in top_per_combo: ... dataset, model = r['group_key'] ... print(f"{dataset}/{model}: {r['test_score']:.4f}")
- validate(check_nan_metrics: bool = True, check_empty: bool = True, raise_on_failure: bool = True, nan_threshold: float = 0.0) dict[str, Any][source]
Validate the run result for common issues.
Checks for NaN values in metrics, empty predictions, and other issues that might indicate problems with the pipeline execution.
- Parameters:
check_nan_metrics – If True, check for NaN values in metrics.
check_empty – If True, check for empty predictions.
raise_on_failure – If True, raise ValueError on validation failure.
nan_threshold – Maximum allowed ratio of predictions with NaN metrics (0.0 = none allowed).
- Returns:
valid: True if all checks passed.
issues: List of issue descriptions.
nan_count: Number of predictions with NaN metrics.
total_count: Total number of predictions.
- Return type:
Dictionary with validation results
- Raises:
ValueError – If raise_on_failure=True and validation fails.
Example
>>> result = nirs4all.run(pipeline, dataset) >>> result.validate() # Raises if issues found >>> # Or check without raising >>> report = result.validate(raise_on_failure=False) >>> if not report['valid']: ... print(f"Issues: {report['issues']}")
- class nirs4all.api.Session(pipeline: List[Any] | None = None, name: str = '', **runner_kwargs: Any)[source]
Bases:
objectExecution session for resource reuse and stateful pipeline management.
A session can be used in two modes:
Resource sharing mode (no pipeline): Share a PipelineRunner across multiple nirs4all.run() calls.
Stateful pipeline mode (with pipeline): Manage a single pipeline’s lifecycle: train, predict, save, load.
- name
Session/pipeline name for identification.
- pipeline
Pipeline definition (if in stateful mode).
- status
Current session status (‘initialized’, ‘trained’, ‘error’).
- is_trained
Whether the pipeline has been trained.
- runner
The shared PipelineRunner instance.
- workspace_path
Path to the workspace directory.
- Example (resource sharing):
>>> with nirs4all.session(verbose=1) as s: ... result1 = nirs4all.run(pipeline1, data1, session=s) ... result2 = nirs4all.run(pipeline2, data2, session=s)
- Example (stateful pipeline):
>>> session = nirs4all.Session(pipeline=pipeline, name="MyModel") >>> result = session.run("sample_data/regression") >>> predictions = session.predict(new_data) >>> session.save("exports/my_model.n4a")
- __exit__(exc_type: Any, exc_val: Any, exc_tb: Any) None[source]
Exit the session context and clean up resources.
- close() None[source]
Clean up session resources.
Called automatically when exiting a context manager block.
- predict(dataset: str | Path | Any, **kwargs: Any) PredictResult[source]
Make predictions using the trained pipeline.
- Parameters:
dataset – Data to predict on. Can be: - Path to data folder - Numpy array X - Dict with ‘X’ key
**kwargs – Additional arguments for prediction.
- Returns:
PredictResult with predictions.
- Raises:
ValueError – If session has not been trained.
- retrain(dataset: str | Path | Any, mode: str = 'full', **kwargs: Any) RunResult[source]
Retrain the pipeline on new data.
- Parameters:
dataset – New dataset to train on.
mode – Retrain mode (‘full’, ‘transfer’, ‘finetune’).
**kwargs – Additional arguments for retraining.
- Returns:
RunResult from retraining.
- Raises:
ValueError – If session has not been trained.
- run(dataset: str | Path | Any, *, plots_visible: bool = False, **kwargs: Any) RunResult[source]
Train the session’s pipeline on a dataset.
- Parameters:
dataset – Dataset to train on. Can be: - Path to data folder: “sample_data/regression” - Numpy arrays: (X, y) - Dict: {“X”: X, “y”: y}
plots_visible – Whether to show plots during training.
**kwargs – Additional arguments passed to runner.run().
- Returns:
RunResult with predictions and metrics.
- Raises:
ValueError – If no pipeline was provided to the session.
- property runner: PipelineRunner
Get or create the shared PipelineRunner instance.
The runner is created lazily on first access.
- Returns:
The shared PipelineRunner instance.
- save(path: str | Path) Path[source]
Save the trained session to a bundle file.
- Parameters:
path – Output path for the .n4a bundle file.
- Returns:
Path to the saved bundle file.
- Raises:
ValueError – If session has not been trained.
- nirs4all.api.explain(model: Dict[str, Any] | str | Path, data: str | Path | ndarray | Dict[str, Any] | SpectroDataset | DatasetConfigs, *, name: str = 'explain_dataset', session: Session | None = None, verbose: int = 1, plots_visible: bool = True, n_samples: int | None = None, explainer_type: str = 'auto', **shap_params: Any) ExplainResult[source]
Generate SHAP explanations for a trained model.
This function provides a simple interface for computing SHAP values to explain model predictions. It supports various SHAP explainer types and generates visualizations.
- Parameters:
model – Trained model specification. Can be: - Prediction dict from
result.bestorresult.top()- Path to exported bundle:"exports/model.n4a"- Path to pipeline config directorydata – Data to explain. Can be: - Path to data folder:
"test_data/"- Numpy array:X_test(n_samples, n_features) - Dict:{"X": X, "metadata": meta}- SpectroDataset instancename – Name for the explanation dataset (for logging). Default: “explain_dataset”
session – Optional Session for resource reuse. If provided, uses the session’s runner.
verbose – Verbosity level (0=quiet, 1=info, 2=debug). Default: 1
plots_visible – Whether to display plots interactively. Default: True
n_samples – Number of background samples for SHAP. If None, uses default (typically 100-200).
explainer_type – SHAP explainer type. Options: - “auto”: Automatically select best explainer - “tree”: TreeExplainer (for tree-based models) - “kernel”: KernelExplainer (model-agnostic) - “deep”: DeepExplainer (for neural networks) - “linear”: LinearExplainer (for linear models) Default: “auto”
**shap_params – Additional SHAP configuration parameters. Common options: - feature_names: List of feature names - background_samples: Number of background samples - max_display: Max features to show in plots
- Returns:
- shap_values: SHAP values array or Explanation object
feature_names: Names/labels of features
base_value: Expected value (baseline prediction)
visualizations: Paths to generated plots
mean_abs_shap: Mean absolute SHAP per feature
top_features: Features sorted by importance
Use
result.get_feature_importance()for importance ranking, orresult.to_dataframe()for pandas DataFrame output.- Return type:
ExplainResult containing
- Raises:
ValueError – If model specification is invalid.
FileNotFoundError – If model bundle or data path doesn’t exist.
ImportError – If SHAP is not installed.
Examples
Explain an exported model:
>>> import nirs4all >>> >>> result = nirs4all.explain( ... model="exports/wheat_model.n4a", ... data=X_test ... ) >>> print(f"Top 5 features: {result.top_features[:5]}") >>> importance = result.get_feature_importance(top_n=10)
Explain using a result from a previous run:
>>> # Training >>> train_result = nirs4all.run(pipeline, train_data) >>> >>> # Explain best model >>> explain_result = nirs4all.explain( ... model=train_result.best, ... data=X_test, ... explainer_type="kernel" ... )
Get SHAP values as DataFrame:
>>> result = nirs4all.explain(model, data) >>> df = result.to_dataframe() >>> df.to_csv("shap_values.csv")
Get per-sample explanations:
>>> result = nirs4all.explain(model, data) >>> sample_0_shap = result.get_sample_explanation(0) >>> for feature, value in list(sample_0_shap.items())[:5]: ... print(f"{feature}: {value:.4f}")
See also
nirs4all.run(): Train a pipelinenirs4all.predict(): Make predictionsnirs4all.api.result.ExplainResult: Result class
- nirs4all.api.load_session(path: str | Path) Session[source]
Load a session from a saved bundle file.
- Parameters:
path – Path to .n4a bundle file.
- Returns:
Session ready for prediction.
Example
>>> session = nirs4all.load_session("exports/model.n4a") >>> predictions = session.predict(new_data)
- nirs4all.api.predict(model: Dict[str, Any] | str | Path | None = None, data: str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs | None = None, *, chain_id: str | None = None, workspace_path: str | Path | None = None, name: str = 'prediction_dataset', all_predictions: bool = False, session: Session | None = None, verbose: int = 0, **runner_kwargs: Any) PredictResult[source]
Make predictions with a trained model on new data.
This function provides a simple interface for running inference with trained nirs4all pipelines.
Two prediction paths are supported:
Store-based (preferred) – pass
chain_idtogether with a raw numpy array fordata:>>> result = nirs4all.predict(chain_id="abc123", data=X_new)
Model-based (legacy) – pass
modeltogether withdata:>>> result = nirs4all.predict(model="exports/model.n4a", data=X_new)
- Parameters:
model – Trained model specification. Can be: - Prediction dict from
result.bestorresult.top()- Path to exported bundle:"exports/model.n4a"- Path to pipeline config directory Mutually exclusive withchain_id.data – Data to predict on. Can be: - Path to data folder:
"new_data/"- Numpy array:X_new(n_samples, n_features) - Tuple:(X,)or(X, y)for evaluation - Dict:{"X": X, "metadata": meta}- SpectroDataset instancechain_id – Chain identifier in the workspace DuckDB store. When provided, uses the fast store-based replay path. Mutually exclusive with
model.workspace_path – Workspace root directory. Required when using
chain_idoutside a session. Ignored when asessionis provided (the session’s workspace is used instead).name – Name for the prediction dataset (for logging). Default: “prediction_dataset”
all_predictions – If True, return predictions from all folds. If False (default), return single aggregated prediction.
session – Optional Session for resource reuse. If provided, uses the session’s runner.
verbose – Verbosity level (0=quiet, 1=info, 2=debug). Default: 0
**runner_kwargs – Additional PipelineRunner parameters. Common options: plots_visible
- Returns:
- y_pred: Predicted values array (n_samples,)
metadata: Additional prediction metadata
model_name: Name of the model used
preprocessing_steps: List of preprocessing steps applied
Use
result.to_dataframe()for pandas DataFrame output.- Return type:
PredictResult containing
- Raises:
ValueError – If neither
modelnorchain_idis provided, or if both are provided.FileNotFoundError – If model bundle or data path doesn’t exist.
Examples
Predict from a stored chain (preferred):
>>> import nirs4all >>> result = nirs4all.predict(chain_id="abc123", data=X_new)
Predict from an exported bundle:
>>> result = nirs4all.predict( ... model="exports/wheat_model.n4a", ... data=X_new ... )
Predict using a result from a previous run:
>>> train_result = nirs4all.run(pipeline, train_data) >>> pred_result = nirs4all.predict( ... model=train_result.best, ... data=X_test ... )
See also
nirs4all.run(): Train a pipelinenirs4all.explain(): Generate SHAP explanationsnirs4all.api.result.PredictResult: Result class
- nirs4all.api.retrain(source: Dict[str, Any] | str | Path, data: str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs, *, mode: str = 'full', name: str = 'retrain_dataset', new_model: Any | None = None, epochs: int | None = None, session: Session | None = None, verbose: int = 1, save_artifacts: bool = True, **kwargs: Any) RunResult[source]
Retrain a pipeline on new data.
This function enables retraining trained pipelines with various modes, allowing for full retraining, transfer learning, or fine-tuning.
- Parameters:
source – Pipeline source to retrain from. Can be: - Prediction dict from
result.bestorresult.top()- Path to exported bundle:"exports/model.n4a"- Path to pipeline config directorydata – New dataset to train on. Can be: - Path to data folder:
"new_data/"- Numpy arrays:(X, y)- Dict:{"X": X, "y": y}- SpectroDataset instancemode – Retrain mode. Options: - “full”: Train everything from scratch (same pipeline structure) - “transfer”: Use existing preprocessing, train new model - “finetune”: Continue training existing model Default: “full”
name – Name for the retrain dataset (for logging). Default: “retrain_dataset”
new_model – Optional new model for transfer mode. Replaces the original model while keeping preprocessing.
epochs – Optional number of epochs for fine-tuning neural networks.
session – Optional Session for resource reuse. If provided, uses the session’s runner.
verbose – Verbosity level (0=quiet, 1=info, 2=debug). Default: 1
save_artifacts – Whether to save retrained artifacts. Default: True
**kwargs – Additional retraining parameters: - learning_rate: Learning rate for fine-tuning - freeze_layers: List of layers to freeze during fine-tuning - step_modes: Per-step mode overrides (advanced)
- Returns:
predictions: Predictions from the retrained pipeline
per_dataset: Per-dataset execution details
best: Best prediction entry
best_score: Best model’s primary test score
- Return type:
RunResult containing
- Raises:
ValueError – If mode is invalid or source cannot be resolved.
FileNotFoundError – If source references files that don’t exist.
Examples
Full retrain on new data:
>>> import nirs4all >>> >>> # Original training >>> original = nirs4all.run(pipeline, train_data) >>> >>> # Retrain on new data with same pipeline >>> retrained = nirs4all.retrain( ... source=original.best, ... data=new_train_data, ... mode="full" ... ) >>> print(f"Original: {original.best_rmse:.4f}") >>> print(f"Retrained: {retrained.best_rmse:.4f}")
Transfer learning with new model:
>>> from sklearn.ensemble import RandomForestRegressor >>> >>> result = nirs4all.retrain( ... source="exports/pls_model.n4a", ... data=new_data, ... mode="transfer", ... new_model=RandomForestRegressor(n_estimators=100) ... )
Fine-tune a neural network:
>>> result = nirs4all.retrain( ... source="exports/nn_model.n4a", ... data=new_data, ... mode="finetune", ... epochs=10, ... learning_rate=0.0001 ... )
Retrain from an exported bundle:
>>> result = nirs4all.retrain( ... source="exports/wheat_model.n4a", ... data="new_wheat_data/", ... mode="full", ... verbose=2 ... ) >>> result.export("exports/retrained_model.n4a")
See also
nirs4all.run(): Train a pipeline from scratchnirs4all.predict(): Make predictionsnirs4all.pipeline.RetrainMode: Retrain mode enum
- nirs4all.api.run(pipeline: List[Any] | Dict[str, Any] | str | Path | PipelineConfigs | List[List[Any] | Dict[str, Any] | str | Path | PipelineConfigs], dataset: str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs | List[str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs], *, name: str = '', session: Session | None = None, verbose: int = 1, save_artifacts: bool = True, save_charts: bool = True, plots_visible: bool = False, random_state: int | None = None, **runner_kwargs: Any) RunResult[source]
Execute a training pipeline on a dataset.
This is the primary entry point for training ML pipelines on NIRS data. It provides a simpler interface than creating PipelineRunner and config objects directly.
- Parameters:
pipeline –
Pipeline definition. Can be: - List of steps (most common):
[MinMaxScaler(), PLSRegression(10)]- Dict with steps:{"steps": [...], "name": "my_pipeline"}- Path to YAML/JSON config file:"configs/my_pipeline.yaml"- PipelineConfigs object (backward compatibility) - List of pipelines:[pipeline1, pipeline2, ...]- eachpipeline is executed independently (cartesian product with datasets)
dataset –
Dataset definition. Can be: - Path to data folder:
"sample_data/regression"- Numpy arrays:(X, y)orXalone - Dict with arrays:{"X": X, "y": y, "metadata": meta}- SpectroDataset instance - List of SpectroDataset instances (multi-dataset) - DatasetConfigs object (backward compatibility) - List of datasets:[dataset1, dataset2, ...]- eachdataset is used with each pipeline (cartesian product)
name – Optional pipeline name for identification and logging. If not provided, a name will be generated.
session – Optional Session object for resource reuse across multiple runs. When provided, shares workspace and configuration.
verbose – Verbosity level (0=quiet, 1=info, 2=debug, 3=trace). Default: 1
save_artifacts – Whether to save binary artifacts (models, transformers). Default: True
save_charts – Whether to save charts and visual outputs. Default: True
plots_visible – Whether to display plots interactively. Default: False
random_state – Random seed for reproducibility. Default: None (no seeding)
**runner_kwargs – Additional PipelineRunner parameters. See PipelineRunner.__init__ for full list. Common options: - workspace_path: Workspace root directory - continue_on_error: Whether to continue on step failures - show_spinner: Whether to show progress spinners - log_file: Whether to write logs to disk - log_format: Output format (“pretty”, “minimal”, “json”) - show_progress_bar: Whether to show progress bars - max_generation_count: Max pipeline combinations (for generators)
- Returns:
- predictions: Predictions object with all pipeline results
per_dataset: Dictionary with per-dataset execution details
best: Best prediction entry (convenience accessor)
best_score: Best model’s primary test score
best_rmse, best_r2, best_accuracy: Score shortcuts
Use
result.top(n=5)to get top N predictions, orresult.export("path.n4a")to export the best model.- Return type:
RunResult containing
- Raises:
ValueError – If pipeline or dataset format is invalid.
FileNotFoundError – If pipeline config or dataset path doesn’t exist.
Examples
Simple usage with list of steps:
>>> import nirs4all >>> from sklearn.preprocessing import MinMaxScaler >>> from sklearn.cross_decomposition import PLSRegression >>> >>> result = nirs4all.run( ... pipeline=[MinMaxScaler(), PLSRegression(10)], ... dataset="sample_data/regression", ... verbose=1 ... ) >>> print(f"Best RMSE: {result.best_rmse:.4f}")
With cross-validation and multiple models:
>>> from sklearn.model_selection import ShuffleSplit >>> >>> result = nirs4all.run( ... pipeline=[ ... MinMaxScaler(), ... ShuffleSplit(n_splits=3), ... {"model": PLSRegression(10)} ... ], ... dataset="sample_data/regression", ... name="PLS_experiment", ... verbose=2, ... save_artifacts=True ... )
Multiple pipelines executed independently:
>>> pipeline_pls = [MinMaxScaler(), PLSRegression(10)] >>> pipeline_rf = [StandardScaler(), RandomForestRegressor()] >>> >>> result = nirs4all.run( ... pipeline=[pipeline_pls, pipeline_rf], # Two independent pipelines ... dataset="sample_data/regression", ... verbose=1 ... ) >>> print(f"Total configs: {result.num_predictions}")
Cartesian product of pipelines × datasets:
>>> pipelines = [pipeline1, pipeline2, pipeline3] >>> datasets = [dataset_a, dataset_b] >>> >>> # Runs 6 combinations: p1×da, p1×db, p2×da, p2×db, p3×da, p3×db >>> result = nirs4all.run( ... pipeline=pipelines, ... dataset=datasets, ... verbose=1 ... )
Using a session for multiple runs:
>>> with nirs4all.session(verbose=1) as s: ... r1 = nirs4all.run(pipeline1, data, session=s) ... r2 = nirs4all.run(pipeline2, data, session=s) ... print(f"Pipeline 1: {r1.best_score:.4f}") ... print(f"Pipeline 2: {r2.best_score:.4f}")
Export the best model:
>>> result = nirs4all.run(pipeline, dataset) >>> result.export("exports/best_model.n4a")
See also
nirs4all.predict(): Make predictions with a trained modelnirs4all.explain(): Generate SHAP explanationsnirs4all.session(): Create execution session for resource reusenirs4all.PipelineRunner: Direct runner access for advanced use
- nirs4all.api.session(pipeline: List[Any] | None = None, name: str = '', **kwargs: Any) Generator[Session, None, None][source]
Create an execution session context manager.
This is a convenience function that creates a Session and yields it within a context manager block.
- Parameters:
pipeline – Optional pipeline definition for stateful mode.
name – Name for the session/pipeline.
**kwargs – Arguments passed to Session (and ultimately PipelineRunner). Common options: - verbose (int): Verbosity level (0-3). Default: 1 - save_artifacts (bool): Save model artifacts. Default: True - workspace_path (str|Path): Workspace directory. - random_state (int): Random seed for reproducibility.
- Yields:
Session – The active session for use within the block.
- Example (resource sharing):
>>> with nirs4all.session(verbose=2, save_artifacts=True) as s: ... r1 = nirs4all.run(pipeline1, data1, session=s) ... r2 = nirs4all.run(pipeline2, data2, session=s) ... print(f"PLS: {r1.best_score:.4f}, RF: {r2.best_score:.4f}")
- Example (stateful pipeline):
>>> with nirs4all.session(pipeline=my_pipeline, name="Demo") as s: ... result = s.run("sample_data/regression") ... print(f"Best score: {result.best_score:.4f}")