nirs4all.api.run module
Module-level run() function for nirs4all.
This module provides the primary entry point for training ML pipelines on NIRS data. It wraps PipelineRunner.run() with a simpler, more ergonomic interface.
Example
>>> import nirs4all
>>> result = nirs4all.run(
... pipeline=[MinMaxScaler(), PLSRegression(10)],
... dataset="sample_data/regression",
... verbose=1
... )
>>> print(f"Best RMSE: {result.best_rmse:.4f}")
- nirs4all.api.run.run(pipeline: List[Any] | Dict[str, Any] | str | Path | PipelineConfigs | List[List[Any] | Dict[str, Any] | str | Path | PipelineConfigs], dataset: str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs | List[str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs], *, name: str = '', session: Session | None = None, verbose: int = 1, save_artifacts: bool = True, save_charts: bool = True, plots_visible: bool = False, random_state: int | None = None, **runner_kwargs: Any) RunResult[source]
Execute a training pipeline on a dataset.
This is the primary entry point for training ML pipelines on NIRS data. It provides a simpler interface than creating PipelineRunner and config objects directly.
- Parameters:
pipeline –
Pipeline definition. Can be: - List of steps (most common):
[MinMaxScaler(), PLSRegression(10)]- Dict with steps:{"steps": [...], "name": "my_pipeline"}- Path to YAML/JSON config file:"configs/my_pipeline.yaml"- PipelineConfigs object (backward compatibility) - List of pipelines:[pipeline1, pipeline2, ...]- eachpipeline is executed independently (cartesian product with datasets)
dataset –
Dataset definition. Can be: - Path to data folder:
"sample_data/regression"- Numpy arrays:(X, y)orXalone - Dict with arrays:{"X": X, "y": y, "metadata": meta}- SpectroDataset instance - List of SpectroDataset instances (multi-dataset) - DatasetConfigs object (backward compatibility) - List of datasets:[dataset1, dataset2, ...]- eachdataset is used with each pipeline (cartesian product)
name – Optional pipeline name for identification and logging. If not provided, a name will be generated.
session – Optional Session object for resource reuse across multiple runs. When provided, shares workspace and configuration.
verbose – Verbosity level (0=quiet, 1=info, 2=debug, 3=trace). Default: 1
save_artifacts – Whether to save binary artifacts (models, transformers). Default: True
save_charts – Whether to save charts and visual outputs. Default: True
plots_visible – Whether to display plots interactively. Default: False
random_state – Random seed for reproducibility. Default: None (no seeding)
**runner_kwargs – Additional PipelineRunner parameters. See PipelineRunner.__init__ for full list. Common options: - workspace_path: Workspace root directory - continue_on_error: Whether to continue on step failures - show_spinner: Whether to show progress spinners - log_file: Whether to write logs to disk - log_format: Output format (“pretty”, “minimal”, “json”) - show_progress_bar: Whether to show progress bars - max_generation_count: Max pipeline combinations (for generators)
- Returns:
- predictions: Predictions object with all pipeline results
per_dataset: Dictionary with per-dataset execution details
best: Best prediction entry (convenience accessor)
best_score: Best model’s primary test score
best_rmse, best_r2, best_accuracy: Score shortcuts
Use
result.top(n=5)to get top N predictions, orresult.export("path.n4a")to export the best model.- Return type:
RunResult containing
- Raises:
ValueError – If pipeline or dataset format is invalid.
FileNotFoundError – If pipeline config or dataset path doesn’t exist.
Examples
Simple usage with list of steps:
>>> import nirs4all >>> from sklearn.preprocessing import MinMaxScaler >>> from sklearn.cross_decomposition import PLSRegression >>> >>> result = nirs4all.run( ... pipeline=[MinMaxScaler(), PLSRegression(10)], ... dataset="sample_data/regression", ... verbose=1 ... ) >>> print(f"Best RMSE: {result.best_rmse:.4f}")
With cross-validation and multiple models:
>>> from sklearn.model_selection import ShuffleSplit >>> >>> result = nirs4all.run( ... pipeline=[ ... MinMaxScaler(), ... ShuffleSplit(n_splits=3), ... {"model": PLSRegression(10)} ... ], ... dataset="sample_data/regression", ... name="PLS_experiment", ... verbose=2, ... save_artifacts=True ... )
Multiple pipelines executed independently:
>>> pipeline_pls = [MinMaxScaler(), PLSRegression(10)] >>> pipeline_rf = [StandardScaler(), RandomForestRegressor()] >>> >>> result = nirs4all.run( ... pipeline=[pipeline_pls, pipeline_rf], # Two independent pipelines ... dataset="sample_data/regression", ... verbose=1 ... ) >>> print(f"Total configs: {result.num_predictions}")
Cartesian product of pipelines × datasets:
>>> pipelines = [pipeline1, pipeline2, pipeline3] >>> datasets = [dataset_a, dataset_b] >>> >>> # Runs 6 combinations: p1×da, p1×db, p2×da, p2×db, p3×da, p3×db >>> result = nirs4all.run( ... pipeline=pipelines, ... dataset=datasets, ... verbose=1 ... )
Using a session for multiple runs:
>>> with nirs4all.session(verbose=1) as s: ... r1 = nirs4all.run(pipeline1, data, session=s) ... r2 = nirs4all.run(pipeline2, data, session=s) ... print(f"Pipeline 1: {r1.best_score:.4f}") ... print(f"Pipeline 2: {r2.best_score:.4f}")
Export the best model:
>>> result = nirs4all.run(pipeline, dataset) >>> result.export("exports/best_model.n4a")
See also
nirs4all.predict(): Make predictions with a trained modelnirs4all.explain(): Generate SHAP explanationsnirs4all.session(): Create execution session for resource reusenirs4all.PipelineRunner: Direct runner access for advanced use