nirs4all.api package

Submodules

Module contents

NIRS4All API Module - High-level functional interface.

This module provides the primary public API for nirs4all, offering simple function-based entry points that wrap the underlying PipelineRunner.

Public API:
run(pipeline, dataset, **kwargs) -> RunResult

Execute a training pipeline on a dataset.

predict(model, data, **kwargs) -> PredictResult

Make predictions with a trained model.

explain(model, data, **kwargs) -> ExplainResult

Generate SHAP explanations for model predictions.

retrain(source, data, **kwargs) -> RunResult

Retrain a pipeline on new data.

session(**kwargs) -> Session

Create an execution session for resource reuse.

generate(n_samples, **kwargs) -> SpectroDataset | (X, y)

Generate synthetic NIRS data for testing and research.

Example

>>> import nirs4all
>>> from sklearn.preprocessing import MinMaxScaler
>>> from sklearn.cross_decomposition import PLSRegression
>>>
>>> result = nirs4all.run(
...     pipeline=[MinMaxScaler(), PLSRegression(10)],
...     dataset="sample_data/regression",
...     verbose=1
... )
>>> print(f"Best RMSE: {result.best_rmse:.4f}")

For more examples, see the examples/Q40_new_api.py file.

class nirs4all.api.ExplainResult(shap_values: ~typing.Any, feature_names: ~typing.List[str] | None = None, base_value: float | ~numpy.ndarray | None = None, visualizations: ~typing.Dict[str, ~pathlib.Path] = <factory>, explainer_type: str = 'auto', model_name: str = '', n_samples: int = 0)[source]

Bases: object

Result from nirs4all.explain().

Wraps SHAP explanation outputs with visualization helpers and accessors.

shap_values

SHAP values array or Explanation object.

Type:

Any

feature_names

Names/labels of features explained.

Type:

List[str] | None

base_value

Expected value (baseline prediction).

Type:

float | numpy.ndarray | None

visualizations

Paths to generated visualization files.

Type:

Dict[str, pathlib.Path]

explainer_type

Type of SHAP explainer used.

Type:

str

model_name

Name of the explained model.

Type:

str

n_samples

Number of samples explained.

Type:

int

Properties:

values: Raw SHAP values array. shape: Shape of SHAP values array. mean_abs_shap: Mean absolute SHAP values per feature. top_features: Feature names sorted by importance.

get_feature_importance()[source]

Get feature importance ranking.

get_sample_explanation(idx)[source]

Get explanation for a single sample.

to_dataframe()[source]

Get SHAP values as DataFrame.

Example

>>> result = nirs4all.explain(model, X_test)
>>> print(f"Top features: {result.top_features[:5]}")
>>> importance = result.get_feature_importance()
__post_init__()[source]

Extract metadata from shap_values if available.

__repr__() str[source]

String representation.

__str__() str[source]

User-friendly string representation.

base_value: float | ndarray | None = None
explainer_type: str = 'auto'
feature_names: List[str] | None = None
get_feature_importance(top_n: int | None = None, normalize: bool = False) Dict[str, float][source]

Get feature importance ranking.

Parameters:
  • top_n – If provided, return only top N features.

  • normalize – If True, normalize values to sum to 1.

Returns:

Dictionary mapping feature names to importance values.

get_sample_explanation(idx: int) Dict[str, float][source]

Get SHAP explanation for a single sample.

Parameters:

idx – Sample index.

Returns:

Dictionary mapping feature names to SHAP values for that sample.

property mean_abs_shap: ndarray

Get mean absolute SHAP values per feature.

Returns:

1D array of mean |SHAP| values, one per feature.

model_name: str = ''
n_samples: int = 0
shap_values: Any
property shape: tuple

Get shape of SHAP values array.

to_dataframe(include_feature_names: bool = True)[source]

Get SHAP values as pandas DataFrame.

Parameters:

include_feature_names – If True, use feature names as columns.

Returns:

pandas DataFrame with SHAP values.

Raises:

ImportError – If pandas is not available.

property top_features: List[str]

Get feature names sorted by importance (descending).

Returns:

List of feature names, most important first. Returns indices as strings if feature_names not available.

property values: ndarray

Get raw SHAP values array.

Returns:

Numpy array of SHAP values (n_samples, n_features).

visualizations: Dict[str, Path]
class nirs4all.api.PredictResult(y_pred: ~numpy.ndarray, metadata: ~typing.Dict[str, ~typing.Any] = <factory>, sample_indices: ~numpy.ndarray | None = None, model_name: str = '', preprocessing_steps: ~typing.List[str] = <factory>)[source]

Bases: object

Result from nirs4all.predict().

Wraps prediction outputs with convenient accessors and conversion methods.

y_pred

Predicted values array (n_samples,) or (n_samples, n_outputs).

Type:

numpy.ndarray

metadata

Additional prediction metadata (uncertainty, timing, etc.).

Type:

Dict[str, Any]

sample_indices

Optional indices of predicted samples.

Type:

numpy.ndarray | None

model_name

Name of the model used for prediction.

Type:

str

preprocessing_steps

List of preprocessing steps applied.

Type:

List[str]

Properties:

values: Alias for y_pred (for consistency). shape: Shape of prediction array. is_multioutput: True if predictions have multiple outputs.

to_numpy()[source]

Get predictions as numpy array.

to_list()[source]

Get predictions as Python list.

to_dataframe()[source]

Get predictions as pandas DataFrame.

flatten()[source]

Get flattened 1D predictions.

Example

>>> result = nirs4all.predict(model, X_new)
>>> print(f"Predictions shape: {result.shape}")
>>> df = result.to_dataframe()
__len__() int[source]

Return number of predictions.

__post_init__()[source]

Ensure y_pred is a numpy array.

__repr__() str[source]

String representation.

__str__() str[source]

User-friendly string representation.

flatten() ndarray[source]

Get flattened 1D predictions.

Returns:

1D numpy array of predictions.

property is_multioutput: bool

Check if predictions have multiple outputs.

metadata: Dict[str, Any]
model_name: str = ''
preprocessing_steps: List[str]
sample_indices: ndarray | None = None
property shape: tuple

Get shape of prediction array.

to_dataframe(include_indices: bool = True)[source]

Get predictions as pandas DataFrame.

Parameters:

include_indices – If True and sample_indices available, include as column.

Returns:

pandas DataFrame with predictions.

Raises:

ImportError – If pandas is not available.

to_list() List[float][source]

Get predictions as Python list.

Returns:

List of prediction values (flattened if 2D).

to_numpy() ndarray[source]

Get predictions as numpy array.

Returns:

Numpy array of predictions.

property values: ndarray

Get prediction values (alias for y_pred).

y_pred: ndarray
class nirs4all.api.RunResult(predictions: Predictions, per_dataset: Dict[str, Any], _runner: PipelineRunner | None = None)[source]

Bases: object

Result from nirs4all.run().

Provides convenient access to predictions, best model, and artifacts. Wraps the raw (predictions, per_dataset) tuple returned by PipelineRunner.run().

predictions

Predictions object containing all pipeline results.

Type:

Predictions

per_dataset

Dictionary with per-dataset execution details.

Type:

Dict[str, Any]

Properties:

best: Best prediction entry by default ranking. best_score: Best model’s primary test score. best_rmse: Best model’s RMSE (regression). best_r2: Best model’s R² (regression). best_accuracy: Best model’s accuracy (classification). artifacts_path: Path to run artifacts directory. num_predictions: Total number of predictions stored.

top(n)[source]

Get top N predictions by ranking.

export(path)[source]

Export best model to .n4a bundle.

filter(**kwargs)[source]

Filter predictions by criteria.

get_datasets()[source]

Get list of unique dataset names.

get_models()[source]

Get list of unique model names.

Example

>>> result = nirs4all.run(pipeline, dataset)
>>> print(f"Best RMSE: {result.best_rmse:.4f}")
>>> print(f"Best R²: {result.best_r2:.4f}")
>>> result.export("exports/best_model.n4a")
__repr__() str[source]

String representation.

__str__() str[source]

User-friendly string representation.

property artifacts_path: Path | None

Get path to run artifacts directory.

Returns:

Path to the current run directory, or None if not available.

property best: Dict[str, Any]

Get best prediction entry by default ranking.

Returns:

Dictionary containing best model’s metrics, name, and configuration. Empty dict if no predictions available.

property best_accuracy: float

Get best model’s accuracy score (for classification).

Returns:

Accuracy value or NaN if unavailable.

property best_r2: float

Get best model’s R² score.

Looks for ‘r2’ in scores dict.

Returns:

R² value or NaN if unavailable.

property best_rmse: float

Get best model’s RMSE score.

Looks for ‘rmse’ in scores dict, then falls back to computing from y arrays.

Returns:

RMSE value or NaN if unavailable.

property best_score: float

Get best model’s primary test score.

Returns:

The test_score value from best prediction, or NaN if unavailable.

export(output_path: str | Path, format: str = 'n4a', source: Dict[str, Any] | None = None) Path[source]

Export a model to bundle.

Parameters:
  • output_path – Path for the exported bundle file.

  • format – Export format (‘n4a’ or ‘n4a.py’).

  • source – Prediction dict to export. If None, exports best model.

Returns:

Path to the exported bundle file.

Raises:
  • RuntimeError – If runner reference is not available.

  • ValueError – If no predictions available and source not provided.

export_model(output_path: str | Path, source: Dict[str, Any] | None = None, format: str | None = None, fold: int | None = None) Path[source]

Export only the model artifact (lightweight).

Unlike export() which creates a full bundle, this exports just the model.

Parameters:
  • output_path – Path for the output model file.

  • source – Prediction dict to export. If None, exports best model.

  • format – Model format (inferred from extension if None).

  • fold – Fold index to export (default: fold 0).

Returns:

Path to the exported model file.

Raises:

RuntimeError – If runner reference is not available.

filter(**kwargs) List[Dict[str, Any]][source]

Filter predictions by criteria.

Parameters:

**kwargs – Filter criteria passed to predictions.filter_predictions(). Supported kwargs include: - dataset_name: Filter by dataset name - model_name: Filter by model name - partition: Filter by partition (‘train’, ‘val’, ‘test’) - fold_id: Filter by fold ID - step_idx: Filter by pipeline step index - branch_id: Filter by branch ID - load_arrays: If True, load actual arrays (default: True)

Returns:

List of matching prediction dictionaries.

get_datasets() List[str][source]

Get list of unique dataset names.

Returns:

List of dataset names in predictions.

get_models() List[str][source]

Get list of unique model names.

Returns:

List of model names in predictions.

property num_predictions: int

Get total number of predictions stored.

Returns:

Number of prediction entries.

per_dataset: Dict[str, Any]
predictions: Predictions
summary() str[source]

Get a summary string of the run result.

Returns:

Multi-line summary string with key metrics.

top(n: int = 5, **kwargs) List[Dict[str, Any]] | Dict[tuple, List[Dict[str, Any]]][source]

Get top N predictions by ranking.

Parameters:
  • n – Number of top predictions to return. When group_by is used, this means top N per group (e.g., top 3 per dataset).

  • **kwargs

    Additional arguments passed to predictions.top(). Supported kwargs include: - rank_metric: Metric to rank by (default: uses record’s metric) - rank_partition: Partition to rank on (default: “val”) - display_partition: Partition for display metrics (default: “test”) - aggregate_partitions: If True, include train/val/test data - ascending: Sort order (None = infer from metric) - group_by: Group predictions by column(s). Returns top N per group.

    Each result includes ‘group_key’ for easy filtering.

    • return_grouped: If True with group_by, return dict of group->results instead of flat list. Default: False.

Returns:

List of prediction dicts,

ranked by score. With group_by, returns top N per group as flat list.

  • If return_grouped=True: Dict mapping group keys to lists of predictions.

Return type:

  • If return_grouped=False (default)

Examples

>>> # Top 5 overall
>>> result.top(5)
>>>
>>> # Top 3 per dataset (flat list)
>>> top_per_ds = result.top(3, group_by='dataset_name')
>>> ds1 = [r for r in top_per_ds if r['group_key'] == ('my_dataset',)]
>>>
>>> # Top 3 per dataset (grouped dict)
>>> grouped = result.top(3, group_by='dataset_name', return_grouped=True)
>>> for key, results in grouped.items():
...     print(f"{key}: {len(results)} results")
>>>
>>> # Multi-column grouping: top 2 per (dataset, model) combination
>>> top_per_combo = result.top(2, group_by=['dataset_name', 'model_name'])
>>> # Group keys are tuples: ('wheat', 'PLSRegression'), ('corn', 'RandomForest')
>>> for r in top_per_combo:
...     dataset, model = r['group_key']
...     print(f"{dataset}/{model}: {r['test_score']:.4f}")
validate(check_nan_metrics: bool = True, check_empty: bool = True, raise_on_failure: bool = True, nan_threshold: float = 0.0) Dict[str, Any][source]

Validate the run result for common issues.

Checks for NaN values in metrics, empty predictions, and other issues that might indicate problems with the pipeline execution.

Parameters:
  • check_nan_metrics – If True, check for NaN values in metrics.

  • check_empty – If True, check for empty predictions.

  • raise_on_failure – If True, raise ValueError on validation failure.

  • nan_threshold – Maximum allowed ratio of predictions with NaN metrics (0.0 = none allowed).

Returns:

  • valid: True if all checks passed.

  • issues: List of issue descriptions.

  • nan_count: Number of predictions with NaN metrics.

  • total_count: Total number of predictions.

Return type:

Dictionary with validation results

Raises:

ValueError – If raise_on_failure=True and validation fails.

Example

>>> result = nirs4all.run(pipeline, dataset)
>>> result.validate()  # Raises if issues found
>>> # Or check without raising
>>> report = result.validate(raise_on_failure=False)
>>> if not report['valid']:
...     print(f"Issues: {report['issues']}")
class nirs4all.api.Session(pipeline: List[Any] | None = None, name: str = '', **runner_kwargs: Any)[source]

Bases: object

Execution session for resource reuse and stateful pipeline management.

A session can be used in two modes:

  1. Resource sharing mode (no pipeline): Share a PipelineRunner across multiple nirs4all.run() calls.

  2. Stateful pipeline mode (with pipeline): Manage a single pipeline’s lifecycle: train, predict, save, load.

name

Session/pipeline name for identification.

pipeline

Pipeline definition (if in stateful mode).

status

Current session status (‘initialized’, ‘trained’, ‘error’).

is_trained

Whether the pipeline has been trained.

runner

The shared PipelineRunner instance.

workspace_path

Path to the workspace directory.

Example (resource sharing):
>>> with nirs4all.session(verbose=1) as s:
...     result1 = nirs4all.run(pipeline1, data1, session=s)
...     result2 = nirs4all.run(pipeline2, data2, session=s)
Example (stateful pipeline):
>>> session = nirs4all.Session(pipeline=pipeline, name="MyModel")
>>> result = session.run("sample_data/regression")
>>> predictions = session.predict(new_data)
>>> session.save("exports/my_model.n4a")
__enter__() Session[source]

Enter the session context.

__exit__(exc_type: Any, exc_val: Any, exc_tb: Any) None[source]

Exit the session context and clean up resources.

__repr__() str[source]

Return string representation of session.

close() None[source]

Clean up session resources.

Called automatically when exiting a context manager block.

property history: List[Dict[str, Any]]

Get run history for this session.

property is_trained: bool

Check if pipeline has been trained or loaded from a bundle.

property name: str

Get session name.

property pipeline: List[Any] | None

Get pipeline definition.

predict(dataset: str | Path | Any, **kwargs: Any) PredictResult[source]

Make predictions using the trained pipeline.

Parameters:
  • dataset – Data to predict on. Can be: - Path to data folder - Numpy array X - Dict with ‘X’ key

  • **kwargs – Additional arguments for prediction.

Returns:

PredictResult with predictions.

Raises:

ValueError – If session has not been trained.

retrain(dataset: str | Path | Any, mode: str = 'full', **kwargs: Any) RunResult[source]

Retrain the pipeline on new data.

Parameters:
  • dataset – New dataset to train on.

  • mode – Retrain mode (‘full’, ‘transfer’, ‘finetune’).

  • **kwargs – Additional arguments for retraining.

Returns:

RunResult from retraining.

Raises:

ValueError – If session has not been trained.

run(dataset: str | Path | Any, *, plots_visible: bool = False, **kwargs: Any) RunResult[source]

Train the session’s pipeline on a dataset.

Parameters:
  • dataset – Dataset to train on. Can be: - Path to data folder: “sample_data/regression” - Numpy arrays: (X, y) - Dict: {“X”: X, “y”: y}

  • plots_visible – Whether to show plots during training.

  • **kwargs – Additional arguments passed to runner.run().

Returns:

RunResult with predictions and metrics.

Raises:

ValueError – If no pipeline was provided to the session.

property runner: PipelineRunner

Get or create the shared PipelineRunner instance.

The runner is created lazily on first access.

Returns:

The shared PipelineRunner instance.

save(path: str | Path) Path[source]

Save the trained session to a bundle file.

Parameters:

path – Output path for the .n4a bundle file.

Returns:

Path to the saved bundle file.

Raises:

ValueError – If session has not been trained.

property status: str

Get current session status.

Returns:

‘initialized’, ‘trained’, ‘error’

Return type:

One of

property workspace_path: Path | None

Get the workspace path from the runner.

Returns:

Path to the workspace directory, or None if runner not created.

nirs4all.api.explain(model: Dict[str, Any] | str | Path, data: str | Path | ndarray | Dict[str, Any] | SpectroDataset | DatasetConfigs, *, name: str = 'explain_dataset', session: Session | None = None, verbose: int = 1, plots_visible: bool = True, n_samples: int | None = None, explainer_type: str = 'auto', **shap_params: Any) ExplainResult[source]

Generate SHAP explanations for a trained model.

This function provides a simple interface for computing SHAP values to explain model predictions. It supports various SHAP explainer types and generates visualizations.

Parameters:
  • model – Trained model specification. Can be: - Prediction dict from result.best or result.top() - Path to exported bundle: "exports/model.n4a" - Path to pipeline config directory

  • data – Data to explain. Can be: - Path to data folder: "test_data/" - Numpy array: X_test (n_samples, n_features) - Dict: {"X": X, "metadata": meta} - SpectroDataset instance

  • name – Name for the explanation dataset (for logging). Default: “explain_dataset”

  • session – Optional Session for resource reuse. If provided, uses the session’s runner.

  • verbose – Verbosity level (0=quiet, 1=info, 2=debug). Default: 1

  • plots_visible – Whether to display plots interactively. Default: True

  • n_samples – Number of background samples for SHAP. If None, uses default (typically 100-200).

  • explainer_type – SHAP explainer type. Options: - “auto”: Automatically select best explainer - “tree”: TreeExplainer (for tree-based models) - “kernel”: KernelExplainer (model-agnostic) - “deep”: DeepExplainer (for neural networks) - “linear”: LinearExplainer (for linear models) Default: “auto”

  • **shap_params – Additional SHAP configuration parameters. Common options: - feature_names: List of feature names - background_samples: Number of background samples - max_display: Max features to show in plots

Returns:

  • shap_values: SHAP values array or Explanation object
    • feature_names: Names/labels of features

    • base_value: Expected value (baseline prediction)

    • visualizations: Paths to generated plots

    • mean_abs_shap: Mean absolute SHAP per feature

    • top_features: Features sorted by importance

Use result.get_feature_importance() for importance ranking, or result.to_dataframe() for pandas DataFrame output.

Return type:

ExplainResult containing

Raises:

Examples

Explain an exported model:

>>> import nirs4all
>>>
>>> result = nirs4all.explain(
...     model="exports/wheat_model.n4a",
...     data=X_test
... )
>>> print(f"Top 5 features: {result.top_features[:5]}")
>>> importance = result.get_feature_importance(top_n=10)

Explain using a result from a previous run:

>>> # Training
>>> train_result = nirs4all.run(pipeline, train_data)
>>>
>>> # Explain best model
>>> explain_result = nirs4all.explain(
...     model=train_result.best,
...     data=X_test,
...     explainer_type="kernel"
... )

Get SHAP values as DataFrame:

>>> result = nirs4all.explain(model, data)
>>> df = result.to_dataframe()
>>> df.to_csv("shap_values.csv")

Get per-sample explanations:

>>> result = nirs4all.explain(model, data)
>>> sample_0_shap = result.get_sample_explanation(0)
>>> for feature, value in list(sample_0_shap.items())[:5]:
...     print(f"{feature}: {value:.4f}")

See also

nirs4all.api.load_session(path: str | Path) Session[source]

Load a session from a saved bundle file.

Parameters:

path – Path to .n4a bundle file.

Returns:

Session ready for prediction.

Example

>>> session = nirs4all.load_session("exports/model.n4a")
>>> predictions = session.predict(new_data)
nirs4all.api.predict(model: Dict[str, Any] | str | Path, data: str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs, *, name: str = 'prediction_dataset', all_predictions: bool = False, session: Session | None = None, verbose: int = 0, **runner_kwargs: Any) PredictResult[source]

Make predictions with a trained model on new data.

This function provides a simple interface for running inference with trained nirs4all pipelines. The model can be specified as a prediction dict from a previous run, or as a path to an exported bundle.

Parameters:
  • model – Trained model specification. Can be: - Prediction dict from result.best or result.top() - Path to exported bundle: "exports/model.n4a" - Path to pipeline config directory

  • data – Data to predict on. Can be: - Path to data folder: "new_data/" - Numpy array: X_new (n_samples, n_features) - Tuple: (X,) or (X, y) for evaluation - Dict: {"X": X, "metadata": meta} - SpectroDataset instance

  • name – Name for the prediction dataset (for logging). Default: “prediction_dataset”

  • all_predictions – If True, return predictions from all folds. If False (default), return single aggregated prediction.

  • session – Optional Session for resource reuse. If provided, uses the session’s runner.

  • verbose – Verbosity level (0=quiet, 1=info, 2=debug). Default: 0

  • **runner_kwargs – Additional PipelineRunner parameters. Common options: workspace_path, plots_visible

Returns:

  • y_pred: Predicted values array (n_samples,)
    • metadata: Additional prediction metadata

    • model_name: Name of the model used

    • preprocessing_steps: List of preprocessing steps applied

Use result.to_dataframe() for pandas DataFrame output.

Return type:

PredictResult containing

Raises:

Examples

Predict from an exported bundle:

>>> import nirs4all
>>>
>>> result = nirs4all.predict(
...     model="exports/wheat_model.n4a",
...     data=X_new
... )
>>> print(f"Predictions: {result.values[:5]}")

Predict using a result from a previous run:

>>> # Training
>>> train_result = nirs4all.run(pipeline, train_data)
>>>
>>> # Prediction with best model
>>> pred_result = nirs4all.predict(
...     model=train_result.best,
...     data=X_test
... )

Get all fold predictions:

>>> result = nirs4all.predict(
...     model="exports/model.n4a",
...     data=X_new,
...     all_predictions=True
... )
>>> print(f"Shape: {result.shape}")

Convert to DataFrame:

>>> result = nirs4all.predict(model, data)
>>> df = result.to_dataframe()
>>> df.to_csv("predictions.csv")

See also

nirs4all.api.retrain(source: Dict[str, Any] | str | Path, data: str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs, *, mode: str = 'full', name: str = 'retrain_dataset', new_model: Any | None = None, epochs: int | None = None, session: Session | None = None, verbose: int = 1, save_artifacts: bool = True, **kwargs: Any) RunResult[source]

Retrain a pipeline on new data.

This function enables retraining trained pipelines with various modes, allowing for full retraining, transfer learning, or fine-tuning.

Parameters:
  • source – Pipeline source to retrain from. Can be: - Prediction dict from result.best or result.top() - Path to exported bundle: "exports/model.n4a" - Path to pipeline config directory

  • data – New dataset to train on. Can be: - Path to data folder: "new_data/" - Numpy arrays: (X, y) - Dict: {"X": X, "y": y} - SpectroDataset instance

  • mode – Retrain mode. Options: - “full”: Train everything from scratch (same pipeline structure) - “transfer”: Use existing preprocessing, train new model - “finetune”: Continue training existing model Default: “full”

  • name – Name for the retrain dataset (for logging). Default: “retrain_dataset”

  • new_model – Optional new model for transfer mode. Replaces the original model while keeping preprocessing.

  • epochs – Optional number of epochs for fine-tuning neural networks.

  • session – Optional Session for resource reuse. If provided, uses the session’s runner.

  • verbose – Verbosity level (0=quiet, 1=info, 2=debug). Default: 1

  • save_artifacts – Whether to save retrained artifacts. Default: True

  • **kwargs – Additional retraining parameters: - learning_rate: Learning rate for fine-tuning - freeze_layers: List of layers to freeze during fine-tuning - step_modes: Per-step mode overrides (advanced)

Returns:

  • predictions: Predictions from the retrained pipeline

  • per_dataset: Per-dataset execution details

  • best: Best prediction entry

  • best_score: Best model’s primary test score

Return type:

RunResult containing

Raises:

Examples

Full retrain on new data:

>>> import nirs4all
>>>
>>> # Original training
>>> original = nirs4all.run(pipeline, train_data)
>>>
>>> # Retrain on new data with same pipeline
>>> retrained = nirs4all.retrain(
...     source=original.best,
...     data=new_train_data,
...     mode="full"
... )
>>> print(f"Original: {original.best_rmse:.4f}")
>>> print(f"Retrained: {retrained.best_rmse:.4f}")

Transfer learning with new model:

>>> from sklearn.ensemble import RandomForestRegressor
>>>
>>> result = nirs4all.retrain(
...     source="exports/pls_model.n4a",
...     data=new_data,
...     mode="transfer",
...     new_model=RandomForestRegressor(n_estimators=100)
... )

Fine-tune a neural network:

>>> result = nirs4all.retrain(
...     source="exports/nn_model.n4a",
...     data=new_data,
...     mode="finetune",
...     epochs=10,
...     learning_rate=0.0001
... )

Retrain from an exported bundle:

>>> result = nirs4all.retrain(
...     source="exports/wheat_model.n4a",
...     data="new_wheat_data/",
...     mode="full",
...     verbose=2
... )
>>> result.export("exports/retrained_model.n4a")

See also

nirs4all.api.run(pipeline: List[Any] | Dict[str, Any] | str | Path | PipelineConfigs | List[List[Any] | Dict[str, Any] | str | Path | PipelineConfigs], dataset: str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs | List[str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs], *, name: str = '', session: Session | None = None, verbose: int = 1, save_artifacts: bool = True, save_charts: bool = True, plots_visible: bool = False, random_state: int | None = None, **runner_kwargs: Any) RunResult[source]

Execute a training pipeline on a dataset.

This is the primary entry point for training ML pipelines on NIRS data. It provides a simpler interface than creating PipelineRunner and config objects directly.

Parameters:
  • pipeline

    Pipeline definition. Can be: - List of steps (most common): [MinMaxScaler(), PLSRegression(10)] - Dict with steps: {"steps": [...], "name": "my_pipeline"} - Path to YAML/JSON config file: "configs/my_pipeline.yaml" - PipelineConfigs object (backward compatibility) - List of pipelines: [pipeline1, pipeline2, ...] - each

    pipeline is executed independently (cartesian product with datasets)

  • dataset

    Dataset definition. Can be: - Path to data folder: "sample_data/regression" - Numpy arrays: (X, y) or X alone - Dict with arrays: {"X": X, "y": y, "metadata": meta} - SpectroDataset instance - List of SpectroDataset instances (multi-dataset) - DatasetConfigs object (backward compatibility) - List of datasets: [dataset1, dataset2, ...] - each

    dataset is used with each pipeline (cartesian product)

  • name – Optional pipeline name for identification and logging. If not provided, a name will be generated.

  • session – Optional Session object for resource reuse across multiple runs. When provided, shares workspace and configuration.

  • verbose – Verbosity level (0=quiet, 1=info, 2=debug, 3=trace). Default: 1

  • save_artifacts – Whether to save binary artifacts (models, transformers). Default: True

  • save_charts – Whether to save charts and visual outputs. Default: True

  • plots_visible – Whether to display plots interactively. Default: False

  • random_state – Random seed for reproducibility. Default: None (no seeding)

  • **runner_kwargs – Additional PipelineRunner parameters. See PipelineRunner.__init__ for full list. Common options: - workspace_path: Workspace root directory - continue_on_error: Whether to continue on step failures - show_spinner: Whether to show progress spinners - log_file: Whether to write logs to disk - log_format: Output format (“pretty”, “minimal”, “json”) - show_progress_bar: Whether to show progress bars - max_generation_count: Max pipeline combinations (for generators)

Returns:

  • predictions: Predictions object with all pipeline results
    • per_dataset: Dictionary with per-dataset execution details

    • best: Best prediction entry (convenience accessor)

    • best_score: Best model’s primary test score

    • best_rmse, best_r2, best_accuracy: Score shortcuts

Use result.top(n=5) to get top N predictions, or result.export("path.n4a") to export the best model.

Return type:

RunResult containing

Raises:

Examples

Simple usage with list of steps:

>>> import nirs4all
>>> from sklearn.preprocessing import MinMaxScaler
>>> from sklearn.cross_decomposition import PLSRegression
>>>
>>> result = nirs4all.run(
...     pipeline=[MinMaxScaler(), PLSRegression(10)],
...     dataset="sample_data/regression",
...     verbose=1
... )
>>> print(f"Best RMSE: {result.best_rmse:.4f}")

With cross-validation and multiple models:

>>> from sklearn.model_selection import ShuffleSplit
>>>
>>> result = nirs4all.run(
...     pipeline=[
...         MinMaxScaler(),
...         ShuffleSplit(n_splits=3),
...         {"model": PLSRegression(10)}
...     ],
...     dataset="sample_data/regression",
...     name="PLS_experiment",
...     verbose=2,
...     save_artifacts=True
... )

Multiple pipelines executed independently:

>>> pipeline_pls = [MinMaxScaler(), PLSRegression(10)]
>>> pipeline_rf = [StandardScaler(), RandomForestRegressor()]
>>>
>>> result = nirs4all.run(
...     pipeline=[pipeline_pls, pipeline_rf],  # Two independent pipelines
...     dataset="sample_data/regression",
...     verbose=1
... )
>>> print(f"Total configs: {result.num_predictions}")

Cartesian product of pipelines × datasets:

>>> pipelines = [pipeline1, pipeline2, pipeline3]
>>> datasets = [dataset_a, dataset_b]
>>>
>>> # Runs 6 combinations: p1×da, p1×db, p2×da, p2×db, p3×da, p3×db
>>> result = nirs4all.run(
...     pipeline=pipelines,
...     dataset=datasets,
...     verbose=1
... )

Using a session for multiple runs:

>>> with nirs4all.session(verbose=1) as s:
...     r1 = nirs4all.run(pipeline1, data, session=s)
...     r2 = nirs4all.run(pipeline2, data, session=s)
...     print(f"Pipeline 1: {r1.best_score:.4f}")
...     print(f"Pipeline 2: {r2.best_score:.4f}")

Export the best model:

>>> result = nirs4all.run(pipeline, dataset)
>>> result.export("exports/best_model.n4a")

See also

nirs4all.api.session(pipeline: List[Any] | None = None, name: str = '', **kwargs: Any) Generator[Session, None, None][source]

Create an execution session context manager.

This is a convenience function that creates a Session and yields it within a context manager block.

Parameters:
  • pipeline – Optional pipeline definition for stateful mode.

  • name – Name for the session/pipeline.

  • **kwargs – Arguments passed to Session (and ultimately PipelineRunner). Common options: - verbose (int): Verbosity level (0-3). Default: 1 - save_artifacts (bool): Save model artifacts. Default: True - workspace_path (str|Path): Workspace directory. - random_state (int): Random seed for reproducibility.

Yields:

Session – The active session for use within the block.

Example (resource sharing):
>>> with nirs4all.session(verbose=2, save_artifacts=True) as s:
...     r1 = nirs4all.run(pipeline1, data1, session=s)
...     r2 = nirs4all.run(pipeline2, data2, session=s)
...     print(f"PLS: {r1.best_score:.4f}, RF: {r2.best_score:.4f}")
Example (stateful pipeline):
>>> with nirs4all.session(pipeline=my_pipeline, name="Demo") as s:
...     result = s.run("sample_data/regression")
...     print(f"Best score: {result.best_score:.4f}")