nirs4all.api.predict module

Module-level predict() function for nirs4all.

This module provides a simple interface for making predictions with trained nirs4all models. It wraps PipelineRunner.predict() with ergonomic defaults.

Two prediction paths are supported:

Store-based (preferred): nirs4all.predict(chain_id="abc", data=X) replays a stored chain directly from the DuckDB workspace.
Model-based (legacy): nirs4all.predict(model="model.n4a", data=X) resolves via PredictionResolver / BundleLoader.

Example

>>> import nirs4all
>>> result = nirs4all.predict(
...     model="exports/best_model.n4a",
...     data=X_new,
...     verbose=1
... )
>>> print(f"Predictions shape: {result.shape}")

Make predictions with a trained model on new data.

This function provides a simple interface for running inference with trained nirs4all pipelines.

Two prediction paths are supported:

Store-based (preferred) – pass chain_id together with a raw numpy array for data:

>>> result = nirs4all.predict(chain_id="abc123", data=X_new)

Model-based (legacy) – pass model together with data:

>>> result = nirs4all.predict(model="exports/model.n4a", data=X_new)

Parameters:

model – Trained model specification. Can be: - Prediction dict from result.best or result.top() - Path to exported bundle: "exports/model.n4a" - Path to pipeline config directory Mutually exclusive with chain_id.
data – Data to predict on. Can be: - Path to data folder: "new_data/" - Numpy array: X_new (n_samples, n_features) - Tuple: (X,) or (X, y) for evaluation - Dict: {"X": X, "metadata": meta} - SpectroDataset instance
chain_id – Chain identifier in the workspace DuckDB store. When provided, uses the fast store-based replay path. Mutually exclusive with model.
workspace_path – Workspace root directory. Required when using chain_id outside a session. Ignored when a session is provided (the session’s workspace is used instead).
name – Name for the prediction dataset (for logging). Default: “prediction_dataset”
all_predictions – If True, return predictions from all folds. If False (default), return single aggregated prediction.
session – Optional Session for resource reuse. If provided, uses the session’s runner.
verbose – Verbosity level (0=quiet, 1=info, 2=debug). Default: 0
**runner_kwargs – Additional PipelineRunner parameters. Common options: plots_visible

Returns:

y_pred: Predicted values array (n_samples,)
- metadata: Additional prediction metadata
- model_name: Name of the model used
- preprocessing_steps: List of preprocessing steps applied

Use result.to_dataframe() for pandas DataFrame output.

Return type:

PredictResult containing

Raises:

ValueError – If neither model nor chain_id is provided, or if both are provided.
FileNotFoundError – If model bundle or data path doesn’t exist.

Examples

Predict from a stored chain (preferred):

>>> import nirs4all
>>> result = nirs4all.predict(chain_id="abc123", data=X_new)

Predict from an exported bundle:

>>> result = nirs4all.predict(
...     model="exports/wheat_model.n4a",
...     data=X_new
... )

Predict using a result from a previous run:

>>> train_result = nirs4all.run(pipeline, train_data)
>>> pred_result = nirs4all.predict(
...     model=train_result.best,
...     data=X_test
... )