nirs4all.api.predict module

Module-level predict() function for nirs4all.

This module provides a simple interface for making predictions with trained nirs4all models. It wraps PipelineRunner.predict() with ergonomic defaults.

Two prediction paths are supported:

  1. Store-based (preferred): nirs4all.predict(chain_id="abc", data=X) replays a stored chain directly from the DuckDB workspace.

  2. Model-based (legacy): nirs4all.predict(model="model.n4a", data=X) resolves via PredictionResolver / BundleLoader.

Example

>>> import nirs4all
>>> result = nirs4all.predict(
...     model="exports/best_model.n4a",
...     data=X_new,
...     verbose=1
... )
>>> print(f"Predictions shape: {result.shape}")
nirs4all.api.predict.predict(model: Dict[str, Any] | str | Path | None = None, data: str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs | None = None, *, chain_id: str | None = None, workspace_path: str | Path | None = None, name: str = 'prediction_dataset', all_predictions: bool = False, session: Session | None = None, verbose: int = 0, **runner_kwargs: Any) PredictResult[source]

Make predictions with a trained model on new data.

This function provides a simple interface for running inference with trained nirs4all pipelines.

Two prediction paths are supported:

Store-based (preferred) – pass chain_id together with a raw numpy array for data:

>>> result = nirs4all.predict(chain_id="abc123", data=X_new)

Model-based (legacy) – pass model together with data:

>>> result = nirs4all.predict(model="exports/model.n4a", data=X_new)
Parameters:
  • model – Trained model specification. Can be: - Prediction dict from result.best or result.top() - Path to exported bundle: "exports/model.n4a" - Path to pipeline config directory Mutually exclusive with chain_id.

  • data – Data to predict on. Can be: - Path to data folder: "new_data/" - Numpy array: X_new (n_samples, n_features) - Tuple: (X,) or (X, y) for evaluation - Dict: {"X": X, "metadata": meta} - SpectroDataset instance

  • chain_id – Chain identifier in the workspace DuckDB store. When provided, uses the fast store-based replay path. Mutually exclusive with model.

  • workspace_path – Workspace root directory. Required when using chain_id outside a session. Ignored when a session is provided (the session’s workspace is used instead).

  • name – Name for the prediction dataset (for logging). Default: “prediction_dataset”

  • all_predictions – If True, return predictions from all folds. If False (default), return single aggregated prediction.

  • session – Optional Session for resource reuse. If provided, uses the session’s runner.

  • verbose – Verbosity level (0=quiet, 1=info, 2=debug). Default: 0

  • **runner_kwargs – Additional PipelineRunner parameters. Common options: plots_visible

Returns:

  • y_pred: Predicted values array (n_samples,)
    • metadata: Additional prediction metadata

    • model_name: Name of the model used

    • preprocessing_steps: List of preprocessing steps applied

Use result.to_dataframe() for pandas DataFrame output.

Return type:

PredictResult containing

Raises:
  • ValueError – If neither model nor chain_id is provided, or if both are provided.

  • FileNotFoundError – If model bundle or data path doesn’t exist.

Examples

Predict from a stored chain (preferred):

>>> import nirs4all
>>> result = nirs4all.predict(chain_id="abc123", data=X_new)

Predict from an exported bundle:

>>> result = nirs4all.predict(
...     model="exports/wheat_model.n4a",
...     data=X_new
... )

Predict using a result from a previous run:

>>> train_result = nirs4all.run(pipeline, train_data)
>>> pred_result = nirs4all.predict(
...     model=train_result.best,
...     data=X_test
... )

See also