nirs4all.api.result module

Result classes for nirs4all API.

These dataclasses wrap the outputs from pipeline execution, prediction, and explanation operations, providing convenient accessor methods.

Classes:

RunResult: Result from nirs4all.run() PredictResult: Result from nirs4all.predict() ExplainResult: Result from nirs4all.explain()

Phase 1 Implementation (v0.6.0):
  • RunResult: Full implementation with best, best_score, top(), export()

  • PredictResult: Full implementation with values, to_dataframe()

  • ExplainResult: Full implementation with values, feature attributions

class nirs4all.api.result.ExplainResult(shap_values: Any, feature_names: list[str] | None = None, base_value: float | ndarray | None = None, visualizations: dict[str, ~pathlib.Path]=<factory>, explainer_type: str = 'auto', model_name: str = '', n_samples: int = 0)[source]

Bases: object

Result from nirs4all.explain().

Wraps SHAP explanation outputs with visualization helpers and accessors.

shap_values

SHAP values array or Explanation object.

Type:

Any

feature_names

Names/labels of features explained.

Type:

list[str] | None

base_value

Expected value (baseline prediction).

Type:

float | numpy.ndarray | None

visualizations

Paths to generated visualization files.

Type:

dict[str, pathlib.Path]

explainer_type

Type of SHAP explainer used.

Type:

str

model_name

Name of the explained model.

Type:

str

n_samples

Number of samples explained.

Type:

int

Properties:

values: Raw SHAP values array. shape: Shape of SHAP values array. mean_abs_shap: Mean absolute SHAP values per feature. top_features: Feature names sorted by importance.

get_feature_importance()[source]

Get feature importance ranking.

get_sample_explanation(idx)[source]

Get explanation for a single sample.

to_dataframe()[source]

Get SHAP values as DataFrame.

Example

>>> result = nirs4all.explain(model, X_test)
>>> print(f"Top features: {result.top_features[:5]}")
>>> importance = result.get_feature_importance()
__post_init__()[source]

Extract metadata from shap_values if available.

__repr__() str[source]

String representation.

__str__() str[source]

User-friendly string representation.

base_value: float | ndarray | None = None
explainer_type: str = 'auto'
feature_names: list[str] | None = None
get_feature_importance(top_n: int | None = None, normalize: bool = False) dict[str, float][source]

Get feature importance ranking.

Parameters:
  • top_n – If provided, return only top N features.

  • normalize – If True, normalize values to sum to 1.

Returns:

Dictionary mapping feature names to importance values.

get_sample_explanation(idx: int) dict[str, float][source]

Get SHAP explanation for a single sample.

Parameters:

idx – Sample index.

Returns:

Dictionary mapping feature names to SHAP values for that sample.

property mean_abs_shap: ndarray

Get mean absolute SHAP values per feature.

Returns:

1D array of mean |SHAP| values, one per feature.

model_name: str = ''
n_samples: int = 0
shap_values: Any
property shape: tuple

Get shape of SHAP values array.

to_dataframe(include_feature_names: bool = True)[source]

Get SHAP values as pandas DataFrame.

Parameters:

include_feature_names – If True, use feature names as columns.

Returns:

pandas DataFrame with SHAP values.

Raises:

ImportError – If pandas is not available.

property top_features: list[str]

Get feature names sorted by importance (descending).

Returns:

List of feature names, most important first. Returns indices as strings if feature_names not available.

property values: ndarray

Get raw SHAP values array.

Returns:

Numpy array of SHAP values (n_samples, n_features).

visualizations: dict[str, Path]
class nirs4all.api.result.PredictResult(y_pred: ndarray, metadata: dict[str, ~typing.Any]=<factory>, sample_indices: ndarray | None = None, model_name: str = '', preprocessing_steps: list[str] = <factory>)[source]

Bases: object

Result from nirs4all.predict().

Wraps prediction outputs with convenient accessors and conversion methods.

y_pred

Predicted values array (n_samples,) or (n_samples, n_outputs).

Type:

numpy.ndarray

metadata

Additional prediction metadata (uncertainty, timing, etc.).

Type:

dict[str, Any]

sample_indices

Optional indices of predicted samples.

Type:

numpy.ndarray | None

model_name

Name of the model used for prediction.

Type:

str

preprocessing_steps

List of preprocessing steps applied.

Type:

list[str]

Properties:

values: Alias for y_pred (for consistency). shape: Shape of prediction array. is_multioutput: True if predictions have multiple outputs.

to_numpy()[source]

Get predictions as numpy array.

to_list()[source]

Get predictions as Python list.

to_dataframe()[source]

Get predictions as pandas DataFrame.

flatten()[source]

Get flattened 1D predictions.

Example

>>> result = nirs4all.predict(model, X_new)
>>> print(f"Predictions shape: {result.shape}")
>>> df = result.to_dataframe()
__len__() int[source]

Return number of predictions.

__post_init__()[source]

Ensure y_pred is a numpy array.

__repr__() str[source]

String representation.

__str__() str[source]

User-friendly string representation.

flatten() ndarray[source]

Get flattened 1D predictions.

Returns:

1D numpy array of predictions.

property is_multioutput: bool

Check if predictions have multiple outputs.

metadata: dict[str, Any]
model_name: str = ''
preprocessing_steps: list[str]
sample_indices: ndarray | None = None
property shape: tuple

Get shape of prediction array.

to_dataframe(include_indices: bool = True)[source]

Get predictions as pandas DataFrame.

Parameters:

include_indices – If True and sample_indices available, include as column.

Returns:

pandas DataFrame with predictions.

Raises:

ImportError – If pandas is not available.

to_list() list[float][source]

Get predictions as Python list.

Returns:

List of prediction values (flattened if 2D).

to_numpy() ndarray[source]

Get predictions as numpy array.

Returns:

Numpy array of predictions.

property values: ndarray

Get prediction values (alias for y_pred).

y_pred: ndarray
class nirs4all.api.result.RunResult(predictions: Predictions, per_dataset: dict[str, Any], _runner: PipelineRunner | None = None)[source]

Bases: object

Result from nirs4all.run().

Provides convenient access to predictions, best model, and artifacts. Wraps the raw (predictions, per_dataset) tuple returned by PipelineRunner.run().

predictions

Predictions object containing all pipeline results.

Type:

Predictions

per_dataset

Dictionary with per-dataset execution details.

Type:

dict[str, Any]

Properties:

best: Best prediction entry by default ranking. best_score: Best model’s primary test score. best_rmse: Best model’s RMSE (regression). best_r2: Best model’s R² (regression). best_accuracy: Best model’s accuracy (classification). artifacts_path: Path to run artifacts directory. num_predictions: Total number of predictions stored.

top(n)[source]

Get top N predictions by ranking.

export(path)[source]

Export best model to .n4a bundle.

filter(**kwargs)[source]

Filter predictions by criteria.

get_datasets()[source]

Get list of unique dataset names.

get_models()[source]

Get list of unique model names.

Example

>>> result = nirs4all.run(pipeline, dataset)
>>> print(f"Best RMSE: {result.best_rmse:.4f}")
>>> print(f"Best R²: {result.best_r2:.4f}")
>>> result.export("exports/best_model.n4a")
__repr__() str[source]

String representation.

__str__() str[source]

User-friendly string representation.

property artifacts_path: Path | None

Get path to workspace artifacts directory.

Returns:

Path to the workspace directory, or None if not available.

property best: dict[str, Any]

Get best prediction entry by default ranking.

Returns:

Dictionary containing best model’s metrics, name, and configuration. Empty dict if no predictions available.

property best_accuracy: float

Get best model’s accuracy score (for classification).

Looks for ‘accuracy’ as a flat key (from display_metrics), then in scores dict.

Returns:

Accuracy value or NaN if unavailable.

property best_r2: float

Get best model’s R² score.

Looks for ‘r2’ as a flat key (from display_metrics), then in scores dict.

Returns:

R² value or NaN if unavailable.

property best_rmse: float

Get best model’s RMSE score.

Looks for ‘rmse’ as a flat key (from display_metrics), then in scores dict, then falls back to test_score if metric is rmse-like.

Returns:

RMSE value or NaN if unavailable.

property best_score: float

Get best model’s primary test score.

Returns:

The test_score value from best prediction, or NaN if unavailable.

export(output_path: str | Path, format: str = 'n4a', source: dict[str, Any] | None = None, chain_id: str | None = None) Path[source]

Export a model to bundle.

Two export paths are supported:

Store-based (preferred) – pass chain_id to export directly from the DuckDB workspace:

>>> result.export("model.n4a", chain_id="abc123")

Resolver-based (legacy) – exports via PipelineRunner.export:

>>> result.export("model.n4a")  # uses best prediction
Parameters:
  • output_path – Path for the exported bundle file.

  • format – Export format (‘n4a’ or ‘n4a.py’).

  • source – Prediction dict to export. If None, exports best model.

  • chain_id – Chain identifier for store-based export. When provided, source is ignored and the chain is exported directly from the DuckDB store.

Returns:

Path to the exported bundle file.

Raises:
  • RuntimeError – If runner reference is not available.

  • ValueError – If no predictions available and source not provided.

export_model(output_path: str | Path, source: dict[str, Any] | None = None, format: str | None = None, fold: int | None = None) Path[source]

Export only the model artifact (lightweight).

Unlike export() which creates a full bundle, this exports just the model.

Parameters:
  • output_path – Path for the output model file.

  • source – Prediction dict to export. If None, exports best model.

  • format – Model format (inferred from extension if None).

  • fold – Fold index to export (default: fold 0).

Returns:

Path to the exported model file.

Raises:

RuntimeError – If runner reference is not available.

filter(**kwargs) list[dict[str, Any]][source]

Filter predictions by criteria.

Parameters:

**kwargs – Filter criteria passed to predictions.filter_predictions(). Supported kwargs include: - dataset_name: Filter by dataset name - model_name: Filter by model name - partition: Filter by partition (‘train’, ‘val’, ‘test’) - fold_id: Filter by fold ID - step_idx: Filter by pipeline step index - branch_id: Filter by branch ID - load_arrays: If True, load actual arrays (default: True)

Returns:

List of matching prediction dictionaries.

get_datasets() list[str][source]

Get list of unique dataset names.

Returns:

List of dataset names in predictions.

get_models() list[str][source]

Get list of unique model names.

Returns:

List of model names in predictions.

property num_predictions: int

Get total number of predictions stored.

Returns:

Number of prediction entries.

per_dataset: dict[str, Any]
predictions: Predictions
summary() str[source]

Get a summary string of the run result.

Returns:

Multi-line summary string with key metrics.

top(n: int = 5, **kwargs) list[dict[str, Any]] | dict[tuple, list[dict[str, Any]]][source]

Get top N predictions by ranking.

Parameters:
  • n – Number of top predictions to return. When group_by is used, this means top N per group (e.g., top 3 per dataset).

  • **kwargs

    Additional arguments passed to predictions.top(). Supported kwargs include: - rank_metric: Metric to rank by (default: uses record’s metric) - rank_partition: Partition to rank on (default: “val”) - display_partition: Partition for display metrics (default: “test”) - aggregate_partitions: If True, include train/val/test data - ascending: Sort order (None = infer from metric) - group_by: Group predictions by column(s). Returns top N per group.

    Each result includes ‘group_key’ for easy filtering.

    • return_grouped: If True with group_by, return dict of group->results instead of flat list. Default: False.

Returns:

List of prediction dicts,

ranked by score. With group_by, returns top N per group as flat list.

  • If return_grouped=True: Dict mapping group keys to lists of predictions.

Return type:

  • If return_grouped=False (default)

Examples

>>> # Top 5 overall
>>> result.top(5)
>>>
>>> # Top 3 per dataset (flat list)
>>> top_per_ds = result.top(3, group_by='dataset_name')
>>> ds1 = [r for r in top_per_ds if r['group_key'] == ('my_dataset',)]
>>>
>>> # Top 3 per dataset (grouped dict)
>>> grouped = result.top(3, group_by='dataset_name', return_grouped=True)
>>> for key, results in grouped.items():
...     print(f"{key}: {len(results)} results")
>>>
>>> # Multi-column grouping: top 2 per (dataset, model) combination
>>> top_per_combo = result.top(2, group_by=['dataset_name', 'model_name'])
>>> # Group keys are tuples: ('wheat', 'PLSRegression'), ('corn', 'RandomForest')
>>> for r in top_per_combo:
...     dataset, model = r['group_key']
...     print(f"{dataset}/{model}: {r['test_score']:.4f}")
validate(check_nan_metrics: bool = True, check_empty: bool = True, raise_on_failure: bool = True, nan_threshold: float = 0.0) dict[str, Any][source]

Validate the run result for common issues.

Checks for NaN values in metrics, empty predictions, and other issues that might indicate problems with the pipeline execution.

Parameters:
  • check_nan_metrics – If True, check for NaN values in metrics.

  • check_empty – If True, check for empty predictions.

  • raise_on_failure – If True, raise ValueError on validation failure.

  • nan_threshold – Maximum allowed ratio of predictions with NaN metrics (0.0 = none allowed).

Returns:

  • valid: True if all checks passed.

  • issues: List of issue descriptions.

  • nan_count: Number of predictions with NaN metrics.

  • total_count: Total number of predictions.

Return type:

Dictionary with validation results

Raises:

ValueError – If raise_on_failure=True and validation fails.

Example

>>> result = nirs4all.run(pipeline, dataset)
>>> result.validate()  # Raises if issues found
>>> # Or check without raising
>>> report = result.validate(raise_on_failure=False)
>>> if not report['valid']:
...     print(f"Issues: {report['issues']}")