PredictionResultsList Reference

The PredictionResultsList class is a specialized list container that wraps lists of PredictionResult objects returned by the top() method of the Predictions class. It provides additional functionality while maintaining full compatibility with standard Python list operations.

Quick Reference

Get Top Predictions

# Get top predictions (returns PredictionResultsList)
top_models = predictions.top(n=5, rank_metric="mse", aggregate_partitions=True)

Save All Predictions to CSV

top_models.save(path="results", filename="top_5_models.csv")

Get Prediction by ID

prediction = top_models.get("abc123")
if prediction:
    print(f"Found: {prediction.model_name}")

Standard List Operations

len(top_models)           # Length
top_models[0]             # Indexing
top_models[:3]            # Slicing
for model in top_models:  # Iteration
    ...

Key Features

Extended Functionality

  • save(path, filename): Save all predictions to a single structured CSV file

  • get(id): Fast retrieval of predictions by their unique ID

  • Standard list operations: indexing, slicing, iteration, length, etc.

Enhanced PredictionResult

  • summary(): Generate a formatted tab report with metrics for train/val/test partitions

  • save_to_csv(path_or_file, filename): Save individual prediction to CSV

  • eval_score(metrics): Calculate metrics for the prediction

Usage Examples

Basic Usage

from nirs4all.data import Predictions

predictions = Predictions()

# Get top 5 models using top() method
top_models = predictions.top(
    n=5,
    rank_metric="mse",
    rank_partition="val",
    display_partition="test",
    aggregate_partitions=True
)

# Type: PredictionResultsList (extends list)
print(type(top_models))  # <class 'PredictionResultsList'>
print(len(top_models))   # 5

Saving to CSV

The save() method creates a structured CSV:

Line 1: dataset_name
Line 2: model_classname + model_id
Line 3: fold_id
Line 4: partition
Line 5: column headers (y_true_partition, y_pred_partition, ...)
Lines 6+: prediction data

Example:

top_models.save(
    path="results",
    filename="top_5_models.csv"
)

For aggregated results, the CSV has columns like:

  • y_true_train_fold0, y_pred_train_fold0

  • y_true_val_fold0, y_pred_val_fold0

  • y_true_test, y_pred_test

Common Workflows

Analyze Top Models

# Get top 10 models
top_10 = predictions.top(
    n=10,
    rank_metric="mse",
    aggregate_partitions=True
)

# Save all to CSV
top_10.save(path="results/analysis")

# Print summaries
for i, model in enumerate(top_10, 1):
    print(f"\n{'='*80}")
    print(f"MODEL {i}: {model.model_name} (ID: {model.id})")
    print(f"{'='*80}")
    print(model.summary())

Export Best Model Details

# Get best model
best = predictions.top(n=1, rank_metric="rmse")[0]

# Print summary
print("BEST MODEL PERFORMANCE:")
print(best.summary())

# Save individual prediction
best.save_to_csv("results/best_model.csv")

# Access details
print(f"Model: {best.model_name}")
print(f"Dataset: {best.dataset_name}")
print(f"Fold: {best.fold_id}")
print(f"Score: {best.get('rank_score')}")

Compare Multiple Models

# Get top 5 models
top_5 = predictions.top(n=5, rank_metric="r2", ascending=False)

# Save all predictions to single file
top_5.save(filename="top_5_comparison.csv")

# Compare metrics
for model in top_5:
    scores = model.eval_score(metrics=["rmse", "mae", "r2"])
    print(f"{model.model_name}: {scores}")

Group By: Top N Per Group

The group_by parameter allows you to get top N results per group instead of N total. This is useful when comparing models across multiple datasets or configurations.

# Get top 3 models PER DATASET (flat list, sorted by global rank)
top_per_dataset = predictions.top(
    n=3,
    rank_metric="rmse",
    group_by="dataset_name"
)

# Each result includes 'group_key' for easy filtering
for pred in top_per_dataset:
    dataset = pred['group_key'][0]  # group_key is a tuple
    print(f"{dataset}: {pred.model_name} - RMSE: {pred.get('rmse', 0):.4f}")

# Filter results for a specific dataset
wheat_results = [r for r in top_per_dataset if r['group_key'] == ('wheat',)]

Grouped dict output with return_grouped=True:

# Get top 3 models per dataset as a dictionary
grouped = predictions.top(
    n=3,
    rank_metric="rmse",
    group_by="dataset_name",
    return_grouped=True
)

# Result: {('dataset1',): [...], ('dataset2',): [...]}
for group_key, results in grouped.items():
    print(f"\n{group_key[0]}: {len(results)} best models")
    for i, pred in enumerate(results, 1):
        print(f"  {i}. {pred.model_name}: RMSE={pred.get('rmse', 0):.4f}")

Multi-column grouping:

# Top 2 per (dataset, model_class) combination
per_combo = predictions.top(
    n=2,
    rank_metric="rmse",
    group_by=["dataset_name", "model_classname"]
)
# Each result has group_key like ('wheat', 'PLSRegression')

Complete Workflow Example

from nirs4all.data import Predictions

# Load existing predictions
predictions = Predictions.load(
    dataset_name="my_dataset",
    path="results"
)

# Get top 10 models ranked by MSE on validation set
top_models = predictions.top(
    n=10,
    rank_metric="mse",
    rank_partition="val",
    display_partition="test",
    aggregate_partitions=True,  # Include train/val/test data
    ascending=True  # Lower MSE is better
)

# Save all predictions to CSV
top_models.save(
    path="results/analysis",
    filename="top_10_models.csv"
)

# Print summary for best model
print("=" * 80)
print("BEST MODEL SUMMARY")
print("=" * 80)
print(top_models[0].summary())

# Access specific prediction by ID
best_id = top_models[0].id
best_prediction = top_models.get(best_id)

# Iterate through predictions
for i, prediction in enumerate(top_models, 1):
    print(f"\n{i}. {prediction.model_name} (ID: {prediction.id})")
    print(f"   Fold: {prediction.fold_id}")
    print(f"   Rank Score: {prediction.get('rank_score'):.4f}")

    # Save individual prediction
    prediction.save_to_csv(f"results/individual/model_{i}.csv")

API Reference

PredictionResultsList

class PredictionResultsList(list):
    def save(self, path: str = "results", filename: Optional[str] = None) -> None
    def get(self, prediction_id: str) -> Optional[PredictionResult]

Methods:

  • __init__(predictions=None): Initialize with optional list of predictions

  • save(path="results", filename=None): Save all predictions to structured CSV

  • get(prediction_id): Retrieve prediction by ID (returns PredictionResult or None)

  • All standard list methods: append(), extend(), pop(), remove(), etc.

PredictionResult

class PredictionResult(dict):
    def summary(self) -> str
    def save_to_csv(self, path_or_file: str = "results", filename: Optional[str] = None) -> None
    def eval_score(self, metrics: Optional[List[str]] = None) -> Dict[str, Any]

    @property
    def id(self) -> str
    @property
    def dataset_name(self) -> str
    @property
    def model_name(self) -> str
    @property
    def model_classname(self) -> str
    @property
    def fold_id(self) -> str
    @property
    def config_name(self) -> str
    @property
    def step_idx(self) -> int
    @property
    def op_counter(self) -> int

Notes

Aggregated vs Non-Aggregated Results

Aggregated results (when aggregate_partitions=True):

  • Contains nested dictionaries for train, val, test partitions

  • Each partition has y_true, y_pred, and score fields

  • Summary shows metrics for all partitions

Non-aggregated results (single partition):

  • Contains y_true, y_pred at the root level

  • Summary shows metrics for that partition only

CSV File Structure

With aggregation:

dataset_name
model_classname_id
fold_id
partition
y_true_train_foldX,y_pred_train_foldX,y_true_val_foldX,y_pred_val_foldX,y_true_test,y_pred_test
0.5,0.52,0.6,0.58,0.55,0.54
...

Without aggregation:

dataset_name
model_classname_id
fold_id
partition
y_true,y_pred
0.5,0.52
...

Implementation Details

  • Type: PredictionResultsList extends Python’s built-in list class

  • Compatibility: Fully compatible with all list operations and duck typing

  • Performance: get() method uses linear search (O(n)), suitable for small result sets

  • Dependencies: Uses TabReportManager for summary generation

  • Return Type: top() returns PredictionResultsList instead of plain list

Key Points

  • Backward Compatible: All existing code continues to work

  • List Compatible: Standard list operations work normally

  • Flexible: Works with aggregated and non-aggregated results

  • Type Safe: Properly typed with Union types

See Also