PredictionResultsList Reference

The PredictionResultsList class is a specialized list container that wraps lists of PredictionResult objects returned by the top() method of the Predictions class. It provides additional functionality while maintaining full compatibility with standard Python list operations.

Quick Reference

Get Top Predictions

# Get top predictions (returns PredictionResultsList)
top_models = predictions.top(n=5, rank_metric="mse", aggregate_partitions=True)

Save All Predictions to CSV

top_models.save(path="results", filename="top_5_models.csv")

Get Prediction by ID

prediction = top_models.get("abc123")
if prediction:
    print(f"Found: {prediction.model_name}")

Print Summary Report

print(top_models[0].summary())

Output:

|----------|---------|----------|--------|--------|--------|--------|
|          | Nsample | Nfeature | R²     | RMSE   | MSE    | MAE    |
|----------|---------|----------|--------|--------|--------|--------|
| Cros Val | 50      | 100      | 0.966  | 0.195  | 0.038  | 0.160  |
| Train    | 50      | 100      | 0.944  | 0.231  | 0.053  | 0.191  |
| Test     | 50      | 100      | 0.962  | 0.176  | 0.031  | 0.141  |
|----------|---------|----------|--------|--------|--------|--------|

Standard List Operations

len(top_models)           # Length
top_models[0]             # Indexing
top_models[:3]            # Slicing
for model in top_models:  # Iteration
    ...

Key Features

Extended Functionality

save(path, filename): Save all predictions to a single structured CSV file
get(id): Fast retrieval of predictions by their unique ID
Standard list operations: indexing, slicing, iteration, length, etc.

Enhanced PredictionResult

summary(): Generate a formatted tab report with metrics for train/val/test partitions
save_to_csv(path_or_file, filename): Save individual prediction to CSV
eval_score(metrics): Calculate metrics for the prediction

Usage Examples

Basic Usage

from nirs4all.data import Predictions

predictions = Predictions()

# Get top 5 models using top() method
top_models = predictions.top(
    n=5,
    rank_metric="mse",
    rank_partition="val",
    display_partition="test",
    aggregate_partitions=True
)

# Type: PredictionResultsList (extends list)
print(type(top_models))  # <class 'PredictionResultsList'>
print(len(top_models))   # 5

Saving to CSV

The save() method creates a structured CSV:

Line 1: dataset_name
Line 2: model_classname + model_id
Line 3: fold_id
Line 4: partition
Line 5: column headers (y_true_partition, y_pred_partition, ...)
Lines 6+: prediction data

Example:

top_models.save(
    path="results",
    filename="top_5_models.csv"
)

For aggregated results, the CSV has columns like:

y_true_train_fold0, y_pred_train_fold0
y_true_val_fold0, y_pred_val_fold0
y_true_test, y_pred_test

Common Workflows

Analyze Top Models

# Get top 10 models
top_10 = predictions.top(
    n=10,
    rank_metric="mse",
    aggregate_partitions=True
)

# Save all to CSV
top_10.save(path="results/analysis")

# Print summaries
for i, model in enumerate(top_10, 1):
    print(f"\n{'='*80}")
    print(f"MODEL {i}: {model.model_name} (ID: {model.id})")
    print(f"{'='*80}")
    print(model.summary())

Export Best Model Details

# Get best model
best = predictions.top(n=1, rank_metric="rmse")[0]

# Print summary
print("BEST MODEL PERFORMANCE:")
print(best.summary())

# Save individual prediction
best.save_to_csv("results/best_model.csv")

# Access details
print(f"Model: {best.model_name}")
print(f"Dataset: {best.dataset_name}")
print(f"Fold: {best.fold_id}")
print(f"Score: {best.get('rank_score')}")

Compare Multiple Models

# Get top 5 models
top_5 = predictions.top(n=5, rank_metric="r2", ascending=False)

# Save all predictions to single file
top_5.save(filename="top_5_comparison.csv")

# Compare metrics
for model in top_5:
    scores = model.eval_score(metrics=["rmse", "mae", "r2"])
    print(f"{model.model_name}: {scores}")

Group By: Top N Per Group

The group_by parameter allows you to get top N results per group instead of N total. This is useful when comparing models across multiple datasets or configurations.

# Get top 3 models PER DATASET (flat list, sorted by global rank)
top_per_dataset = predictions.top(
    n=3,
    rank_metric="rmse",
    group_by="dataset_name"
)

# Each result includes 'group_key' for easy filtering
for pred in top_per_dataset:
    dataset = pred['group_key'][0]  # group_key is a tuple
    print(f"{dataset}: {pred.model_name} - RMSE: {pred.get('rmse', 0):.4f}")

# Filter results for a specific dataset
wheat_results = [r for r in top_per_dataset if r['group_key'] == ('wheat',)]

Grouped dict output with return_grouped=True:

# Get top 3 models per dataset as a dictionary
grouped = predictions.top(
    n=3,
    rank_metric="rmse",
    group_by="dataset_name",
    return_grouped=True
)

# Result: {('dataset1',): [...], ('dataset2',): [...]}
for group_key, results in grouped.items():
    print(f"\n{group_key[0]}: {len(results)} best models")
    for i, pred in enumerate(results, 1):
        print(f"  {i}. {pred.model_name}: RMSE={pred.get('rmse', 0):.4f}")

Multi-column grouping:

# Top 2 per (dataset, model_class) combination
per_combo = predictions.top(
    n=2,
    rank_metric="rmse",
    group_by=["dataset_name", "model_classname"]
)
# Each result has group_key like ('wheat', 'PLSRegression')

Complete Workflow Example

from nirs4all.data import Predictions

# Load existing predictions
predictions = Predictions.load(
    dataset_name="my_dataset",
    path="results"
)

# Get top 10 models ranked by MSE on validation set
top_models = predictions.top(
    n=10,
    rank_metric="mse",
    rank_partition="val",
    display_partition="test",
    aggregate_partitions=True,  # Include train/val/test data
    ascending=True  # Lower MSE is better
)

# Save all predictions to CSV
top_models.save(
    path="results/analysis",
    filename="top_10_models.csv"
)

# Print summary for best model
print("=" * 80)
print("BEST MODEL SUMMARY")
print("=" * 80)
print(top_models[0].summary())

# Access specific prediction by ID
best_id = top_models[0].id
best_prediction = top_models.get(best_id)

# Iterate through predictions
for i, prediction in enumerate(top_models, 1):
    print(f"\n{i}. {prediction.model_name} (ID: {prediction.id})")
    print(f"   Fold: {prediction.fold_id}")
    print(f"   Rank Score: {prediction.get('rank_score'):.4f}")

    # Save individual prediction
    prediction.save_to_csv(f"results/individual/model_{i}.csv")

API Reference