# PredictionResultsList Reference

The `PredictionResultsList` class is a specialized list container that wraps lists of `PredictionResult` objects returned by the `top()` method of the `Predictions` class. It provides additional functionality while maintaining full compatibility with standard Python list operations.

## Quick Reference

### Get Top Predictions

```python
# Get top predictions (returns PredictionResultsList)
top_models = predictions.top(n=5, rank_metric="mse", aggregate_partitions=True)
```

### Save All Predictions to CSV

```python
top_models.save(path="results", filename="top_5_models.csv")
```

### Get Prediction by ID

```python
prediction = top_models.get("abc123")
if prediction:
    print(f"Found: {prediction.model_name}")
```

### Print Summary Report

```python
print(top_models[0].summary())
```

**Output:**
```
|----------|---------|----------|--------|--------|--------|--------|
|          | Nsample | Nfeature | R²     | RMSE   | MSE    | MAE    |
|----------|---------|----------|--------|--------|--------|--------|
| Cros Val | 50      | 100      | 0.966  | 0.195  | 0.038  | 0.160  |
| Train    | 50      | 100      | 0.944  | 0.231  | 0.053  | 0.191  |
| Test     | 50      | 100      | 0.962  | 0.176  | 0.031  | 0.141  |
|----------|---------|----------|--------|--------|--------|--------|
```

### Standard List Operations

```python
len(top_models)           # Length
top_models[0]             # Indexing
top_models[:3]            # Slicing
for model in top_models:  # Iteration
    ...
```

## Key Features

### Extended Functionality

- **`save(path, filename)`**: Save all predictions to a single structured CSV file
- **`get(id)`**: Fast retrieval of predictions by their unique ID
- Standard list operations: indexing, slicing, iteration, length, etc.

### Enhanced PredictionResult

- **`summary()`**: Generate a formatted tab report with metrics for train/val/test partitions
- **`save_to_csv(path_or_file, filename)`**: Save individual prediction to CSV
- **`eval_score(metrics)`**: Calculate metrics for the prediction

## Usage Examples

### Basic Usage

```python
from nirs4all.data import Predictions

predictions = Predictions()

# Get top 5 models using top() method
top_models = predictions.top(
    n=5,
    rank_metric="mse",
    rank_partition="val",
    display_partition="test",
    aggregate_partitions=True
)

# Type: PredictionResultsList (extends list)
print(type(top_models))  # <class 'PredictionResultsList'>
print(len(top_models))   # 5
```

### Saving to CSV

The `save()` method creates a structured CSV:

```text
Line 1: dataset_name
Line 2: model_classname + model_id
Line 3: fold_id
Line 4: partition
Line 5: column headers (y_true_partition, y_pred_partition, ...)
Lines 6+: prediction data
```

**Example:**

```python
top_models.save(
    path="results",
    filename="top_5_models.csv"
)
```

For aggregated results, the CSV has columns like:
- `y_true_train_fold0`, `y_pred_train_fold0`
- `y_true_val_fold0`, `y_pred_val_fold0`
- `y_true_test`, `y_pred_test`

### Common Workflows

#### Analyze Top Models

```python
# Get top 10 models
top_10 = predictions.top(
    n=10,
    rank_metric="mse",
    aggregate_partitions=True
)

# Save all to CSV
top_10.save(path="results/analysis")

# Print summaries
for i, model in enumerate(top_10, 1):
    print(f"\n{'='*80}")
    print(f"MODEL {i}: {model.model_name} (ID: {model.id})")
    print(f"{'='*80}")
    print(model.summary())
```

#### Export Best Model Details

```python
# Get best model
best = predictions.top(n=1, rank_metric="rmse")[0]

# Print summary
print("BEST MODEL PERFORMANCE:")
print(best.summary())

# Save individual prediction
best.save_to_csv("results/best_model.csv")

# Access details
print(f"Model: {best.model_name}")
print(f"Dataset: {best.dataset_name}")
print(f"Fold: {best.fold_id}")
print(f"Score: {best.get('rank_score')}")
```

#### Compare Multiple Models

```python
# Get top 5 models
top_5 = predictions.top(n=5, rank_metric="r2", ascending=False)

# Save all predictions to single file
top_5.save(filename="top_5_comparison.csv")

# Compare metrics
for model in top_5:
    scores = model.eval_score(metrics=["rmse", "mae", "r2"])
    print(f"{model.model_name}: {scores}")
```

#### Group By: Top N Per Group

The `group_by` parameter allows you to get top N results **per group** instead of N total.
This is useful when comparing models across multiple datasets or configurations.

```python
# Get top 3 models PER DATASET (flat list, sorted by global rank)
top_per_dataset = predictions.top(
    n=3,
    rank_metric="rmse",
    group_by="dataset_name"
)

# Each result includes 'group_key' for easy filtering
for pred in top_per_dataset:
    dataset = pred['group_key'][0]  # group_key is a tuple
    print(f"{dataset}: {pred.model_name} - RMSE: {pred.get('rmse', 0):.4f}")

# Filter results for a specific dataset
wheat_results = [r for r in top_per_dataset if r['group_key'] == ('wheat',)]
```

**Grouped dict output** with `return_grouped=True`:

```python
# Get top 3 models per dataset as a dictionary
grouped = predictions.top(
    n=3,
    rank_metric="rmse",
    group_by="dataset_name",
    return_grouped=True
)

# Result: {('dataset1',): [...], ('dataset2',): [...]}
for group_key, results in grouped.items():
    print(f"\n{group_key[0]}: {len(results)} best models")
    for i, pred in enumerate(results, 1):
        print(f"  {i}. {pred.model_name}: RMSE={pred.get('rmse', 0):.4f}")
```

**Multi-column grouping**:

```python
# Top 2 per (dataset, model_class) combination
per_combo = predictions.top(
    n=2,
    rank_metric="rmse",
    group_by=["dataset_name", "model_classname"]
)
# Each result has group_key like ('wheat', 'PLSRegression')
```

## Complete Workflow Example

```python
from nirs4all.data import Predictions

# Load existing predictions
predictions = Predictions.load(
    dataset_name="my_dataset",
    path="results"
)

# Get top 10 models ranked by MSE on validation set
top_models = predictions.top(
    n=10,
    rank_metric="mse",
    rank_partition="val",
    display_partition="test",
    aggregate_partitions=True,  # Include train/val/test data
    ascending=True  # Lower MSE is better
)

# Save all predictions to CSV
top_models.save(
    path="results/analysis",
    filename="top_10_models.csv"
)

# Print summary for best model
print("=" * 80)
print("BEST MODEL SUMMARY")
print("=" * 80)
print(top_models[0].summary())

# Access specific prediction by ID
best_id = top_models[0].id
best_prediction = top_models.get(best_id)

# Iterate through predictions
for i, prediction in enumerate(top_models, 1):
    print(f"\n{i}. {prediction.model_name} (ID: {prediction.id})")
    print(f"   Fold: {prediction.fold_id}")
    print(f"   Rank Score: {prediction.get('rank_score'):.4f}")

    # Save individual prediction
    prediction.save_to_csv(f"results/individual/model_{i}.csv")
```

## API Reference

### PredictionResultsList

```python
class PredictionResultsList(list):
    def save(self, path: str = "results", filename: Optional[str] = None) -> None
    def get(self, prediction_id: str) -> Optional[PredictionResult]
```

**Methods:**

- `__init__(predictions=None)`: Initialize with optional list of predictions
- `save(path="results", filename=None)`: Save all predictions to structured CSV
- `get(prediction_id)`: Retrieve prediction by ID (returns `PredictionResult` or `None`)
- All standard list methods: `append()`, `extend()`, `pop()`, `remove()`, etc.

### PredictionResult

```python
class PredictionResult(dict):
    def summary(self) -> str
    def save_to_csv(self, path_or_file: str = "results", filename: Optional[str] = None) -> None
    def eval_score(self, metrics: Optional[List[str]] = None) -> Dict[str, Any]

    @property
    def id(self) -> str
    @property
    def dataset_name(self) -> str
    @property
    def model_name(self) -> str
    @property
    def model_classname(self) -> str
    @property
    def fold_id(self) -> str
    @property
    def config_name(self) -> str
    @property
    def step_idx(self) -> int
    @property
    def op_counter(self) -> int
```

## Notes

### Aggregated vs Non-Aggregated Results

**Aggregated results** (when `aggregate_partitions=True`):
- Contains nested dictionaries for `train`, `val`, `test` partitions
- Each partition has `y_true`, `y_pred`, and score fields
- Summary shows metrics for all partitions

**Non-aggregated results** (single partition):
- Contains `y_true`, `y_pred` at the root level
- Summary shows metrics for that partition only

### CSV File Structure

**With aggregation:**
```text
dataset_name
model_classname_id
fold_id
partition
y_true_train_foldX,y_pred_train_foldX,y_true_val_foldX,y_pred_val_foldX,y_true_test,y_pred_test
0.5,0.52,0.6,0.58,0.55,0.54
...
```

**Without aggregation:**
```text
dataset_name
model_classname_id
fold_id
partition
y_true,y_pred
0.5,0.52
...
```

### Implementation Details

- **Type:** `PredictionResultsList` extends Python's built-in `list` class
- **Compatibility:** Fully compatible with all list operations and duck typing
- **Performance:** `get()` method uses linear search (O(n)), suitable for small result sets
- **Dependencies:** Uses `TabReportManager` for summary generation
- **Return Type:** `top()` returns `PredictionResultsList` instead of plain list

## Key Points

- ✅ **Backward Compatible**: All existing code continues to work
- ✅ **List Compatible**: Standard list operations work normally
- ✅ **Flexible**: Works with aggregated and non-aggregated results
- ✅ **Type Safe**: Properly typed with Union types

## See Also

- {doc}`/reference/pipeline_syntax` - Pipeline syntax reference
- {doc}`/user_guide/visualization/index` - Visualization and charts