# Metrics Reference

This page documents all evaluation metrics available in NIRS4ALL.

## Overview

NIRS4ALL automatically computes appropriate metrics based on the task type:

| Task Type | Default Metric | Direction |
|-----------|----------------|-----------|
| Regression | MSE | Lower is better ↓ |
| Binary Classification | Balanced Accuracy | Higher is better ↑ |
| Multiclass Classification | Balanced Accuracy | Higher is better ↑ |

## Regression Metrics

### Core Metrics

| Metric | Abbreviation | Formula | Range | Direction |
|--------|--------------|---------|-------|-----------|
| **MSE** | MSE | $\frac{1}{n}\sum(y_{true} - y_{pred})^2$ | [0, ∞) | ↓ |
| **RMSE** | RMSE | $\sqrt{MSE}$ | [0, ∞) | ↓ |
| **MAE** | MAE | $\frac{1}{n}\sum\|y_{true} - y_{pred}\|$ | [0, ∞) | ↓ |
| **R²** | R² | $1 - \frac{SS_{res}}{SS_{tot}}$ | (-∞, 1] | ↑ |
| **MAPE** | MAPE | $\frac{100}{n}\sum\|\frac{y_{true} - y_{pred}}{y_{true}}\|$ | [0, ∞) | ↓ |

### NIRS-Specific Metrics

| Metric | Abbreviation | Description | Range | Direction |
|--------|--------------|-------------|-------|-----------|
| **Bias** | Bias | Mean error: $\bar{y_{pred} - y_{true}}$ | (-∞, ∞) | → 0 |
| **SEP** | SEP | Standard Error of Prediction | [0, ∞) | ↓ |
| **RPD** | RPD | Ratio of Performance to Deviation: $\frac{SD(y_{true})}{SEP}$ | [0, ∞) | ↑ |
| **Consistency** | Cons | $1 - \frac{RMSE}{SD(y_{true})}$ | (-∞, 1] | ↑ |

### Additional Metrics

| Metric | Abbreviation | Description | Range | Direction |
|--------|--------------|-------------|-------|-----------|
| **Explained Variance** | ExpVar | Proportion of variance explained | (-∞, 1] | ↑ |
| **Max Error** | MaxErr | Maximum absolute error | [0, ∞) | ↓ |
| **Median AE** | MedAE | Median absolute error | [0, ∞) | ↓ |
| **NRMSE** | NRMSE | RMSE / (max - min) | [0, ∞) | ↓ |
| **NMSE** | NMSE | MSE / variance | [0, ∞) | ↓ |
| **NMAE** | NMAE | MAE / (max - min) | [0, ∞) | ↓ |
| **Pearson R** | Pearson | Pearson correlation coefficient | [-1, 1] | ↑ |
| **Spearman R** | Spearman | Spearman rank correlation | [-1, 1] | ↑ |

### Metric Descriptions

#### MSE (Mean Squared Error)
Measures the average squared difference between predictions and true values. Penalizes large errors more than small ones.

```python
# In NIRS4ALL
metrics = result.top(n=5, display_metrics=['mse'])
```

#### RMSE (Root Mean Squared Error)
Square root of MSE, in the same units as the target variable. Most commonly used regression metric.

```python
# Default ranking metric for regression
result.top(n=5)  # Ranks by RMSE
```

#### R² (Coefficient of Determination)
Proportion of variance in the target explained by the model. R² = 1 is perfect, R² = 0 means no better than mean.

```python
result.top(n=5, display_metrics=['r2'])
```

#### RPD (Ratio of Performance to Deviation)
Common in NIRS literature. Indicates model quality:

| RPD Value | Interpretation |
|-----------|----------------|
| < 1.5 | Not usable |
| 1.5 - 2.0 | Rough screening |
| 2.0 - 2.5 | Good screening |
| 2.5 - 3.0 | Good quantification |
| > 3.0 | Excellent quantification |

#### SEP (Standard Error of Prediction)
Standard deviation of prediction errors. Indicates spread of errors around bias.

#### Bias
Mean error. Positive bias means model over-predicts on average.

---

## Classification Metrics

### Core Metrics

| Metric | Abbreviation | Description | Range | Direction |
|--------|--------------|-------------|-------|-----------|
| **Accuracy** | Acc | Correct predictions / total | [0, 1] | ↑ |
| **Balanced Accuracy** | BalAcc | Mean recall per class | [0, 1] | ↑ |
| **Precision** | Prec | TP / (TP + FP), weighted | [0, 1] | ↑ |
| **Recall** | Rec | TP / (TP + FN), weighted | [0, 1] | ↑ |
| **F1 Score** | F1 | Harmonic mean of precision & recall | [0, 1] | ↑ |
| **Specificity** | Spec | TN / (TN + FP) | [0, 1] | ↑ |

### Advanced Metrics

| Metric | Abbreviation | Description | Range | Direction |
|--------|--------------|-------------|-------|-----------|
| **ROC AUC** | AUC | Area under ROC curve | [0, 1] | ↑ |
| **MCC** | MCC | Matthews correlation coefficient | [-1, 1] | ↑ |
| **Cohen's Kappa** | Kappa | Agreement adjusted for chance | [-1, 1] | ↑ |
| **Log Loss** | LogLoss | Cross-entropy loss | [0, ∞) | ↓ |
| **Jaccard** | Jaccard | Intersection over union | [0, 1] | ↑ |
| **Hamming Loss** | Hamming | Fraction of wrong labels | [0, 1] | ↓ |

### Averaging Methods

For multiclass problems, metrics use different averaging:

| Suffix | Method | Description |
|--------|--------|-------------|
| (none) | Weighted | Weighted by class frequency (default) |
| `_micro` | Micro | Global TP, FP, FN counts |
| `_macro` | Macro | Unweighted mean per class |
| `balanced_*` | Macro | Same as macro average |

```python
# Available multiclass metrics
result.top(n=5, display_metrics=['accuracy', 'balanced_accuracy', 'f1_macro'])
```

### Metric Descriptions

#### Balanced Accuracy
Mean of recall for each class. Handles imbalanced datasets better than accuracy.

```python
# Default for classification
result.top(n=5)  # Uses balanced_accuracy
```

#### MCC (Matthews Correlation Coefficient)
Correlation between predicted and true classes. Considers all four confusion matrix quadrants. Recommended for imbalanced datasets.

| MCC Value | Interpretation |
|-----------|----------------|
| +1 | Perfect prediction |
| 0 | Random prediction |
| -1 | Inverse prediction |

#### ROC AUC
Area under the Receiver Operating Characteristic curve. Measures discrimination ability across all classification thresholds.

---

## Using Metrics in Code

### Accessing Metrics in Results

```python
result = nirs4all.run(pipeline, dataset)

# Get top results with specific metrics
for pred in result.top(n=5, display_metrics=['rmse', 'r2', 'mae']):
    print(f"RMSE: {pred['rmse']:.4f}, R²: {pred['r2']:.4f}, MAE: {pred['mae']:.4f}")
```

### Ranking by Different Metrics

```python
# Rank by RMSE (default for regression)
top_by_rmse = result.top(n=5, rank_metric='rmse')

# Rank by R²
top_by_r2 = result.top(n=5, rank_metric='r2')

# Rank by custom metric
top_by_mae = result.top(n=5, rank_metric='mae')
```

### Metric Abbreviations

NIRS4ALL provides abbreviations for display:

```python
from nirs4all.core.metrics import abbreviate_metric

abbreviate_metric('balanced_accuracy')  # Returns 'BalAcc'
abbreviate_metric('mean_squared_error')  # Returns 'MSE'
abbreviate_metric('r2')                  # Returns 'R²'
```

### Computing Metrics Manually

```python
from nirs4all.core.metrics import eval, eval_multi

# Single metric
rmse = eval(y_true, y_pred, 'rmse')

# All metrics for task type
metrics = eval_multi(y_true, y_pred, 'regression')
# Returns: {'mse': 0.01, 'rmse': 0.1, 'mae': 0.08, 'r2': 0.95, ...}
```

### Getting Available Metrics

```python
from nirs4all.core.metrics import get_available_metrics, get_default_metrics

# All available
all_reg = get_available_metrics('regression')
all_cls = get_available_metrics('binary_classification')

# Commonly used
default_reg = get_default_metrics('regression')
# ['r2', 'rmse', 'mse', 'sep', 'mae', 'rpd', 'bias', ...]
```

---

## Metric Selection Guidelines

### For Regression

| Scenario | Recommended Metrics |
|----------|---------------------|
| General purpose | RMSE, R², MAE |
| NIRS literature | RMSE, R², RPD, SEP |
| Outlier-sensitive | MAE, Median AE |
| Relative errors | MAPE, NRMSE |
| Correlation focus | Pearson R, Spearman R |

### For Classification

| Scenario | Recommended Metrics |
|----------|---------------------|
| Balanced classes | Accuracy, F1 |
| Imbalanced classes | Balanced Accuracy, MCC, ROC AUC |
| Cost-sensitive | Precision or Recall (depending on cost) |
| Binary problems | Accuracy, AUC, F1 |
| Multiclass problems | Balanced Accuracy, F1 Macro |

---

## Complete Example

```python
import nirs4all
from nirs4all.core.metrics import eval_multi, get_default_metrics

# Run pipeline
result = nirs4all.run(
    pipeline=[
        MinMaxScaler(),
        ShuffleSplit(n_splits=5),
        {"model": PLSRegression(n_components=10)}
    ],
    dataset="sample_data/regression",
    verbose=1
)

# View multiple metrics
print("📊 Top 5 Models by RMSE:")
for pred in result.top(n=5, display_metrics=['rmse', 'r2', 'mae', 'sep', 'rpd']):
    print(f"  {pred['model_name']}:")
    print(f"    RMSE: {pred['rmse']:.4f}")
    print(f"    R²: {pred['r2']:.4f}")
    print(f"    MAE: {pred['mae']:.4f}")
    print(f"    SEP: {pred.get('sep', 'N/A')}")
    print(f"    RPD: {pred.get('rpd', 'N/A')}")

# Compute all metrics for best model
best = result.best
y_true = best['y_true']
y_pred = best['y_pred']
all_metrics = eval_multi(y_true, y_pred, 'regression')

print("\n📈 All Regression Metrics:")
for metric, value in all_metrics.items():
    print(f"  {metric}: {value:.4f}")
```

## See Also

- {doc}`/user_guide/models/training` - Model training basics
- {doc}`/reference/predictions_api` - Working with prediction results
- {doc}`/user_guide/visualization/prediction_charts` - Visualizing metrics