Metrics Reference
This page documents all evaluation metrics available in NIRS4ALL.
Overview
NIRS4ALL automatically computes appropriate metrics based on the task type:
Task Type |
Default Metric |
Direction |
|---|---|---|
Regression |
MSE |
Lower is better ↓ |
Binary Classification |
Balanced Accuracy |
Higher is better ↑ |
Multiclass Classification |
Balanced Accuracy |
Higher is better ↑ |
Regression Metrics
Core Metrics
Metric |
Abbreviation |
Formula |
Range |
Direction |
|---|---|---|---|---|
MSE |
MSE |
$\frac{1}{n}\sum(y_{true} - y_{pred})^2$ |
[0, ∞) |
↓ |
RMSE |
RMSE |
$\sqrt{MSE}$ |
[0, ∞) |
↓ |
MAE |
MAE |
$\frac{1}{n}\sum|y_{true} - y_{pred}|$ |
[0, ∞) |
↓ |
R² |
R² |
$1 - \frac{SS_{res}}{SS_{tot}}$ |
(-∞, 1] |
↑ |
MAPE |
MAPE |
$\frac{100}{n}\sum|\frac{y_{true} - y_{pred}}{y_{true}}|$ |
[0, ∞) |
↓ |
NIRS-Specific Metrics
Metric |
Abbreviation |
Description |
Range |
Direction |
|---|---|---|---|---|
Bias |
Bias |
Mean error: $\bar{y_{pred} - y_{true}}$ |
(-∞, ∞) |
→ 0 |
SEP |
SEP |
Standard Error of Prediction |
[0, ∞) |
↓ |
RPD |
RPD |
Ratio of Performance to Deviation: $\frac{SD(y_{true})}{SEP}$ |
[0, ∞) |
↑ |
Consistency |
Cons |
$1 - \frac{RMSE}{SD(y_{true})}$ |
(-∞, 1] |
↑ |
Additional Metrics
Metric |
Abbreviation |
Description |
Range |
Direction |
|---|---|---|---|---|
Explained Variance |
ExpVar |
Proportion of variance explained |
(-∞, 1] |
↑ |
Max Error |
MaxErr |
Maximum absolute error |
[0, ∞) |
↓ |
Median AE |
MedAE |
Median absolute error |
[0, ∞) |
↓ |
NRMSE |
NRMSE |
RMSE / (max - min) |
[0, ∞) |
↓ |
NMSE |
NMSE |
MSE / variance |
[0, ∞) |
↓ |
NMAE |
NMAE |
MAE / (max - min) |
[0, ∞) |
↓ |
Pearson R |
Pearson |
Pearson correlation coefficient |
[-1, 1] |
↑ |
Spearman R |
Spearman |
Spearman rank correlation |
[-1, 1] |
↑ |
Metric Descriptions
MSE (Mean Squared Error)
Measures the average squared difference between predictions and true values. Penalizes large errors more than small ones.
# In NIRS4ALL
metrics = result.top(n=5, display_metrics=['mse'])
RMSE (Root Mean Squared Error)
Square root of MSE, in the same units as the target variable. Most commonly used regression metric.
# Default ranking metric for regression
result.top(n=5) # Ranks by RMSE
R² (Coefficient of Determination)
Proportion of variance in the target explained by the model. R² = 1 is perfect, R² = 0 means no better than mean.
result.top(n=5, display_metrics=['r2'])
RPD (Ratio of Performance to Deviation)
Common in NIRS literature. Indicates model quality:
RPD Value |
Interpretation |
|---|---|
< 1.5 |
Not usable |
1.5 - 2.0 |
Rough screening |
2.0 - 2.5 |
Good screening |
2.5 - 3.0 |
Good quantification |
> 3.0 |
Excellent quantification |
SEP (Standard Error of Prediction)
Standard deviation of prediction errors. Indicates spread of errors around bias.
Bias
Mean error. Positive bias means model over-predicts on average.
Classification Metrics
Core Metrics
Metric |
Abbreviation |
Description |
Range |
Direction |
|---|---|---|---|---|
Accuracy |
Acc |
Correct predictions / total |
[0, 1] |
↑ |
Balanced Accuracy |
BalAcc |
Mean recall per class |
[0, 1] |
↑ |
Precision |
Prec |
TP / (TP + FP), weighted |
[0, 1] |
↑ |
Recall |
Rec |
TP / (TP + FN), weighted |
[0, 1] |
↑ |
F1 Score |
F1 |
Harmonic mean of precision & recall |
[0, 1] |
↑ |
Specificity |
Spec |
TN / (TN + FP) |
[0, 1] |
↑ |
Advanced Metrics
Metric |
Abbreviation |
Description |
Range |
Direction |
|---|---|---|---|---|
ROC AUC |
AUC |
Area under ROC curve |
[0, 1] |
↑ |
MCC |
MCC |
Matthews correlation coefficient |
[-1, 1] |
↑ |
Cohen’s Kappa |
Kappa |
Agreement adjusted for chance |
[-1, 1] |
↑ |
Log Loss |
LogLoss |
Cross-entropy loss |
[0, ∞) |
↓ |
Jaccard |
Jaccard |
Intersection over union |
[0, 1] |
↑ |
Hamming Loss |
Hamming |
Fraction of wrong labels |
[0, 1] |
↓ |
Averaging Methods
For multiclass problems, metrics use different averaging:
Suffix |
Method |
Description |
|---|---|---|
(none) |
Weighted |
Weighted by class frequency (default) |
|
Micro |
Global TP, FP, FN counts |
|
Macro |
Unweighted mean per class |
|
Macro |
Same as macro average |
# Available multiclass metrics
result.top(n=5, display_metrics=['accuracy', 'balanced_accuracy', 'f1_macro'])
Metric Descriptions
Balanced Accuracy
Mean of recall for each class. Handles imbalanced datasets better than accuracy.
# Default for classification
result.top(n=5) # Uses balanced_accuracy
MCC (Matthews Correlation Coefficient)
Correlation between predicted and true classes. Considers all four confusion matrix quadrants. Recommended for imbalanced datasets.
MCC Value |
Interpretation |
|---|---|
+1 |
Perfect prediction |
0 |
Random prediction |
-1 |
Inverse prediction |
ROC AUC
Area under the Receiver Operating Characteristic curve. Measures discrimination ability across all classification thresholds.
Using Metrics in Code
Accessing Metrics in Results
result = nirs4all.run(pipeline, dataset)
# Get top results with specific metrics
for pred in result.top(n=5, display_metrics=['rmse', 'r2', 'mae']):
print(f"RMSE: {pred['rmse']:.4f}, R²: {pred['r2']:.4f}, MAE: {pred['mae']:.4f}")
Ranking by Different Metrics
# Rank by RMSE (default for regression)
top_by_rmse = result.top(n=5, rank_metric='rmse')
# Rank by R²
top_by_r2 = result.top(n=5, rank_metric='r2')
# Rank by custom metric
top_by_mae = result.top(n=5, rank_metric='mae')
Metric Abbreviations
NIRS4ALL provides abbreviations for display:
from nirs4all.core.metrics import abbreviate_metric
abbreviate_metric('balanced_accuracy') # Returns 'BalAcc'
abbreviate_metric('mean_squared_error') # Returns 'MSE'
abbreviate_metric('r2') # Returns 'R²'
Computing Metrics Manually
from nirs4all.core.metrics import eval, eval_multi
# Single metric
rmse = eval(y_true, y_pred, 'rmse')
# All metrics for task type
metrics = eval_multi(y_true, y_pred, 'regression')
# Returns: {'mse': 0.01, 'rmse': 0.1, 'mae': 0.08, 'r2': 0.95, ...}
Getting Available Metrics
from nirs4all.core.metrics import get_available_metrics, get_default_metrics
# All available
all_reg = get_available_metrics('regression')
all_cls = get_available_metrics('binary_classification')
# Commonly used
default_reg = get_default_metrics('regression')
# ['r2', 'rmse', 'mse', 'sep', 'mae', 'rpd', 'bias', ...]
Metric Selection Guidelines
For Regression
Scenario |
Recommended Metrics |
|---|---|
General purpose |
RMSE, R², MAE |
NIRS literature |
RMSE, R², RPD, SEP |
Outlier-sensitive |
MAE, Median AE |
Relative errors |
MAPE, NRMSE |
Correlation focus |
Pearson R, Spearman R |
For Classification
Scenario |
Recommended Metrics |
|---|---|
Balanced classes |
Accuracy, F1 |
Imbalanced classes |
Balanced Accuracy, MCC, ROC AUC |
Cost-sensitive |
Precision or Recall (depending on cost) |
Binary problems |
Accuracy, AUC, F1 |
Multiclass problems |
Balanced Accuracy, F1 Macro |
Complete Example
import nirs4all
from nirs4all.core.metrics import eval_multi, get_default_metrics
# Run pipeline
result = nirs4all.run(
pipeline=[
MinMaxScaler(),
ShuffleSplit(n_splits=5),
{"model": PLSRegression(n_components=10)}
],
dataset="sample_data/regression",
verbose=1
)
# View multiple metrics
print("📊 Top 5 Models by RMSE:")
for pred in result.top(n=5, display_metrics=['rmse', 'r2', 'mae', 'sep', 'rpd']):
print(f" {pred['model_name']}:")
print(f" RMSE: {pred['rmse']:.4f}")
print(f" R²: {pred['r2']:.4f}")
print(f" MAE: {pred['mae']:.4f}")
print(f" SEP: {pred.get('sep', 'N/A')}")
print(f" RPD: {pred.get('rpd', 'N/A')}")
# Compute all metrics for best model
best = result.best
y_true = best['y_true']
y_pred = best['y_pred']
all_metrics = eval_multi(y_true, y_pred, 'regression')
print("\n📈 All Regression Metrics:")
for metric, value in all_metrics.items():
print(f" {metric}: {value:.4f}")
See Also
Model Training - Model training basics
PredictionResultsList Reference - Working with prediction results
Analyzer Charts Reference - Visualizing metrics