Metrics Reference

This page documents all evaluation metrics available in NIRS4ALL.

Overview

NIRS4ALL automatically computes appropriate metrics based on the task type:

Task Type	Default Metric	Direction
Regression	MSE	Lower is better ↓
Binary Classification	Balanced Accuracy	Higher is better ↑
Multiclass Classification	Balanced Accuracy	Higher is better ↑

Regression Metrics

Core Metrics

Metric	Abbreviation	Formula	Range	Direction
MSE	MSE	$\frac{1}{n}\sum(y_{true} - y_{pred})^2$	[0, ∞)	↓
RMSE	RMSE	$\sqrt{MSE}$	[0, ∞)	↓
MAE	MAE	$\frac{1}{n}\sum\|y_{true} - y_{pred}\|$	[0, ∞)	↓
R²	R²	$1 - \frac{SS_{res}}{SS_{tot}}$	(-∞, 1]	↑
MAPE	MAPE	$\frac{100}{n}\sum\|\frac{y_{true} - y_{pred}}{y_{true}}\|$	[0, ∞)	↓

NIRS-Specific Metrics

Metric	Abbreviation	Description	Range	Direction
Bias	Bias	Mean error: $\bar{y_{pred} - y_{true}}$	(-∞, ∞)	→ 0
SEP	SEP	Standard Error of Prediction	[0, ∞)	↓
RPD	RPD	Ratio of Performance to Deviation: $\frac{SD(y_{true})}{SEP}$	[0, ∞)	↑
Consistency	Cons	$1 - \frac{RMSE}{SD(y_{true})}$	(-∞, 1]	↑

Additional Metrics

Metric	Abbreviation	Description	Range	Direction
Explained Variance	ExpVar	Proportion of variance explained	(-∞, 1]	↑
Max Error	MaxErr	Maximum absolute error	[0, ∞)	↓
Median AE	MedAE	Median absolute error	[0, ∞)	↓
NRMSE	NRMSE	RMSE / (max - min)	[0, ∞)	↓
NMSE	NMSE	MSE / variance	[0, ∞)	↓
NMAE	NMAE	MAE / (max - min)	[0, ∞)	↓
Pearson R	Pearson	Pearson correlation coefficient	[-1, 1]	↑
Spearman R	Spearman	Spearman rank correlation	[-1, 1]	↑

Metric Descriptions

MSE (Mean Squared Error)

Measures the average squared difference between predictions and true values. Penalizes large errors more than small ones.

# In NIRS4ALL
metrics = result.top(n=5, display_metrics=['mse'])

RMSE (Root Mean Squared Error)

Square root of MSE, in the same units as the target variable. Most commonly used regression metric.

# Default ranking metric for regression
result.top(n=5)  # Ranks by RMSE

R² (Coefficient of Determination)

Proportion of variance in the target explained by the model. R² = 1 is perfect, R² = 0 means no better than mean.

result.top(n=5, display_metrics=['r2'])

RPD (Ratio of Performance to Deviation)

Common in NIRS literature. Indicates model quality:

RPD Value	Interpretation
< 1.5	Not usable
1.5 - 2.0	Rough screening
2.0 - 2.5	Good screening
2.5 - 3.0	Good quantification
> 3.0	Excellent quantification

SEP (Standard Error of Prediction)

Standard deviation of prediction errors. Indicates spread of errors around bias.

Bias

Mean error. Positive bias means model over-predicts on average.

Classification Metrics

Core Metrics

Metric	Abbreviation	Description	Range	Direction
Accuracy	Acc	Correct predictions / total	[0, 1]	↑
Balanced Accuracy	BalAcc	Mean recall per class	[0, 1]	↑
Precision	Prec	TP / (TP + FP), weighted	[0, 1]	↑
Recall	Rec	TP / (TP + FN), weighted	[0, 1]	↑
F1 Score	F1	Harmonic mean of precision & recall	[0, 1]	↑
Specificity	Spec	TN / (TN + FP)	[0, 1]	↑

Advanced Metrics

Metric	Abbreviation	Description	Range	Direction
ROC AUC	AUC	Area under ROC curve	[0, 1]	↑
MCC	MCC	Matthews correlation coefficient	[-1, 1]	↑
Cohen’s Kappa	Kappa	Agreement adjusted for chance	[-1, 1]	↑
Log Loss	LogLoss	Cross-entropy loss	[0, ∞)	↓
Jaccard	Jaccard	Intersection over union	[0, 1]	↑
Hamming Loss	Hamming	Fraction of wrong labels	[0, 1]	↓

Averaging Methods

For multiclass problems, metrics use different averaging:

Suffix	Method	Description
(none)	Weighted	Weighted by class frequency (default)
`_micro`	Micro	Global TP, FP, FN counts
`_macro`	Macro	Unweighted mean per class
`balanced_*`	Macro	Same as macro average

# Available multiclass metrics
result.top(n=5, display_metrics=['accuracy', 'balanced_accuracy', 'f1_macro'])

Metric Descriptions

Balanced Accuracy

Mean of recall for each class. Handles imbalanced datasets better than accuracy.

# Default for classification
result.top(n=5)  # Uses balanced_accuracy

MCC (Matthews Correlation Coefficient)

Correlation between predicted and true classes. Considers all four confusion matrix quadrants. Recommended for imbalanced datasets.

MCC Value	Interpretation
+1	Perfect prediction
0	Random prediction
-1	Inverse prediction

ROC AUC

Area under the Receiver Operating Characteristic curve. Measures discrimination ability across all classification thresholds.

Using Metrics in Code

Accessing Metrics in Results

result = nirs4all.run(pipeline, dataset)

# Get top results with specific metrics
for pred in result.top(n=5, display_metrics=['rmse', 'r2', 'mae']):
    print(f"RMSE: {pred['rmse']:.4f}, R²: {pred['r2']:.4f}, MAE: {pred['mae']:.4f}")

Ranking by Different Metrics

# Rank by RMSE (default for regression)
top_by_rmse = result.top(n=5, rank_metric='rmse')

# Rank by R²
top_by_r2 = result.top(n=5, rank_metric='r2')

# Rank by custom metric
top_by_mae = result.top(n=5, rank_metric='mae')

Metric Abbreviations

NIRS4ALL provides abbreviations for display:

from nirs4all.core.metrics import abbreviate_metric

abbreviate_metric('balanced_accuracy')  # Returns 'BalAcc'
abbreviate_metric('mean_squared_error')  # Returns 'MSE'
abbreviate_metric('r2')                  # Returns 'R²'

Computing Metrics Manually

from nirs4all.core.metrics import eval, eval_multi

# Single metric
rmse = eval(y_true, y_pred, 'rmse')

# All metrics for task type
metrics = eval_multi(y_true, y_pred, 'regression')
# Returns: {'mse': 0.01, 'rmse': 0.1, 'mae': 0.08, 'r2': 0.95, ...}

Getting Available Metrics

from nirs4all.core.metrics import get_available_metrics, get_default_metrics

# All available
all_reg = get_available_metrics('regression')
all_cls = get_available_metrics('binary_classification')

# Commonly used
default_reg = get_default_metrics('regression')
# ['r2', 'rmse', 'mse', 'sep', 'mae', 'rpd', 'bias', ...]

Metric Selection Guidelines

For Regression

Scenario	Recommended Metrics
General purpose	RMSE, R², MAE
NIRS literature	RMSE, R², RPD, SEP
Outlier-sensitive	MAE, Median AE
Relative errors	MAPE, NRMSE
Correlation focus	Pearson R, Spearman R

For Classification

Scenario	Recommended Metrics
Balanced classes	Accuracy, F1
Imbalanced classes	Balanced Accuracy, MCC, ROC AUC
Cost-sensitive	Precision or Recall (depending on cost)
Binary problems	Accuracy, AUC, F1
Multiclass problems	Balanced Accuracy, F1 Macro

Complete Example

import nirs4all
from nirs4all.core.metrics import eval_multi, get_default_metrics

# Run pipeline
result = nirs4all.run(
    pipeline=[
        MinMaxScaler(),
        ShuffleSplit(n_splits=5),
        {"model": PLSRegression(n_components=10)}
    ],
    dataset="sample_data/regression",
    verbose=1
)

# View multiple metrics
print("📊 Top 5 Models by RMSE:")
for pred in result.top(n=5, display_metrics=['rmse', 'r2', 'mae', 'sep', 'rpd']):
    print(f"  {pred['model_name']}:")
    print(f"    RMSE: {pred['rmse']:.4f}")
    print(f"    R²: {pred['r2']:.4f}")
    print(f"    MAE: {pred['mae']:.4f}")
    print(f"    SEP: {pred.get('sep', 'N/A')}")
    print(f"    RPD: {pred.get('rpd', 'N/A')}")

# Compute all metrics for best model
best = result.best
y_true = best['y_true']
y_pred = best['y_pred']
all_metrics = eval_multi(y_true, y_pred, 'regression')

print("\n📈 All Regression Metrics:")
for metric, value in all_metrics.items():
    print(f"  {metric}: {value:.4f}")

Metrics Reference

Overview

Regression Metrics

Core Metrics

NIRS-Specific Metrics

Additional Metrics

Metric Descriptions

MSE (Mean Squared Error)

RMSE (Root Mean Squared Error)

R² (Coefficient of Determination)

RPD (Ratio of Performance to Deviation)

SEP (Standard Error of Prediction)

Bias

Classification Metrics

Core Metrics

Advanced Metrics

Averaging Methods

Metric Descriptions

Balanced Accuracy

MCC (Matthews Correlation Coefficient)

ROC AUC

Using Metrics in Code

Accessing Metrics in Results

Ranking by Different Metrics

Metric Abbreviations

Computing Metrics Manually

Getting Available Metrics

Metric Selection Guidelines

For Regression

For Classification

Complete Example

See Also