Getting Started Examples
This section introduces the fundamentals of NIRS4ALL through a series of progressive examples. Each example builds upon the previous one, guiding you from your first pipeline to advanced visualization techniques.
Overview
Example |
Topic |
Difficulty |
Duration |
|---|---|---|---|
Hello World |
★☆☆☆☆ |
~1 min |
|
Basic Regression |
★★☆☆☆ |
~3 min |
|
Basic Classification |
★★☆☆☆ |
~2 min |
|
Visualization |
★★☆☆☆ |
~3 min |
U01: Hello World
Your first NIRS4ALL pipeline in about 20 lines of code.
What You’ll Learn
Using
nirs4all.run()to train a pipelineThe structure of a minimal pipeline
Reading results from the
RunResultobject
Key Concepts
A pipeline in NIRS4ALL is simply a list of processing steps:
from sklearn.cross_decomposition import PLSRegression
from sklearn.model_selection import ShuffleSplit
from sklearn.preprocessing import MinMaxScaler
import nirs4all
# Generate synthetic data (or use your own dataset path)
dataset = nirs4all.generate.regression(
n_samples=200,
target_component=0,
random_state=42
)
# Define the pipeline as a list of steps
pipeline = [
MinMaxScaler(), # Feature scaling
{"y_processing": MinMaxScaler()}, # Target scaling
ShuffleSplit(n_splits=3, test_size=0.25), # Cross-validation
{"model": PLSRegression(n_components=10)} # Model
]
# Run with one simple call
result = nirs4all.run(
pipeline=pipeline,
dataset=dataset,
name="HelloWorld",
verbose=1
)
# Access results
print(f"Best RMSE: {result.best_rmse:.4f}")
print(f"Best R²: {result.best_r2:.4f}")
# Explore top predictions
for pred in result.top(n=3, display_metrics=['rmse', 'r2']):
print(f"{pred['model_name']}: RMSE={pred['rmse']:.4f}, R²={pred['r2']:.4f}")
The RunResult Object
The result object provides convenient accessors:
Accessor |
Description |
|---|---|
|
Best model’s primary score (MSE by default) |
|
Best model’s RMSE |
|
Best model’s R² |
|
Best prediction entry as a dictionary |
|
Top N predictions ranked by score |
|
Full Predictions object for analysis |
Understanding Prediction Entries
Each prediction returned by result.top() or result.best is a dictionary with rich information:
# Get top predictions
for pred in result.top(n=3, display_metrics=['rmse', 'r2']):
# Core identification
print(f"Model: {pred['model_name']}")
print(f"Dataset: {pred['dataset_name']}")
print(f"Fold: {pred['fold_id']}")
print(f"Preprocessing: {pred.get('preprocessings', 'none')}")
# Metrics (available when using display_metrics)
print(f"RMSE: {pred['rmse']:.4f}")
print(f"R²: {pred['r2']:.4f}")
# Scores by partition (primary metric)
print(f"Train: {pred['train_score']:.6f}")
print(f"Val: {pred['val_score']:.6f}")
print(f"Test: {pred['test_score']:.6f}")
# Additional metadata
print(f"Samples: {pred['n_samples']}, Features: {pred['n_features']}")
Key fields in each prediction entry:
Field |
Description |
|---|---|
|
Name of the model (e.g., “PLSRegression”) |
|
Class name of the model |
|
Dataset used for training |
|
Cross-validation fold index |
|
Preprocessing steps applied |
|
Training score (primary metric) |
|
Validation score (primary metric) |
|
Test score (primary metric) |
|
RMSE and R² (when using |
|
Data shape information |
|
‘regression’ or ‘classification’ |
|
Primary metric name (e.g., ‘mse’) |
Tips for Beginners
Start simple: Begin with a basic pipeline and add complexity gradually
Use verbose=1: See what’s happening during training
Check top models: Use
result.top(n=5, display_metrics=['rmse', 'r2'])to compare performanceExplore predictions: Each prediction entry contains detailed metrics and metadata
U02: Basic Regression
A complete regression pipeline with NIRS-specific preprocessing and visualization.
What You’ll Learn
NIRS-specific preprocessing (SNV, Detrend, Derivatives, Gaussian)
Feature augmentation to explore preprocessing combinations
Using
PredictionAnalyzerfor result visualizationComparing models with different
n_components
NIRS Preprocessing Options
NIRS4ALL provides specialized transforms for spectral data:
Transform |
Purpose |
When to Use |
|---|---|---|
|
Scatter correction |
Path length variations |
|
Scatter correction |
Reference-based correction |
|
Baseline correction |
Polynomial drift removal |
|
Enhance peaks, remove baseline |
Constant baseline issues |
|
Smoothing + derivatives |
Noisy data |
|
Smoothing |
Noise reduction |
|
Wavelet transform |
Multi-resolution analysis |
Feature Augmentation
Instead of manually defining multiple pipelines, use feature augmentation to explore combinations:
pipeline = [
MinMaxScaler(),
{"y_processing": MinMaxScaler()},
# Generate 3 preprocessing combinations from 5 options
{
"feature_augmentation": {
"_or_": [Detrend, FirstDerivative, Gaussian, SavitzkyGolay, Haar],
"pick": 2, # Pick 2 at a time
"count": 3 # Generate 3 combinations
}
},
ShuffleSplit(n_splits=3, test_size=0.25),
{"model": PLSRegression(n_components=10)}
]
Visualization with PredictionAnalyzer
from nirs4all.visualization.predictions import PredictionAnalyzer
analyzer = PredictionAnalyzer(result.predictions)
# Compare top K models
analyzer.plot_top_k(k=3, rank_metric='rmse')
# Heatmap: models vs preprocessing
analyzer.plot_heatmap(x_var="model_name", y_var="preprocessings")
# Performance distribution
analyzer.plot_candlestick(variable="model_name")
U03: Basic Classification
Classification pipeline with Random Forest, XGBoost, and confusion matrix visualization.
What You’ll Learn
Setting up a classification pipeline
Using multiple classifiers (Random Forest, XGBoost)
Confusion matrix visualization
Classification metrics (accuracy, balanced recall)
Classification Pipeline Structure
from sklearn.ensemble import RandomForestClassifier
from sklearn.discriminant_analysis import StandardScaler
pipeline = [
# Feature augmentation with preprocessing options
{"feature_augmentation": [
FirstDerivative,
StandardNormalVariate,
Haar,
MultiplicativeScatterCorrection
]},
StandardScaler(),
ShuffleSplit(n_splits=3, test_size=0.25),
# Classifier
{"model": RandomForestClassifier(n_estimators=50, max_depth=8)}
]
Classification Metrics
Metric |
Description |
Use Case |
|---|---|---|
|
Overall correct predictions |
Balanced classes |
|
Average recall per class |
Imbalanced classes |
|
Average accuracy per class |
Class imbalance |
Confusion Matrix Visualization
# Plot confusion matrices for top 4 classifiers
analyzer.plot_confusion_matrix(
k=4,
rank_metric='accuracy',
rank_partition='val',
display_partition='test'
)
U04: Visualization
A comprehensive tour of all visualization options in NIRS4ALL.
What You’ll Learn
All PredictionAnalyzer methods and options
Heatmaps, candlestick charts, histograms
Top-k comparison plots
Ranking vs display partition configuration
Available Visualizations
Top-K Comparison
# Basic top-k plot
analyzer.plot_top_k(k=3, rank_metric='rmse')
# Rank by test partition, display R²
analyzer.plot_top_k(k=3, rank_metric='r2', rank_partition='test')
Heatmaps
Create 2D comparisons between any two variables:
# Model vs preprocessing
analyzer.plot_heatmap(x_var="model_name", y_var="preprocessings")
# Model vs dataset
analyzer.plot_heatmap(x_var="model_name", y_var="dataset_name", display_metric="r2")
# Model vs fold with counts
analyzer.plot_heatmap(x_var="model_name", y_var="fold_id", show_counts=True)
Candlestick Charts
Show performance distribution per category:
analyzer.plot_candlestick(variable="model_name", display_metric='rmse')
analyzer.plot_candlestick(variable="dataset_name", display_metric='r2')
Histograms
analyzer.plot_histogram(display_metric='rmse')
analyzer.plot_histogram(display_metric='r2')
Ranking vs Display: A Key Concept
You can separate ranking from display:
Parameter |
Purpose |
|---|---|
|
Determines which models are “best” |
|
What values to show |
# Rank by validation RMSE, but display test R²
analyzer.plot_heatmap(
x_var="model_name",
y_var="preprocessings",
rank_metric='rmse',
rank_partition='val',
display_metric='r2',
display_partition='test'
)
Aggregation Options
Option |
Description |
|---|---|
|
Show best score for each cell |
|
Show mean score |
|
Show median score |
Running These Examples
cd examples
# Run all getting started examples
./run.sh -n "U0*.py" -c user
# Run a specific example
python user/01_getting_started/U01_hello_world.py
# Enable plots
python user/01_getting_started/U02_basic_regression.py --plots --show
Next Steps
After completing these examples:
Data Handling: Learn different input formats and multi-dataset analysis
Preprocessing: Deep dive into NIRS-specific transformations
Models: Compare multiple models and hyperparameter tuning