Getting Started Examples
This section introduces the fundamentals of NIRS4ALL through a series of progressive examples. Each example builds upon the previous one, guiding you from your first pipeline to advanced visualization techniques.
Overview
Example |
Topic |
Difficulty |
Duration |
|---|---|---|---|
Hello World |
★☆☆☆☆ |
~1 min |
|
Basic Regression |
★★☆☆☆ |
~3 min |
|
Basic Classification |
★★☆☆☆ |
~2 min |
|
Visualization |
★★☆☆☆ |
~3 min |
U01: Hello World
Your first NIRS4ALL pipeline in about 20 lines of code.
What You’ll Learn
Using
nirs4all.run()to train a pipelineThe structure of a minimal pipeline
Reading results from the
RunResultobject
Key Concepts
A pipeline in NIRS4ALL is simply a list of processing steps:
from sklearn.cross_decomposition import PLSRegression
from sklearn.model_selection import ShuffleSplit
from sklearn.preprocessing import MinMaxScaler
import nirs4all
# Define the pipeline as a list of steps
pipeline = [
MinMaxScaler(), # Feature scaling
{"y_processing": MinMaxScaler()}, # Target scaling
ShuffleSplit(n_splits=3, test_size=0.25), # Cross-validation
{"model": PLSRegression(n_components=10)} # Model
]
# Run with one simple call
result = nirs4all.run(
pipeline=pipeline,
dataset="sample_data/regression",
name="HelloWorld",
verbose=1
)
# Access results
print(f"Best Score (MSE): {result.best_score:.4f}")
The RunResult Object
The result object provides convenient accessors:
Accessor |
Description |
|---|---|
|
Best model’s primary score (MSE by default) |
|
Best prediction entry as a dictionary |
|
Top N predictions ranked by score |
|
Full Predictions object for analysis |
Tips for Beginners
Start simple: Begin with a basic pipeline and add complexity gradually
Use verbose=1: See what’s happening during training
Check top models: Use
result.top(n=5)to compare performance
U02: Basic Regression
A complete regression pipeline with NIRS-specific preprocessing and visualization.
What You’ll Learn
NIRS-specific preprocessing (SNV, Detrend, Derivatives, Gaussian)
Feature augmentation to explore preprocessing combinations
Using
PredictionAnalyzerfor result visualizationComparing models with different
n_components
NIRS Preprocessing Options
NIRS4ALL provides specialized transforms for spectral data:
Transform |
Purpose |
When to Use |
|---|---|---|
|
Scatter correction |
Path length variations |
|
Scatter correction |
Reference-based correction |
|
Baseline correction |
Polynomial drift removal |
|
Enhance peaks, remove baseline |
Constant baseline issues |
|
Smoothing + derivatives |
Noisy data |
|
Smoothing |
Noise reduction |
|
Wavelet transform |
Multi-resolution analysis |
Feature Augmentation
Instead of manually defining multiple pipelines, use feature augmentation to explore combinations:
pipeline = [
MinMaxScaler(),
{"y_processing": MinMaxScaler()},
# Generate 3 preprocessing combinations from 5 options
{
"feature_augmentation": {
"_or_": [Detrend, FirstDerivative, Gaussian, SavitzkyGolay, Haar],
"pick": 2, # Pick 2 at a time
"count": 3 # Generate 3 combinations
}
},
ShuffleSplit(n_splits=3, test_size=0.25),
{"model": PLSRegression(n_components=10)}
]
Visualization with PredictionAnalyzer
from nirs4all.visualization.predictions import PredictionAnalyzer
analyzer = PredictionAnalyzer(result.predictions)
# Compare top K models
analyzer.plot_top_k(k=3, rank_metric='rmse')
# Heatmap: models vs preprocessing
analyzer.plot_heatmap(x_var="model_name", y_var="preprocessings")
# Performance distribution
analyzer.plot_candlestick(variable="model_name")
U03: Basic Classification
Classification pipeline with Random Forest, XGBoost, and confusion matrix visualization.
What You’ll Learn
Setting up a classification pipeline
Using multiple classifiers (Random Forest, XGBoost)
Confusion matrix visualization
Classification metrics (accuracy, balanced recall)
Classification Pipeline Structure
from sklearn.ensemble import RandomForestClassifier
from sklearn.discriminant_analysis import StandardScaler
pipeline = [
# Feature augmentation with preprocessing options
{"feature_augmentation": [
FirstDerivative,
StandardNormalVariate,
Haar,
MultiplicativeScatterCorrection
]},
StandardScaler(),
ShuffleSplit(n_splits=3, test_size=0.25),
# Classifier
{"model": RandomForestClassifier(n_estimators=50, max_depth=8)}
]
Classification Metrics
Metric |
Description |
Use Case |
|---|---|---|
|
Overall correct predictions |
Balanced classes |
|
Average recall per class |
Imbalanced classes |
|
Average accuracy per class |
Class imbalance |
Confusion Matrix Visualization
# Plot confusion matrices for top 4 classifiers
analyzer.plot_confusion_matrix(
k=4,
rank_metric='accuracy',
rank_partition='val',
display_partition='test'
)
U04: Visualization
A comprehensive tour of all visualization options in NIRS4ALL.
What You’ll Learn
All PredictionAnalyzer methods and options
Heatmaps, candlestick charts, histograms
Top-k comparison plots
Ranking vs display partition configuration
Available Visualizations
Top-K Comparison
# Basic top-k plot
analyzer.plot_top_k(k=3, rank_metric='rmse')
# Rank by test partition, display R²
analyzer.plot_top_k(k=3, rank_metric='r2', rank_partition='test')
Heatmaps
Create 2D comparisons between any two variables:
# Model vs preprocessing
analyzer.plot_heatmap(x_var="model_name", y_var="preprocessings")
# Model vs dataset
analyzer.plot_heatmap(x_var="model_name", y_var="dataset_name", display_metric="r2")
# Model vs fold with counts
analyzer.plot_heatmap(x_var="model_name", y_var="fold_id", show_counts=True)
Candlestick Charts
Show performance distribution per category:
analyzer.plot_candlestick(variable="model_name", display_metric='rmse')
analyzer.plot_candlestick(variable="dataset_name", display_metric='r2')
Histograms
analyzer.plot_histogram(display_metric='rmse')
analyzer.plot_histogram(display_metric='r2')
Ranking vs Display: A Key Concept
You can separate ranking from display:
Parameter |
Purpose |
|---|---|
|
Determines which models are “best” |
|
What values to show |
# Rank by validation RMSE, but display test R²
analyzer.plot_heatmap(
x_var="model_name",
y_var="preprocessings",
rank_metric='rmse',
rank_partition='val',
display_metric='r2',
display_partition='test'
)
Aggregation Options
Option |
Description |
|---|---|
|
Show best score for each cell |
|
Show mean score |
|
Show median score |
Running These Examples
cd examples
# Run all getting started examples
./run.sh -n "U0*.py" -c user
# Run a specific example
python user/01_getting_started/U01_hello_world.py
# Enable plots
python user/01_getting_started/U02_basic_regression.py --plots --show
Next Steps
After completing these examples:
Data Handling: Learn different input formats and multi-dataset analysis
Preprocessing: Deep dive into NIRS-specific transformations
Models: Compare multiple models and hyperparameter tuning