Getting Started Examples

This section introduces the fundamentals of NIRS4ALL through a series of progressive examples. Each example builds upon the previous one, guiding you from your first pipeline to advanced visualization techniques.

Overview

Example

Topic

Difficulty

Duration

U01

Hello World

★☆☆☆☆

~1 min

U02

Basic Regression

★★☆☆☆

~3 min

U03

Basic Classification

★★☆☆☆

~2 min

U04

Visualization

★★☆☆☆

~3 min


U01: Hello World

Your first NIRS4ALL pipeline in about 20 lines of code.

📄 View source code

What You’ll Learn

  • Using nirs4all.run() to train a pipeline

  • The structure of a minimal pipeline

  • Reading results from the RunResult object

Key Concepts

A pipeline in NIRS4ALL is simply a list of processing steps:

from sklearn.cross_decomposition import PLSRegression
from sklearn.model_selection import ShuffleSplit
from sklearn.preprocessing import MinMaxScaler

import nirs4all

# Define the pipeline as a list of steps
pipeline = [
    MinMaxScaler(),                              # Feature scaling
    {"y_processing": MinMaxScaler()},            # Target scaling
    ShuffleSplit(n_splits=3, test_size=0.25),    # Cross-validation
    {"model": PLSRegression(n_components=10)}    # Model
]

# Run with one simple call
result = nirs4all.run(
    pipeline=pipeline,
    dataset="sample_data/regression",
    name="HelloWorld",
    verbose=1
)

# Access results
print(f"Best Score (MSE): {result.best_score:.4f}")

The RunResult Object

The result object provides convenient accessors:

Accessor

Description

result.best_score

Best model’s primary score (MSE by default)

result.best

Best prediction entry as a dictionary

result.top(n)

Top N predictions ranked by score

result.predictions

Full Predictions object for analysis

Tips for Beginners

  1. Start simple: Begin with a basic pipeline and add complexity gradually

  2. Use verbose=1: See what’s happening during training

  3. Check top models: Use result.top(n=5) to compare performance


U02: Basic Regression

A complete regression pipeline with NIRS-specific preprocessing and visualization.

📄 View source code

What You’ll Learn

  • NIRS-specific preprocessing (SNV, Detrend, Derivatives, Gaussian)

  • Feature augmentation to explore preprocessing combinations

  • Using PredictionAnalyzer for result visualization

  • Comparing models with different n_components

NIRS Preprocessing Options

NIRS4ALL provides specialized transforms for spectral data:

Transform

Purpose

When to Use

StandardNormalVariate (SNV)

Scatter correction

Path length variations

MultiplicativeScatterCorrection (MSC)

Scatter correction

Reference-based correction

Detrend

Baseline correction

Polynomial drift removal

FirstDerivative

Enhance peaks, remove baseline

Constant baseline issues

SavitzkyGolay

Smoothing + derivatives

Noisy data

Gaussian

Smoothing

Noise reduction

Haar

Wavelet transform

Multi-resolution analysis

Feature Augmentation

Instead of manually defining multiple pipelines, use feature augmentation to explore combinations:

pipeline = [
    MinMaxScaler(),
    {"y_processing": MinMaxScaler()},

    # Generate 3 preprocessing combinations from 5 options
    {
        "feature_augmentation": {
            "_or_": [Detrend, FirstDerivative, Gaussian, SavitzkyGolay, Haar],
            "pick": 2,      # Pick 2 at a time
            "count": 3      # Generate 3 combinations
        }
    },

    ShuffleSplit(n_splits=3, test_size=0.25),
    {"model": PLSRegression(n_components=10)}
]

Visualization with PredictionAnalyzer

from nirs4all.visualization.predictions import PredictionAnalyzer

analyzer = PredictionAnalyzer(result.predictions)

# Compare top K models
analyzer.plot_top_k(k=3, rank_metric='rmse')

# Heatmap: models vs preprocessing
analyzer.plot_heatmap(x_var="model_name", y_var="preprocessings")

# Performance distribution
analyzer.plot_candlestick(variable="model_name")

U03: Basic Classification

Classification pipeline with Random Forest, XGBoost, and confusion matrix visualization.

📄 View source code

What You’ll Learn

  • Setting up a classification pipeline

  • Using multiple classifiers (Random Forest, XGBoost)

  • Confusion matrix visualization

  • Classification metrics (accuracy, balanced recall)

Classification Pipeline Structure

from sklearn.ensemble import RandomForestClassifier
from sklearn.discriminant_analysis import StandardScaler

pipeline = [
    # Feature augmentation with preprocessing options
    {"feature_augmentation": [
        FirstDerivative,
        StandardNormalVariate,
        Haar,
        MultiplicativeScatterCorrection
    ]},

    StandardScaler(),
    ShuffleSplit(n_splits=3, test_size=0.25),

    # Classifier
    {"model": RandomForestClassifier(n_estimators=50, max_depth=8)}
]

Classification Metrics

Metric

Description

Use Case

accuracy

Overall correct predictions

Balanced classes

balanced_recall

Average recall per class

Imbalanced classes

balanced_accuracy

Average accuracy per class

Class imbalance

Confusion Matrix Visualization

# Plot confusion matrices for top 4 classifiers
analyzer.plot_confusion_matrix(
    k=4,
    rank_metric='accuracy',
    rank_partition='val',
    display_partition='test'
)

U04: Visualization

A comprehensive tour of all visualization options in NIRS4ALL.

📄 View source code

What You’ll Learn

  • All PredictionAnalyzer methods and options

  • Heatmaps, candlestick charts, histograms

  • Top-k comparison plots

  • Ranking vs display partition configuration

Available Visualizations

Top-K Comparison

# Basic top-k plot
analyzer.plot_top_k(k=3, rank_metric='rmse')

# Rank by test partition, display R²
analyzer.plot_top_k(k=3, rank_metric='r2', rank_partition='test')

Heatmaps

Create 2D comparisons between any two variables:

# Model vs preprocessing
analyzer.plot_heatmap(x_var="model_name", y_var="preprocessings")

# Model vs dataset
analyzer.plot_heatmap(x_var="model_name", y_var="dataset_name", display_metric="r2")

# Model vs fold with counts
analyzer.plot_heatmap(x_var="model_name", y_var="fold_id", show_counts=True)

Candlestick Charts

Show performance distribution per category:

analyzer.plot_candlestick(variable="model_name", display_metric='rmse')
analyzer.plot_candlestick(variable="dataset_name", display_metric='r2')

Histograms

analyzer.plot_histogram(display_metric='rmse')
analyzer.plot_histogram(display_metric='r2')

Ranking vs Display: A Key Concept

You can separate ranking from display:

Parameter

Purpose

rank_metric + rank_partition

Determines which models are “best”

display_metric + display_partition

What values to show

# Rank by validation RMSE, but display test R²
analyzer.plot_heatmap(
    x_var="model_name",
    y_var="preprocessings",
    rank_metric='rmse',
    rank_partition='val',
    display_metric='r2',
    display_partition='test'
)

Aggregation Options

Option

Description

'best'

Show best score for each cell

'mean'

Show mean score

'median'

Show median score


Running These Examples

cd examples

# Run all getting started examples
./run.sh -n "U0*.py" -c user

# Run a specific example
python user/01_getting_started/U01_hello_world.py

# Enable plots
python user/01_getting_started/U02_basic_regression.py --plots --show

Next Steps

After completing these examples:

  • Data Handling: Learn different input formats and multi-dataset analysis

  • Preprocessing: Deep dive into NIRS-specific transformations

  • Models: Compare multiple models and hyperparameter tuning