Getting Started

Welcome to NIRS4ALL! This section will help you get up and running quickly.

Overview

NIRS4ALL is designed to make Near-Infrared Spectroscopy data analysis accessible to everyone. Whether you’re a spectroscopy expert or new to the field, this guide will help you:

Install the library and its dependencies
Run your first pipeline in minutes
Understand the core concepts
Explore what’s possible

📦 Installation

Install NIRS4ALL and verify your setup.

First Step

Installation

🚀 Quickstart

Your first pipeline in 5 minutes.

5 Minutes

Quickstart

💡 Core Concepts

Understand pipelines, datasets, and results.

Essential

Core Concepts

Installation

Basic Installation

pip install nirs4all

This installs the core library with scikit-learn support. Deep learning frameworks are optional.

With Additional ML Frameworks

# With TensorFlow support (CPU)
pip install nirs4all[tensorflow]

# With TensorFlow support (GPU)
pip install nirs4all[gpu]

# With PyTorch support
pip install nirs4all[torch]

# With all ML frameworks
pip install nirs4all[all]

Development Installation

For developers who want to contribute:

git clone https://github.com/gbeurier/nirs4all.git
cd nirs4all
pip install -e .[dev]

Verify Installation

nirs4all --test-install

Quick Start (5 Minutes)

Here’s a complete example to get you started:

import nirs4all
from sklearn.preprocessing import MinMaxScaler
from sklearn.cross_decomposition import PLSRegression
from sklearn.model_selection import ShuffleSplit

# Define and run a pipeline in one step
result = nirs4all.run(
    pipeline=[
        MinMaxScaler(),                          # Scale features to [0, 1]
        ShuffleSplit(n_splits=3),                # 3-fold cross-validation
        {"model": PLSRegression(n_components=10)} # PLS model
    ],
    dataset="path/to/your/data",                 # Your spectral data
    verbose=1                                     # Show progress
)

# Check the results
print(f"Best RMSE: {result.best_rmse:.4f}")
print(f"Best R²: {result.best_r2:.4f}")

# Export the best model for deployment
result.export("exports/my_first_model.n4a")

Core Concepts

1. Pipelines

A pipeline is a sequence of operations applied to your data:

pipeline = [
    MinMaxScaler(),                    # Preprocessing
    SNV(),                             # NIRS-specific preprocessing
    KFold(n_splits=5),                 # Cross-validation
    {"model": PLSRegression(10)}       # Model training
]

2. Datasets

NIRS4ALL automatically loads data from various formats:

# From a folder containing CSV files
result = nirs4all.run(pipeline, dataset="data/wheat/")

# From specific files
result = nirs4all.run(pipeline, dataset="data/spectra.csv")

# With explicit configuration
from nirs4all.data import DatasetConfigs
dataset = DatasetConfigs(
    path="data/wheat/",
    target_column="protein",
    wavelength_start=900,
    wavelength_end=2500
)

3. Results

The result object contains everything about your run:

# Access metrics
print(result.best_rmse)
print(result.best_r2)

# Get predictions
predictions = result.predictions

# Export for deployment
result.export("exports/best_model.n4a")

# Use for new predictions
y_pred = result.predict(X_new)

Next Steps

📖 User Guide

Learn about preprocessing, stacking, and deployment.

User Guide

📚 Reference

Complete pipeline syntax and operator catalog.

Writing a Pipeline in nirs4all

📝 Examples

50+ working examples organized by topic.

Examples

🔧 Developer Guide

Architecture and extending the library.

Developer Guide