Developer Examples

This section contains advanced examples for users who want to extend NIRS4ALL’s capabilities or use its advanced features.

Overview

Developer examples are organized into six sections, progressing from advanced pipeline patterns to internal customization:

Section

Topics

Difficulty

Advanced Pipelines

Branching, merging, stacking

★★★☆☆

Generators

Dynamic pipeline generation

★★★☆☆

Synthetic Data

Custom data generation

★★★☆☆

Deep Learning

PyTorch, JAX, TensorFlow

★★★★☆

Transfer Learning

Instrument adaptation

★★★★☆

Advanced Features

Metadata, transforms

★★★★☆

Internals

Custom controllers, sessions

★★★★★


Advanced Pipelines

Pipeline branching and merging enable sophisticated model comparison, ensemble methods, and multi-source data handling.

D01: Branching Basics

Introduction to pipeline branching for parallel experiments.

📄 View source code

Pipeline branching enables running multiple parallel sub-pipelines (“branches”), each with its own preprocessing context while sharing common upstream state.

Key Concepts

# List syntax: Simple parallel branches
{"branch": [
    [SNV()],              # Branch 0
    [MSC()],              # Branch 1
    [FirstDerivative()],  # Branch 2
]}

# Dict syntax: Named branches
{"branch": {
    "snv": [SNV()],
    "msc": [MSC()],
    "derivative": [FirstDerivative()],
}}

# Generator syntax: Dynamic branches
{"branch": {"_or_": [SNV(), MSC(), FirstDerivative()]}}

What Branches Share

  • ✓ Data loading (no redundant I/O)

  • ✓ Train/test splits

  • ✓ Upstream preprocessing

What’s Independent

  • ✗ Branch-specific preprocessing

  • ✗ Y processing per branch

  • ✗ Models trained in-branch

D02: Branching Advanced

Statistical comparison and HTML reports.

📄 View source code

Branch Comparison

analyzer = PredictionAnalyzer(result.predictions)

# Statistical summary
summary = analyzer.branch_summary(metrics=['rmse', 'r2'])

# Visualizations
analyzer.plot_branch_comparison(display_metric='rmse', show_ci=True)
analyzer.plot_branch_boxplot(display_metric='rmse')
analyzer.plot_branch_heatmap(y_var='fold_id', display_metric='rmse')

D03: Merge Basics

Stacking and ensemble methods through prediction merging.

📄 View source code

pipeline = [
    ShuffleSplit(n_splits=5),

    # Base models in branches
    {"branch": {
        "pls": [PLSRegression(n_components=10)],
        "rf": [RandomForestRegressor()],
        "ridge": [Ridge(alpha=1.0)],
    }},

    # Merge OOF predictions for stacking
    {"merge": "predictions"},

    # Meta-learner
    {"model": Ridge(alpha=0.1)}
]

D04: Merge Sources

Combine multi-source data with flexible merging.

📄 View source code

pipeline = [
    # Per-source preprocessing
    {"source_branch": {
        "NIR": [SNV(), FirstDerivative()],
        "markers": [StandardScaler()],
    }},

    # Merge strategies
    {"merge_sources": "concat"},  # Horizontal concatenation
    # or: "stack" for 3D stacking

    PLSRegression(n_components=10)
]

D05: Meta-Stacking

Multi-level stacking ensembles.

📄 View source code


Generators

Generators enable dynamic pipeline generation for automated hyperparameter search and experiment design.

D01: Generator Syntax

Dynamic pipeline generation with _or_, _range_, _grid_.

📄 View source code

Generator Keywords

Keyword

Purpose

Example

_or_

Alternatives

{"_or_": [A, B, C]} → 3 variants

_range_

Numeric sweep

{"_range_": [5, 20, 5]} → [5, 10, 15, 20]

_log_range_

Log sweep

{"_log_range_": [0.001, 1, 4]} → [0.001, 0.01, 0.1, 1.0]

_grid_

Cartesian product

All combinations

_zip_

Parallel iteration

Paired values

Combination Controls

# pick: Select k items (combinations)
{"_or_": [A, B, C, D], "pick": 2}
# → [A,B], [A,C], [A,D], [B,C], [B,D], [C,D]

# arrange: Permutations (order matters)
{"_or_": [A, B, C], "arrange": 2}
# → [A,B], [A,C], [B,A], [B,C], [C,A], [C,B]

# count: Limit variants
{"_or_": [A, B, C, D, E], "count": 3}
# → 3 randomly selected

D02: Generator Advanced

Constraints, presets, and patterns.

📄 View source code

D03: Generator Iterators

Iterate over generated configurations.

📄 View source code

D04: Nested Generators

Complex nested generation patterns.

📄 View source code


Synthetic Data

The synthetic data generator allows creating realistic NIRS spectra for testing, validation, and development. These examples show advanced customization options.

D05: Custom Components

Create custom spectral components for synthetic data.

📄 View source code

Learn how to define your own chemical components with specific absorption profiles.

D06: Testing Integration

Generate data for testing and benchmarking.

📄 View source code

Create reproducible datasets for unit tests, benchmark different configurations, and compare real vs synthetic data.

D07: Wavenumber & Procedural

Wavenumber utilities and procedural component generation (Phase 1).

📄 View source code

Advanced wavenumber-to-wavelength conversions, overtone calculations, and procedural spectral band generation.

D08: Application Domains

Domain-specific synthetic data (Phase 1).

📄 View source code

Generate spectra tailored to specific applications: agriculture, food, pharmaceutical, petrochemical, and more.

D09: Instrument Simulation

Simulate instrument-specific characteristics (Phase 2).

📄 View source code

Model detector types, multi-sensor stitching, multi-scan averaging, and measurement mode effects.

Note

For advanced synthetic data features (environmental effects, validation, real data fitting), see the Reference Examples:

  • R05: Environmental and Matrix Effects (Phase 3)

  • R06: Validation and Quality Assessment (Phase 4)

  • R07: Fitting to Real Data (Phase 4)


Deep Learning

NIRS4ALL integrates with PyTorch, JAX, and TensorFlow for deep learning workflows.

D01: PyTorch Models

Integrate PyTorch neural networks.

📄 View source code

from nirs4all.operators.models.pytorch.nicon import nicon

pipeline = [
    MinMaxScaler(),
    SNV(),
    ShuffleSplit(n_splits=3),

    {"model": nicon(input_dim=2151, output_dim=1),
     "train_params": {
         "epochs": 100,
         "batch_size": 32,
         "learning_rate": 0.001,
         "device": "auto"  # Uses GPU if available
     }}
]

Built-in Architectures

Model

Description

nicon

Convolutional network for spectra

decon

Deconvolution architecture

transformer

Attention-based model

Custom PyTorch Models

import torch.nn as nn
from nirs4all.operators.models import framework

@framework("pytorch")
class MyModel(nn.Module):
    def __init__(self, input_dim, output_dim):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(input_dim, 128),
            nn.ReLU(),
            nn.Linear(128, output_dim)
        )

    def forward(self, x):
        return self.layers(x)

D02: JAX Models

JAX/Flax integration.

📄 View source code

D03: TensorFlow Models

TensorFlow/Keras integration.

📄 View source code

D04: Framework Comparison

Compare PyTorch, JAX, and TensorFlow.

📄 View source code


Transfer Learning

Adapt trained models to new instruments or conditions.

D01: Transfer Analysis

Analyze instrument transfer challenges.

📄 View source code

D02: Retrain Modes

Strategies for model adaptation.

📄 View source code

# Direct transfer: Apply model without adaptation
predictions = predictor.predict(model_id, new_instrument_data)

# Retrain last layers
predictor.retrain(
    model_id,
    new_data,
    mode="finetune",      # or "head_only", "full"
    epochs=10
)

D03: PCA Geometry

Analyze spectral space differences.

📄 View source code


Advanced Features

Advanced data handling and transformation features.

D01: Metadata Branching

Branch based on sample metadata.

📄 View source code

D02: Concat Transform

Concatenation transforms for multi-source data.

📄 View source code

D03: Repetition Transform

Repetition-based transforms.

📄 View source code

Creating Custom Transforms

from sklearn.base import TransformerMixin, BaseEstimator

class MyTransform(TransformerMixin, BaseEstimator):
    def __init__(self, param=1.0):
        self.param = param

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X * self.param

# Use in pipeline
pipeline = [MyTransform(param=2.0), PLSRegression()]

Internals

Extend NIRS4ALL at the deepest level with custom controllers and session management.

D01: Session Workflow

Understanding execution flow.

📄 View source code

# Pipeline execution flow:
# 1. PipelineRunner.run() creates PipelineOrchestrator
# 2. Pipeline expands generators
# 3. For each variant:
#    a. Execute preprocessing steps
#    b. Execute splitter (CV)
#    c. For each fold:
#       - Execute model training
#       - Collect predictions
# 4. Aggregate results

D02: Custom Controllers

Extend NIRS4ALL with custom step handlers.

📄 View source code

from nirs4all.controllers import register_controller, OperatorController

@register_controller
class MyController(OperatorController):
    priority = 50  # Lower = higher priority

    @classmethod
    def matches(cls, step, operator, keyword) -> bool:
        return keyword == "my_custom_step"

    @classmethod
    def use_multi_source(cls) -> bool:
        return False

    @classmethod
    def supports_prediction_mode(cls) -> bool:
        return True  # Run during prediction

    def execute(self, step_info, dataset, context, runtime_context, **kwargs):
        # Custom logic
        return context, output

Running Developer Examples

cd examples

# Run all developer examples
./run.sh -c developer

# Run specific section
./run.sh -n "D01*.py" -c developer

# Run only generator examples (D01-D04)
./run.sh -n "D0[1-4]*.py" -c developer

# Run synthetic data examples (D05-D09)
./run.sh -n "D0[5-9]*.py" -c developer

# Skip deep learning (faster)
./run.sh -c developer -q

Prerequisites

Developer examples assume familiarity with:

  • All user examples

  • Python advanced concepts (decorators, metaclasses)

  • Machine learning theory

Next Steps