Migration Guide

This guide helps you migrate from older versions of nirs4all to the current version. It covers API changes, prediction format updates, and dataset configuration migrations.

Table of Contents

API Migration (v0.5 → v0.6+)
Dataset Configuration Migration
Prediction Format Migration
Troubleshooting

API Migration (v0.5 → v0.6+)

nirs4all v0.6 introduces a simplified module-level API that reduces boilerplate while maintaining full functionality. The classic API remains fully supported for backward compatibility.

What Changed

Aspect	Classic API	New API (v0.6+)
Entry point	`PipelineRunner.run()`	`nirs4all.run()`
Configuration	Explicit config objects	Inline parameters
Result access	`predictions.top(n=1)[0]`	`result.best`
Sessions	N/A	`nirs4all.session()`
sklearn integration	Manual	`NIRSPipeline` wrapper

Quick Comparison

Classic API (Still Supported)

from nirs4all.pipeline import PipelineRunner, PipelineConfigs
from nirs4all.data import DatasetConfigs
from sklearn.preprocessing import MinMaxScaler
from sklearn.cross_decomposition import PLSRegression

# Create configuration objects
pipeline_config = PipelineConfigs(
    [MinMaxScaler(), PLSRegression(n_components=10)],
    name="MyPipeline"
)
dataset_config = DatasetConfigs("sample_data/regression")

# Create runner and execute
runner = PipelineRunner(
    verbose=1,
    save_artifacts=True,
    save_charts=False
)
predictions, per_dataset = runner.run(pipeline_config, dataset_config)

# Access results
best = predictions.top(n=1)[0]
print(f"Best RMSE: {best.get('rmse', 'N/A')}")

New Module-Level API (Recommended)

import nirs4all
from sklearn.preprocessing import MinMaxScaler
from sklearn.cross_decomposition import PLSRegression

# Direct execution with inline configuration
result = nirs4all.run(
    pipeline=[MinMaxScaler(), PLSRegression(n_components=10)],
    dataset="sample_data/regression",
    name="MyPipeline",
    verbose=1,
    save_artifacts=True,
    save_charts=False
)

# Convenient result access
print(f"Best RMSE: {result.best_rmse:.4f}")
print(f"Best R²: {result.best_r2:.4f}")

Migration Steps

1. Basic Training

Before:

from nirs4all.pipeline import PipelineRunner, PipelineConfigs
from nirs4all.data import DatasetConfigs

runner = PipelineRunner(verbose=1, save_artifacts=True)
predictions, _ = runner.run(
    PipelineConfigs(pipeline, "name"),
    DatasetConfigs("path/to/data")
)
best = predictions.top(n=1)[0]

After:

import nirs4all

result = nirs4all.run(
    pipeline=pipeline,
    dataset="path/to/data",
    name="name",
    verbose=1,
    save_artifacts=True
)
best = result.best

2. Accessing Results

Before:

top_5 = predictions.top(n=5)
best = predictions.top(n=1)[0]
rmse = best.get('rmse', float('nan'))
r2 = best.get('r2', float('nan'))
pls_preds = predictions.filter_predictions(model_name='PLSRegression')

After:

top_5 = result.top(n=5)
rmse = result.best_rmse
r2 = result.best_r2
pls_preds = result.filter(model_name='PLSRegression')
print(result.num_predictions)
print(result.get_models())

3. Prediction

Before:

runner = PipelineRunner(verbose=0)
y_pred, metadata = runner.predict(source=best_prediction, dataset=new_data)

After:

predict_result = nirs4all.predict(
    source=result.best,
    dataset=new_data,
    verbose=0
)
y_pred = predict_result.values
df = predict_result.to_dataframe()

4. Model Export

Before:

runner = PipelineRunner(save_artifacts=True)
predictions, _ = runner.run(pipeline_config, dataset_config)
best = predictions.top(n=1)[0]
runner.export(source=best, output_path="exports/model.n4a")

After:

result = nirs4all.run(pipeline, dataset, save_artifacts=True)
result.export("exports/model.n4a")

New Features in v0.6+

Sessions for Multiple Runs

with nirs4all.session(verbose=1, save_artifacts=True) as s:
    result1 = nirs4all.run(pipeline1, data, name="PLS", session=s)
    result2 = nirs4all.run(pipeline2, data, name="RF", session=s)
    result3 = nirs4all.run(pipeline3, data, name="SVM", session=s)

sklearn Integration

from nirs4all.sklearn import NIRSPipeline

result = nirs4all.run(pipeline, dataset, save_artifacts=True)
pipe = NIRSPipeline.from_result(result)
y_pred = pipe.predict(X_test)
score = pipe.score(X_test, y_test)

Migration Checklist

Replace PipelineRunner(...) with nirs4all.run(...)
Remove explicit PipelineConfigs and DatasetConfigs wrappers
Update result access from predictions.top(n=1)[0] to result.best
Use result.best_rmse, result.best_r2 for quick access
Consider using nirs4all.session() for multiple related runs
Use NIRSPipeline.from_result() for sklearn/SHAP integration
Update exports from runner.export(source=best, ...) to result.export(...)

Dataset Configuration Migration

The new configuration system provides:

Multiple file formats: CSV, NumPy, Parquet, Excel, MATLAB
Flexible column/row selection: Select data by name, index, or pattern
Multiple partition methods: Static, column-based, percentage, or index-based
Multi-source support: Sensor fusion with multiple feature sources
Feature variations: Pre-computed preprocessing variants
Cross-validation folds: Load pre-defined fold assignments

Note

The legacy format continues to work unchanged. You can migrate gradually.

Quick Comparison

Legacy Format (Still Supported)

train_x: data/Xcal.csv
train_y: data/Ycal.csv
test_x: data/Xval.csv
test_y: data/Yval.csv

global_params:
  delimiter: ";"
  has_header: true
  header_unit: cm-1

task_type: regression

New Sources Format

sources:
  - name: "NIR"
    train_x: data/NIR_train.csv
    test_x: data/NIR_test.csv
    params:
      header_unit: nm

  - name: "MIR"
    train_x: data/MIR_train.csv
    test_x: data/MIR_test.csv
    params:
      header_unit: cm-1

targets:
  path: data/targets.csv

task_type: regression

New Variations Format

variations:
  - name: raw
    train_x: data/X_raw_train.csv
    test_x: data/X_raw_test.csv

  - name: snv
    description: "SNV preprocessed"
    train_x: data/X_snv_train.csv
    test_x: data/X_snv_test.csv

variation_mode: compare
targets:
  path: data/Y.csv
task_type: regression

Converting Configurations

Multi-Source (Legacy → Sources Format)

Before:

train_x:
  - data/sensor1_train.csv
  - data/sensor2_train.csv
test_x:
  - data/sensor1_test.csv
  - data/sensor2_test.csv
train_y: data/Y_train.csv
test_y: data/Y_test.csv

After:

sources:
  - name: "sensor1"
    files:
      - path: data/sensor1_train.csv
        partition: train
      - path: data/sensor1_test.csv
        partition: test

  - name: "sensor2"
    files:
      - path: data/sensor2_train.csv
        partition: train
      - path: data/sensor2_test.csv
        partition: test

targets:
  path: data/Y.csv

Validation Commands

# Validate configuration
nirs4all dataset validate path/to/config.yaml

# Inspect configuration details
nirs4all dataset inspect new_config.yaml --detect

# Compare configurations
nirs4all dataset diff old_config.yaml new_config.yaml

Prediction Format Migration

New Fields in Predictions (v0.9+)

Field	Description
`trace_id`	Unique identifier for the execution trace
`model_artifact_id`	Reference to the saved model artifact
`execution_hash`	Hash of the exact execution path
`step_artifacts`	List of artifact IDs for each pipeline step

Impact

Old predictions without the new fields will:

✅ Continue to work for basic operations
✅ Work with predict() if model folder still exists
⚠️ Not support retrain() with mode=’transfer’ or ‘finetune’
⚠️ Not support bundle export with full artifact chain

Migration Methods

Automatic Migration (Recommended)

from nirs4all.database import PredictionsDB
from nirs4all.migration import migrate_predictions

db = PredictionsDB('runs/')
results = migrate_predictions(db, dry_run=False, verbose=1)
print(f"Migrated: {results['migrated']}")

Migration During Retrain

Old predictions are automatically migrated when used:

runner = PipelineRunner(save_artifacts=True, verbose=0)
new_preds, _ = runner.retrain(
    source=old_prediction,  # Will be migrated automatically
    dataset=new_data,
    mode='full'
)

Checking Migration Status

from nirs4all.database import PredictionsDB

db = PredictionsDB('runs/')
old_format = sum(1 for p in db.all() if 'trace_id' not in p)
new_format = sum(1 for p in db.all() if 'trace_id' in p)

print(f"Old format: {old_format}")
print(f"New format: {new_format}")

Troubleshooting

Common API Migration Issues

Wrong Return Type

# ❌ Wrong - will fail
predictions, per_dataset = nirs4all.run(pipeline, data)

# ✅ Correct
result = nirs4all.run(pipeline, data)
predictions = result.predictions
per_dataset = result.per_dataset

NIRSPipeline is for Prediction Only

# ❌ NIRSPipeline doesn't train
pipe = NIRSPipeline(steps=[MinMaxScaler(), PLSRegression(10)])
pipe.fit(X, y)  # Raises NotImplementedError

# ✅ Train with nirs4all, then wrap
result = nirs4all.run(pipeline, dataset)
pipe = NIRSPipeline.from_result(result)
pipe.predict(X_new)  # Works

Common Dataset Issues

“No data source specified”

Your configuration needs at least one of:

train_x or test_x (legacy)
sources (new multi-source)
variations (new variations)
folder (auto-scan)

“Sample count mismatch across sources”

All sources must have the same number of samples. Check that your data files have consistent row counts.

Common Prediction Format Issues

Missing Model Folder

# Error: Model folder not found
# Old predictions without saved folders cannot be fully migrated
from pathlib import Path
folder = Path(pred['folder'])
if not folder.exists():
    print("Original model folder missing - limited functionality")

Hash Mismatch

ValueError: Content hash mismatch for artifact 0001:3:all

Cause: Artifact file was modified after saving. Solution: Delete the corrupted artifact and re-run training.

Migration Guide

Table of Contents

API Migration (v0.5 → v0.6+)

What Changed

Quick Comparison

Classic API (Still Supported)

New Module-Level API (Recommended)

Migration Steps

1. Basic Training

2. Accessing Results

3. Prediction

4. Model Export

New Features in v0.6+

Sessions for Multiple Runs

sklearn Integration

Migration Checklist

Dataset Configuration Migration

Quick Comparison

Legacy Format (Still Supported)

New Sources Format

New Variations Format

Converting Configurations

Multi-Source (Legacy → Sources Format)

Validation Commands

Prediction Format Migration

New Fields in Predictions (v0.9+)

Impact

Migration Methods

Automatic Migration (Recommended)

Migration During Retrain

Checking Migration Status

Troubleshooting

Common API Migration Issues

Wrong Return Type

NIRSPipeline is for Prediction Only

Common Dataset Issues

“No data source specified”

“Sample count mismatch across sources”

Common Prediction Format Issues

Missing Model Folder

Hash Mismatch

See Also