Migration Guide
This guide helps you migrate from older versions of nirs4all to the current version. It covers API changes, prediction format updates, and dataset configuration migrations.
Table of Contents
API Migration (v0.5 → v0.6+)
nirs4all v0.6 introduces a simplified module-level API that reduces boilerplate while maintaining full functionality. The classic API remains fully supported for backward compatibility.
What Changed
Aspect |
Classic API |
New API (v0.6+) |
|---|---|---|
Entry point |
|
|
Configuration |
Explicit config objects |
Inline parameters |
Result access |
|
|
Sessions |
N/A |
|
sklearn integration |
Manual |
|
Quick Comparison
Classic API (Still Supported)
from nirs4all.pipeline import PipelineRunner, PipelineConfigs
from nirs4all.data import DatasetConfigs
from sklearn.preprocessing import MinMaxScaler
from sklearn.cross_decomposition import PLSRegression
# Create configuration objects
pipeline_config = PipelineConfigs(
[MinMaxScaler(), PLSRegression(n_components=10)],
name="MyPipeline"
)
dataset_config = DatasetConfigs("sample_data/regression")
# Create runner and execute
runner = PipelineRunner(
verbose=1,
save_artifacts=True,
save_charts=False
)
predictions, per_dataset = runner.run(pipeline_config, dataset_config)
# Access results
best = predictions.top(n=1)[0]
print(f"Best RMSE: {best.get('rmse', 'N/A')}")
New Module-Level API (Recommended)
import nirs4all
from sklearn.preprocessing import MinMaxScaler
from sklearn.cross_decomposition import PLSRegression
# Direct execution with inline configuration
result = nirs4all.run(
pipeline=[MinMaxScaler(), PLSRegression(n_components=10)],
dataset="sample_data/regression",
name="MyPipeline",
verbose=1,
save_artifacts=True,
save_charts=False
)
# Convenient result access
print(f"Best RMSE: {result.best_rmse:.4f}")
print(f"Best R²: {result.best_r2:.4f}")
Migration Steps
1. Basic Training
Before:
from nirs4all.pipeline import PipelineRunner, PipelineConfigs
from nirs4all.data import DatasetConfigs
runner = PipelineRunner(verbose=1, save_artifacts=True)
predictions, _ = runner.run(
PipelineConfigs(pipeline, "name"),
DatasetConfigs("path/to/data")
)
best = predictions.top(n=1)[0]
After:
import nirs4all
result = nirs4all.run(
pipeline=pipeline,
dataset="path/to/data",
name="name",
verbose=1,
save_artifacts=True
)
best = result.best
2. Accessing Results
Before:
top_5 = predictions.top(n=5)
best = predictions.top(n=1)[0]
rmse = best.get('rmse', float('nan'))
r2 = best.get('r2', float('nan'))
pls_preds = predictions.filter_predictions(model_name='PLSRegression')
After:
top_5 = result.top(n=5)
rmse = result.best_rmse
r2 = result.best_r2
pls_preds = result.filter(model_name='PLSRegression')
print(result.num_predictions)
print(result.get_models())
3. Prediction
Before:
runner = PipelineRunner(verbose=0)
y_pred, metadata = runner.predict(source=best_prediction, dataset=new_data)
After:
predict_result = nirs4all.predict(
source=result.best,
dataset=new_data,
verbose=0
)
y_pred = predict_result.values
df = predict_result.to_dataframe()
4. Model Export
Before:
runner = PipelineRunner(save_artifacts=True)
predictions, _ = runner.run(pipeline_config, dataset_config)
best = predictions.top(n=1)[0]
runner.export(source=best, output_path="exports/model.n4a")
After:
result = nirs4all.run(pipeline, dataset, save_artifacts=True)
result.export("exports/model.n4a")
New Features in v0.6+
Sessions for Multiple Runs
with nirs4all.session(verbose=1, save_artifacts=True) as s:
result1 = nirs4all.run(pipeline1, data, name="PLS", session=s)
result2 = nirs4all.run(pipeline2, data, name="RF", session=s)
result3 = nirs4all.run(pipeline3, data, name="SVM", session=s)
sklearn Integration
from nirs4all.sklearn import NIRSPipeline
result = nirs4all.run(pipeline, dataset, save_artifacts=True)
pipe = NIRSPipeline.from_result(result)
y_pred = pipe.predict(X_test)
score = pipe.score(X_test, y_test)
Migration Checklist
Replace
PipelineRunner(...)withnirs4all.run(...)Remove explicit
PipelineConfigsandDatasetConfigswrappersUpdate result access from
predictions.top(n=1)[0]toresult.bestUse
result.best_rmse,result.best_r2for quick accessConsider using
nirs4all.session()for multiple related runsUse
NIRSPipeline.from_result()for sklearn/SHAP integrationUpdate exports from
runner.export(source=best, ...)toresult.export(...)
Dataset Configuration Migration
The new configuration system provides:
Multiple file formats: CSV, NumPy, Parquet, Excel, MATLAB
Flexible column/row selection: Select data by name, index, or pattern
Multiple partition methods: Static, column-based, percentage, or index-based
Multi-source support: Sensor fusion with multiple feature sources
Feature variations: Pre-computed preprocessing variants
Cross-validation folds: Load pre-defined fold assignments
Note
The legacy format continues to work unchanged. You can migrate gradually.
Quick Comparison
Legacy Format (Still Supported)
train_x: data/Xcal.csv
train_y: data/Ycal.csv
test_x: data/Xval.csv
test_y: data/Yval.csv
global_params:
delimiter: ";"
has_header: true
header_unit: cm-1
task_type: regression
New Sources Format
sources:
- name: "NIR"
train_x: data/NIR_train.csv
test_x: data/NIR_test.csv
params:
header_unit: nm
- name: "MIR"
train_x: data/MIR_train.csv
test_x: data/MIR_test.csv
params:
header_unit: cm-1
targets:
path: data/targets.csv
task_type: regression
New Variations Format
variations:
- name: raw
train_x: data/X_raw_train.csv
test_x: data/X_raw_test.csv
- name: snv
description: "SNV preprocessed"
train_x: data/X_snv_train.csv
test_x: data/X_snv_test.csv
variation_mode: compare
targets:
path: data/Y.csv
task_type: regression
Converting Configurations
Multi-Source (Legacy → Sources Format)
Before:
train_x:
- data/sensor1_train.csv
- data/sensor2_train.csv
test_x:
- data/sensor1_test.csv
- data/sensor2_test.csv
train_y: data/Y_train.csv
test_y: data/Y_test.csv
After:
sources:
- name: "sensor1"
files:
- path: data/sensor1_train.csv
partition: train
- path: data/sensor1_test.csv
partition: test
- name: "sensor2"
files:
- path: data/sensor2_train.csv
partition: train
- path: data/sensor2_test.csv
partition: test
targets:
path: data/Y.csv
Validation Commands
# Validate configuration
nirs4all dataset validate path/to/config.yaml
# Inspect configuration details
nirs4all dataset inspect new_config.yaml --detect
# Compare configurations
nirs4all dataset diff old_config.yaml new_config.yaml
Prediction Format Migration
New Fields in Predictions (v0.9+)
Field |
Description |
|---|---|
|
Unique identifier for the execution trace |
|
Reference to the saved model artifact |
|
Hash of the exact execution path |
|
List of artifact IDs for each pipeline step |
Impact
Old predictions without the new fields will:
✅ Continue to work for basic operations
✅ Work with
predict()if model folder still exists⚠️ Not support
retrain()with mode=’transfer’ or ‘finetune’⚠️ Not support bundle export with full artifact chain
Migration Methods
Automatic Migration (Recommended)
from nirs4all.database import PredictionsDB
from nirs4all.migration import migrate_predictions
db = PredictionsDB('runs/')
results = migrate_predictions(db, dry_run=False, verbose=1)
print(f"Migrated: {results['migrated']}")
Migration During Retrain
Old predictions are automatically migrated when used:
runner = PipelineRunner(save_artifacts=True, verbose=0)
new_preds, _ = runner.retrain(
source=old_prediction, # Will be migrated automatically
dataset=new_data,
mode='full'
)
Checking Migration Status
from nirs4all.database import PredictionsDB
db = PredictionsDB('runs/')
old_format = sum(1 for p in db.all() if 'trace_id' not in p)
new_format = sum(1 for p in db.all() if 'trace_id' in p)
print(f"Old format: {old_format}")
print(f"New format: {new_format}")
Troubleshooting
Common API Migration Issues
Wrong Return Type
# ❌ Wrong - will fail
predictions, per_dataset = nirs4all.run(pipeline, data)
# ✅ Correct
result = nirs4all.run(pipeline, data)
predictions = result.predictions
per_dataset = result.per_dataset
NIRSPipeline is for Prediction Only
# ❌ NIRSPipeline doesn't train
pipe = NIRSPipeline(steps=[MinMaxScaler(), PLSRegression(10)])
pipe.fit(X, y) # Raises NotImplementedError
# ✅ Train with nirs4all, then wrap
result = nirs4all.run(pipeline, dataset)
pipe = NIRSPipeline.from_result(result)
pipe.predict(X_new) # Works
Common Dataset Issues
“No data source specified”
Your configuration needs at least one of:
train_xortest_x(legacy)sources(new multi-source)variations(new variations)folder(auto-scan)
“Sample count mismatch across sources”
All sources must have the same number of samples. Check that your data files have consistent row counts.
Common Prediction Format Issues
Missing Model Folder
# Error: Model folder not found
# Old predictions without saved folders cannot be fully migrated
from pathlib import Path
folder = Path(pred['folder'])
if not folder.exists():
print("Original model folder missing - limited functionality")
Hash Mismatch
ValueError: Content hash mismatch for artifact 0001:3:all
Cause: Artifact file was modified after saving. Solution: Delete the corrupted artifact and re-run training.
See Also
Dataset Configuration Troubleshooting Guide - Common data loading issues
Getting Started - Installation guide
Workspace CLI Commands - CLI command reference