Deployment Examples

This section covers saving, loading, and deploying trained NIRS4ALL models for production use.

Overview 

Example	Topic	Difficulty	Duration
U01	Save, Load, Predict	★★☆☆☆	~4 min
U02	Export Bundle	★★☆☆☆	~3 min
U03	Workspace Management	★★☆☆☆	~3 min
U04	sklearn Integration	★★☆☆☆	~3 min

U01: Save, Load, and Predict 

Save trained models and use them for prediction on new data.

📄 View source code

What You’ll Learn 

Automatic model saving with PipelineRunner
Prediction with prediction entries
Prediction with model IDs
Verifying prediction consistency

Training with Model Saving 

Enable save_artifacts=True to persist trained models:

from nirs4all.pipeline import PipelineRunner, PipelineConfigs
from nirs4all.data import DatasetConfigs

# Define pipeline
pipeline = [
    MinMaxScaler(),
    SNV(),
    FirstDerivative(),
    ShuffleSplit(n_splits=3, random_state=42),
    {"model": PLSRegression(n_components=10)}
]

# Run with saving enabled
runner = PipelineRunner(save_artifacts=True, verbose=1)
predictions, _ = runner.run(
    PipelineConfigs(pipeline, "MyPipeline"),
    DatasetConfigs("sample_data/regression")
)

# Get best model info
best = predictions.top(n=1)[0]
model_id = best['id']
print(f"Model ID: {model_id}")

Prediction Methods 

Method 1: Using Prediction Entry

# Create predictor
predictor = PipelineRunner()

# New data for prediction
new_data = DatasetConfigs({'X_test': 'path/to/new_data.csv'})

# Predict using the prediction entry directly
new_predictions, _ = predictor.predict(best, new_data)
print(f"Predictions shape: {new_predictions.shape}")

Method 2: Using Model ID

# Predict using just the model ID string
predictor = PipelineRunner()
new_predictions, _ = predictor.predict(model_id, new_data)

Prediction on NumPy Arrays 

import numpy as np

# Create synthetic new data
X_new = np.random.randn(10, 2151)  # Must match training feature count

# Create dataset from array
new_data = DatasetConfigs({'X_test': X_new})
predictions, _ = predictor.predict(model_id, new_data)

Model Storage Location 

Models are saved in the workspace:

workspace/runs/<run_id>/
├── model.pkl          # Trained model
├── preprocessor.pkl   # Preprocessing pipeline
├── metadata.json      # Configuration info
└── ...

U02: Export Bundle 

Export portable model bundles for distribution.

📄 View source code

What You’ll Learn 

Creating portable .n4a bundles
Bundle contents and structure
Importing bundles
Version compatibility

Creating a Bundle 

from nirs4all.pipeline.bundle import export_bundle, import_bundle

# After training, export the best model
export_bundle(
    prediction_entry=best,
    output_path="my_model.n4a",
    include_metadata=True
)

Bundle Contents 

A .n4a bundle is a compressed archive containing:

File	Description
`model.pkl`	Trained model
`pipeline.pkl`	Full preprocessing pipeline
`metadata.json`	Training info, metrics
`requirements.txt`	Python dependencies
`manifest.json`	Bundle structure info

Importing a Bundle 

# Load bundle
predictor = import_bundle("my_model.n4a")

# Use for prediction
predictions = predictor.predict(X_new)

Bundle Portability 

Bundles are designed to be portable:

✓ Self-contained (all preprocessing included)
✓ Version info for compatibility checks
✓ Can be shared via email, cloud storage, etc.
⚠ Requires compatible Python/library versions

U03: Workspace Management 

Organize and manage your training artifacts.

📄 View source code

What You’ll Learn 

Workspace structure
Artifact cleanup
Library management
Configuration

Workspace Structure 

workspace/
├── runs/              # Individual training runs
│   ├── run_20241231_123456/
│   │   ├── model.pkl
│   │   ├── predictions.json
│   │   └── charts/
│   └── ...
├── library/           # Curated model library
├── exports/           # Exported bundles
├── logs/              # Training logs
└── examples_output/   # Example outputs

Workspace Configuration 

from nirs4all.workspace import WorkspaceConfig

# Configure workspace location
config = WorkspaceConfig(
    root="./my_workspace",
    max_runs=100,           # Keep last 100 runs
    auto_cleanup=True       # Remove old runs
)

Artifact Cleanup 

from nirs4all.workspace import cleanup_workspace

# Remove runs older than 30 days
cleanup_workspace(
    max_age_days=30,
    keep_best_n=10  # Always keep top 10 models
)

Model Library 

Curate your best models:

from nirs4all.workspace import add_to_library, list_library

# Add model to library
add_to_library(
    model_id=best['id'],
    name="Production_PLS_v1",
    tags=["production", "sugar_content"]
)

# List library
for model in list_library():
    print(f"{model['name']}: {model['metrics']}")

U04: sklearn Integration 

Use NIRS4ALL models with scikit-learn workflows.

📄 View source code

What You’ll Learn 

SklearnWrapper for pipeline compatibility
Using with sklearn utilities
Cross-validation with sklearn
Integration with sklearn pipelines

SklearnWrapper 

Wrap trained models for sklearn compatibility:

from nirs4all.sklearn import SklearnWrapper

# Wrap a trained model
wrapper = SklearnWrapper(prediction_entry=best)

# Use like any sklearn estimator
predictions = wrapper.predict(X_new)

# Works with sklearn utilities
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_true, wrapper.predict(X_test))

sklearn Cross-Validation 

from sklearn.model_selection import cross_val_score

# Create wrapper
wrapper = SklearnWrapper(prediction_entry=best)

# Use sklearn cross-validation
scores = cross_val_score(wrapper, X, y, cv=5, scoring='neg_mean_squared_error')
print(f"CV RMSE: {np.sqrt(-scores.mean()):.4f}")

sklearn Pipeline Integration 

from sklearn.pipeline import Pipeline

# Create sklearn pipeline with NIRS4ALL model
sklearn_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', SklearnWrapper(prediction_entry=best))
])

# Fit and predict
sklearn_pipeline.fit(X_train, y_train)
predictions = sklearn_pipeline.predict(X_test)

Grid Search with sklearn 

from sklearn.model_selection import GridSearchCV

# Note: Hyperparameter tuning should be done in NIRS4ALL
# Use sklearn GridSearch only for final pipeline tuning

param_grid = {'model__scale_factor': [0.9, 1.0, 1.1]}
grid_search = GridSearchCV(sklearn_pipeline, param_grid, cv=5)
grid_search.fit(X, y)

Deployment Best Practices 

1. Always Validate Before Deployment 

# Load the saved model
predictor = PipelineRunner()
preds, _ = predictor.predict(model_id, test_dataset)

# Verify against training results
assert np.allclose(preds[:5], reference_preds[:5])

2. Document Model Requirements 

# Include in bundle metadata
export_bundle(
    prediction_entry=best,
    output_path="model.n4a",
    metadata={
        "description": "Sugar content prediction for NIR spectra",
        "input_shape": (None, 2151),
        "wavelength_range": "1000-2500 nm",
        "preprocessing": "SNV + FirstDerivative",
        "training_date": "2024-12-31",
        "training_rmse": 1.23
    }
)

3. Version Your Models 

# Use semantic versioning
add_to_library(
    model_id=best['id'],
    name="SugarModel_v2.1.0",
    changelog="Improved preprocessing, added MSC"
)

4. Monitor Prediction Quality 

# Log predictions for monitoring
import logging

logger = logging.getLogger("nirs4all.predictions")
logger.info(f"Prediction batch: n={len(X_new)}, mean={preds.mean():.2f}")

Running These Examples 

cd examples

# Run all deployment examples
./run.sh -n "U0*.py" -c user

# Run save/load example
python user/06_deployment/U01_save_load_predict.py

Next Steps 

After mastering deployment:

Explainability: Understand model decisions with SHAP
Transfer Learning: Adapt models to new instruments
Advanced Pipelines: Complex branching and merging