Deployment Examples
This section covers saving, loading, and deploying trained NIRS4ALL models for production use.
Overview
Example |
Topic |
Difficulty |
Duration |
|---|---|---|---|
Save, Load, Predict |
★★☆☆☆ |
~4 min |
|
Export Bundle |
★★☆☆☆ |
~3 min |
|
Workspace Management |
★★☆☆☆ |
~3 min |
|
sklearn Integration |
★★☆☆☆ |
~3 min |
U01: Save, Load, and Predict
Save trained models and use them for prediction on new data.
What You’ll Learn
Automatic model saving with PipelineRunner
Prediction with prediction entries
Prediction with model IDs
Verifying prediction consistency
Training with Model Saving
Enable save_artifacts=True to persist trained models:
from nirs4all.pipeline import PipelineRunner, PipelineConfigs
from nirs4all.data import DatasetConfigs
# Define pipeline
pipeline = [
MinMaxScaler(),
SNV(),
FirstDerivative(),
ShuffleSplit(n_splits=3, random_state=42),
{"model": PLSRegression(n_components=10)}
]
# Run with saving enabled
runner = PipelineRunner(save_artifacts=True, verbose=1)
predictions, _ = runner.run(
PipelineConfigs(pipeline, "MyPipeline"),
DatasetConfigs("sample_data/regression")
)
# Get best model info
best = predictions.top(n=1)[0]
model_id = best['id']
print(f"Model ID: {model_id}")
Prediction Methods
Method 1: Using Prediction Entry
# Create predictor
predictor = PipelineRunner()
# New data for prediction
new_data = DatasetConfigs({'X_test': 'path/to/new_data.csv'})
# Predict using the prediction entry directly
new_predictions, _ = predictor.predict(best, new_data)
print(f"Predictions shape: {new_predictions.shape}")
Method 2: Using Model ID
# Predict using just the model ID string
predictor = PipelineRunner()
new_predictions, _ = predictor.predict(model_id, new_data)
Prediction on NumPy Arrays
import numpy as np
# Create synthetic new data
X_new = np.random.randn(10, 2151) # Must match training feature count
# Create dataset from array
new_data = DatasetConfigs({'X_test': X_new})
predictions, _ = predictor.predict(model_id, new_data)
Model Storage Location
Models are saved in the workspace:
workspace/runs/<run_id>/
├── model.pkl # Trained model
├── preprocessor.pkl # Preprocessing pipeline
├── metadata.json # Configuration info
└── ...
U02: Export Bundle
Export portable model bundles for distribution.
What You’ll Learn
Creating portable
.n4abundlesBundle contents and structure
Importing bundles
Version compatibility
Creating a Bundle
from nirs4all.pipeline.bundle import export_bundle, import_bundle
# After training, export the best model
export_bundle(
prediction_entry=best,
output_path="my_model.n4a",
include_metadata=True
)
Bundle Contents
A .n4a bundle is a compressed archive containing:
File |
Description |
|---|---|
|
Trained model |
|
Full preprocessing pipeline |
|
Training info, metrics |
|
Python dependencies |
|
Bundle structure info |
Importing a Bundle
# Load bundle
predictor = import_bundle("my_model.n4a")
# Use for prediction
predictions = predictor.predict(X_new)
Bundle Portability
Bundles are designed to be portable:
✓ Self-contained (all preprocessing included)
✓ Version info for compatibility checks
✓ Can be shared via email, cloud storage, etc.
⚠ Requires compatible Python/library versions
U03: Workspace Management
Organize and manage your training artifacts.
What You’ll Learn
Workspace structure
Artifact cleanup
Library management
Configuration
Workspace Structure
workspace/
├── runs/ # Individual training runs
│ ├── run_20241231_123456/
│ │ ├── model.pkl
│ │ ├── predictions.json
│ │ └── charts/
│ └── ...
├── library/ # Curated model library
├── exports/ # Exported bundles
├── logs/ # Training logs
└── examples_output/ # Example outputs
Workspace Configuration
from nirs4all.workspace import WorkspaceConfig
# Configure workspace location
config = WorkspaceConfig(
root="./my_workspace",
max_runs=100, # Keep last 100 runs
auto_cleanup=True # Remove old runs
)
Artifact Cleanup
from nirs4all.workspace import cleanup_workspace
# Remove runs older than 30 days
cleanup_workspace(
max_age_days=30,
keep_best_n=10 # Always keep top 10 models
)
Model Library
Curate your best models:
from nirs4all.workspace import add_to_library, list_library
# Add model to library
add_to_library(
model_id=best['id'],
name="Production_PLS_v1",
tags=["production", "sugar_content"]
)
# List library
for model in list_library():
print(f"{model['name']}: {model['metrics']}")
U04: sklearn Integration
Use NIRS4ALL models with scikit-learn workflows.
What You’ll Learn
SklearnWrapper for pipeline compatibility
Using with sklearn utilities
Cross-validation with sklearn
Integration with sklearn pipelines
SklearnWrapper
Wrap trained models for sklearn compatibility:
from nirs4all.sklearn import SklearnWrapper
# Wrap a trained model
wrapper = SklearnWrapper(prediction_entry=best)
# Use like any sklearn estimator
predictions = wrapper.predict(X_new)
# Works with sklearn utilities
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_true, wrapper.predict(X_test))
sklearn Cross-Validation
from sklearn.model_selection import cross_val_score
# Create wrapper
wrapper = SklearnWrapper(prediction_entry=best)
# Use sklearn cross-validation
scores = cross_val_score(wrapper, X, y, cv=5, scoring='neg_mean_squared_error')
print(f"CV RMSE: {np.sqrt(-scores.mean()):.4f}")
sklearn Pipeline Integration
from sklearn.pipeline import Pipeline
# Create sklearn pipeline with NIRS4ALL model
sklearn_pipeline = Pipeline([
('scaler', StandardScaler()),
('model', SklearnWrapper(prediction_entry=best))
])
# Fit and predict
sklearn_pipeline.fit(X_train, y_train)
predictions = sklearn_pipeline.predict(X_test)
Grid Search with sklearn
from sklearn.model_selection import GridSearchCV
# Note: Hyperparameter tuning should be done in NIRS4ALL
# Use sklearn GridSearch only for final pipeline tuning
param_grid = {'model__scale_factor': [0.9, 1.0, 1.1]}
grid_search = GridSearchCV(sklearn_pipeline, param_grid, cv=5)
grid_search.fit(X, y)
Deployment Best Practices
1. Always Validate Before Deployment
# Load the saved model
predictor = PipelineRunner()
preds, _ = predictor.predict(model_id, test_dataset)
# Verify against training results
assert np.allclose(preds[:5], reference_preds[:5])
2. Document Model Requirements
# Include in bundle metadata
export_bundle(
prediction_entry=best,
output_path="model.n4a",
metadata={
"description": "Sugar content prediction for NIR spectra",
"input_shape": (None, 2151),
"wavelength_range": "1000-2500 nm",
"preprocessing": "SNV + FirstDerivative",
"training_date": "2024-12-31",
"training_rmse": 1.23
}
)
3. Version Your Models
# Use semantic versioning
add_to_library(
model_id=best['id'],
name="SugarModel_v2.1.0",
changelog="Improved preprocessing, added MSC"
)
4. Monitor Prediction Quality
# Log predictions for monitoring
import logging
logger = logging.getLogger("nirs4all.predictions")
logger.info(f"Prediction batch: n={len(X_new)}, mean={preds.mean():.2f}")
Running These Examples
cd examples
# Run all deployment examples
./run.sh -n "U0*.py" -c user
# Run save/load example
python user/06_deployment/U01_save_load_predict.py
Next Steps
After mastering deployment:
Explainability: Understand model decisions with SHAP
Transfer Learning: Adapt models to new instruments
Advanced Pipelines: Complex branching and merging