# Deployment Examples This section covers saving, loading, and deploying trained NIRS4ALL models for production use. ```{contents} On this page :local: :depth: 2 ``` ## Overview | Example | Topic | Difficulty | Duration | |---------|-------|------------|----------| | [U01](#u01-save-load-predict) | Save, Load, Predict | ★★☆☆☆ | ~4 min | | [U02](#u02-export-bundle) | Export Bundle | ★★☆☆☆ | ~3 min | | [U03](#u03-workspace-management) | Workspace Management | ★★☆☆☆ | ~3 min | | [U04](#u04-sklearn-integration) | sklearn Integration | ★★☆☆☆ | ~3 min | --- ## U01: Save, Load, and Predict **Save trained models and use them for prediction on new data.** [📄 View source code](https://github.com/GBeurier/nirs4all/blob/main/examples/user/06_deployment/U01_save_load_predict.py) ### What You'll Learn - Automatic model saving with PipelineRunner - Prediction with prediction entries - Prediction with model IDs - Verifying prediction consistency ### Training with Model Saving Enable `save_artifacts=True` to persist trained models: ```python from nirs4all.pipeline import PipelineRunner, PipelineConfigs from nirs4all.data import DatasetConfigs # Define pipeline pipeline = [ MinMaxScaler(), SNV(), FirstDerivative(), ShuffleSplit(n_splits=3, random_state=42), {"model": PLSRegression(n_components=10)} ] # Run with saving enabled runner = PipelineRunner(save_artifacts=True, verbose=1) predictions, _ = runner.run( PipelineConfigs(pipeline, "MyPipeline"), DatasetConfigs("sample_data/regression") ) # Get best model info best = predictions.top(n=1)[0] model_id = best['id'] print(f"Model ID: {model_id}") ``` ### Prediction Methods #### Method 1: Using Prediction Entry ```python # Create predictor predictor = PipelineRunner() # New data for prediction new_data = DatasetConfigs({'X_test': 'path/to/new_data.csv'}) # Predict using the prediction entry directly new_predictions, _ = predictor.predict(best, new_data) print(f"Predictions shape: {new_predictions.shape}") ``` #### Method 2: Using Model ID ```python # Predict using just the model ID string predictor = PipelineRunner() new_predictions, _ = predictor.predict(model_id, new_data) ``` ### Prediction on NumPy Arrays ```python import numpy as np # Create synthetic new data X_new = np.random.randn(10, 2151) # Must match training feature count # Create dataset from array new_data = DatasetConfigs({'X_test': X_new}) predictions, _ = predictor.predict(model_id, new_data) ``` ### Model Storage Location Models are saved in the workspace: ``` workspace/runs// ├── model.pkl # Trained model ├── preprocessor.pkl # Preprocessing pipeline ├── metadata.json # Configuration info └── ... ``` --- ## U02: Export Bundle **Export portable model bundles for distribution.** [📄 View source code](https://github.com/GBeurier/nirs4all/blob/main/examples/user/06_deployment/U02_export_bundle.py) ### What You'll Learn - Creating portable `.n4a` bundles - Bundle contents and structure - Importing bundles - Version compatibility ### Creating a Bundle ```python from nirs4all.pipeline.bundle import export_bundle, import_bundle # After training, export the best model export_bundle( prediction_entry=best, output_path="my_model.n4a", include_metadata=True ) ``` ### Bundle Contents A `.n4a` bundle is a compressed archive containing: | File | Description | |------|-------------| | `model.pkl` | Trained model | | `pipeline.pkl` | Full preprocessing pipeline | | `metadata.json` | Training info, metrics | | `requirements.txt` | Python dependencies | | `manifest.json` | Bundle structure info | ### Importing a Bundle ```python # Load bundle predictor = import_bundle("my_model.n4a") # Use for prediction predictions = predictor.predict(X_new) ``` ### Bundle Portability Bundles are designed to be portable: - ✓ Self-contained (all preprocessing included) - ✓ Version info for compatibility checks - ✓ Can be shared via email, cloud storage, etc. - ⚠ Requires compatible Python/library versions --- ## U03: Workspace Management **Organize and manage your training artifacts.** [📄 View source code](https://github.com/GBeurier/nirs4all/blob/main/examples/user/06_deployment/U03_workspace_management.py) ### What You'll Learn - Workspace structure - Artifact cleanup - Library management - Configuration ### Workspace Structure ``` workspace/ ├── runs/ # Individual training runs │ ├── run_20241231_123456/ │ │ ├── model.pkl │ │ ├── predictions.json │ │ └── charts/ │ └── ... ├── library/ # Curated model library ├── exports/ # Exported bundles ├── logs/ # Training logs └── examples_output/ # Example outputs ``` ### Workspace Configuration ```python from nirs4all.workspace import WorkspaceConfig # Configure workspace location config = WorkspaceConfig( root="./my_workspace", max_runs=100, # Keep last 100 runs auto_cleanup=True # Remove old runs ) ``` ### Artifact Cleanup ```python from nirs4all.workspace import cleanup_workspace # Remove runs older than 30 days cleanup_workspace( max_age_days=30, keep_best_n=10 # Always keep top 10 models ) ``` ### Model Library Curate your best models: ```python from nirs4all.workspace import add_to_library, list_library # Add model to library add_to_library( model_id=best['id'], name="Production_PLS_v1", tags=["production", "sugar_content"] ) # List library for model in list_library(): print(f"{model['name']}: {model['metrics']}") ``` --- ## U04: sklearn Integration **Use NIRS4ALL models with scikit-learn workflows.** [📄 View source code](https://github.com/GBeurier/nirs4all/blob/main/examples/user/06_deployment/U04_sklearn_integration.py) ### What You'll Learn - SklearnWrapper for pipeline compatibility - Using with sklearn utilities - Cross-validation with sklearn - Integration with sklearn pipelines ### SklearnWrapper Wrap trained models for sklearn compatibility: ```python from nirs4all.sklearn import SklearnWrapper # Wrap a trained model wrapper = SklearnWrapper(prediction_entry=best) # Use like any sklearn estimator predictions = wrapper.predict(X_new) # Works with sklearn utilities from sklearn.metrics import mean_squared_error mse = mean_squared_error(y_true, wrapper.predict(X_test)) ``` ### sklearn Cross-Validation ```python from sklearn.model_selection import cross_val_score # Create wrapper wrapper = SklearnWrapper(prediction_entry=best) # Use sklearn cross-validation scores = cross_val_score(wrapper, X, y, cv=5, scoring='neg_mean_squared_error') print(f"CV RMSE: {np.sqrt(-scores.mean()):.4f}") ``` ### sklearn Pipeline Integration ```python from sklearn.pipeline import Pipeline # Create sklearn pipeline with NIRS4ALL model sklearn_pipeline = Pipeline([ ('scaler', StandardScaler()), ('model', SklearnWrapper(prediction_entry=best)) ]) # Fit and predict sklearn_pipeline.fit(X_train, y_train) predictions = sklearn_pipeline.predict(X_test) ``` ### Grid Search with sklearn ```python from sklearn.model_selection import GridSearchCV # Note: Hyperparameter tuning should be done in NIRS4ALL # Use sklearn GridSearch only for final pipeline tuning param_grid = {'model__scale_factor': [0.9, 1.0, 1.1]} grid_search = GridSearchCV(sklearn_pipeline, param_grid, cv=5) grid_search.fit(X, y) ``` --- ## Deployment Best Practices ### 1. Always Validate Before Deployment ```python # Load the saved model predictor = PipelineRunner() preds, _ = predictor.predict(model_id, test_dataset) # Verify against training results assert np.allclose(preds[:5], reference_preds[:5]) ``` ### 2. Document Model Requirements ```python # Include in bundle metadata export_bundle( prediction_entry=best, output_path="model.n4a", metadata={ "description": "Sugar content prediction for NIR spectra", "input_shape": (None, 2151), "wavelength_range": "1000-2500 nm", "preprocessing": "SNV + FirstDerivative", "training_date": "2024-12-31", "training_rmse": 1.23 } ) ``` ### 3. Version Your Models ```python # Use semantic versioning add_to_library( model_id=best['id'], name="SugarModel_v2.1.0", changelog="Improved preprocessing, added MSC" ) ``` ### 4. Monitor Prediction Quality ```python # Log predictions for monitoring import logging logger = logging.getLogger("nirs4all.predictions") logger.info(f"Prediction batch: n={len(X_new)}, mean={preds.mean():.2f}") ``` --- ## Running These Examples ```bash cd examples # Run all deployment examples ./run.sh -n "U0*.py" -c user # Run save/load example python user/06_deployment/U01_save_load_predict.py ``` ## Next Steps After mastering deployment: - **Explainability**: Understand model decisions with SHAP - **Transfer Learning**: Adapt models to new instruments - **Advanced Pipelines**: Complex branching and merging