# In-Pipeline Charts Visualize data at any stage of your pipeline using built-in chart controllers. ## Overview nirs4all provides **in-pipeline chart controllers** that generate visualizations during pipeline execution. These charts help you: - Inspect data quality and distributions - Verify preprocessing effects - Monitor cross-validation fold balance - Track sample exclusions - Understand augmentation effects ## Quick Reference | Category | Keywords | Description | |----------|----------|-------------| | **Spectra** | `chart_2d`, `chart_3d` | 2D/3D spectral visualization | | **Distribution** | `spectral_distribution`, `spectra_envelope` | Train/test envelope comparison | | **Folds** | `fold_chart`, `fold_` | CV fold distribution | | **Targets** | `y_chart`, `chart_y` | Y-value histograms | | **Augmentation** | `augment_chart`, `augment_details_chart` | Augmentation effects | | **Exclusion** | `exclusion_chart` | Excluded sample visualization | --- ## Spectra Charts Visualize spectral data with color-coded target values. ### 2D Spectra (`chart_2d`) Displays all spectra as overlaid lines with color gradient based on y values. ```python pipeline = [ "chart_2d", # Raw spectra StandardNormalVariate(), "chart_2d", # After preprocessing ShuffleSplit(n_splits=3), PLSRegression(n_components=10), ] ``` ```{figure} ../../assets/chart_2d.png :align: center :width: 90% :alt: 2D Spectra Chart Raw spectral data with color gradient based on target values (wavenumber cm⁻¹ on x-axis). ``` ### After Multiple Preprocessing Steps Place `chart_2d` after a preprocessing chain to visualize the transformed spectra: ```python pipeline = [ StandardNormalVariate(), SavitzkyGolay(window_length=11, polyorder=2), FirstDerivative(), "chart_2d", # Chart after all preprocessing ShuffleSplit(n_splits=3), PLSRegression(n_components=10), ] ``` ```{figure} ../../assets/chart_2d_preprocessed.png :align: center :width: 90% :alt: 2D Preprocessed Spectra Chart Spectra after SNV + Savitzky-Golay + First Derivative preprocessing. ``` ### 3D Spectra (`chart_3d`) Adds target value as Y-axis for 3D visualization. ```python pipeline = [ "chart_3d", # 3D view with target gradient StandardNormalVariate(), "chart_3d", ShuffleSplit(n_splits=3), PLSRegression(n_components=10), ] ``` ```{figure} ../../assets/chart_3d.png :align: center :width: 90% :alt: 3D Spectra Chart 3D spectral visualization with target values as the Y-axis. ``` ### Options (Dict Syntax) ```python {"chart_2d": { "include_excluded": True, # Include excluded samples "highlight_excluded": True, # Highlight excluded with red dashed lines }} ``` --- ## Spectral Distribution Charts Show statistical envelopes comparing train vs test distributions. ### Keywords - `spectral_distribution` - `spectra_dist` - `spectra_envelope` ### Usage ```python pipeline = [ "spectral_distribution", # Before preprocessing StandardNormalVariate(), "spectral_distribution", # After preprocessing ShuffleSplit(n_splits=5), PLSRegression(n_components=10), ] ``` **Output:** Shows envelope with: - **Min-max range** (light fill) - **IQR range** (25th-75th percentile, darker fill) - **Mean line** ```{figure} ../../assets/spectral_distribution.png :align: center :width: 90% :alt: Spectral Distribution Chart Spectral distribution showing statistical envelopes with wavenumbers on x-axis. ``` When CV folds exist (> 1), displays a grid with one plot per fold. --- ## Fold Charts Visualize cross-validation fold distributions with color-coded samples. ### Keywords | Keyword | Description | |---------|-------------| | `fold_chart` / `chart_fold` | Color by y values | | `fold_` | Color by metadata column (e.g., `fold_breed`) | ### Usage ```python pipeline = [ ShuffleSplit(n_splits=5, test_size=0.2), "fold_chart", # Visualize fold distribution PLSRegression(n_components=10), ] ``` ```{figure} ../../assets/fold_chart.png :align: center :width: 90% :alt: Fold Chart Cross-validation fold distribution with train (T) and validation (V) sets. ``` - **T**: Train set for fold - **V**: Validation set for fold - **Colors**: Gradient based on y values (continuous) or discrete colors (classification) ### Color by Metadata Column ```python pipeline = [ ShuffleSplit(n_splits=5), "fold_breed", # Color by 'breed' metadata column PLSRegression(n_components=10), ] ``` --- ## Target (Y) Charts Visualize y-value distributions as histograms. ### Keywords - `y_chart` - `chart_y` ### Usage ```python pipeline = [ "y_chart", # Before splitting ShuffleSplit(n_splits=3, test_size=0.2), "y_chart", # After splitting (train vs test) PLSRegression(n_components=10), ] ``` ```{figure} ../../assets/y_chart.png :align: center :width: 90% :alt: Y Distribution Chart Target value (y) distribution histogram. ``` ### Options ```python {"y_chart": { "include_excluded": True, # Include excluded samples "highlight_excluded": True, # Show excluded as separate histogram "layout": "stacked", # 'standard', 'stacked', or 'staggered' }} ``` | Layout | Description | |--------|-------------| | `standard` | Overlapping histograms (default) | | `stacked` | Bars stacked on top of each other | | `staggered` | Side-by-side bars | ### CV Fold Mode When multiple CV folds exist, displays a grid with one histogram per fold showing train vs validation distribution. --- ## Augmentation Charts Visualize the effects of sample augmentation. ### Keywords | Keyword | Description | |---------|-------------| | `augment_chart` / `augmentation_chart` | Overlay of original vs augmented | | `augment_details_chart` / `augmentation_details_chart` | Grid showing each augmentation type | ### Usage ```python from nirs4all.operators.augmentation import GaussianNoise, SpectrumShift pipeline = [ {"sample_augmentation": [GaussianNoise(sigma=0.01), SpectrumShift(max_shift=5)]}, "augment_chart", # Overlay view "augment_details_chart", # Detailed grid view ShuffleSplit(n_splits=3), PLSRegression(n_components=10), ] ``` ### Overlay Chart (`augment_chart`) Shows original spectra with augmented versions overlaid: - Original samples in solid lines - Augmented samples in dashed lines with different colors ```{figure} ../../assets/augment_chart.png :align: center :width: 90% :alt: Augmentation Chart Overlay of original and augmented spectra. ``` ### Details Chart (`augment_details_chart`) Shows a grid with one subplot per augmentation type: - First panel: Raw (original) spectra - Subsequent panels: Each augmentation technique separately ```{figure} ../../assets/augment_details_chart.png :align: center :width: 90% :alt: Augmentation Details Chart Grid showing each augmentation type separately. ``` ### Multiple Augmentation Methods When using multiple augmentation transformers, the details chart shows each method: ```python from nirs4all.operators.augmentation import GaussianAdditiveNoise, WavelengthShift pipeline = [ {"sample_augmentation": { "transformers": [ GaussianAdditiveNoise(sigma=0.003), WavelengthShift(shift_range=(-2, 2)), ], "count": 2, }}, "augment_chart", "augment_details_chart", ShuffleSplit(n_splits=3), PLSRegression(n_components=10), ] ``` ```{figure} ../../assets/augment_multi_chart.png :align: center :width: 90% :alt: Multiple Augmentation Chart Overlay of original spectra with multiple augmentation methods applied. ``` ```{figure} ../../assets/augment_multi_details_chart.png :align: center :width: 90% :alt: Multiple Augmentation Details Chart Grid showing each augmentation method (GaussianNoise and WavelengthShift) separately. ``` --- ## Exclusion Charts Visualize included vs excluded samples using PCA projection. ### Keywords - `exclusion_chart` - `chart_exclusion` ### Usage ```python from nirs4all.operators.transforms import OutlierExclusion pipeline = [ StandardNormalVariate(), "chart_2d", # Before exclusion {"sample_filter": { "filters": [XOutlierFilter(method='isolation_forest', contamination=0.1)], }}, "exclusion_chart", # Visualize exclusions "chart_2d", # After exclusion ShuffleSplit(n_splits=3), PLSRegression(n_components=10), ] ``` The exclusion chart uses PCA projection to show included vs excluded samples: ```{figure} ../../assets/exclusion_chart.png :align: center :width: 90% :alt: Exclusion Chart PCA projection showing included (green) vs excluded (red) samples. ``` Use `chart_2d` alongside exclusion to see the spectral view of the filtered data: ```{figure} ../../assets/chart_2d_with_exclusion.png :align: center :width: 90% :alt: 2D Chart with Exclusion Spectral view after sample exclusion (outliers removed). ``` ### Options ```python {"exclusion_chart": { "color_by": "status", # 'status', 'y', or 'reason' "n_components": 2, # 2 or 3 for PCA dimensions "show_legend": True, }} ``` | Option | Values | Description | |--------|--------|-------------| | `color_by` | `'status'` | Color by included/excluded (default) | | | `'y'` | Color by target value | | | `'reason'` | Color by exclusion reason | | `n_components` | `2`, `3` | PCA dimensions for visualization | --- ## Controlling Chart Display ### Interactive Display ```python result = nirs4all.run( pipeline=pipeline, dataset="data/", plots_visible=True # Display plots interactively ) ``` ### Save as Artifacts Charts are automatically saved when using artifacts: ```python runner = PipelineRunner( save_artifacts=True, workspace_path="workspace/" ) predictions, _ = runner.run(pipeline, dataset) # Charts saved in: workspace/runs//artifacts/ ``` ### Headless Mode For automated pipelines without display: ```python result = nirs4all.run( pipeline=pipeline, dataset="data/", plots_visible=False # Save only, don't display ) ``` --- ## Complete Example ```python import nirs4all from sklearn.cross_decomposition import PLSRegression from sklearn.model_selection import ShuffleSplit from nirs4all.operators.transforms import StandardNormalVariate, FirstDerivative from nirs4all.operators.filters import XOutlierFilter from nirs4all.operators.transforms import GaussianAdditiveNoise pipeline = [ # Visualize raw data "chart_2d", "y_chart", "spectral_distribution", # Preprocessing StandardNormalVariate(), FirstDerivative(), "chart_2d", # After preprocessing # Outlier removal {"sample_filter": { "filters": [XOutlierFilter(method='isolation_forest', contamination=0.05)], }}, "exclusion_chart", # Augmentation {"sample_augmentation": { "transformers": [GaussianAdditiveNoise(sigma=0.01)], "count": 2, }}, "augment_chart", # Cross-validation ShuffleSplit(n_splits=5, test_size=0.2, random_state=42), "fold_chart", # Fold distribution "y_chart", # Y distribution per fold # Model {"model": PLSRegression(n_components=10)}, ] result = nirs4all.run( pipeline=pipeline, dataset="sample_data/regression", plots_visible=True, save_artifacts=True, ) ``` --- ## Best Practices 1. **Place strategically**: Add charts before and after key transformations 2. **Use sparingly in production**: Charts add execution time 3. **Check distributions**: Use `spectral_distribution` and `y_chart` to verify train/test similarity 4. **Monitor folds**: Use `fold_chart` to ensure balanced CV splits 5. **Track exclusions**: Always visualize after outlier removal with `exclusion_chart` 6. **Save artifacts**: Enable `save_artifacts=True` for reproducibility --- ## See Also - {doc}`prediction_charts` - Post-prediction visualization with PredictionAnalyzer - {doc}`shap` - SHAP-based model explanation - {doc}`pipeline_diagram` - Pipeline structure visualization - {doc}`/user_guide/preprocessing/overview` - Preprocessing techniques