nirs4all.visualization.analysis.shap module

SHAP Analyzer - Model explainability using SHAP values

This module provides SHAP-based explanations for NIRS models, including spectral importance visualizations that highlight which wavelengths contribute most to predictions.

class nirs4all.visualization.analysis.shap.ShapAnalyzer[source]

Bases: object

SHAP-based model explainability analyzer for NIRS models.

Provides explanations showing which wavelengths/features are most important for model predictions, with specialized visualizations for spectral data.

explain_model(model: Any, X: ndarray, y: ndarray | None = None, feature_names: List[str] | None = None, sample_indices: List[int] | None = None, task_type: str = 'regression', n_background: int = 100, explainer_type: str = 'auto', output_dir: str | None = None, visualizations: List[str] | None = None, bin_size=20, bin_stride=10, bin_aggregation='sum', plots_visible=True) → Dict[str, Any][source]

Explain model predictions using SHAP values.

Parameters:

model – Trained model to explain
X – Input features (samples x features)
y – Target values (optional, for reference)
feature_names – Names of features/wavelengths
sample_indices – Specific samples to explain (None = all)
task_type – ‘regression’ or ‘classification’
n_background – Number of background samples for KernelExplainer
explainer_type – ‘auto’, ‘tree’, ‘deep’, ‘kernel’, ‘linear’
output_dir – Directory to save visualizations
visualizations – List of viz types to generate
bin_size – Number of wavelengths per bin. Can be: - int: same for all visualizations - dict: {‘spectral’: 20, ‘waterfall’: 30, ‘beeswarm’: 50}
bin_stride – Step size between bins. Can be: - int: same for all visualizations - dict: {‘spectral’: 10, ‘waterfall’: 15, ‘beeswarm’: 25}
bin_aggregation – Aggregation method. Can be: - str: same for all (‘sum’, ‘sum_abs’, ‘mean’, ‘mean_abs’) - dict: {‘spectral’: ‘sum’, ‘waterfall’: ‘mean’, ‘beeswarm’: ‘sum_abs’}

Returns:

Dictionary with SHAP results

get_feature_importance(top_n: int | None = None) → Dict[str, float][source]

Get feature importance ranking based on mean absolute SHAP values.

Parameters:: top_n – Return only top N features (None = all)
Returns:: Dictionary mapping feature index to importance score

static load_results(input_path: str) → Dict[str, Any][source]: Load SHAP results from disk using the new serializer.

plot_beeswarm(feature_names: List[str] | None = None, output_path: str | None = None, max_display: int = 20, plots_visible: bool = True)[source]: Create SHAP beeswarm plot.

plot_beeswarm_binned(output_path: str | None = None, max_display: int = 20, plots_visible: bool = True)[source]

Create SHAP beeswarm plot with binned features.

Bins wavelengths/features according to bin_size and bin_stride parameters, then displays beeswarm plot for aggregated SHAP values.

plot_dependence(feature_idx: int, feature_names: List[str] | None = None, output_path: str | None = None, interaction_index: int | None = None, plots_visible: bool = True)[source]: Create SHAP dependence plot for a specific feature.

plot_force(sample_idx: int = 0, feature_names: List[str] | None = None, output_path: str | None = None, plots_visible: bool = True)[source]: Create SHAP force plot for a single sample.

plot_spectral_importance(feature_names: List[str] | None = None, output_path: str | None = None, figsize: Tuple[int, int] = (16, 10), plots_visible: bool = True)[source]

Create NIRS-specific spectral importance visualization with binned regions.

Shows important spectral regions (not individual wavelengths) by binning wavelengths and aggregating SHAP values. This is more robust and meaningful for NIRS analysis than point-by-point importance.

Uses self.bin_size, self.bin_stride, and self.bin_aggregation configured in explain_model().

plot_summary(feature_names: List[str] | None = None, output_path: str | None = None, max_display: int = 20, plots_visible: bool = True)[source]: Create SHAP summary plot showing feature importance.

plot_waterfall(sample_idx: int = 0, feature_names: List[str] | None = None, output_path: str | None = None, max_display: int = 20, plots_visible: bool = True)[source]: Create SHAP waterfall plot for a single sample.

plot_waterfall_binned(sample_idx: int = 0, output_path: str | None = None, max_display: int = 20, plots_visible: bool = True)[source]

Create SHAP waterfall plot with binned features for a single sample.

Bins wavelengths/features according to bin_size and bin_stride parameters, then displays waterfall plot for aggregated SHAP values.

save_results(results: Dict[str, Any], output_path: str)[source]: Save SHAP results to disk using the new serializer.