nirs4all.visualization.analysis.branch module
Branch Analysis - Statistical analysis and comparison for pipeline branches.
This module provides tools for analyzing and comparing performance across different pipeline branches.
Features: - Branch summary statistics (mean, std, min, max) - Statistical significance testing between branches - DataFrame and LaTeX export for publications - Nested branch analysis support
Example
>>> from nirs4all.visualization.analysis.branch import BranchAnalyzer
>>> analyzer = BranchAnalyzer(predictions)
>>> summary = analyzer.summary(metrics=['rmse', 'r2'])
>>> print(summary.to_markdown())
- class nirs4all.visualization.analysis.branch.BranchAnalyzer(predictions)[source]
Bases:
objectAnalyze and compare performance across pipeline branches.
Provides statistical analysis, hypothesis testing, and comparison tools for branched pipeline results.
- predictions
Predictions object containing prediction data.
- compare(branch1: str | int, branch2: str | int, metric: str = 'rmse', partition: str = 'test', test: str = 'ttest') Dict[str, Any][source]
Statistical comparison between two branches.
Performs hypothesis testing to determine if there’s a significant difference between two branches.
- Parameters:
branch1 – First branch name or ID.
branch2 – Second branch name or ID.
metric – Metric to compare (default: ‘rmse’).
partition – Partition for scores (default: ‘test’).
test – Statistical test (‘ttest’, ‘wilcoxon’, ‘mannwhitney’).
- Returns:
statistic: Test statistic
p_value: P-value
significant: Boolean at alpha=0.05
branch1_mean: Mean of branch1
branch2_mean: Mean of branch2
effect_size: Cohen’s d effect size
- Return type:
Dictionary with
- Raises:
ImportError – If scipy is not available.
ValueError – If branches not found or insufficient data.
- get_branch_names() List[str][source]
Get list of unique branch names.
- Returns:
List of branch names.
- pairwise_comparison(metric: str = 'rmse', partition: str = 'test', test: str = 'ttest') DataFrame[source]
Compute pairwise statistical comparisons between all branches.
- Parameters:
metric – Metric to compare (default: ‘rmse’).
partition – Partition for scores (default: ‘test’).
test – Statistical test to use.
- Returns:
DataFrame with p-values for all branch pairs.
- Raises:
ImportError – If pandas or scipy not available.
- rank_branches(metric: str = 'rmse', partition: str = 'test', ascending: bool | None = None) List[Dict[str, Any]][source]
Rank branches by mean performance.
- Parameters:
metric – Metric to rank by (default: ‘rmse’).
partition – Partition for scores (default: ‘test’).
ascending – Sort order. If None, auto-detect based on metric.
- Returns:
List of dicts with branch_name, mean, std, rank.
- summary(metrics: List[str] | None = None, partition: str = 'test', aggregate: str | None = None) BranchSummary[source]
Generate summary statistics for each branch.
Computes mean, std, min, max for each metric across branches.
- Parameters:
metrics – List of metrics to compute (default: [‘rmse’, ‘r2’]).
partition – Partition to compute metrics from (default: ‘test’).
aggregate – If provided, aggregate predictions by this column before computing statistics.
- Returns:
BranchSummary object with statistics.
- class nirs4all.visualization.analysis.branch.BranchSummary(data: List[Dict[str, Any]], metrics: List[str])[source]
Bases:
objectBranch summary statistics container with export capabilities.
Provides DataFrame-like access and export to markdown, LaTeX, and CSV.
- data
List of dictionaries with branch statistics.
- metrics
List of metrics computed.
- columns
Column names in order.
- __getitem__(key: int | str) Dict[str, Any][source]
Get branch by index or name.
- Parameters:
key – Integer index or branch name string.
- Returns:
Dictionary with branch statistics.
- to_csv(path: str, precision: int = 6) None[source]
Export to CSV file.
- Parameters:
path – Output file path.
precision – Decimal places for floating point values.
- to_dataframe() DataFrame[source]
Convert to pandas DataFrame.
- Returns:
pandas DataFrame with branch statistics.
- Raises:
ImportError – If pandas is not installed.
- to_dict() Dict[str, Dict[str, Any]][source]
Convert to dictionary keyed by branch name.
- Returns:
Dictionary mapping branch_name to statistics.
- to_latex(caption: str = 'Branch Performance Comparison', label: str = 'tab:branch_comparison', precision: int = 3, include_std: bool = True, mean_std_combined: bool = True) str[source]
Export as LaTeX table for publications.
- Parameters:
caption – Table caption.
label – LaTeX label for referencing.
precision – Decimal places for floating point values.
include_std – If True, include std columns.
mean_std_combined – If True, format as “mean ± std”.
- Returns:
LaTeX-formatted table string.