nirs4all.visualization.analysis.branch module

Branch Analysis - Statistical analysis and comparison for pipeline branches.

This module provides tools for analyzing and comparing performance across different pipeline branches.

Features: - Branch summary statistics (mean, std, min, max) - Statistical significance testing between branches - DataFrame and LaTeX export for publications - Nested branch analysis support

Example

>>> from nirs4all.visualization.analysis.branch import BranchAnalyzer
>>> analyzer = BranchAnalyzer(predictions)
>>> summary = analyzer.summary(metrics=['rmse', 'r2'])
>>> print(summary.to_markdown())

class nirs4all.visualization.analysis.branch.BranchAnalyzer(predictions)[source]

Bases: object

Analyze and compare performance across pipeline branches.

Provides statistical analysis, hypothesis testing, and comparison tools for branched pipeline results.

predictions: Predictions object containing prediction data.

compare(branch1: str | int, branch2: str | int, metric: str = 'rmse', partition: str = 'test', test: str = 'ttest') → Dict[str, Any][source]

Statistical comparison between two branches.

Performs hypothesis testing to determine if there’s a significant difference between two branches.

Parameters:

branch1 – First branch name or ID.
branch2 – Second branch name or ID.
metric – Metric to compare (default: ‘rmse’).
partition – Partition for scores (default: ‘test’).
test – Statistical test (‘ttest’, ‘wilcoxon’, ‘mannwhitney’).

Returns:

statistic: Test statistic
p_value: P-value
significant: Boolean at alpha=0.05
branch1_mean: Mean of branch1
branch2_mean: Mean of branch2
effect_size: Cohen’s d effect size

Return type:

Dictionary with

Raises:

ImportError – If scipy is not available.
ValueError – If branches not found or insufficient data.

get_branch_ids() → List[int][source]

Get list of unique branch IDs.

Returns:: List of branch IDs.

get_branch_names() → List[str][source]

Get list of unique branch names.

Returns:: List of branch names.

pairwise_comparison(metric: str = 'rmse', partition: str = 'test', test: str = 'ttest') → DataFrame[source]

Compute pairwise statistical comparisons between all branches.

Parameters:

metric – Metric to compare (default: ‘rmse’).
partition – Partition for scores (default: ‘test’).
test – Statistical test to use.

Returns:

DataFrame with p-values for all branch pairs.

Raises:

ImportError – If pandas or scipy not available.

rank_branches(metric: str = 'rmse', partition: str = 'test', ascending: bool | None = None) → List[Dict[str, Any]][source]

Rank branches by mean performance.

Parameters:

metric – Metric to rank by (default: ‘rmse’).
partition – Partition for scores (default: ‘test’).
ascending – Sort order. If None, auto-detect based on metric.

Returns:

List of dicts with branch_name, mean, std, rank.

summary(metrics: List[str] | None = None, partition: str = 'test', aggregate: str | None = None) → BranchSummary[source]

Generate summary statistics for each branch.

Computes mean, std, min, max for each metric across branches.

Parameters:

metrics – List of metrics to compute (default: [‘rmse’, ‘r2’]).
partition – Partition to compute metrics from (default: ‘test’).
aggregate – If provided, aggregate predictions by this column before computing statistics.

Returns:

BranchSummary object with statistics.

class nirs4all.visualization.analysis.branch.BranchSummary(data: List[Dict[str, Any]], metrics: List[str])[source]

Bases: object

Branch summary statistics container with export capabilities.

Provides DataFrame-like access and export to markdown, LaTeX, and CSV.

data: List of dictionaries with branch statistics.

metrics: List of metrics computed.

columns: Column names in order.

__getitem__(key: int | str) → Dict[str, Any][source]

Get branch by index or name.

Parameters:: key – Integer index or branch name string.
Returns:: Dictionary with branch statistics.

__len__() → int[source]: Number of branches.

__repr__() → str[source]: String representation.

to_csv(path: str, precision: int = 6) → None[source]

Export to CSV file.

Parameters:

path – Output file path.
precision – Decimal places for floating point values.

to_dataframe() → DataFrame[source]

Convert to pandas DataFrame.

Returns:: pandas DataFrame with branch statistics.
Raises:: ImportError – If pandas is not installed.

to_dict() → Dict[str, Dict[str, Any]][source]

Convert to dictionary keyed by branch name.

Returns:: Dictionary mapping branch_name to statistics.

to_latex(caption: str = 'Branch Performance Comparison', label: str = 'tab:branch_comparison', precision: int = 3, include_std: bool = True, mean_std_combined: bool = True) → str[source]

Export as LaTeX table for publications.

Parameters:

caption – Table caption.
label – LaTeX label for referencing.
precision – Decimal places for floating point values.
include_std – If True, include std columns.
mean_std_combined – If True, format as “mean ± std”.

Returns:

LaTeX-formatted table string.

to_markdown(precision: int = 3, include_std: bool = True) → str[source]

Export as markdown table.

Parameters:

precision – Decimal places for floating point values.
include_std – If True, include std columns.

Returns:

Markdown-formatted table string.