nirs4all.visualization.analysis.branch module

Branch Analysis - Statistical analysis and comparison for pipeline branches.

This module provides tools for analyzing and comparing performance across different pipeline branches.

Features: - Branch summary statistics (mean, std, min, max) - Statistical significance testing between branches - DataFrame and LaTeX export for publications - Nested branch analysis support

Example

>>> from nirs4all.visualization.analysis.branch import BranchAnalyzer
>>> analyzer = BranchAnalyzer(predictions)
>>> summary = analyzer.summary(metrics=['rmse', 'r2'])
>>> print(summary.to_markdown())
class nirs4all.visualization.analysis.branch.BranchAnalyzer(predictions)[source]

Bases: object

Analyze and compare performance across pipeline branches.

Provides statistical analysis, hypothesis testing, and comparison tools for branched pipeline results.

predictions

Predictions object containing prediction data.

compare(branch1: str | int, branch2: str | int, metric: str = 'rmse', partition: str = 'test', test: str = 'ttest') Dict[str, Any][source]

Statistical comparison between two branches.

Performs hypothesis testing to determine if there’s a significant difference between two branches.

Parameters:
  • branch1 – First branch name or ID.

  • branch2 – Second branch name or ID.

  • metric – Metric to compare (default: ‘rmse’).

  • partition – Partition for scores (default: ‘test’).

  • test – Statistical test (‘ttest’, ‘wilcoxon’, ‘mannwhitney’).

Returns:

  • statistic: Test statistic

  • p_value: P-value

  • significant: Boolean at alpha=0.05

  • branch1_mean: Mean of branch1

  • branch2_mean: Mean of branch2

  • effect_size: Cohen’s d effect size

Return type:

Dictionary with

Raises:
get_branch_ids() List[int][source]

Get list of unique branch IDs.

Returns:

List of branch IDs.

get_branch_names() List[str][source]

Get list of unique branch names.

Returns:

List of branch names.

pairwise_comparison(metric: str = 'rmse', partition: str = 'test', test: str = 'ttest') DataFrame[source]

Compute pairwise statistical comparisons between all branches.

Parameters:
  • metric – Metric to compare (default: ‘rmse’).

  • partition – Partition for scores (default: ‘test’).

  • test – Statistical test to use.

Returns:

DataFrame with p-values for all branch pairs.

Raises:

ImportError – If pandas or scipy not available.

rank_branches(metric: str = 'rmse', partition: str = 'test', ascending: bool | None = None) List[Dict[str, Any]][source]

Rank branches by mean performance.

Parameters:
  • metric – Metric to rank by (default: ‘rmse’).

  • partition – Partition for scores (default: ‘test’).

  • ascending – Sort order. If None, auto-detect based on metric.

Returns:

List of dicts with branch_name, mean, std, rank.

summary(metrics: List[str] | None = None, partition: str = 'test', aggregate: str | None = None) BranchSummary[source]

Generate summary statistics for each branch.

Computes mean, std, min, max for each metric across branches.

Parameters:
  • metrics – List of metrics to compute (default: [‘rmse’, ‘r2’]).

  • partition – Partition to compute metrics from (default: ‘test’).

  • aggregate – If provided, aggregate predictions by this column before computing statistics.

Returns:

BranchSummary object with statistics.

class nirs4all.visualization.analysis.branch.BranchSummary(data: List[Dict[str, Any]], metrics: List[str])[source]

Bases: object

Branch summary statistics container with export capabilities.

Provides DataFrame-like access and export to markdown, LaTeX, and CSV.

data

List of dictionaries with branch statistics.

metrics

List of metrics computed.

columns

Column names in order.

__getitem__(key: int | str) Dict[str, Any][source]

Get branch by index or name.

Parameters:

key – Integer index or branch name string.

Returns:

Dictionary with branch statistics.

__len__() int[source]

Number of branches.

__repr__() str[source]

String representation.

to_csv(path: str, precision: int = 6) None[source]

Export to CSV file.

Parameters:
  • path – Output file path.

  • precision – Decimal places for floating point values.

to_dataframe() DataFrame[source]

Convert to pandas DataFrame.

Returns:

pandas DataFrame with branch statistics.

Raises:

ImportError – If pandas is not installed.

to_dict() Dict[str, Dict[str, Any]][source]

Convert to dictionary keyed by branch name.

Returns:

Dictionary mapping branch_name to statistics.

to_latex(caption: str = 'Branch Performance Comparison', label: str = 'tab:branch_comparison', precision: int = 3, include_std: bool = True, mean_std_combined: bool = True) str[source]

Export as LaTeX table for publications.

Parameters:
  • caption – Table caption.

  • label – LaTeX label for referencing.

  • precision – Decimal places for floating point values.

  • include_std – If True, include std columns.

  • mean_std_combined – If True, format as “mean ± std”.

Returns:

LaTeX-formatted table string.

to_markdown(precision: int = 3, include_std: bool = True) str[source]

Export as markdown table.

Parameters:
  • precision – Decimal places for floating point values.

  • include_std – If True, include std columns.

Returns:

Markdown-formatted table string.