nirs4all.controllers.data.merge module

Merge Controller for branch combination and exit.

This controller is the CORE PRIMITIVE for all branch combination operations. It handles: 1. Exiting branch mode (always, unconditionally) 2. Collecting features and/or predictions from branches 3. Enforcing OOF (out-of-fold) safety when predictions are involved 4. Creating a unified dataset for subsequent steps

Phase 1 Implementation: - Controller registration and matching - Configuration parsing for all syntax variants - Branch validation utilities

Phase 3 Implementation: - Feature collection and concatenation - Shape mismatch handling

Phase 4 Implementation: - Model discovery from prediction store - OOF prediction reconstruction via TrainingSetReconstructor - Unsafe mode with prominent warnings - Simple prediction merge syntax

Phase 5 Implementation: - Per-branch model selection strategies (all, best, top_k, explicit) - Per-branch aggregation strategies (separate, mean, weighted_mean, proba_mean) - Model ranking by validation metrics - Advanced per-branch prediction configuration

Phase 6 Implementation: - Mixed merging (features from some branches, predictions from others) - Asymmetric branch detection and handling (models in some, not others) - Different feature dimensions per branch handling - Different model counts per branch handling - Improved error messages with resolution suggestions (MERGE-E010, MERGE-E011)

Phase 8 Implementation: - Prediction mode support for merge steps - Bundle export support - Full train/predict cycle

Phase 9 Implementation: - Source merge (merge_sources keyword) for multi-source datasets - Source merge strategies: concat, stack, dict - Source incompatibility handling: error, flatten, pad, truncate - Prediction merge (merge_predictions keyword) for late fusion - Error codes: MERGE-E024, MERGE-E030, MERGE-E031

Example

>>> # Simple feature merge
>>> pipeline = [
...     {"branch": [[SNV()], [MSC()]]},
...     {"merge": "features"},
...     PLSRegression(n_components=10)
... ]
>>>
>>> # Prediction stacking
>>> pipeline = [
...     {"branch": [[SNV(), PLS()], [MSC(), RF()]]},
...     {"merge": "predictions"},
...     {"model": Ridge()}
... ]
>>>
>>> # Source merge for multi-source datasets
>>> pipeline = [
...     SNV(),  # Applied to all sources
...     {"merge_sources": "concat"},  # Combine NIR + markers
...     {"model": PLS()}
... ]
>>>
>>> # Late fusion without branches
>>> pipeline = [
...     SNV(),
...     {"model": PLS()},
...     {"model": RF()},
...     {"merge_predictions": "all"},  # Combine predictions
...     {"model": Ridge()}
... ]

Keywords: “merge”, “merge_sources”, “merge_predictions” Priority: 5 (same as BranchController)

class nirs4all.controllers.data.merge.AsymmetricBranchAnalyzer(branch_contexts: List[Dict[str, Any]], prediction_store: Any | None, context: ExecutionContext)[source]

Bases: object

Utility class for analyzing branch asymmetry.

Detects and reports on asymmetry across branches, providing detailed information for error messages and resolution suggestions.

Phase 6 Features: - Detect model presence asymmetry (some have models, some don’t) - Detect model count asymmetry (different numbers of models) - Detect feature dimension asymmetry - Generate resolution suggestions for mixed merge

analyze_all() AsymmetryReport[source]

Analyze all branches for asymmetry.

Returns:

AsymmetryReport with comprehensive asymmetry analysis.

analyze_branch(branch_idx: int) BranchAnalysisResult[source]

Analyze a single branch for its characteristics.

Parameters:

branch_idx – Branch index to analyze.

Returns:

BranchAnalysisResult with branch characteristics.

suggest_mixed_merge() str | None[source]

Suggest a mixed merge configuration for asymmetric branches.

Returns:

Suggested merge configuration string, or None if not applicable.

class nirs4all.controllers.data.merge.AsymmetryReport(is_asymmetric: bool, has_model_asymmetry: bool, has_model_count_asymmetry: bool, has_feature_dim_asymmetry: bool, branches_with_models: List[int], branches_without_models: List[int], model_counts: Dict[int, int], feature_dims: Dict[int, int | None], summary: str)[source]

Bases: object

Report on asymmetry across branches.

Provides detailed analysis of how branches differ, helping users understand and resolve merge configuration issues.

is_asymmetric

Whether any asymmetry was detected.

Type:

bool

has_model_asymmetry

Some branches have models, others don’t.

Type:

bool

has_model_count_asymmetry

Branches have different model counts.

Type:

bool

has_feature_dim_asymmetry

Branches have different feature dimensions.

Type:

bool

branches_with_models

List of branch IDs that have models.

Type:

List[int]

branches_without_models

List of branch IDs without models.

Type:

List[int]

model_counts

Dict mapping branch_id to model count.

Type:

Dict[int, int]

feature_dims

Dict mapping branch_id to feature dimension.

Type:

Dict[int, int | None]

summary

Human-readable summary of asymmetry.

Type:

str

branches_with_models: List[int]
branches_without_models: List[int]
feature_dims: Dict[int, int | None]
has_feature_dim_asymmetry: bool
has_model_asymmetry: bool
has_model_count_asymmetry: bool
is_asymmetric: bool
model_counts: Dict[int, int]
summary: str
class nirs4all.controllers.data.merge.BranchAnalysisResult(branch_id: int, branch_name: str | None, has_models: bool, model_names: List[str], model_count: int, feature_dim: int | None, has_features: bool)[source]

Bases: object

Result of analyzing branch asymmetry.

branch_id

Numeric identifier of the branch.

Type:

int

branch_name

Name of the branch (if named).

Type:

str | None

has_models

Whether the branch contains trained models.

Type:

bool

model_names

List of model names in this branch.

Type:

List[str]

model_count

Number of models in this branch.

Type:

int

feature_dim

Feature dimension from this branch (or None if not extracted).

Type:

int | None

has_features

Whether the branch has feature snapshots.

Type:

bool

branch_id: int
branch_name: str | None
feature_dim: int | None
has_features: bool
has_models: bool
model_count: int
model_names: List[str]
class nirs4all.controllers.data.merge.DisjointBranchAnalysis(is_disjoint: bool, branch_type: BranchType | None, branch_sample_counts: Dict[int, int], branch_sample_indices: Dict[int, List[int]], total_samples: int, partition_column: str | None = None)[source]

Bases: object

Analysis result for disjoint sample branches.

is_disjoint

Whether branches have disjoint sample sets.

Type:

bool

branch_type

Type of disjoint branching (metadata_partitioner, sample_partitioner).

Type:

nirs4all.operators.data.merge.BranchType | None

branch_sample_counts

Dict mapping branch_id to sample count.

Type:

Dict[int, int]

branch_sample_indices

Dict mapping branch_id to list of sample indices.

Type:

Dict[int, List[int]]

total_samples

Total unique samples across all branches.

Type:

int

partition_column

Metadata column used for partitioning (if metadata_partitioner).

Type:

str | None

branch_sample_counts: Dict[int, int]
branch_sample_indices: Dict[int, List[int]]
branch_type: BranchType | None
is_disjoint: bool
partition_column: str | None = None
total_samples: int
class nirs4all.controllers.data.merge.DisjointBranchInfo(n_samples: int, sample_ids: ~typing.List[int], n_models_original: int = 0, n_models_selected: int = 0, selected_models: ~typing.List[~typing.Dict[str, ~typing.Any]] = <factory>, dropped_models: ~typing.List[~typing.Dict[str, ~typing.Any]] = <factory>)[source]

Bases: object

Information about a single branch in a disjoint merge.

Captures per-branch statistics and model selection details for comprehensive merge metadata.

n_samples

Number of samples in this branch partition.

Type:

int

sample_ids

List of sample indices belonging to this branch.

Type:

List[int]

n_models_original

Original number of models in the branch.

Type:

int

n_models_selected

Number of models selected for merge.

Type:

int

selected_models

List of selected model details with name, score, column.

Type:

List[Dict[str, Any]]

dropped_models

List of dropped model details with name, score.

Type:

List[Dict[str, Any]]

dropped_models: List[Dict[str, Any]]
n_models_original: int = 0
n_models_selected: int = 0
n_samples: int
sample_ids: List[int]
selected_models: List[Dict[str, Any]]
to_dict() Dict[str, Any][source]

Convert to dictionary for serialization.

class nirs4all.controllers.data.merge.DisjointMergeMetadata(merge_type: str = 'disjoint_samples', n_columns: int = 0, select_by: str = 'mse', branches: Dict[str, ~nirs4all.operators.data.merge.DisjointBranchInfo]=<factory>, column_mapping: Dict[int, ~typing.Dict[str, str]]=<factory>, is_heterogeneous: bool = False, feature_dim: int | None = None)[source]

Bases: object

Complete metadata for a disjoint sample branch merge.

This dataclass captures all information about a disjoint merge operation for logging, debugging, and downstream use. Matches the specification in docs/reports/disjoint_sample_branch_merging.md Section 6.

merge_type

Always “disjoint_samples” for disjoint merges.

Type:

str

n_columns

Number of output columns (prediction features).

Type:

int

select_by

Selection criterion used (mse, rmse, mae, r2, order).

Type:

str

branches

Per-branch information as Dict[branch_name, DisjointBranchInfo].

Type:

Dict[str, nirs4all.operators.data.merge.DisjointBranchInfo]

column_mapping

Maps output column index to per-branch model names. Example: {0: {“red”: “RF”, “blue”: “PLS”}, 1: {“red”: “PLS”, “blue”: “RF”}}

Type:

Dict[int, Dict[str, str]]

is_heterogeneous

True if different branches have different models per column.

Type:

bool

feature_dim

Feature dimension (for feature merges).

Type:

int | None

Example

>>> metadata = DisjointMergeMetadata(
...     merge_type="disjoint_samples",
...     n_columns=2,
...     select_by="mse",
...     branches={
...         "red": DisjointBranchInfo(n_samples=50, sample_ids=[...], ...),
...         "blue": DisjointBranchInfo(n_samples=100, sample_ids=[...], ...),
...     },
...     column_mapping={
...         0: {"red": "RF", "blue": "PLS"},
...         1: {"red": "PLS", "blue": "RF"},
...     },
... )
branches: Dict[str, DisjointBranchInfo]
column_mapping: Dict[int, Dict[str, str]]
feature_dim: int | None = None
classmethod from_dict(data: Dict[str, Any]) DisjointMergeMetadata[source]

Create from dictionary representation.

Parameters:

data – Dictionary with metadata fields.

Returns:

DisjointMergeMetadata instance.

get_branch_summary() str[source]

Get a summary string for logging.

Returns:

Human-readable summary of branch statistics.

get_column_mapping_summary() List[str][source]

Get column mapping summary for logging.

Returns:

List of strings describing each column’s model mapping.

is_heterogeneous: bool = False
log_summary(logger_func) None[source]

Log merge summary using provided logger function.

Parameters:

logger_func – Logger function (e.g., logger.info)

log_warnings(logger_warning_func) None[source]

Log warnings for heterogeneous columns and dropped models.

Parameters:

logger_warning_func – Logger warning function (e.g., logger.warning)

merge_type: str = 'disjoint_samples'
n_columns: int = 0
select_by: str = 'mse'
to_dict() Dict[str, Any][source]

Convert to dictionary for serialization/logging.

Returns:

Dictionary representation suitable for YAML/JSON serialization.

class nirs4all.controllers.data.merge.DisjointMergeResult(merged_array: ndarray, n_columns: int, select_by: str, branch_info: Dict[str, Any], column_mapping: Dict[int, Dict[str, str]])[source]

Bases: object

Result of disjoint sample branch merge.

merged_array

The merged prediction or feature array (n_total_samples, n_columns).

Type:

numpy.ndarray

n_columns

Number of output columns.

Type:

int

select_by

Selection criterion used.

Type:

str

branch_info

Per-branch information about selection and merging.

Type:

Dict[str, Any]

column_mapping

Mapping of output columns to per-branch models.

Type:

Dict[int, Dict[str, str]]

branch_info: Dict[str, Any]
column_mapping: Dict[int, Dict[str, str]]
merged_array: ndarray
n_columns: int
select_by: str
class nirs4all.controllers.data.merge.MergeConfigParser[source]

Bases: object

Parser for merge step configurations.

Handles all syntax variants and normalizes them to MergeConfig.

Supported syntaxes:
  • Simple string: “features”, “predictions”, “all”

  • Dict with keys: {“features”: …, “predictions”: …, …}

  • Legacy format: {“predictions”: [0, 1]}

  • Per-branch format: {“predictions”: [{“branch”: 0, …}]}

classmethod parse(raw_config: Any) MergeConfig[source]

Parse raw merge configuration into MergeConfig.

Parameters:

raw_config – The value from {“merge”: raw_config}

Returns:

Normalized MergeConfig instance.

Raises:

ValueError – If configuration format is invalid.

class nirs4all.controllers.data.merge.MergeController[source]

Bases: OperatorController

Controller for merging branch outputs and exiting branch mode.

This controller is the CORE PRIMITIVE for branch combination. It: 1. Collects features and/or predictions from specified branches 2. Performs horizontal concatenation of features 3. Performs OOF reconstruction for predictions (mandatory unless unsafe=True) 4. Creates a unified “merged” processing in the dataset 5. ALWAYS clears branch contexts and exits branch mode

Supported Keywords:
  • “merge”: Branch merging (features/predictions/both)

  • “merge_sources”: Source merging (multi-source datasets) [Phase 9]

  • “merge_predictions”: Prediction-only late fusion [Phase 9]

OOF Safety:

When predictions are merged, OOF reconstruction is MANDATORY by default. This prevents data leakage when the merged output is used for training. Set unsafe=True to disable OOF (generates prominent warnings).

Relationship to MetaModel:

MetaModel internally uses MergeController for data preparation, then trains the meta-learner. Users can achieve the same result with:

{“merge”: “predictions”}, {“model”: Ridge()}

which is equivalent to:

{“model”: MetaModel(Ridge())}

priority

Controller priority (5 = same as BranchController).

Type:

int

SUPPORTED_KEYWORDS

Set of keywords this controller handles.

SUPPORTED_KEYWORDS = {'merge', 'merge_predictions', 'merge_sources'}
classmethod build_config_from_meta_model(meta_operator: Any, context: ExecutionContext, branch_contexts: List[Dict[str, Any]] | None = None) MergeConfig[source]

Build MergeConfig from MetaModel operator parameters.

Translates MetaModel configuration to an equivalent MergeConfig for use with merge_branches(). This enables MetaModel to delegate to the centralized merge logic.

This is a helper for Phase 7: MetaModel Refactoring.

Parameters:
  • meta_operator – MetaModel operator instance with configuration.

  • context – Execution context with branch info.

  • branch_contexts – Optional branch contexts for branch resolution.

Returns:

MergeConfig equivalent to the MetaModel’s configuration.

Example

>>> config = MergeController.build_config_from_meta_model(
...     meta_operator=meta_model,
...     context=context,
... )
>>> merged_X, info = MergeController.merge_branches(
...     dataset=dataset,
...     context=context,
...     config=config,
...     prediction_store=prediction_store,
... )
execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, StepOutput][source]

Execute the merge step with keyword dispatch.

Dispatches to appropriate handler based on the step keyword: - “merge”: Branch merging (features/predictions/both) - “merge_sources”: Source merging (Phase 9, not yet implemented) - “merge_predictions”: Prediction-only late fusion (Phase 9, not yet implemented)

Phase 2 implementation provides: - Configuration parsing - Branch validation - Branch mode exit - Keyword dispatch framework

Subsequent phases will add: - Feature collection (Phase 3) - Prediction OOF reconstruction (Phase 4) - Per-branch selection/aggregation (Phase 5) - Source merge implementation (Phase 9)

Parameters:
  • step_info – Parsed step containing merge configuration

  • dataset – Dataset to operate on

  • context – Pipeline execution context

  • runtime_context – Runtime infrastructure context

  • source – Data source index

  • mode – Execution mode (“train” or “predict”)

  • loaded_binaries – Pre-loaded binary objects for prediction mode

  • prediction_store – External prediction store for model predictions

Returns:

Tuple of (updated_context, StepOutput)

Raises:
  • ValueError – If not in branch mode or configuration is invalid.

  • NotImplementedError – If merge_sources or merge_predictions called (Phase 9).

classmethod matches(step: Any, operator: Any, keyword: str) bool[source]

Check if the step matches the merge controller.

Parameters:
  • step – Original step configuration

  • operator – Deserialized operator

  • keyword – Step keyword

Returns:

True if keyword is one of the supported merge keywords.

classmethod merge_branches(dataset: SpectroDataset, context: ExecutionContext, config: MergeConfig, prediction_store: Any | None = None, mode: str = 'train') Tuple[ndarray, Dict[str, Any]][source]

Static method for programmatic merge (used by MetaModel).

This class method allows MetaModelController to delegate to merge logic without going through the full step execution machinery. It provides the core branch merging functionality without modifying the context or requiring a step_info object.

This is the key integration point for Phase 7: MetaModel Refactoring.

Parameters:
  • dataset – SpectroDataset with sample data.

  • context – Execution context with branch_contexts and state.

  • config – MergeConfig specifying what to merge.

  • prediction_store – Prediction storage for model predictions. Required if config.collect_predictions is True.

  • mode – Execution mode (“train” or “predict”).

Returns:

  • merged_features: 2D numpy array (n_samples, n_features)

  • info_dict: Dictionary with merge metadata including:
    • ”merged_shape”: Shape of merged features

    • ”feature_branches_used”: List of branch indices for features

    • ”prediction_branches_used”: List of branch indices for predictions

    • ”models_used”: List of model names (if predictions)

    • ”oof_reconstruction”: Whether OOF was used (if predictions)

    • ”unsafe_merge”: True if unsafe mode was used

Return type:

Tuple of (merged_features, info_dict) where

Raises:
  • ValueError – If not in branch mode or config is invalid.

  • ValueError – If prediction_store is None but predictions requested.

Example

>>> from nirs4all.controllers.data.merge import MergeController
>>> from nirs4all.operators.data.merge import MergeConfig
>>>
>>> # Called from MetaModelController
>>> config = MergeConfig(
...     collect_predictions=True,
...     prediction_branches="all",
... )
>>> merged_X, info = MergeController.merge_branches(
...     dataset=dataset,
...     context=context,
...     config=config,
...     prediction_store=prediction_store,
... )
>>> meta_model.fit(merged_X, y)

Note

Unlike execute(), this method does NOT: - Exit branch mode (caller must handle this if needed) - Modify the context - Add merged features to the dataset - Return a StepOutput

It simply performs the merge computation and returns the result.

priority: int = 5
classmethod supports_prediction_mode() bool[source]

Merge controller should execute in prediction mode.

classmethod use_multi_source() bool[source]

Merge controller supports multi-source datasets.

class nirs4all.controllers.data.merge.ModelSelector(prediction_store: Predictions, context: ExecutionContext)[source]

Bases: object

Utility class for selecting models based on validation metrics.

Handles model ranking and selection strategies (all, best, top_k, explicit) for per-branch prediction collection and stacking operations.

This class is shared between MergeController and MetaModelController to avoid code duplication.

prediction_store

Prediction storage instance.

context

Execution context.

LOWER_IS_BETTER_METRICS

Set of metrics where lower values are better.

LOWER_IS_BETTER_METRICS = {'log_loss', 'mae', 'mape', 'mse', 'nmae', 'nmse', 'nrmse', 'rmse'}
get_model_scores(model_names: List[str], metric: str, branch_id: int) Dict[str, float][source]

Get validation scores for multiple models.

Used for weighted aggregation.

Parameters:
  • model_names – List of model names.

  • metric – Metric name.

  • branch_id – Branch identifier.

Returns:

Dictionary mapping model name to score.

select_models(available_models: List[str], config: BranchPredictionConfig, branch_id: int) List[str][source]

Select models from available models based on config.

Parameters:
  • available_models – List of available model names in the branch.

  • config – Per-branch prediction configuration.

  • branch_id – Branch identifier.

Returns:

List of selected model names.

Raises:

ValueError – If explicit model selection references unknown models.

select_models_global(available_models: List[str], selection: Any, metric: str | None = None) List[str][source]

Select models globally (without branch context).

This is used by MetaModelController for pipelines without branches.

Parameters:
  • available_models – List of available model names.

  • selection – Selection configuration: - “all”: Use all models - “best”: Use best model - {“top_k”: N}: Use top N models - [“model1”, “model2”]: Explicit list

  • metric – Optional metric for ranking.

Returns:

List of selected model names.

class nirs4all.controllers.data.merge.PredictionAggregator[source]

Bases: object

Utility class for aggregating predictions from multiple models.

Handles aggregation strategies (separate, mean, weighted_mean, proba_mean) for combining predictions within a branch or across models.

This class is shared between MergeController and MetaModelController to avoid code duplication.

All methods are static as no instance state is needed.

LOWER_IS_BETTER_METRICS = {'log_loss', 'mae', 'mape', 'mse', 'nmae', 'nmse', 'nrmse', 'rmse'}
static aggregate(predictions: Dict[str, ndarray], strategy: AggregationStrategy, model_scores: Dict[str, float] | None = None, proba: bool = False, metric: str | None = None) ndarray[source]

Aggregate predictions from multiple models.

Parameters:
  • predictions – Dictionary mapping model names to prediction arrays. Each array has shape (n_samples,) for regression or (n_samples, n_classes) for classification probabilities.

  • strategy – Aggregation strategy to use.

  • model_scores – Optional dictionary of model scores for weighted averaging.

  • proba – Whether predictions are class probabilities.

  • metric – Metric name (for determining weight direction).

Returns:

  • SEPARATE: (n_samples, n_models)

  • MEAN/WEIGHTED_MEAN: (n_samples, 1)

  • PROBA_MEAN: (n_samples, n_classes)

Return type:

Aggregated predictions with shape

Raises:

ValueError – If predictions dict is empty.

static aggregate_folds(fold_predictions: List[ndarray], fold_scores: List[float] | None = None, strategy: str = 'mean', metric: str | None = None) ndarray[source]

Aggregate predictions across CV folds.

Useful for combining test predictions from different folds.

Parameters:
  • fold_predictions – List of prediction arrays, one per fold.

  • fold_scores – Optional list of validation scores per fold.

  • strategy – Aggregation strategy (“mean”, “weighted_mean”, “best”).

  • metric – Metric name for weighted aggregation.

Returns:

Aggregated predictions.

class nirs4all.controllers.data.merge.SourceMergeConfig(strategy: str = 'concat', sources: str | List[int | str] = 'all', on_incompatible: str = 'error', output_name: str = 'merged', preserve_source_info: bool = True)[source]

Bases: object

Configuration for merging multi-source dataset features.

This dataclass provides configuration for the merge_sources keyword, which combines features from multiple data sources (e.g., NIR, markers, Raman) into a unified feature space.

Unlike branch merging (merge), source merging operates on the data provenance dimension—combining features that originated from different sensors, instruments, or data modalities.

strategy

How to combine source features. - “concat” (default): Horizontal concatenation (2D result) - “stack”: Stack along new axis (3D result, requires uniform shapes) - “dict”: Keep as structured dictionary (for multi-input models)

Type:

str

sources

Which sources to include. - “all” (default): Include all available sources - List of source indices: [0, 1] for specific sources - List of source names: [“NIR”, “markers”] for named sources

Type:

str | List[int | str]

on_incompatible

How to handle incompatible shapes (for stack strategy). - “error” (default): Raise error if shapes don’t match - “flatten”: Fall back to 2D concat - “pad”: Pad shorter with zeros - “truncate”: Truncate longer to match shortest

Type:

str

output_name

Name for the merged output source (default: “merged”).

Type:

str

preserve_source_info

Whether to store source metadata for debugging.

Type:

bool

Example

>>> # Simple concatenation (default)
>>> {"merge_sources": "concat"}
>>>
>>> # Stack for 3D models (requires same feature count per source)
>>> {"merge_sources": {"strategy": "stack"}}
>>>
>>> # Selective sources with fallback on shape mismatch
>>> {"merge_sources": {
...     "strategy": "stack",
...     "sources": ["NIR", "MIR"],
...     "on_incompatible": "flatten"
... }}
>>>
>>> # Dict output for multi-head models
>>> {"merge_sources": {"strategy": "dict"}}
__post_init__()[source]

Validate configuration after initialization.

classmethod from_dict(data: Dict[str, Any]) SourceMergeConfig[source]

Create config from dictionary.

Parameters:

data – Dictionary representation.

Returns:

SourceMergeConfig instance.

get_incompatible_strategy() SourceIncompatibleStrategy[source]

Get the incompatible handling strategy as an enum.

Returns:

SourceIncompatibleStrategy enum value.

get_source_indices(available_sources: List[str]) List[int][source]

Resolve source specification to indices.

Parameters:

available_sources – List of available source names.

Returns:

List of source indices to include.

Raises:

ValueError – If a specified source is not found.

get_strategy() SourceMergeStrategy[source]

Get the merge strategy as an enum.

Returns:

SourceMergeStrategy enum value.

on_incompatible: str = 'error'
output_name: str = 'merged'
preserve_source_info: bool = True
sources: str | List[int | str] = 'all'
strategy: str = 'concat'
to_dict() Dict[str, Any][source]

Serialize configuration to dictionary.

Returns:

Dictionary representation for manifest storage.

nirs4all.controllers.data.merge.detect_disjoint_branches(branch_contexts: List[Dict[str, Any]]) DisjointBranchAnalysis[source]

Detect if branches represent disjoint sample partitions.

Examines branch contexts to determine if they were created by a partitioning controller (metadata_partitioner or sample_partitioner).

Parameters:

branch_contexts – List of branch context dictionaries.

Returns:

DisjointBranchAnalysis with detection results.

nirs4all.controllers.data.merge.is_disjoint_branch(branch_context: Dict[str, Any]) bool[source]

Check if a branch context indicates disjoint sample branching.

A disjoint branch has a ‘sample_partition’ or ‘partition_info’ key that indicates samples were partitioned (not copied) across branches.

Parameters:

branch_context – A single branch context dictionary.

Returns:

True if this branch is part of a disjoint sample partition.