nirs4all.controllers.data.branch module

Branch Controller for pipeline branching.

This controller enables splitting a pipeline into multiple parallel sub-pipelines (“branches”), each with its own preprocessing context (X transformations, Y processing), while sharing common upstream state (splits, initial preprocessing).

Steps declared after a branch block execute on each branch independently.

V3 improvements: - Uses trace_recorder.enter_branch() / exit_branch() for automatic branch path tracking - Records each branch substep individually in the execution trace - Builds proper operator chains for artifact identification

Example

>>> pipeline = [
...     ShuffleSplit(n_splits=5),
...     {"branch": [
...         [SNV(), PCA(n_components=10)],
...         [MSC(), FirstDerivative()],
...     ]},
...     PLSRegression(n_components=5),  # Runs on BOTH branches
... ]

Generator syntax is also supported:

>>> pipeline = [
...     ShuffleSplit(n_splits=3),
...     {"branch": {"_or_": [SNV(), MSC(), FirstDerivative()]}},  # 3 branches
...     PLSRegression(n_components=5),
... ]

class nirs4all.controllers.data.branch.BranchController[source]

Bases: OperatorController

Controller for pipeline branching.

Implements the branching mechanism that allows multiple preprocessing chains to be evaluated independently within a single pipeline execution.

Key behaviors:

Creates independent context copies for each branch
Executes branch steps sequentially within each branch
Stores branch contexts in context.custom[“branch_contexts”]
Post-branch steps iterate over all branch contexts

priority

Controller priority (lower = higher priority). Set to 5 to execute before most other controllers.

Type:: int

execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) → Tuple[ExecutionContext, StepOutput][source]

Execute the branch step with V3 chain tracking.

Creates independent contexts for each branch, executes branch-specific steps, and stores branch contexts for post-branch iteration.

In predict/explain mode, only executes the target branch specified in runtime_context.target_model.branch_id for efficiency.

V3 improvements: - Uses trace_recorder.enter_branch() / exit_branch() for branch path tracking - Records each substep individually for complete trace fidelity - Builds proper operator chains for artifact identification

Parameters:

step_info – Parsed step containing branch definitions
dataset – Dataset to operate on
context – Pipeline execution context
runtime_context – Runtime infrastructure context
source – Data source index
mode – Execution mode (“train” or “predict”)
loaded_binaries – Pre-loaded binary objects for prediction mode
prediction_store – External prediction store for model predictions

Returns:

Tuple of (updated_context, StepOutput with collected artifacts)

classmethod matches(step: Any, operator: Any, keyword: str) → bool[source]

Check if the step matches the branch controller.

Parameters:

step – Original step configuration
operator – Deserialized operator (may be list of branch definitions)
keyword – Step keyword

Returns:

True if keyword is “branch”

priority: int = 5

classmethod supports_prediction_mode() → bool[source]: Branch controller should execute in prediction mode to reconstruct branches.

classmethod use_multi_source() → bool[source]: Branch controller supports multi-source datasets.