nirs4all.controllers.data.branch module

Branch Controller for pipeline branching.

This controller enables splitting a pipeline into multiple parallel sub-pipelines (“branches”), each with its own preprocessing context (X transformations, Y processing), while sharing common upstream state (splits, initial preprocessing).

Steps declared after a branch block execute on each branch independently.

V3 improvements: - Uses trace_recorder.enter_branch() / exit_branch() for automatic branch path tracking - Records each branch substep individually in the execution trace - Builds proper operator chains for artifact identification

Example

>>> pipeline = [
...     ShuffleSplit(n_splits=5),
...     {"branch": [
...         [SNV(), PCA(n_components=10)],
...         [MSC(), FirstDerivative()],
...     ]},
...     PLSRegression(n_components=5),  # Runs on BOTH branches
... ]
Generator syntax is also supported:
>>> pipeline = [
...     ShuffleSplit(n_splits=3),
...     {"branch": {"_or_": [SNV(), MSC(), FirstDerivative()]}},  # 3 branches
...     PLSRegression(n_components=5),
... ]
class nirs4all.controllers.data.branch.BranchController[source]

Bases: OperatorController

Controller for pipeline branching.

Implements the branching mechanism that allows multiple preprocessing chains to be evaluated independently within a single pipeline execution.

Key behaviors:
  • Creates independent context copies for each branch

  • Executes branch steps sequentially within each branch

  • Stores branch contexts in context.custom[“branch_contexts”]

  • Post-branch steps iterate over all branch contexts

priority

Controller priority (lower = higher priority). Set to 5 to execute before most other controllers.

Type:

int

execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, StepOutput][source]

Execute the branch step with V3 chain tracking.

Creates independent contexts for each branch, executes branch-specific steps, and stores branch contexts for post-branch iteration.

In predict/explain mode, only executes the target branch specified in runtime_context.target_model.branch_id for efficiency.

V3 improvements: - Uses trace_recorder.enter_branch() / exit_branch() for branch path tracking - Records each substep individually for complete trace fidelity - Builds proper operator chains for artifact identification

Parameters:
  • step_info – Parsed step containing branch definitions

  • dataset – Dataset to operate on

  • context – Pipeline execution context

  • runtime_context – Runtime infrastructure context

  • source – Data source index

  • mode – Execution mode (“train” or “predict”)

  • loaded_binaries – Pre-loaded binary objects for prediction mode

  • prediction_store – External prediction store for model predictions

Returns:

Tuple of (updated_context, StepOutput with collected artifacts)

classmethod matches(step: Any, operator: Any, keyword: str) bool[source]

Check if the step matches the branch controller.

Parameters:
  • step – Original step configuration

  • operator – Deserialized operator (may be list of branch definitions)

  • keyword – Step keyword

Returns:

True if keyword is “branch”

priority: int = 5
classmethod supports_prediction_mode() bool[source]

Branch controller should execute in prediction mode to reconstruct branches.

classmethod use_multi_source() bool[source]

Branch controller supports multi-source datasets.