nirs4all.controllers.data.branch module
Branch Controller for pipeline branching.
This controller enables splitting a pipeline into multiple parallel sub-pipelines (“branches”), each with its own preprocessing context (X transformations, Y processing), while sharing common upstream state (splits, initial preprocessing).
Steps declared after a branch block execute on each branch independently.
V3 improvements: - Uses trace_recorder.enter_branch() / exit_branch() for automatic branch path tracking - Records each branch substep individually in the execution trace - Builds proper operator chains for artifact identification
Example
>>> pipeline = [
... ShuffleSplit(n_splits=5),
... {"branch": [
... [SNV(), PCA(n_components=10)],
... [MSC(), FirstDerivative()],
... ]},
... PLSRegression(n_components=5), # Runs on BOTH branches
... ]
- Generator syntax is also supported:
>>> pipeline = [ ... ShuffleSplit(n_splits=3), ... {"branch": {"_or_": [SNV(), MSC(), FirstDerivative()]}}, # 3 branches ... PLSRegression(n_components=5), ... ]
- class nirs4all.controllers.data.branch.BranchController[source]
Bases:
OperatorControllerController for pipeline branching.
Implements the branching mechanism that allows multiple preprocessing chains to be evaluated independently within a single pipeline execution.
- Key behaviors:
Creates independent context copies for each branch
Executes branch steps sequentially within each branch
Stores branch contexts in context.custom[“branch_contexts”]
Post-branch steps iterate over all branch contexts
- priority
Controller priority (lower = higher priority). Set to 5 to execute before most other controllers.
- Type:
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, StepOutput][source]
Execute the branch step with V3 chain tracking.
Creates independent contexts for each branch, executes branch-specific steps, and stores branch contexts for post-branch iteration.
In predict/explain mode, only executes the target branch specified in runtime_context.target_model.branch_id for efficiency.
V3 improvements: - Uses trace_recorder.enter_branch() / exit_branch() for branch path tracking - Records each substep individually for complete trace fidelity - Builds proper operator chains for artifact identification
- Parameters:
step_info – Parsed step containing branch definitions
dataset – Dataset to operate on
context – Pipeline execution context
runtime_context – Runtime infrastructure context
source – Data source index
mode – Execution mode (“train” or “predict”)
loaded_binaries – Pre-loaded binary objects for prediction mode
prediction_store – External prediction store for model predictions
- Returns:
Tuple of (updated_context, StepOutput with collected artifacts)
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Check if the step matches the branch controller.
- Parameters:
step – Original step configuration
operator – Deserialized operator (may be list of branch definitions)
keyword – Step keyword
- Returns:
True if keyword is “branch”