nirs4all.controllers.splitters.split module

class nirs4all.controllers.splitters.split.CrossValidatorController[source]

Bases: OperatorController

Controller for any sklearn‑compatible splitter (native or custom).

execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: Any = None, prediction_store: Any = None)[source]

Run operator.split and store the resulting folds on dataset.

  • Smartly supplies y / groups only if required.

  • Extracts groups from metadata if specified.

  • Supports force_group parameter to wrap any splitter with group-awareness.

  • Maps local indices back to the global index space.

  • Stores the list of folds into the dataset for subsequent steps.

Parameters:
  • step_info (ParsedStep) – Parsed step containing the operator and original step configuration.

  • dataset (SpectroDataset) – The dataset to split.

  • context (ExecutionContext) – Current execution context.

  • runtime_context (RuntimeContext) – Runtime context with global settings.

  • source (int) – Source index (-1 for combined sources).

  • mode (str) – Execution mode (“train”, “predict”, or “explain”).

  • loaded_binaries (Any) – Pre-loaded binary data (not used).

  • prediction_store (Any) – Store for predictions (not used).

Notes

The force_group parameter enables any sklearn-compatible splitter to work with grouped samples by wrapping it with GroupedSplitterWrapper. This aggregates samples by group, passes “virtual samples” to the splitter, and expands fold indices back to the original dataset.

Example usage:

{"split": KFold(n_splits=5), "force_group": "Sample_ID"}
{"split": ShuffleSplit(test_size=0.2), "force_group": "ID", "aggregation": "median"}
classmethod matches(step: Any, operator: Any, keyword: str) bool[source]

Return True if operator behaves like a splitter.

Criteria – must expose a callable split whose first positional argument is named X. Optional presence of get_n_splits is a plus but not mandatory, so user‑defined simple splitters are still accepted.

Also matches on the ‘split’ keyword for group-aware splitting syntax.

priority: int = 10
classmethod supports_prediction_mode() bool[source]

Cross-validators should not execute during prediction mode.

classmethod use_multi_source() bool[source]

Cross‑validators themselves are single‑source operators.