nirs4all.controllers.splitters.split module
- class nirs4all.controllers.splitters.split.CrossValidatorController[source]
Bases:
OperatorControllerController for any sklearn‑compatible splitter (native or custom).
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: Any = None, prediction_store: Any = None)[source]
Run
operator.splitand store the resulting folds on dataset.Smartly supplies
y/groupsonly if required.Extracts groups from metadata if specified.
Supports
force_groupparameter to wrap any splitter with group-awareness.Maps local indices back to the global index space.
Stores the list of folds into the dataset for subsequent steps.
- Parameters:
step_info (ParsedStep) – Parsed step containing the operator and original step configuration.
dataset (SpectroDataset) – The dataset to split.
context (ExecutionContext) – Current execution context.
runtime_context (RuntimeContext) – Runtime context with global settings.
source (int) – Source index (-1 for combined sources).
mode (str) – Execution mode (“train”, “predict”, or “explain”).
loaded_binaries (Any) – Pre-loaded binary data (not used).
prediction_store (Any) – Store for predictions (not used).
Notes
The
force_groupparameter enables any sklearn-compatible splitter to work with grouped samples by wrapping it withGroupedSplitterWrapper. This aggregates samples by group, passes “virtual samples” to the splitter, and expands fold indices back to the original dataset.Example usage:
{"split": KFold(n_splits=5), "force_group": "Sample_ID"} {"split": ShuffleSplit(test_size=0.2), "force_group": "ID", "aggregation": "median"}
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Return True if operator behaves like a splitter.
Criteria – must expose a callable
splitwhose first positional argument is named X. Optional presence ofget_n_splitsis a plus but not mandatory, so user‑defined simple splitters are still accepted.Also matches on the ‘split’ keyword for group-aware splitting syntax.