nirs4all.controllers.splitters package
Submodules
Module contents
Splitter controllers.
Controllers for data splitting operators.
- class nirs4all.controllers.splitters.FoldFileLoaderController[source]
Bases:
OperatorControllerController for loading pre-computed fold indices from files.
This controller matches pipeline steps where the ‘split’ keyword is used with a file path (string ending in a supported extension) instead of a splitter object.
Examples
>>> # In pipeline >>> {"split": "path/to/folds.csv"} >>> {"split": "workspace/runs/my_run/folds_KFold_seed42.csv"}
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: Any = None, prediction_store: Any = None) Tuple[ExecutionContext, Any][source]
Load folds from file and set them on the dataset.
- Parameters:
step_info – Parsed step containing the file path.
dataset – Dataset to set folds on.
context – Current execution context.
runtime_context – Runtime context with global settings.
source – Source index (unused).
mode – Execution mode (“train” or “predict”).
loaded_binaries – Pre-loaded binaries (unused).
prediction_store – Prediction store (unused).
- Returns:
Tuple of (context, StepOutput).
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Match steps with ‘split’ keyword and file path value.
Returns True if: - keyword is ‘split’, AND - operator is a string (file path), AND - path has a supported extension (.csv, .json, .yaml, .yml, .txt)
- class nirs4all.controllers.splitters.FoldFileParser[source]
Bases:
objectUtility class for parsing fold files in various formats.
Supports multiple fold file formats: - nirs4all CSV: columns fold_0, fold_1, etc. with sample IDs as rows - Assignment CSV: columns sample_id, fold assigning each sample to a fold - JSON: List of dicts with train and val (or test) keys - YAML: Same structure as JSON - TXT: Simple format with fold indices
Examples
>>> parser = FoldFileParser() >>> folds = parser.parse("folds_KFold.csv") >>> # Returns: [(train_ids, val_ids), (train_ids, val_ids), ...]
- SUPPORTED_EXTENSIONS = {'.csv', '.json', '.txt', '.yaml', '.yml'}
- parse(file_path: str | Path, format: str | None = None) List[Tuple[List[int], List[int]]][source]
Parse a fold file and return fold definitions.
- Parameters:
file_path – Path to the fold file.
format – Optional format hint (‘csv’, ‘json’, ‘yaml’, ‘txt’). If None, format is auto-detected from extension.
- Returns:
List of (train_indices, val_indices) tuples.
- Raises:
FileNotFoundError – If file doesn’t exist.
ValueError – If file format is unsupported or content is invalid.