nirs4all.controllers.splitters package

Submodules

Module contents

Splitter controllers.

Controllers for data splitting operators.

class nirs4all.controllers.splitters.FoldFileLoaderController[source]

Bases: OperatorController

Controller for loading pre-computed fold indices from files.

This controller matches pipeline steps where the ‘split’ keyword is used with a file path (string ending in a supported extension) instead of a splitter object.

Examples

>>> # In pipeline
>>> {"split": "path/to/folds.csv"}
>>> {"split": "workspace/runs/my_run/folds_KFold_seed42.csv"}

execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: Any = None, prediction_store: Any = None) → Tuple[ExecutionContext, Any][source]

Load folds from file and set them on the dataset.

Parameters:

step_info – Parsed step containing the file path.
dataset – Dataset to set folds on.
context – Current execution context.
runtime_context – Runtime context with global settings.
source – Source index (unused).
mode – Execution mode (“train” or “predict”).
loaded_binaries – Pre-loaded binaries (unused).
prediction_store – Prediction store (unused).

Returns:

Tuple of (context, StepOutput).

classmethod matches(step: Any, operator: Any, keyword: str) → bool[source]

Match steps with ‘split’ keyword and file path value.

Returns True if: - keyword is ‘split’, AND - operator is a string (file path), AND - path has a supported extension (.csv, .json, .yaml, .yml, .txt)

priority: int = 9

classmethod supports_prediction_mode() → bool[source]: Fold files should be loaded in prediction mode to set up fold structure.

classmethod use_multi_source() → bool[source]: Fold loading is a single-source operation.

class nirs4all.controllers.splitters.FoldFileParser[source]

Bases: object

Utility class for parsing fold files in various formats.

Supports multiple fold file formats: - nirs4all CSV: columns fold_0, fold_1, etc. with sample IDs as rows - Assignment CSV: columns sample_id, fold assigning each sample to a fold - JSON: List of dicts with train and val (or test) keys - YAML: Same structure as JSON - TXT: Simple format with fold indices

Examples

>>> parser = FoldFileParser()
>>> folds = parser.parse("folds_KFold.csv")
>>> # Returns: [(train_ids, val_ids), (train_ids, val_ids), ...]

SUPPORTED_EXTENSIONS = {'.csv', '.json', '.txt', '.yaml', '.yml'}

parse(file_path: str | Path, format: str | None = None) → List[Tuple[List[int], List[int]]][source]

Parse a fold file and return fold definitions.

Parameters:

file_path – Path to the fold file.
format – Optional format hint (‘csv’, ‘json’, ‘yaml’, ‘txt’). If None, format is auto-detected from extension.

Returns:

List of (train_indices, val_indices) tuples.

Raises:

FileNotFoundError – If file doesn’t exist.
ValueError – If file format is unsupported or content is invalid.