nirs4all.data.partition package
Submodules
- nirs4all.data.partition.partition_assigner module
PartitionAssignerPartitionErrorPartitionResultPartitionResult.train_indicesPartitionResult.test_indicesPartitionResult.predict_indicesPartitionResult.train_dataPartitionResult.test_dataPartitionResult.predict_dataPartitionResult.partition_columnPartitionResult.get_data()PartitionResult.get_indices()PartitionResult.has_predictPartitionResult.has_testPartitionResult.has_trainPartitionResult.partition_columnPartitionResult.predict_dataPartitionResult.predict_indicesPartitionResult.test_dataPartitionResult.test_indicesPartitionResult.train_dataPartitionResult.train_indices
Module contents
Partition module for dataset configuration.
This module provides flexible partition assignment for dataset loading, supporting static, column-based, percentage-based, and index-based partition methods.
- Classes:
PartitionAssigner: Assign rows to train/test/predict partitions PartitionError: Raised when partition assignment fails
- Supported partition methods:
Static: Assign entire file to a partition
Column-based: Partition based on column values
Percentage-based: Split by percentage with optional shuffle/stratify
Index-based: Explicit index lists or external files
- class nirs4all.data.partition.PartitionAssigner(default_random_state: int | None = None, base_path: Path | None = None)[source]
Bases:
objectFlexible partition assigner for DataFrames.
Supports multiple partition methods: - Static: “train”, “test”, “predict” (assign entire DataFrame) - Column-based: {“column”: “split”, “train_values”: […], “test_values”: […]} - Percentage-based: {“train”: “80%”, “test”: “20%”, “shuffle”: True} - Index-based: {“train”: [0,1,2], “test”: [3,4,5]} - Index file: {“train_file”: “train_idx.txt”, “test_file”: “test_idx.txt”}
Example
>>> assigner = PartitionAssigner() >>> result = assigner.assign(df, {"train": "80%", "test": "20%"}) >>> print(len(result.train_data), len(result.test_data))
- DEFAULT_PREDICT_VALUES = ('predict', 'prediction', 'unknown')
- DEFAULT_TEST_VALUES = ('test', 'testing', 'val', 'validation', 'valid')
- DEFAULT_TRAIN_VALUES = ('train', 'training', 'cal', 'calibration')
- PARTITION_NAMES = ('train', 'test', 'predict')
- assign(df: DataFrame, partition: str | Dict[str, Any] | None) PartitionResult[source]
Assign rows to partitions.
- Parameters:
df – The DataFrame to partition.
partition – Partition specification. Can be: - str: Static partition (“train”, “test”, “predict”) - dict: Complex partition (column-based, percentage, or index) - None: No partitioning (returns empty result)
- Returns:
PartitionResult with indices and data for each partition.
- Raises:
PartitionError – If partition specification is invalid.
- concatenate_partitions(results: Sequence[PartitionResult]) PartitionResult[source]
Concatenate multiple partition results.
Useful when combining multiple files with the same partition. Indices are adjusted to account for concatenation order.
- Parameters:
results – Sequence of PartitionResult objects.
- Returns:
Combined PartitionResult.
- exception nirs4all.data.partition.PartitionError[source]
Bases:
ExceptionRaised when partition assignment fails.
- class nirs4all.data.partition.PartitionResult(train_indices: ~typing.List[int] = <factory>, test_indices: ~typing.List[int] = <factory>, predict_indices: ~typing.List[int] = <factory>, train_data: ~pandas.core.frame.DataFrame | None = None, test_data: ~pandas.core.frame.DataFrame | None = None, predict_data: ~pandas.core.frame.DataFrame | None = None, partition_column: str | None = None)[source]
Bases:
objectResult of a partition assignment operation.
- train_data
DataFrame subset for training.
- Type:
pandas.core.frame.DataFrame | None
- test_data
DataFrame subset for testing.
- Type:
pandas.core.frame.DataFrame | None
- predict_data
DataFrame subset for prediction.
- Type:
pandas.core.frame.DataFrame | None
- get_data(partition: Literal['train', 'test', 'predict']) DataFrame | None[source]
Get data for a specific partition.