nirs4all.data.selection.row_selector module
Row selector for dataset configuration.
This module provides flexible row selection for DataFrames, supporting multiple selection syntaxes including indices, ranges, percentages, conditions, and random sampling.
Example
>>> selector = RowSelector()
>>> # By index
>>> rows = selector.select(df, [0, 1, 2])
>>> # By range
>>> rows = selector.select(df, "0:100")
>>> # By percentage
>>> rows = selector.select(df, "0:80%")
>>> # By condition
>>> rows = selector.select(df, {"where": {"column": "quality", "op": ">", "value": 0.5}})
>>> # Random sample
>>> rows = selector.select(df, {"sample": 100, "random_state": 42})
- exception nirs4all.data.selection.row_selector.RowSelectionError[source]
Bases:
ExceptionRaised when row selection fails.
- class nirs4all.data.selection.row_selector.RowSelectionResult(indices: List[int], mask: Series, data: DataFrame)[source]
Bases:
objectResult of a row selection operation.
- mask
Boolean mask for the selection.
- data
The selected DataFrame subset.
- class nirs4all.data.selection.row_selector.RowSelector(default_random_state: int | None = None)[source]
Bases:
objectFlexible row selector for DataFrames.
Supports multiple selection methods: - All rows: None - By index: [0, 1, 2] or 0 - By range: “0:100” (slice syntax as string) - By percentage: “0:80%” or “80%:100%” - By condition: {“where”: {“column”: “quality”, “op”: “>”, “value”: 0.5}} - Random sample: {“sample”: 100, “random_state”: 42} - Stratified sample: {“sample”: 100, “stratify”: “class”, “random_state”: 42} - Head/Tail: {“head”: 100} or {“tail”: 50}
Example
>>> selector = RowSelector() >>> result = selector.select(df, "0:80%") >>> print(len(result.data)) # 80% of rows
- OPERATORS: Dict[str, Callable[[Any, Any], bool]] = {'!=': <function RowSelector.<lambda>>, '<': <function RowSelector.<lambda>>, '<=': <function RowSelector.<lambda>>, '==': <function RowSelector.<lambda>>, '>': <function RowSelector.<lambda>>, '>=': <function RowSelector.<lambda>>, 'contains': <function RowSelector.<lambda>>, 'endswith': <function RowSelector.<lambda>>, 'in': <function RowSelector.<lambda>>, 'isna': <function RowSelector.<lambda>>, 'not in': <function RowSelector.<lambda>>, 'notna': <function RowSelector.<lambda>>, 'regex': <function RowSelector.<lambda>>, 'startswith': <function RowSelector.<lambda>>}
- select(df: DataFrame, selection: int | str | List[int] | Dict[str, Any] | slice | None) RowSelectionResult[source]
Select rows from a DataFrame.
- Parameters:
df – The DataFrame to select rows from.
selection – Row selection specification. Can be: - None: Select all rows - int: Single row index - str: Range string (“0:100”) or percentage (“0:80%”) - List[int]: List of row indices - Dict: Complex selection (see class docstring)
- Returns:
RowSelectionResult with indices, mask, and selected data.
- Raises:
RowSelectionError – If selection is invalid or rows not found.