nirs4all.data.selection.row_selector module

Row selector for dataset configuration.

This module provides flexible row selection for DataFrames, supporting multiple selection syntaxes including indices, ranges, percentages, conditions, and random sampling.

Example

>>> selector = RowSelector()
>>> # By index
>>> rows = selector.select(df, [0, 1, 2])
>>> # By range
>>> rows = selector.select(df, "0:100")
>>> # By percentage
>>> rows = selector.select(df, "0:80%")
>>> # By condition
>>> rows = selector.select(df, {"where": {"column": "quality", "op": ">", "value": 0.5}})
>>> # Random sample
>>> rows = selector.select(df, {"sample": 100, "random_state": 42})
exception nirs4all.data.selection.row_selector.RowSelectionError[source]

Bases: Exception

Raised when row selection fails.

class nirs4all.data.selection.row_selector.RowSelectionResult(indices: List[int], mask: Series, data: DataFrame)[source]

Bases: object

Result of a row selection operation.

indices

List of selected row indices (from original DataFrame index).

Type:

List[int]

mask

Boolean mask for the selection.

Type:

pandas.core.series.Series

data

The selected DataFrame subset.

Type:

pandas.core.frame.DataFrame

data: DataFrame
indices: List[int]
mask: Series
class nirs4all.data.selection.row_selector.RowSelector(default_random_state: int | None = None)[source]

Bases: object

Flexible row selector for DataFrames.

Supports multiple selection methods: - All rows: None - By index: [0, 1, 2] or 0 - By range: “0:100” (slice syntax as string) - By percentage: “0:80%” or “80%:100%” - By condition: {“where”: {“column”: “quality”, “op”: “>”, “value”: 0.5}} - Random sample: {“sample”: 100, “random_state”: 42} - Stratified sample: {“sample”: 100, “stratify”: “class”, “random_state”: 42} - Head/Tail: {“head”: 100} or {“tail”: 50}

Example

>>> selector = RowSelector()
>>> result = selector.select(df, "0:80%")
>>> print(len(result.data))  # 80% of rows
OPERATORS: Dict[str, Callable[[Any, Any], bool]] = {'!=': <function RowSelector.<lambda>>, '<': <function RowSelector.<lambda>>, '<=': <function RowSelector.<lambda>>, '==': <function RowSelector.<lambda>>, '>': <function RowSelector.<lambda>>, '>=': <function RowSelector.<lambda>>, 'contains': <function RowSelector.<lambda>>, 'endswith': <function RowSelector.<lambda>>, 'in': <function RowSelector.<lambda>>, 'isna': <function RowSelector.<lambda>>, 'not in': <function RowSelector.<lambda>>, 'notna': <function RowSelector.<lambda>>, 'regex': <function RowSelector.<lambda>>, 'startswith': <function RowSelector.<lambda>>}
select(df: DataFrame, selection: int | str | List[int] | Dict[str, Any] | slice | None) RowSelectionResult[source]

Select rows from a DataFrame.

Parameters:
  • df – The DataFrame to select rows from.

  • selection – Row selection specification. Can be: - None: Select all rows - int: Single row index - str: Range string (“0:100”) or percentage (“0:80%”) - List[int]: List of row indices - Dict: Complex selection (see class docstring)

Returns:

RowSelectionResult with indices, mask, and selected data.

Raises:

RowSelectionError – If selection is invalid or rows not found.