nirs4all.data.selection.column_selector module

Column selector for dataset configuration.

This module provides flexible column selection for DataFrames, supporting multiple selection syntaxes including indices, names, ranges, regex patterns, and exclusion.

Example

>>> selector = ColumnSelector()
>>> # By name
>>> cols = selector.select(df, ["col1", "col2"])
>>> # By index
>>> cols = selector.select(df, [0, 1, 2])
>>> # By range (slice syntax)
>>> cols = selector.select(df, "2:-1")
>>> # By regex pattern
>>> cols = selector.select(df, {"regex": "^feature_.*"})
>>> # By exclusion
>>> cols = selector.select(df, {"exclude": ["id", "date"]})
exception nirs4all.data.selection.column_selector.ColumnSelectionError[source]

Bases: Exception

Raised when column selection fails.

class nirs4all.data.selection.column_selector.ColumnSelector(case_sensitive: bool = True)[source]

Bases: object

Flexible column selector for DataFrames.

Supports multiple selection methods: - By name: [“col1”, “col2”] or “col_name” - By index: [0, 1, 2] or 0 - By range: “2:-1” (slice syntax as string) - By regex pattern: {“regex”: “^feature_.*”} - By exclusion: {“exclude”: [“id”, “date”]} - Combined: {“include”: [0, 1], “exclude”: [“id”]}

Example

>>> selector = ColumnSelector()
>>> result = selector.select(df, "2:-1")
>>> print(result.names)  # Column names in range
>>> print(result.data)   # Selected columns as DataFrame
parse_selection(selection: Any, available_columns: List[str]) List[int][source]

Parse a selection specification and return column indices.

This is a convenience method for when you don’t have a DataFrame but want to validate and resolve a selection.

Parameters:
  • selection – Column selection specification.

  • available_columns – List of available column names.

Returns:

List of column indices.

Raises:

ColumnSelectionError – If selection is invalid.

select(df: DataFrame, selection: int | str | List[int] | List[str] | Dict[str, Any] | slice | None) SelectionResult[source]

Select columns from a DataFrame.

Parameters:
  • df – The DataFrame to select columns from.

  • selection – Column selection specification. Can be: - None: Select all columns - int: Single column index - str: Single column name or range string (“2:-1”) - List[int]: List of column indices - List[str]: List of column names - Dict: Complex selection (see class docstring)

Returns:

SelectionResult with indices, names, and selected data.

Raises:

ColumnSelectionError – If selection is invalid or columns not found.

class nirs4all.data.selection.column_selector.SelectionResult(indices: List[int], names: List[str], data: DataFrame)[source]

Bases: object

Result of a column selection operation.

indices

List of selected column indices (0-based).

Type:

List[int]

names

List of selected column names.

Type:

List[str]

data

The selected DataFrame subset.

Type:

pandas.core.frame.DataFrame

data: DataFrame
indices: List[int]
names: List[str]