nirs4all.data.selection.column_selector module
Column selector for dataset configuration.
This module provides flexible column selection for DataFrames, supporting multiple selection syntaxes including indices, names, ranges, regex patterns, and exclusion.
Example
>>> selector = ColumnSelector()
>>> # By name
>>> cols = selector.select(df, ["col1", "col2"])
>>> # By index
>>> cols = selector.select(df, [0, 1, 2])
>>> # By range (slice syntax)
>>> cols = selector.select(df, "2:-1")
>>> # By regex pattern
>>> cols = selector.select(df, {"regex": "^feature_.*"})
>>> # By exclusion
>>> cols = selector.select(df, {"exclude": ["id", "date"]})
- exception nirs4all.data.selection.column_selector.ColumnSelectionError[source]
Bases:
ExceptionRaised when column selection fails.
- class nirs4all.data.selection.column_selector.ColumnSelector(case_sensitive: bool = True)[source]
Bases:
objectFlexible column selector for DataFrames.
Supports multiple selection methods: - By name: [“col1”, “col2”] or “col_name” - By index: [0, 1, 2] or 0 - By range: “2:-1” (slice syntax as string) - By regex pattern: {“regex”: “^feature_.*”} - By exclusion: {“exclude”: [“id”, “date”]} - Combined: {“include”: [0, 1], “exclude”: [“id”]}
Example
>>> selector = ColumnSelector() >>> result = selector.select(df, "2:-1") >>> print(result.names) # Column names in range >>> print(result.data) # Selected columns as DataFrame
- parse_selection(selection: Any, available_columns: List[str]) List[int][source]
Parse a selection specification and return column indices.
This is a convenience method for when you don’t have a DataFrame but want to validate and resolve a selection.
- Parameters:
selection – Column selection specification.
available_columns – List of available column names.
- Returns:
List of column indices.
- Raises:
ColumnSelectionError – If selection is invalid.
- select(df: DataFrame, selection: int | str | List[int] | List[str] | Dict[str, Any] | slice | None) SelectionResult[source]
Select columns from a DataFrame.
- Parameters:
df – The DataFrame to select columns from.
selection – Column selection specification. Can be: - None: Select all columns - int: Single column index - str: Single column name or range string (“2:-1”) - List[int]: List of column indices - List[str]: List of column names - Dict: Complex selection (see class docstring)
- Returns:
SelectionResult with indices, names, and selected data.
- Raises:
ColumnSelectionError – If selection is invalid or columns not found.