nirs4all.data.selection.role_assigner module

Role assigner for dataset configuration.

This module provides role assignment for DataFrame columns, assigning them to features (X), targets (Y), or metadata roles with validation to prevent overlap.

Example

>>> assigner = RoleAssigner()
>>> result = assigner.assign(df, {
...     "features": "2:-1",
...     "targets": -1,
...     "metadata": [0, 1]
... })
>>> print(result.features)  # Features DataFrame
>>> print(result.targets)   # Targets DataFrame
>>> print(result.metadata)  # Metadata DataFrame
class nirs4all.data.selection.role_assigner.RoleAssigner(case_sensitive: bool = True, allow_overlap: bool = False)[source]

Bases: object

Assign columns to data roles (features, targets, metadata).

Validates that: - No column is assigned to multiple roles - At least features are assigned - Indices are valid

Supports the same column selection syntax as ColumnSelector.

Example

>>> assigner = RoleAssigner()
>>> result = assigner.assign(df, {
...     "features": "2:-1",       # All columns except first 2 and last
...     "targets": -1,            # Last column
...     "metadata": [0, 1]        # First 2 columns
... })
assign(df: DataFrame, roles: Dict[str, int | str | List[int] | List[str] | Dict[str, Any] | slice | None]) RoleAssignmentResult[source]

Assign columns to roles.

Parameters:
  • df – The DataFrame to assign roles from.

  • roles – Dictionary mapping role names to column selections. Supported roles: “features”, “targets”, “metadata” Also accepts: “x” (alias for features), “y” (alias for targets)

Returns:

RoleAssignmentResult with separated DataFrames.

Raises:

RoleAssignmentError – If assignment is invalid (overlap, missing features).

assign_auto(df: DataFrame, target_columns: int | str | List[int] | List[str] | Dict[str, Any] | slice | None = None, metadata_columns: int | str | List[int] | List[str] | Dict[str, Any] | slice | None = None) RoleAssignmentResult[source]

Auto-assign roles with specified targets and metadata.

Features are automatically set to all remaining columns.

Parameters:
  • df – The DataFrame to assign roles from.

  • target_columns – Column selection for targets (Y).

  • metadata_columns – Column selection for metadata.

Returns:

RoleAssignmentResult with separated DataFrames.

extract_y_from_x(df: DataFrame, y_columns: int | str | List[int] | List[str] | Dict[str, Any] | slice | None) RoleAssignmentResult[source]

Extract target columns from a features DataFrame.

This is useful when Y columns are embedded in the X data.

Parameters:
  • df – DataFrame containing both features and targets.

  • y_columns – Column selection for targets to extract.

Returns:

RoleAssignmentResult with features (remaining) and targets (extracted).

validate_roles(df: DataFrame, roles: Dict[str, int | str | List[int] | List[str] | Dict[str, Any] | slice | None]) List[str][source]

Validate a role specification without performing assignment.

Parameters:
  • df – The DataFrame to validate against.

  • roles – Role specification to validate.

Returns:

List of warning messages (empty if no warnings).

Raises:

RoleAssignmentError – If role specification is invalid.

exception nirs4all.data.selection.role_assigner.RoleAssignmentError[source]

Bases: Exception

Raised when role assignment fails.

class nirs4all.data.selection.role_assigner.RoleAssignmentResult(features: DataFrame | None, targets: DataFrame | None, metadata: DataFrame | None, feature_indices: List[int], target_indices: List[int], metadata_indices: List[int])[source]

Bases: object

Result of role assignment.

features

DataFrame containing feature columns (X).

Type:

pandas.core.frame.DataFrame | None

targets

DataFrame containing target columns (Y).

Type:

pandas.core.frame.DataFrame | None

metadata

DataFrame containing metadata columns.

Type:

pandas.core.frame.DataFrame | None

feature_indices

Indices of feature columns in original DataFrame.

Type:

List[int]

target_indices

Indices of target columns in original DataFrame.

Type:

List[int]

metadata_indices

Indices of metadata columns in original DataFrame.

Type:

List[int]

property X: DataFrame | None

Alias for features.

feature_indices: List[int]
features: DataFrame | None
metadata: DataFrame | None
metadata_indices: List[int]
target_indices: List[int]
targets: DataFrame | None
property y: DataFrame | None

Alias for targets.