nirs4all.data.aggregation package

Submodules

nirs4all.data.aggregation.aggregator module

Module contents

Aggregation module for dataset configuration.

This module provides aggregation functionality for sample data, including custom aggregation functions and column-based aggregation.

class nirs4all.data.aggregation.AggregationConfig(column: str | bool | None = None, method: AggregationMethod | str = AggregationMethod.MEAN, exclude_outliers: bool = False, outlier_threshold: float = 3.0, min_samples: int = 1, custom_function: Callable | None = None, feature_method: AggregationMethod | str | None = None, target_method: AggregationMethod | str | None = None)[source]

Bases: object

Configuration for sample aggregation.

column

Column name to group by for aggregation. If True, aggregate by y values. If None, no aggregation.

Type:: str | bool | None

method

Aggregation method or custom function.

Type:: nirs4all.data.aggregation.aggregator.AggregationMethod | str

exclude_outliers

Whether to exclude outliers before aggregation.

Type:: bool

outlier_threshold

Z-score threshold for outlier detection.

Type:: float

min_samples

Minimum number of samples per group (groups with fewer are dropped).

Type:: int

custom_function

Optional custom aggregation function.

Type:: Callable | None

feature_method

Aggregation method for features (X), if different from targets.

Type:: nirs4all.data.aggregation.aggregator.AggregationMethod | str | None

target_method

Aggregation method for targets (Y), if different from features.

Type:: nirs4all.data.aggregation.aggregator.AggregationMethod | str | None

__post_init__()[source]: Normalize method values to enum.

column: str | bool | None = None

custom_function: Callable | None = None

exclude_outliers: bool = False

feature_method: AggregationMethod | str | None = None

classmethod from_config(config: Dict[str, Any]) → AggregationConfig[source]

Create from configuration dictionary.

Parameters:: config – Configuration dictionary with aggregation settings.
Returns:: AggregationConfig instance.

is_enabled() → bool[source]: Check if aggregation is enabled.

method: AggregationMethod | str = 'mean'

min_samples: int = 1

outlier_threshold: float = 3.0

target_method: AggregationMethod | str | None = None

class nirs4all.data.aggregation.AggregationMethod(value)[source]

Bases: str, Enum

Aggregation method for combining samples.

COUNT = 'count'

FIRST = 'first'

LAST = 'last'

MAX = 'max'

MEAN = 'mean'

MEDIAN = 'median'

MIN = 'min'

STD = 'std'

SUM = 'sum'

VOTE = 'vote'

class nirs4all.data.aggregation.Aggregator(config: AggregationConfig)[source]

Bases: object

Aggregates sample data during loading.

Supports grouping by metadata columns, target values, or sample IDs, with configurable aggregation methods for features and targets.

Example

```python # Aggregate by sample_id column using mean config = AggregationConfig(column=”sample_id”, method=”mean”) aggregator = Aggregator(config) X_agg, y_agg, meta_agg = aggregator.aggregate(X, y, metadata)

# Aggregate with outlier exclusion config = AggregationConfig(

column=”sample_id”, method=”mean”, exclude_outliers=True, outlier_threshold=2.5

) aggregator = Aggregator(config) result = aggregator.aggregate(X, y, metadata)

# Custom aggregation function config = AggregationConfig(

column=”sample_id”, custom_function=lambda x: np.percentile(x, 75, axis=0)

) aggregator = Aggregator(config) result = aggregator.aggregate(X, y, metadata) ```

aggregate(X: ndarray, y: ndarray | None = None, metadata: DataFrame | None = None, group_column: str | None = None) → tuple[source]

Aggregate data by groups.

Parameters:

X – Feature array of shape (n_samples, n_features).
y – Optional target array of shape (n_samples,) or (n_samples, n_targets).
metadata – Optional metadata DataFrame.
group_column – Override column to group by.

Returns:

Tuple of (X_aggregated, y_aggregated, metadata_aggregated). Elements are None if not provided in input.

Raises:

AggregationError – If aggregation fails.

register_function(name: str, func: Callable) → None[source]

Register a custom aggregation function.

Parameters:

name – Name to reference the function.
func – Aggregation function that takes array and returns aggregated value.

Convenience function to aggregate data.

Parameters:

X – Feature array.
y – Optional target array.
metadata – Optional metadata DataFrame.
column – Column to group by (str), or True for y-based grouping.
method – Aggregation method.
exclude_outliers – Whether to exclude outliers.
**kwargs – Additional aggregation config options.

Returns:

Tuple of (X_aggregated, y_aggregated, metadata_aggregated).