nirs4all.data.aggregation package

Submodules

Module contents

Aggregation module for dataset configuration.

This module provides aggregation functionality for sample data, including custom aggregation functions and column-based aggregation.

class nirs4all.data.aggregation.AggregationConfig(column: str | bool | None = None, method: AggregationMethod | str = AggregationMethod.MEAN, exclude_outliers: bool = False, outlier_threshold: float = 3.0, min_samples: int = 1, custom_function: Callable | None = None, feature_method: AggregationMethod | str | None = None, target_method: AggregationMethod | str | None = None)[source]

Bases: object

Configuration for sample aggregation.

column

Column name to group by for aggregation. If True, aggregate by y values. If None, no aggregation.

Type:

str | bool | None

method

Aggregation method or custom function.

Type:

nirs4all.data.aggregation.aggregator.AggregationMethod | str

exclude_outliers

Whether to exclude outliers before aggregation.

Type:

bool

outlier_threshold

Z-score threshold for outlier detection.

Type:

float

min_samples

Minimum number of samples per group (groups with fewer are dropped).

Type:

int

custom_function

Optional custom aggregation function.

Type:

Callable | None

feature_method

Aggregation method for features (X), if different from targets.

Type:

nirs4all.data.aggregation.aggregator.AggregationMethod | str | None

target_method

Aggregation method for targets (Y), if different from features.

Type:

nirs4all.data.aggregation.aggregator.AggregationMethod | str | None

__post_init__()[source]

Normalize method values to enum.

column: str | bool | None = None
custom_function: Callable | None = None
exclude_outliers: bool = False
feature_method: AggregationMethod | str | None = None
classmethod from_config(config: Dict[str, Any]) AggregationConfig[source]

Create from configuration dictionary.

Parameters:

config – Configuration dictionary with aggregation settings.

Returns:

AggregationConfig instance.

is_enabled() bool[source]

Check if aggregation is enabled.

method: AggregationMethod | str = 'mean'
min_samples: int = 1
outlier_threshold: float = 3.0
target_method: AggregationMethod | str | None = None
class nirs4all.data.aggregation.AggregationMethod(value)[source]

Bases: str, Enum

Aggregation method for combining samples.

COUNT = 'count'
FIRST = 'first'
LAST = 'last'
MAX = 'max'
MEAN = 'mean'
MEDIAN = 'median'
MIN = 'min'
STD = 'std'
SUM = 'sum'
VOTE = 'vote'
class nirs4all.data.aggregation.Aggregator(config: AggregationConfig)[source]

Bases: object

Aggregates sample data during loading.

Supports grouping by metadata columns, target values, or sample IDs, with configurable aggregation methods for features and targets.

Example

```python # Aggregate by sample_id column using mean config = AggregationConfig(column=”sample_id”, method=”mean”) aggregator = Aggregator(config) X_agg, y_agg, meta_agg = aggregator.aggregate(X, y, metadata)

# Aggregate with outlier exclusion config = AggregationConfig(

column=”sample_id”, method=”mean”, exclude_outliers=True, outlier_threshold=2.5

) aggregator = Aggregator(config) result = aggregator.aggregate(X, y, metadata)

# Custom aggregation function config = AggregationConfig(

column=”sample_id”, custom_function=lambda x: np.percentile(x, 75, axis=0)

) aggregator = Aggregator(config) result = aggregator.aggregate(X, y, metadata) ```

aggregate(X: ndarray, y: ndarray | None = None, metadata: DataFrame | None = None, group_column: str | None = None) tuple[source]

Aggregate data by groups.

Parameters:
  • X – Feature array of shape (n_samples, n_features).

  • y – Optional target array of shape (n_samples,) or (n_samples, n_targets).

  • metadata – Optional metadata DataFrame.

  • group_column – Override column to group by.

Returns:

Tuple of (X_aggregated, y_aggregated, metadata_aggregated). Elements are None if not provided in input.

Raises:

AggregationError – If aggregation fails.

register_function(name: str, func: Callable) None[source]

Register a custom aggregation function.

Parameters:
  • name – Name to reference the function.

  • func – Aggregation function that takes array and returns aggregated value.

nirs4all.data.aggregation.aggregate_data(X: ndarray, y: ndarray | None = None, metadata: DataFrame | None = None, column: str | bool | None = None, method: str | AggregationMethod = 'mean', exclude_outliers: bool = False, **kwargs) tuple[source]

Convenience function to aggregate data.

Parameters:
  • X – Feature array.

  • y – Optional target array.

  • metadata – Optional metadata DataFrame.

  • column – Column to group by (str), or True for y-based grouping.

  • method – Aggregation method.

  • exclude_outliers – Whether to exclude outliers.

  • **kwargs – Additional aggregation config options.

Returns:

Tuple of (X_aggregated, y_aggregated, metadata_aggregated).