nirs4all.data.aggregation package
Submodules
- nirs4all.data.aggregation.aggregator module
AggregationConfigAggregationConfig.columnAggregationConfig.methodAggregationConfig.exclude_outliersAggregationConfig.outlier_thresholdAggregationConfig.min_samplesAggregationConfig.custom_functionAggregationConfig.feature_methodAggregationConfig.target_methodAggregationConfig.__post_init__()AggregationConfig.columnAggregationConfig.custom_functionAggregationConfig.exclude_outliersAggregationConfig.feature_methodAggregationConfig.from_config()AggregationConfig.is_enabled()AggregationConfig.methodAggregationConfig.min_samplesAggregationConfig.outlier_thresholdAggregationConfig.target_method
AggregationErrorAggregationMethodAggregatoraggregate_data()
Module contents
Aggregation module for dataset configuration.
This module provides aggregation functionality for sample data, including custom aggregation functions and column-based aggregation.
- class nirs4all.data.aggregation.AggregationConfig(column: str | bool | None = None, method: AggregationMethod | str = AggregationMethod.MEAN, exclude_outliers: bool = False, outlier_threshold: float = 3.0, min_samples: int = 1, custom_function: Callable | None = None, feature_method: AggregationMethod | str | None = None, target_method: AggregationMethod | str | None = None)[source]
Bases:
objectConfiguration for sample aggregation.
- column
Column name to group by for aggregation. If True, aggregate by y values. If None, no aggregation.
- method
Aggregation method or custom function.
- custom_function
Optional custom aggregation function.
- Type:
Callable | None
- feature_method
Aggregation method for features (X), if different from targets.
- Type:
nirs4all.data.aggregation.aggregator.AggregationMethod | str | None
- target_method
Aggregation method for targets (Y), if different from features.
- Type:
nirs4all.data.aggregation.aggregator.AggregationMethod | str | None
- feature_method: AggregationMethod | str | None = None
- classmethod from_config(config: Dict[str, Any]) AggregationConfig[source]
Create from configuration dictionary.
- Parameters:
config – Configuration dictionary with aggregation settings.
- Returns:
AggregationConfig instance.
- method: AggregationMethod | str = 'mean'
- target_method: AggregationMethod | str | None = None
- class nirs4all.data.aggregation.AggregationMethod(value)[source]
-
Aggregation method for combining samples.
- COUNT = 'count'
- FIRST = 'first'
- LAST = 'last'
- MAX = 'max'
- MEAN = 'mean'
- MEDIAN = 'median'
- MIN = 'min'
- STD = 'std'
- SUM = 'sum'
- VOTE = 'vote'
- class nirs4all.data.aggregation.Aggregator(config: AggregationConfig)[source]
Bases:
objectAggregates sample data during loading.
Supports grouping by metadata columns, target values, or sample IDs, with configurable aggregation methods for features and targets.
Example
```python # Aggregate by sample_id column using mean config = AggregationConfig(column=”sample_id”, method=”mean”) aggregator = Aggregator(config) X_agg, y_agg, meta_agg = aggregator.aggregate(X, y, metadata)
# Aggregate with outlier exclusion config = AggregationConfig(
column=”sample_id”, method=”mean”, exclude_outliers=True, outlier_threshold=2.5
) aggregator = Aggregator(config) result = aggregator.aggregate(X, y, metadata)
# Custom aggregation function config = AggregationConfig(
column=”sample_id”, custom_function=lambda x: np.percentile(x, 75, axis=0)
) aggregator = Aggregator(config) result = aggregator.aggregate(X, y, metadata) ```
- aggregate(X: ndarray, y: ndarray | None = None, metadata: DataFrame | None = None, group_column: str | None = None) tuple[source]
Aggregate data by groups.
- Parameters:
X – Feature array of shape (n_samples, n_features).
y – Optional target array of shape (n_samples,) or (n_samples, n_targets).
metadata – Optional metadata DataFrame.
group_column – Override column to group by.
- Returns:
Tuple of (X_aggregated, y_aggregated, metadata_aggregated). Elements are None if not provided in input.
- Raises:
AggregationError – If aggregation fails.
- nirs4all.data.aggregation.aggregate_data(X: ndarray, y: ndarray | None = None, metadata: DataFrame | None = None, column: str | bool | None = None, method: str | AggregationMethod = 'mean', exclude_outliers: bool = False, **kwargs) tuple[source]
Convenience function to aggregate data.
- Parameters:
X – Feature array.
y – Optional target array.
metadata – Optional metadata DataFrame.
column – Column to group by (str), or True for y-based grouping.
method – Aggregation method.
exclude_outliers – Whether to exclude outliers.
**kwargs – Additional aggregation config options.
- Returns:
Tuple of (X_aggregated, y_aggregated, metadata_aggregated).