nirs4all.data.features module
- class nirs4all.data.features.Features(cache: bool = False)[source]
Bases:
objectManages N aligned NumPy sources + a Polars index.
This class coordinates multiple FeatureSource objects, ensuring they remain aligned in terms of sample count while allowing different feature dimensions and processing pipelines per source.
- sources
List of FeatureSource objects managing individual feature arrays.
- cache
Whether to enable caching for operations.
- add_samples(data: ndarray | list[ndarray], headers: List[str] | List[List[str]] | None = None, header_unit: str | List[str] | None = None) None[source]
Add samples to all sources, ensuring alignment.
- Parameters:
data – Single 2D array or list of 2D arrays, one per source.
headers – Optional feature headers. Single list applies to all sources, or list of lists for per-source headers.
header_unit – Optional unit type for headers (“cm-1”, “nm”, “none”, “text”, “index”). Single string applies to all sources, or list for per-source units.
- Raises:
ValueError – If number of data arrays doesn’t match existing sources, or if headers/units lists don’t match number of sources.
- add_samples_batch_3d(data: ndarray | List[ndarray]) None[source]
Add multiple samples with 3D data in a single operation - O(N) instead of O(N²).
This method is optimized for bulk insertion of augmented samples where each sample may have multiple processings. Much faster than calling add_samples() in a loop.
- Parameters:
data – Single 3D array of shape (n_samples, n_processings, n_features) or list of 3D arrays for multi-source datasets.
- Raises:
ValueError – If number of data arrays doesn’t match existing sources, or if data dimensions don’t match.
- augment_samples(sample_indices: List[int], data: ndarray | list[ndarray], processings: list[str], count: int | List[int]) None[source]
Create augmented samples from existing ones.
- Parameters:
sample_indices – List of sample indices to augment
data – Augmented feature data (single array or list of arrays for multi-source)
processings – Processing names for the augmented data
count – Number of augmentations per sample (int) or per sample list
- headers(src: int) List[str][source]
Get the list of feature headers for a specific source.
- Parameters:
src – Source index.
- Returns:
List of header strings for the specified source.
- property headers_list: List[List[str]] | List[str]
Get the list of feature headers per source.
- Returns:
List of header lists, one per source.
- keep_sources(source_indices: int | List[int]) None[source]
Keep only specified sources, removing all others.
Used after merge operations with output_as=”features” to consolidate to a single source.
- Parameters:
source_indices – Single source index or list of source indices to keep.
- Raises:
ValueError – If no sources exist or source indices are invalid.
- property num_features: List[int] | int
Get the number of features per source.
- Returns:
Single int if only one source, otherwise list of ints (one per source).
- property num_processings: List[int] | int
Get the number of unique processing IDs per source.
- Returns:
Single int if only one source, otherwise list of ints (one per source).
- property num_samples: int
Get the number of samples (rows) across all sources.
- Returns:
Number of samples in the first source (all sources have the same count).
- property preprocessing_str: List[List[str]] | List[str]
Get the list of processing IDs per source.
- Returns:
List of processing ID lists, one per source.
- update_features(source_processings: list[str], features: list[ndarray] | list[list[ndarray]], processings: list[str], source: int = -1) None[source]
Update or add new feature processings to a specific source.
- Parameters:
source_processings – List of existing processing names to replace. Empty string “” means add new.
features – Feature arrays to add or replace (single array or list of arrays).
processings – Target processing names for the features.
source – Source index to update (default: 0 if negative).
- x(indices: list[int] | ndarray, layout: str = '2d', concat_source: bool = True) ndarray | list[ndarray][source]
Retrieve feature data for specified samples.
- Parameters:
indices – Sample indices to retrieve.
layout – Data layout format (“2d”, “2d_interleaved”, “3d”, “3d_transpose”).
concat_source – If True and multiple sources exist, concatenate along feature dimension.
- Returns:
Feature array(s) in the requested layout. Single array if concat_source=True or only one source, otherwise list of arrays.
- Raises:
ValueError – If no features are available.