nirs4all.data.parsers.normalizer module
Configuration normalizer for dataset configuration.
This module provides the ConfigNormalizer class that combines all parsers and produces a canonical representation of dataset configurations.
- class nirs4all.data.parsers.normalizer.ConfigNormalizer(parsers: List[BaseParser] | None = None)[source]
Bases:
objectNormalizes dataset configurations from various input formats.
This class combines multiple parsers to handle: - Folder paths (auto-scanning) - JSON/YAML config files - Dictionary configurations (legacy format) - Sources configurations (multi-source format) - Variations configurations (preprocessed data / feature variations) - In-memory numpy arrays
All inputs are normalized to a canonical dictionary format that can be validated and processed by the loader.
Example
```python normalizer = ConfigNormalizer()
# From folder path config, name = normalizer.normalize(“/path/to/data/”)
# From config file config, name = normalizer.normalize(“config.yaml”)
# From dictionary config, name = normalizer.normalize({“train_x”: “data/X.csv”})
# From sources format config, name = normalizer.normalize({
- “sources”: [
{“name”: “NIR”, “train_x”: “NIR_train.csv”}, {“name”: “MIR”, “train_x”: “MIR_train.csv”}
]
})
# From variations format config, name = normalizer.normalize({
- “variations”: [
{“name”: “raw”, “train_x”: “X_raw.csv”}, {“name”: “snv”, “train_x”: “X_snv.csv”}
], “variation_mode”: “separate”
})