nirs4all.data.parsers.normalizer module

Configuration normalizer for dataset configuration.

This module provides the ConfigNormalizer class that combines all parsers and produces a canonical representation of dataset configurations.

class nirs4all.data.parsers.normalizer.ConfigNormalizer(parsers: List[BaseParser] | None = None)[source]

Bases: object

Normalizes dataset configurations from various input formats.

This class combines multiple parsers to handle: - Folder paths (auto-scanning) - JSON/YAML config files - Dictionary configurations (legacy format) - Sources configurations (multi-source format) - Variations configurations (preprocessed data / feature variations) - In-memory numpy arrays

All inputs are normalized to a canonical dictionary format that can be validated and processed by the loader.

Example

```python normalizer = ConfigNormalizer()

# From folder path config, name = normalizer.normalize(“/path/to/data/”)

# From config file config, name = normalizer.normalize(“config.yaml”)

# From dictionary config, name = normalizer.normalize({“train_x”: “data/X.csv”})

# From sources format config, name = normalizer.normalize({

“sources”: [

{“name”: “NIR”, “train_x”: “NIR_train.csv”}, {“name”: “MIR”, “train_x”: “MIR_train.csv”}

]

})

# From variations format config, name = normalizer.normalize({

“variations”: [

{“name”: “raw”, “train_x”: “X_raw.csv”}, {“name”: “snv”, “train_x”: “X_snv.csv”}

], “variation_mode”: “separate”

})

normalize(input_data: Any) Tuple[Dict[str, Any] | None, str][source]

Normalize a configuration to canonical format.

Parameters:

input_data – Configuration in any supported format.

Returns:

Tuple of (normalized_config, dataset_name). Returns (None, ‘Unknown_dataset’) if parsing fails.

nirs4all.data.parsers.normalizer.normalize_config(input_data: Any) Tuple[Dict[str, Any] | None, str][source]

Convenience function to normalize a configuration.

Parameters:

input_data – Configuration in any supported format.

Returns:

Tuple of (normalized_config, dataset_name).