nirs4all.data.config_parser module
Dataset configuration parser.
This module provides functions for parsing dataset configurations from various formats. It serves as the main entry point for configuration parsing, delegating to specialized parsers in the nirs4all.data.parsers module.
The parser supports: - Folder paths with auto-scanning for data files - JSON/YAML configuration files - Dictionary configurations (legacy train_x/test_x format) - In-memory numpy arrays
For the new schema-based validation, see nirs4all.data.schema. For specialized parsers, see nirs4all.data.parsers.
- nirs4all.data.config_parser.browse_folder(folder_path, global_params=None)[source]
Scan a folder for data files matching standard naming conventions.
This function delegates to FolderParser for the actual scanning.
- Parameters:
folder_path – Path to folder to scan.
global_params – Optional global loading parameters.
- Returns:
Configuration dictionary with detected file paths.
- nirs4all.data.config_parser.folder_to_name(folder_path)[source]
Extract a dataset name from a folder path.
- Parameters:
folder_path – Path to folder.
- Returns:
Cleaned dataset name.
- nirs4all.data.config_parser.normalize_config_keys(config: Dict[str, Any]) Dict[str, Any][source]
Normalize dataset configuration keys to standard format.
Maps variations like ‘x_train’, ‘X_train’, ‘Xtrain’ to ‘train_x’ Maps metadata variations like ‘metadata_train’, ‘train_metadata’, ‘m_train’ to ‘train_group’
- Parameters:
config – Original configuration dictionary
- Returns:
Normalized configuration with standardized keys
- nirs4all.data.config_parser.parse_config(data_config)[source]
Parse a dataset configuration.
Handles multiple input formats: - String path to a folder: auto-browse for data files - String path to JSON/YAML file (.json, .yaml, .yml): load config from file - Dict with ‘folder’ key: browse folder with optional params - Dict with data keys (train_x, test_x, etc.): use directly
- Parameters:
data_config – Dataset configuration in any supported format.
- Returns:
Tuple of (parsed_config_dict, dataset_name). Returns (None, ‘Unknown_dataset’) if parsing fails.