nirs4all.data.loaders.loader module
- nirs4all.data.loaders.loader.create_synthetic_dataset(config: Dict) SpectroDataset[source]
Create a synthetic SpectroDataset for testing purposes.
- Parameters:
config – Dictionary with keys: - X: Feature matrix (n_samples, n_features) - y: Target values (n_samples,) - folds: Number of CV folds - train/val/test: Split ratios - random_state: Random seed
- Returns:
Synthetic dataset ready for pipeline use
- Return type:
- nirs4all.data.loaders.loader.handle_data(config, t_set)[source]
Handle data loading for a given dataset type (train, test). Supports both single-source and multi-source datasets.
Parameters: - config (dict): Data configuration dictionary. - t_set (str): The dataset type (‘train’, ‘test’).
Returns: - tuple: (x, y, m, x_headers, m_headers, x_header_unit, x_signal_type) where:
x is numpy array or list of arrays
y is numpy array
m is DataFrame or None (metadata)
x_headers is list of column names or list of lists for multi-source
m_headers is list of metadata column names
x_header_unit is string or list of strings for multi-source (“cm-1”, “nm”, “none”, “text”, “index”)
x_signal_type is SignalType or list of SignalType for multi-source (None for auto-detect)
- nirs4all.data.loaders.loader.load_XY(x_path, x_filter, x_params, y_path, y_filter, y_params, m_path=None, m_filter=None, m_params=None)[source]
Load X, Y, and metadata from single paths. For multi-source, this will be called multiple times.
Parameters: - x_path (str): Single path to X data file. - x_filter: Filter to apply to X data (not implemented yet). - x_params (dict): Parameters for loading X data, including:
header_unit: Unit for headers (“cm-1”, “nm”, “none”, “text”, “index”)
signal_type: Signal type (“absorbance”, “reflectance”, “reflectance%”, etc.)
delimiter, decimal_separator, has_header, na_policy, etc.
y_path (str): Path to the Y data file (can be None).
y_filter: Filter to apply to Y data (or indices if y_path is None).
y_params (dict): Parameters for loading Y data.
m_path (str): Path to metadata file (can be None).
m_filter: Filter to apply to metadata (not implemented yet).
m_params (dict): Parameters for loading metadata.
Returns: - tuple: (x, y, m, x_headers, m_headers, x_header_unit, x_signal_type) where:
x, y, m are numpy arrays/DataFrames
x_headers, m_headers are lists of column names
x_header_unit is the unit string for X headers (“cm-1”, “nm”, “none”, “text”, “index”)
x_signal_type is the signal type (SignalType enum or None for auto-detect)
Raises: - ValueError: If data is invalid or if there are inconsistencies.