nirs4all.data.loaders.loader module

nirs4all.data.loaders.loader.create_synthetic_dataset(config: Dict) SpectroDataset[source]

Create a synthetic SpectroDataset for testing purposes.

Parameters:

config – Dictionary with keys: - X: Feature matrix (n_samples, n_features) - y: Target values (n_samples,) - folds: Number of CV folds - train/val/test: Split ratios - random_state: Random seed

Returns:

Synthetic dataset ready for pipeline use

Return type:

SpectroDataset

nirs4all.data.loaders.loader.handle_data(config, t_set)[source]

Handle data loading for a given dataset type (train, test). Supports both single-source and multi-source datasets.

Parameters: - config (dict): Data configuration dictionary. - t_set (str): The dataset type (‘train’, ‘test’).

Returns: - tuple: (x, y, m, x_headers, m_headers, x_header_unit, x_signal_type) where:

  • x is numpy array or list of arrays

  • y is numpy array

  • m is DataFrame or None (metadata)

  • x_headers is list of column names or list of lists for multi-source

  • m_headers is list of metadata column names

  • x_header_unit is string or list of strings for multi-source (“cm-1”, “nm”, “none”, “text”, “index”)

  • x_signal_type is SignalType or list of SignalType for multi-source (None for auto-detect)

nirs4all.data.loaders.loader.load_XY(x_path, x_filter, x_params, y_path, y_filter, y_params, m_path=None, m_filter=None, m_params=None)[source]

Load X, Y, and metadata from single paths. For multi-source, this will be called multiple times.

Parameters: - x_path (str): Single path to X data file. - x_filter: Filter to apply to X data (not implemented yet). - x_params (dict): Parameters for loading X data, including:

  • header_unit: Unit for headers (“cm-1”, “nm”, “none”, “text”, “index”)

  • signal_type: Signal type (“absorbance”, “reflectance”, “reflectance%”, etc.)

  • delimiter, decimal_separator, has_header, na_policy, etc.

  • y_path (str): Path to the Y data file (can be None).

  • y_filter: Filter to apply to Y data (or indices if y_path is None).

  • y_params (dict): Parameters for loading Y data.

  • m_path (str): Path to metadata file (can be None).

  • m_filter: Filter to apply to metadata (not implemented yet).

  • m_params (dict): Parameters for loading metadata.

Returns: - tuple: (x, y, m, x_headers, m_headers, x_header_unit, x_signal_type) where:

  • x, y, m are numpy arrays/DataFrames

  • x_headers, m_headers are lists of column names

  • x_header_unit is the unit string for X headers (“cm-1”, “nm”, “none”, “text”, “index”)

  • x_signal_type is the signal type (SignalType enum or None for auto-detect)

Raises: - ValueError: If data is invalid or if there are inconsistencies.