nirs4all.data.loaders.csv_loader module

nirs4all.data.loaders.csv_loader.load_csv(path, na_policy='auto', data_type='x', categorical_mode='auto', header_unit='cm-1', **user_params)[source]

Loads a CSV file using specified or default parameters, cleans data, handles NA values, and performs type conversions.

Parameters:
  • path (str or Path) – Path to the CSV file (.csv, .gz, .zip).

  • na_policy (str) – ‘remove’ or ‘abort’ (or ‘auto’ which acts like ‘remove’). This policy applies to row removal if NAs are found.

  • data_type (str) – ‘x’ or ‘y’. Influences type conversion.

  • categorical_mode (str) – How to handle string columns in ‘y’ data: - ‘auto’: Convert string columns to numerical categories. - ‘preserve’: Keep string columns (will become NaN if not convertible by final astype). - ‘none’: Treat all columns as potentially numeric.

  • header_unit (str) – Unit type of headers - “cm-1” (wavenumber), “nm” (wavelength), “none” (no headers), “text” (string headers), “index” (feature indices). Default: “cm-1”

  • **user_params – CSV parsing parameters (delimiter, decimal_separator, has_header) and other pandas.read_csv arguments.

Returns:

  • DataFrame with processed data (before NA row removal).

  • Report dictionary.

  • Boolean Series indicating rows with NAs (aligned with the returned DataFrame).

  • List of column headers (or None if no headers).

  • Header unit string.

None if an error occurs before this stage.

Return type:

(Union[pandas.DataFrame, None], dict, Union[pandas.Series, None], Union[List[str], None], str)