nirs4all.data.loaders.csv_loader module

nirs4all.data.loaders.csv_loader.load_csv(path, na_policy='auto', data_type='x', categorical_mode='auto', header_unit='cm-1', **user_params)[source]

Loads a CSV file using specified or default parameters, cleans data, handles NA values, and performs type conversions.

Parameters:

path (str or Path) – Path to the CSV file (.csv, .gz, .zip).
na_policy (str) – ‘remove’ or ‘abort’ (or ‘auto’ which acts like ‘remove’). This policy applies to row removal if NAs are found.
data_type (str) – ‘x’ or ‘y’. Influences type conversion.
categorical_mode (str) – How to handle string columns in ‘y’ data: - ‘auto’: Convert string columns to numerical categories. - ‘preserve’: Keep string columns (will become NaN if not convertible by final astype). - ‘none’: Treat all columns as potentially numeric.
header_unit (str) – Unit type of headers - “cm-1” (wavenumber), “nm” (wavelength), “none” (no headers), “text” (string headers), “index” (feature indices). Default: “cm-1”
**user_params – CSV parsing parameters (delimiter, decimal_separator, has_header) and other pandas.read_csv arguments.

Returns:

DataFrame with processed data (before NA row removal).
Report dictionary.
Boolean Series indicating rows with NAs (aligned with the returned DataFrame).
List of column headers (or None if no headers).
Header unit string.

None if an error occurs before this stage.

Return type:

(Union[pandas.DataFrame, None], dict, Union[pandas.Series, None], Union[List[str], None], str)