nirs4all.data.loaders package

Submodules

Module contents

File loaders module for nirs4all.

This module provides a pluggable file loading system supporting multiple file formats with automatic format detection and configurable loading parameters.

Supported Formats:

CSV (.csv, .csv.gz, .csv.zip) - via CSVLoader
NumPy (.npy, .npz) - via NumpyLoader
Parquet (.parquet, .pq) - via ParquetLoader (requires pyarrow or fastparquet)
Excel (.xlsx, .xls) - via ExcelLoader (requires openpyxl/xlrd)
MATLAB (.mat) - via MatlabLoader (requires scipy, optionally h5py)
Archives (.tar, .tar.gz, .tgz, .zip) - via TarLoader, EnhancedZipLoader

Usage:

>>> from nirs4all.data.loaders import LoaderRegistry, load_file
>>>
>>> # Using the registry
>>> registry = LoaderRegistry.get_instance()
>>> result = registry.load("data.csv", delimiter=",")
>>>
>>> # Or using the convenience function
>>> data, report, na_mask, headers, unit = load_file("data.csv")
>>>
>>> # Direct loader usage
>>> from nirs4all.data.loaders import CSVLoader
>>> loader = CSVLoader()
>>> result = loader.load(Path("data.csv"))

Adding Custom Loaders:

>>> from nirs4all.data.loaders import FileLoader, register_loader
>>>
>>> @register_loader
... class MyLoader(FileLoader):
...     supported_extensions = (".myext",)
...     name = "My Loader"
...
...     @classmethod
...     def supports(cls, path):
...         return path.suffix.lower() == ".myext"
...
...     def load(self, path, **params):
...         # Load implementation
...         pass

Backward Compatibility:

The legacy load_csv function is still available for existing code: >>> from nirs4all.data.loaders.csv_loader import load_csv

class nirs4all.data.loaders.ArchiveHandler[source]

Bases: object

Utility class for handling compressed files and archives.

Supports: - Gzip compressed files (.gz) - Zip archives (.zip) with member selection - Tar archives (.tar, .tar.gz, .tgz, .tar.bz2) with member selection

static decompress_gzip(path: Path, encoding: str = 'utf-8') → str[source]

Decompress a gzip file and return content as string.

Parameters:

path – Path to the gzip file.
encoding – Text encoding to use.

Returns:

Decompressed file content as string.

static decompress_gzip_bytes(path: Path) → bytes[source]

Decompress a gzip file and return content as bytes.

Parameters:: path – Path to the gzip file.
Returns:: Decompressed file content as bytes.

static extract_bytes_from_tar(path: Path, member: str | None = None) → bytes[source]

Extract a file from a tar archive as bytes.

Parameters:

path – Path to the tar file.
member – Name of the member to extract. If None, auto-detect.

Returns:

Content of the extracted file as bytes.

static extract_bytes_from_zip(path: Path, member: str | None = None) → bytes[source]

Extract a file from a zip archive as bytes.

Parameters:

path – Path to the zip file.
member – Name of the member to extract. If None, auto-detect.

Returns:

Content of the extracted file as bytes.

static extract_from_tar(path: Path, member: str | None = None, encoding: str = 'utf-8') → str[source]

Extract a file from a tar archive.

Parameters:

path – Path to the tar file.
member – Name of the member to extract. If None, auto-detect.
encoding – Text encoding to use.

Returns:

Content of the extracted file as string.

Raises:

FileLoadError – If no suitable member is found.

static extract_from_zip(path: Path, member: str | None = None, encoding: str = 'utf-8') → str[source]

Extract a file from a zip archive.

Parameters:

path – Path to the zip file.
member – Name of the member to extract. If None, auto-detect.
encoding – Text encoding to use.

Returns:

Content of the extracted file as string.

Raises:

FileLoadError – If no suitable member is found.

static is_archive(path: Path) → bool[source]: Check if a file is an archive (contains multiple files).

static is_compressed(path: Path) → bool[source]: Check if a file is compressed.

static list_tar_members(path: Path) → List[str][source]

List members in a tar archive.

Parameters:: path – Path to the tar file.
Returns:: List of member names in the archive.

static list_zip_members(path: Path) → List[str][source]

List members in a zip archive.

Parameters:: path – Path to the zip file.
Returns:: List of member names in the archive.

class nirs4all.data.loaders.CSVLoader[source]

Bases: FileLoader

Loader for CSV files.

Supports: - Plain CSV files (.csv) - Gzip-compressed CSV files (.csv.gz) - Zip-compressed CSV files (.csv.zip)

Parameters:

delimiter – Field delimiter (default: ‘;’)
decimal_separator – Decimal separator (default: ‘.’)
has_header – Whether first row is header (default: True)
header_unit – Unit for headers (‘cm-1’, ‘nm’, etc.)
na_policy – How to handle NA values (‘remove’ or ‘abort’)
categorical_mode – How to handle categorical data (‘auto’, ‘preserve’, ‘none’)
data_type – Type of data being loaded (‘x’, ‘y’, or ‘metadata’)
encoding – File encoding (default: ‘utf-8’)
member – For zip files, specific member to extract

load(path: Path, na_policy: str = 'auto', data_type: str = 'x', categorical_mode: str = 'auto', header_unit: str = 'cm-1', encoding: str = 'utf-8', member: str | None = None, **user_params: Any) → LoaderResult[source]

Load data from a CSV file.

Parameters:

path – Path to the CSV file.
na_policy – How to handle NA values (‘remove’, ‘abort’, or ‘auto’).
data_type – Type of data (‘x’, ‘y’, or ‘metadata’).
categorical_mode – How to handle categorical columns.
header_unit – Unit type for headers.
encoding – File encoding.
member – For zip files, specific member to extract.
**user_params – Additional CSV parsing parameters.

Returns:

LoaderResult with the loaded data.

name: ClassVar[str] = 'CSV Loader'

priority: ClassVar[int] = 50

supported_extensions: ClassVar[Tuple[str, ...]] = ('.csv',)

classmethod supports(path: Path) → bool[source]

Check if this loader supports the given file.

Supports .csv, .csv.gz, and .csv.zip files.

class nirs4all.data.loaders.EnhancedZipLoader[source]

Bases: FileLoader

Enhanced loader for zip archive files.

This loader provides additional features over the basic zip support in the CSV loader, including: - Member listing and selection - Support for non-CSV files in archives - Binary file extraction (for NumPy, Parquet, etc.)

Parameters:

member – Name of the member file to extract.
password – Password for encrypted archives.
encoding – Text encoding for text files.

Example

>>> loader = EnhancedZipLoader()
>>> result = loader.load(
...     Path("data.zip"),
...     member="train/features.csv",
... )

load(path: Path, member: str | None = None, password: str | None = None, encoding: str = 'utf-8', header_unit: str = 'cm-1', data_type: str = 'x', **params: Any) → LoaderResult[source]

Load data from a zip archive.

Parameters:

path – Path to the zip archive.
member – Name of the member to extract.
password – Password for encrypted archives.
encoding – Text encoding for text files.
header_unit – Unit type for headers.
data_type – Type of data.
**params – Additional parameters for the inner loader.

Returns:

LoaderResult with the loaded data.

name: ClassVar[str] = 'Enhanced Zip Loader'

priority: ClassVar[int] = 65

supported_extensions: ClassVar[Tuple[str, ...]] = ('.zip',)

classmethod supports(path: Path) → bool[source]: Check if this loader supports the given file.

class nirs4all.data.loaders.ExcelLoader[source]

Bases: FileLoader

Loader for Excel spreadsheet files.

Supports: - Modern Excel files (.xlsx) via openpyxl - Legacy Excel files (.xls) via xlrd

Parameters:

sheet_name – Sheet name or index to load (default: 0, first sheet). Can be a string (sheet name), integer (0-indexed), or None (all sheets).
header – Row number to use as header (default: 0). Use None for no header.
skip_rows – Number of rows to skip at the beginning.
skip_footer – Number of rows to skip at the end.
usecols – Columns to load (can be list of names, indices, or Excel-style range).
engine – Excel engine to use (‘auto’, ‘openpyxl’, or ‘xlrd’).
header_unit – Unit for headers (‘cm-1’, ‘nm’, ‘text’, etc.)

Example

>>> loader = ExcelLoader()
>>> result = loader.load(
...     Path("data.xlsx"),
...     sheet_name="Sheet1",
...     skip_rows=2,
... )

Load data from an Excel file.

Parameters:

path – Path to the Excel file.
sheet_name – Sheet to load (name, index, or None for all).
header – Row number for header (0-indexed), or None.
skip_rows – Number of rows to skip at start.
skip_footer – Number of rows to skip at end.
usecols – Columns to load.
engine – Excel engine to use.
header_unit – Unit type for headers.
data_type – Type of data (‘x’, ‘y’, or ‘metadata’).
**params – Additional parameters passed to read_excel.

Returns:

LoaderResult with the loaded data.

name: ClassVar[str] = 'Excel Loader'

priority: ClassVar[int] = 45

supported_extensions: ClassVar[Tuple[str, ...]] = ('.xlsx', '.xls')

classmethod supports(path: Path) → bool[source]: Check if this loader supports the given file.

exception nirs4all.data.loaders.FileLoadError[source]

Bases: LoaderError

Raised when a file cannot be loaded.

class nirs4all.data.loaders.FileLoader[source]

Bases: ABC

Abstract base class for file loaders.

All file format loaders should inherit from this class and implement the required methods for loading and format detection.

Class Attributes:: supported_extensions: Tuple of file extensions this loader handles. name: Human-readable name for the loader. priority: Loading priority (lower = higher priority) when multiple

loaders match. Default: 50.

Example

>>> class CSVLoader(FileLoader):
...     supported_extensions = (".csv",)
...     name = "CSV Loader"
...
...     @classmethod
...     def supports(cls, path: Path) -> bool:
...         return path.suffix.lower() in cls.supported_extensions
...
...     def load(self, path: Path, **params) -> LoaderResult:
...         # Load CSV file
...         pass

classmethod detect_format(path: Path) → str | None[source]

Detect the file format from the path.

Parameters:: path – Path to analyze.
Returns:: Format name if detected, None otherwise.

classmethod get_base_path(path: Path) → Path[source]

Get the base path without compression extensions.

For example, ‘data.csv.gz’ -> ‘data.csv’

Parameters:: path – Path to process.
Returns:: Path without compression extension(s).

abstractmethod load(path: Path, **params: Any) → LoaderResult[source]

Load data from a file.

Parameters:

path – Path to the file to load.
**params – Loader-specific parameters.

Returns:

LoaderResult containing the loaded data and metadata.

Raises:

FileLoadError – If the file cannot be loaded.

name: ClassVar[str] = 'Base Loader'

priority: ClassVar[int] = 50

supported_extensions: ClassVar[Tuple[str, ...]] = ()

abstractmethod classmethod supports(path: Path) → bool[source]

Check if this loader can handle the given file.

Parameters:: path – Path to the file to check.
Returns:: True if this loader can handle the file, False otherwise.

exception nirs4all.data.loaders.FormatNotSupportedError[source]

Bases: LoaderError

Raised when a file format is not supported.

exception nirs4all.data.loaders.LoaderError[source]

Bases: Exception

Base exception for loader errors.

class nirs4all.data.loaders.LoaderRegistry[source]

Bases: object

Registry for file loaders.

The registry maintains a list of available loaders and provides methods for finding the appropriate loader for a given file.

Example

>>> registry = LoaderRegistry()
>>> registry.register(CSVLoader)
>>> registry.register(ParquetLoader)
>>> loader = registry.get_loader(Path("data.csv"))
>>> result = loader.load(Path("data.csv"))

static __new__(cls) → LoaderRegistry[source]: Implement singleton pattern.

clear() → None[source]: Clear all registered loaders (mainly for testing).

classmethod get_instance() → LoaderRegistry[source]: Get the singleton registry instance.

get_loader(path: str | Path) → FileLoader[source]

Get the appropriate loader for a file.

Parameters:: path – Path to the file to load.
Returns:: An instance of the appropriate loader.
Raises:: FormatNotSupportedError – If no loader supports the file format.

get_registered_loaders() → List[Type[FileLoader]][source]

Get all registered loader classes.

Returns:: List of registered loader classes.

get_supported_extensions() → List[str][source]

Get all supported file extensions.

Returns:: List of supported extensions across all registered loaders.

load(path: str | Path, **params: Any) → LoaderResult[source]

Load a file using the appropriate loader.

This is a convenience method that finds the right loader and loads the file.

Parameters:

path – Path to the file to load.
**params – Loading parameters to pass to the loader.

Returns:

LoaderResult containing the loaded data.

Raises:

FormatNotSupportedError – If no loader supports the file format.
FileLoadError – If the file cannot be loaded.

register(loader_class: Type[FileLoader]) → None[source]

Register a file loader.

Parameters:: loader_class – The loader class to register.

unregister(loader_class: Type[FileLoader]) → None[source]

Unregister a file loader.

Parameters:: loader_class – The loader class to unregister.

class nirs4all.data.loaders.LoaderResult(data: DataFrame | None = None, report: Dict[str, Any] | None = None, na_mask: Series | None = None, headers: List[str] | None = None, header_unit: str = 'cm-1')[source]

Bases: object

Result container for file loading operations.

data: The loaded data as a pandas DataFrame.

report: Dictionary containing loading metadata and diagnostics.

na_mask: Boolean Series indicating rows with NA values.

headers: List of column headers.

header_unit: The unit type for headers (e.g., ‘cm-1’, ‘nm’).

property error: str | None: Get error message if loading failed.

property success: bool: Check if loading was successful.

class nirs4all.data.loaders.MatlabLoader[source]

Bases: FileLoader

Loader for MATLAB .mat files.

Supports: - MATLAB v4, v6, v7 files via scipy.io - MATLAB v7.3 (HDF5) files via h5py (if available)

Parameters:

variable – Name of the variable to load. If None, auto-detects.
squeeze_me – Squeeze unit matrix dimensions (default: True).
struct_as_record – Load MATLAB structs as numpy record arrays (default: False).
header_unit – Unit for generated headers (‘index’, ‘cm-1’, ‘nm’, etc.)

Example

>>> loader = MatlabLoader()
>>> result = loader.load(
...     Path("data.mat"),
...     variable="X",
... )

load(path: Path, variable: str | None = None, squeeze_me: bool = True, struct_as_record: bool = False, header_unit: str = 'index', data_type: str = 'x', **params: Any) → LoaderResult[source]

Load data from a MATLAB .mat file.

Parameters:

path – Path to the MATLAB file.
variable – Name of the variable to load. If None, auto-detects.
squeeze_me – Squeeze unit matrix dimensions.
struct_as_record – Load structs as record arrays.
header_unit – Unit type for generated headers.
data_type – Type of data (‘x’, ‘y’, or ‘metadata’).
**params – Additional parameters.

Returns:

LoaderResult with the loaded data.

name: ClassVar[str] = 'MATLAB Loader'

priority: ClassVar[int] = 45

supported_extensions: ClassVar[Tuple[str, ...]] = ('.mat',)

classmethod supports(path: Path) → bool[source]: Check if this loader supports the given file.

class nirs4all.data.loaders.NumpyLoader[source]

Bases: FileLoader

Loader for NumPy array files.

Supports: - Single array files (.npy) - Multi-array archives (.npz)

Parameters:

allow_pickle – Whether to allow loading pickled objects (default: False). Setting this to True may pose a security risk with untrusted files.
key – For .npz files, the key of the array to load. If not specified, uses the first array.
header_unit – Unit for generated headers (‘cm-1’, ‘nm’, ‘index’, etc.)

Security Note:: NumPy’s allow_pickle=True can execute arbitrary code when loading untrusted files. Only enable this for files you trust completely.

load(path: Path, allow_pickle: bool = False, key: str | None = None, header_unit: str = 'index', data_type: str = 'x', **params: Any) → LoaderResult[source]

Load data from a NumPy file.

Parameters:

path – Path to the NumPy file.
allow_pickle – Whether to allow loading pickled objects.
key – For .npz files, the key of the array to load.
header_unit – Unit type for generated headers.
data_type – Type of data (‘x’, ‘y’, or ‘metadata’).
**params – Additional parameters (ignored).

Returns:

LoaderResult with the loaded data as a DataFrame.

name: ClassVar[str] = 'NumPy Loader'

priority: ClassVar[int] = 40

supported_extensions: ClassVar[Tuple[str, ...]] = ('.npy', '.npz')

classmethod supports(path: Path) → bool[source]: Check if this loader supports the given file.

class nirs4all.data.loaders.ParquetLoader[source]

Bases: FileLoader

Loader for Apache Parquet files.

Requires pyarrow or fastparquet to be installed.

Supports: - Single Parquet files (.parquet, .pq) - Partitioned datasets (directory of parquet files) - Column selection for efficient loading

Parameters:

columns – List of column names to load (default: all columns).
engine – Parquet engine to use (‘auto’, ‘pyarrow’, or ‘fastparquet’).
filters – Row group filters for predicate pushdown (pyarrow only).
header_unit – Unit for headers (‘cm-1’, ‘nm’, ‘text’, etc.)

Example

>>> loader = ParquetLoader()
>>> result = loader.load(
...     Path("data.parquet"),
...     columns=["feature_1", "feature_2"],
... )

load(path: Path, columns: List[str] | None = None, engine: str = 'auto', filters: List | None = None, header_unit: str = 'text', data_type: str = 'x', **params: Any) → LoaderResult[source]

Load data from a Parquet file.

Parameters:

path – Path to the Parquet file or directory.
columns – List of column names to load. If None, loads all columns.
engine – Parquet engine (‘auto’, ‘pyarrow’, or ‘fastparquet’).
filters – Row group filters for predicate pushdown (pyarrow only).
header_unit – Unit type for headers.
data_type – Type of data (‘x’, ‘y’, or ‘metadata’).
**params – Additional parameters passed to read_parquet.

Returns:

LoaderResult with the loaded data.

name: ClassVar[str] = 'Parquet Loader'

priority: ClassVar[int] = 35

supported_extensions: ClassVar[Tuple[str, ...]] = ('.parquet', '.pq')

classmethod supports(path: Path) → bool[source]: Check if this loader supports the given file.

class nirs4all.data.loaders.TarLoader[source]

Bases: FileLoader

Loader for tar archive files.

Supports: - Plain tar files (.tar) - Gzip-compressed tar files (.tar.gz, .tgz) - Bzip2-compressed tar files (.tar.bz2) - XZ-compressed tar files (.tar.xz)

Parameters:

member – Name of the member file to extract. If None, auto-detects the first suitable file (prefers CSV).
encoding – Text encoding for the extracted file (default: ‘utf-8’).
inner_loader_params – Parameters to pass to the inner file loader.

Example

>>> loader = TarLoader()
>>> result = loader.load(
...     Path("data.tar.gz"),
...     member="data/train.csv",
... )

load(path: Path, member: str | None = None, encoding: str = 'utf-8', header_unit: str = 'cm-1', data_type: str = 'x', **params: Any) → LoaderResult[source]

Load data from a tar archive.

Parameters:

path – Path to the tar archive.
member – Name of the member to extract. If None, auto-detects.
encoding – Text encoding for extracted files.
header_unit – Unit type for headers.
data_type – Type of data (‘x’, ‘y’, or ‘metadata’).
**params – Additional parameters for the inner loader.

Returns:

LoaderResult with the loaded data.

name: ClassVar[str] = 'Tar Archive Loader'

priority: ClassVar[int] = 60

supported_extensions: ClassVar[Tuple[str, ...]] = ('.tar',)

classmethod supports(path: Path) → bool[source]: Check if this loader supports the given file.

nirs4all.data.loaders.get_loader_for_file(path: str | Path) → FileLoader[source]

Get the appropriate loader for a file.

Parameters:: path – Path to the file.
Returns:: Instance of the appropriate FileLoader subclass.
Raises:: FormatNotSupportedError – If no loader supports the file format.

nirs4all.data.loaders.get_supported_formats() → Dict[str, List[str]][source]

Get all supported file formats and their extensions.

Returns:: Dictionary mapping loader names to their supported extensions.

Example

>>> formats = get_supported_formats()
>>> for name, exts in formats.items():
...     print(f"{name}: {', '.join(exts)}")

nirs4all.data.loaders.list_archive_members(path) → List[str][source]

List members in an archive file.

Parameters:: path – Path to the archive.
Returns:: List of member names.
Raises:: FileLoadError – If the archive cannot be read.

nirs4all.data.loaders.load_csv(path, na_policy='auto', data_type='x', categorical_mode='auto', header_unit='cm-1', **user_params)[source]

Loads a CSV file using specified or default parameters, cleans data, handles NA values, and performs type conversions.

Parameters:

path (str or Path) – Path to the CSV file (.csv, .gz, .zip).
na_policy (str) – ‘remove’ or ‘abort’ (or ‘auto’ which acts like ‘remove’). This policy applies to row removal if NAs are found.
data_type (str) – ‘x’ or ‘y’. Influences type conversion.
categorical_mode (str) – How to handle string columns in ‘y’ data: - ‘auto’: Convert string columns to numerical categories. - ‘preserve’: Keep string columns (will become NaN if not convertible by final astype). - ‘none’: Treat all columns as potentially numeric.
header_unit (str) – Unit type of headers - “cm-1” (wavenumber), “nm” (wavelength), “none” (no headers), “text” (string headers), “index” (feature indices). Default: “cm-1”
**user_params – CSV parsing parameters (delimiter, decimal_separator, has_header) and other pandas.read_csv arguments.

Returns:

DataFrame with processed data (before NA row removal).
Report dictionary.
Boolean Series indicating rows with NAs (aligned with the returned DataFrame).
List of column headers (or None if no headers).
Header unit string.

None if an error occurs before this stage.

Return type:

(Union[pandas.DataFrame, None], dict, Union[pandas.Series, None], Union[List[str], None], str)

nirs4all.data.loaders.load_csv_new(path, na_policy: str = 'auto', data_type: str = 'x', categorical_mode: str = 'auto', header_unit: str = 'cm-1', **user_params)

Load a CSV file using the CSVLoader.

This function maintains backward compatibility with the original load_csv API.

Parameters:

path – Path to the CSV file.
na_policy – How to handle NA values.
data_type – Type of data being loaded.
categorical_mode – How to handle categorical columns.
header_unit – Unit type for headers.
**user_params – Additional CSV parsing parameters.

Returns:

Tuple of (DataFrame, report, na_mask, headers, header_unit).

Load an Excel file.

Convenience function for direct use.

Parameters:

path – Path to the Excel file.
sheet_name – Sheet to load.
header – Row number for header.
skip_rows – Rows to skip at start.
skip_footer – Rows to skip at end.
usecols – Columns to load.
engine – Excel engine to use.
header_unit – Unit type for headers.
**params – Additional parameters.

Returns:

Tuple of (DataFrame, report, na_mask, headers, header_unit).

nirs4all.data.loaders.load_file(path: str | Path, **params: Any) → Tuple[DataFrame | None, Dict[str, Any], Series | None, List[str], str][source]

Load a data file with automatic format detection.

This is the main entry point for loading files. It automatically detects the file format and uses the appropriate loader.

Parameters:

path – Path to the file to load.
**params – Format-specific loading parameters. Common parameters include: - header_unit: Unit for headers (‘cm-1’, ‘nm’, ‘text’, etc.) - data_type: Type of data (‘x’, ‘y’, or ‘metadata’) - delimiter: CSV delimiter - sheet_name: Excel sheet to load - variable: MATLAB variable name - member: Archive member to extract

Returns:

DataFrame with loaded data (or None on error)
Report dictionary with loading metadata
NA mask Series (rows with missing values)
List of column headers
Header unit string

Return type:

Tuple of

Raises:

FormatNotSupportedError – If no loader supports the file format.

Example

>>> data, report, na_mask, headers, unit = load_file("data.csv")
>>> if report.get("error"):
...     print(f"Error: {report['error']}")
>>> else:
...     print(f"Loaded {data.shape[0]} samples with {data.shape[1]} features")

nirs4all.data.loaders.load_matlab(path, variable: str | None = None, squeeze_me: bool = True, header_unit: str = 'index', **params)[source]

Load a MATLAB .mat file.

Convenience function for direct use.

Parameters:

path – Path to the MATLAB file.
variable – Name of the variable to load.
squeeze_me – Squeeze unit dimensions.
header_unit – Unit type for headers.
**params – Additional parameters.

Returns:

Tuple of (DataFrame, report, na_mask, headers, header_unit).

nirs4all.data.loaders.load_numpy(path, allow_pickle: bool = False, key: str | None = None, header_unit: str = 'index', **params)[source]

Load a NumPy file.

Convenience function for backward compatibility.

Parameters:

path – Path to the NumPy file.
allow_pickle – Whether to allow pickled objects.
key – For .npz files, the array key to load.
header_unit – Unit type for generated headers.
**params – Additional parameters.

Returns:

Tuple of (DataFrame, report, na_mask, headers, header_unit).

nirs4all.data.loaders.load_parquet(path, columns: List[str] | None = None, engine: str = 'auto', header_unit: str = 'text', **params)[source]

Load a Parquet file.

Convenience function for direct use.

Parameters:

path – Path to the Parquet file.
columns – Column names to load.
engine – Parquet engine to use.
header_unit – Unit type for headers.
**params – Additional parameters.

Returns:

Tuple of (DataFrame, report, na_mask, headers, header_unit).

nirs4all.data.loaders.register_loader(cls: Type[FileLoader]) → Type[FileLoader][source]

Decorator to register a loader with the global registry.

Example

>>> @register_loader
... class MyLoader(FileLoader):
...     ...