nirs4all.data.loaders package

Submodules

Module contents

File loaders module for nirs4all.

This module provides a pluggable file loading system supporting multiple file formats with automatic format detection and configurable loading parameters.

Supported Formats:
  • CSV (.csv, .csv.gz, .csv.zip) - via CSVLoader

  • NumPy (.npy, .npz) - via NumpyLoader

  • Parquet (.parquet, .pq) - via ParquetLoader (requires pyarrow or fastparquet)

  • Excel (.xlsx, .xls) - via ExcelLoader (requires openpyxl/xlrd)

  • MATLAB (.mat) - via MatlabLoader (requires scipy, optionally h5py)

  • Archives (.tar, .tar.gz, .tgz, .zip) - via TarLoader, EnhancedZipLoader

Usage:
>>> from nirs4all.data.loaders import LoaderRegistry, load_file
>>>
>>> # Using the registry
>>> registry = LoaderRegistry.get_instance()
>>> result = registry.load("data.csv", delimiter=",")
>>>
>>> # Or using the convenience function
>>> data, report, na_mask, headers, unit = load_file("data.csv")
>>>
>>> # Direct loader usage
>>> from nirs4all.data.loaders import CSVLoader
>>> loader = CSVLoader()
>>> result = loader.load(Path("data.csv"))
Adding Custom Loaders:
>>> from nirs4all.data.loaders import FileLoader, register_loader
>>>
>>> @register_loader
... class MyLoader(FileLoader):
...     supported_extensions = (".myext",)
...     name = "My Loader"
...
...     @classmethod
...     def supports(cls, path):
...         return path.suffix.lower() == ".myext"
...
...     def load(self, path, **params):
...         # Load implementation
...         pass
Backward Compatibility:

The legacy load_csv function is still available for existing code: >>> from nirs4all.data.loaders.csv_loader import load_csv

class nirs4all.data.loaders.ArchiveHandler[source]

Bases: object

Utility class for handling compressed files and archives.

Supports: - Gzip compressed files (.gz) - Zip archives (.zip) with member selection - Tar archives (.tar, .tar.gz, .tgz, .tar.bz2) with member selection

static decompress_gzip(path: Path, encoding: str = 'utf-8') str[source]

Decompress a gzip file and return content as string.

Parameters:
  • path – Path to the gzip file.

  • encoding – Text encoding to use.

Returns:

Decompressed file content as string.

static decompress_gzip_bytes(path: Path) bytes[source]

Decompress a gzip file and return content as bytes.

Parameters:

path – Path to the gzip file.

Returns:

Decompressed file content as bytes.

static extract_bytes_from_tar(path: Path, member: str | None = None) bytes[source]

Extract a file from a tar archive as bytes.

Parameters:
  • path – Path to the tar file.

  • member – Name of the member to extract. If None, auto-detect.

Returns:

Content of the extracted file as bytes.

static extract_bytes_from_zip(path: Path, member: str | None = None) bytes[source]

Extract a file from a zip archive as bytes.

Parameters:
  • path – Path to the zip file.

  • member – Name of the member to extract. If None, auto-detect.

Returns:

Content of the extracted file as bytes.

static extract_from_tar(path: Path, member: str | None = None, encoding: str = 'utf-8') str[source]

Extract a file from a tar archive.

Parameters:
  • path – Path to the tar file.

  • member – Name of the member to extract. If None, auto-detect.

  • encoding – Text encoding to use.

Returns:

Content of the extracted file as string.

Raises:

FileLoadError – If no suitable member is found.

static extract_from_zip(path: Path, member: str | None = None, encoding: str = 'utf-8') str[source]

Extract a file from a zip archive.

Parameters:
  • path – Path to the zip file.

  • member – Name of the member to extract. If None, auto-detect.

  • encoding – Text encoding to use.

Returns:

Content of the extracted file as string.

Raises:

FileLoadError – If no suitable member is found.

static is_archive(path: Path) bool[source]

Check if a file is an archive (contains multiple files).

static is_compressed(path: Path) bool[source]

Check if a file is compressed.

static list_tar_members(path: Path) List[str][source]

List members in a tar archive.

Parameters:

path – Path to the tar file.

Returns:

List of member names in the archive.

static list_zip_members(path: Path) List[str][source]

List members in a zip archive.

Parameters:

path – Path to the zip file.

Returns:

List of member names in the archive.

class nirs4all.data.loaders.CSVLoader[source]

Bases: FileLoader

Loader for CSV files.

Supports: - Plain CSV files (.csv) - Gzip-compressed CSV files (.csv.gz) - Zip-compressed CSV files (.csv.zip)

Parameters:
  • delimiter – Field delimiter (default: ‘;’)

  • decimal_separator – Decimal separator (default: ‘.’)

  • has_header – Whether first row is header (default: True)

  • header_unit – Unit for headers (‘cm-1’, ‘nm’, etc.)

  • na_policy – How to handle NA values (‘remove’ or ‘abort’)

  • categorical_mode – How to handle categorical data (‘auto’, ‘preserve’, ‘none’)

  • data_type – Type of data being loaded (‘x’, ‘y’, or ‘metadata’)

  • encoding – File encoding (default: ‘utf-8’)

  • member – For zip files, specific member to extract

load(path: Path, na_policy: str = 'auto', data_type: str = 'x', categorical_mode: str = 'auto', header_unit: str = 'cm-1', encoding: str = 'utf-8', member: str | None = None, **user_params: Any) LoaderResult[source]

Load data from a CSV file.

Parameters:
  • path – Path to the CSV file.

  • na_policy – How to handle NA values (‘remove’, ‘abort’, or ‘auto’).

  • data_type – Type of data (‘x’, ‘y’, or ‘metadata’).

  • categorical_mode – How to handle categorical columns.

  • header_unit – Unit type for headers.

  • encoding – File encoding.

  • member – For zip files, specific member to extract.

  • **user_params – Additional CSV parsing parameters.

Returns:

LoaderResult with the loaded data.

name: ClassVar[str] = 'CSV Loader'
priority: ClassVar[int] = 50
supported_extensions: ClassVar[Tuple[str, ...]] = ('.csv',)
classmethod supports(path: Path) bool[source]

Check if this loader supports the given file.

Supports .csv, .csv.gz, and .csv.zip files.

class nirs4all.data.loaders.EnhancedZipLoader[source]

Bases: FileLoader

Enhanced loader for zip archive files.

This loader provides additional features over the basic zip support in the CSV loader, including: - Member listing and selection - Support for non-CSV files in archives - Binary file extraction (for NumPy, Parquet, etc.)

Parameters:
  • member – Name of the member file to extract.

  • password – Password for encrypted archives.

  • encoding – Text encoding for text files.

Example

>>> loader = EnhancedZipLoader()
>>> result = loader.load(
...     Path("data.zip"),
...     member="train/features.csv",
... )
load(path: Path, member: str | None = None, password: str | None = None, encoding: str = 'utf-8', header_unit: str = 'cm-1', data_type: str = 'x', **params: Any) LoaderResult[source]

Load data from a zip archive.

Parameters:
  • path – Path to the zip archive.

  • member – Name of the member to extract.

  • password – Password for encrypted archives.

  • encoding – Text encoding for text files.

  • header_unit – Unit type for headers.

  • data_type – Type of data.

  • **params – Additional parameters for the inner loader.

Returns:

LoaderResult with the loaded data.

name: ClassVar[str] = 'Enhanced Zip Loader'
priority: ClassVar[int] = 65
supported_extensions: ClassVar[Tuple[str, ...]] = ('.zip',)
classmethod supports(path: Path) bool[source]

Check if this loader supports the given file.

class nirs4all.data.loaders.ExcelLoader[source]

Bases: FileLoader

Loader for Excel spreadsheet files.

Supports: - Modern Excel files (.xlsx) via openpyxl - Legacy Excel files (.xls) via xlrd

Parameters:
  • sheet_name – Sheet name or index to load (default: 0, first sheet). Can be a string (sheet name), integer (0-indexed), or None (all sheets).

  • header – Row number to use as header (default: 0). Use None for no header.

  • skip_rows – Number of rows to skip at the beginning.

  • skip_footer – Number of rows to skip at the end.

  • usecols – Columns to load (can be list of names, indices, or Excel-style range).

  • engine – Excel engine to use (‘auto’, ‘openpyxl’, or ‘xlrd’).

  • header_unit – Unit for headers (‘cm-1’, ‘nm’, ‘text’, etc.)

Example

>>> loader = ExcelLoader()
>>> result = loader.load(
...     Path("data.xlsx"),
...     sheet_name="Sheet1",
...     skip_rows=2,
... )
load(path: Path, sheet_name: str | int | None = 0, header: int | None = 0, skip_rows: int | None = None, skip_footer: int = 0, usecols: List[str] | List[int] | str | None = None, engine: str = 'auto', header_unit: str = 'text', data_type: str = 'x', **params: Any) LoaderResult[source]

Load data from an Excel file.

Parameters:
  • path – Path to the Excel file.

  • sheet_name – Sheet to load (name, index, or None for all).

  • header – Row number for header (0-indexed), or None.

  • skip_rows – Number of rows to skip at start.

  • skip_footer – Number of rows to skip at end.

  • usecols – Columns to load.

  • engine – Excel engine to use.

  • header_unit – Unit type for headers.

  • data_type – Type of data (‘x’, ‘y’, or ‘metadata’).

  • **params – Additional parameters passed to read_excel.

Returns:

LoaderResult with the loaded data.

name: ClassVar[str] = 'Excel Loader'
priority: ClassVar[int] = 45
supported_extensions: ClassVar[Tuple[str, ...]] = ('.xlsx', '.xls')
classmethod supports(path: Path) bool[source]

Check if this loader supports the given file.

exception nirs4all.data.loaders.FileLoadError[source]

Bases: LoaderError

Raised when a file cannot be loaded.

class nirs4all.data.loaders.FileLoader[source]

Bases: ABC

Abstract base class for file loaders.

All file format loaders should inherit from this class and implement the required methods for loading and format detection.

Class Attributes:

supported_extensions: Tuple of file extensions this loader handles. name: Human-readable name for the loader. priority: Loading priority (lower = higher priority) when multiple

loaders match. Default: 50.

Example

>>> class CSVLoader(FileLoader):
...     supported_extensions = (".csv",)
...     name = "CSV Loader"
...
...     @classmethod
...     def supports(cls, path: Path) -> bool:
...         return path.suffix.lower() in cls.supported_extensions
...
...     def load(self, path: Path, **params) -> LoaderResult:
...         # Load CSV file
...         pass
classmethod detect_format(path: Path) str | None[source]

Detect the file format from the path.

Parameters:

path – Path to analyze.

Returns:

Format name if detected, None otherwise.

classmethod get_base_path(path: Path) Path[source]

Get the base path without compression extensions.

For example, ‘data.csv.gz’ -> ‘data.csv’

Parameters:

path – Path to process.

Returns:

Path without compression extension(s).

abstractmethod load(path: Path, **params: Any) LoaderResult[source]

Load data from a file.

Parameters:
  • path – Path to the file to load.

  • **params – Loader-specific parameters.

Returns:

LoaderResult containing the loaded data and metadata.

Raises:

FileLoadError – If the file cannot be loaded.

name: ClassVar[str] = 'Base Loader'
priority: ClassVar[int] = 50
supported_extensions: ClassVar[Tuple[str, ...]] = ()
abstractmethod classmethod supports(path: Path) bool[source]

Check if this loader can handle the given file.

Parameters:

path – Path to the file to check.

Returns:

True if this loader can handle the file, False otherwise.

exception nirs4all.data.loaders.FormatNotSupportedError[source]

Bases: LoaderError

Raised when a file format is not supported.

exception nirs4all.data.loaders.LoaderError[source]

Bases: Exception

Base exception for loader errors.

class nirs4all.data.loaders.LoaderRegistry[source]

Bases: object

Registry for file loaders.

The registry maintains a list of available loaders and provides methods for finding the appropriate loader for a given file.

Example

>>> registry = LoaderRegistry()
>>> registry.register(CSVLoader)
>>> registry.register(ParquetLoader)
>>> loader = registry.get_loader(Path("data.csv"))
>>> result = loader.load(Path("data.csv"))
static __new__(cls) LoaderRegistry[source]

Implement singleton pattern.

clear() None[source]

Clear all registered loaders (mainly for testing).

classmethod get_instance() LoaderRegistry[source]

Get the singleton registry instance.

get_loader(path: str | Path) FileLoader[source]

Get the appropriate loader for a file.

Parameters:

path – Path to the file to load.

Returns:

An instance of the appropriate loader.

Raises:

FormatNotSupportedError – If no loader supports the file format.

get_registered_loaders() List[Type[FileLoader]][source]

Get all registered loader classes.

Returns:

List of registered loader classes.

get_supported_extensions() List[str][source]

Get all supported file extensions.

Returns:

List of supported extensions across all registered loaders.

load(path: str | Path, **params: Any) LoaderResult[source]

Load a file using the appropriate loader.

This is a convenience method that finds the right loader and loads the file.

Parameters:
  • path – Path to the file to load.

  • **params – Loading parameters to pass to the loader.

Returns:

LoaderResult containing the loaded data.

Raises:
register(loader_class: Type[FileLoader]) None[source]

Register a file loader.

Parameters:

loader_class – The loader class to register.

unregister(loader_class: Type[FileLoader]) None[source]

Unregister a file loader.

Parameters:

loader_class – The loader class to unregister.

class nirs4all.data.loaders.LoaderResult(data: DataFrame | None = None, report: Dict[str, Any] | None = None, na_mask: Series | None = None, headers: List[str] | None = None, header_unit: str = 'cm-1')[source]

Bases: object

Result container for file loading operations.

data

The loaded data as a pandas DataFrame.

report

Dictionary containing loading metadata and diagnostics.

na_mask

Boolean Series indicating rows with NA values.

headers

List of column headers.

header_unit

The unit type for headers (e.g., ‘cm-1’, ‘nm’).

property error: str | None

Get error message if loading failed.

property success: bool

Check if loading was successful.

class nirs4all.data.loaders.MatlabLoader[source]

Bases: FileLoader

Loader for MATLAB .mat files.

Supports: - MATLAB v4, v6, v7 files via scipy.io - MATLAB v7.3 (HDF5) files via h5py (if available)

Parameters:
  • variable – Name of the variable to load. If None, auto-detects.

  • squeeze_me – Squeeze unit matrix dimensions (default: True).

  • struct_as_record – Load MATLAB structs as numpy record arrays (default: False).

  • header_unit – Unit for generated headers (‘index’, ‘cm-1’, ‘nm’, etc.)

Example

>>> loader = MatlabLoader()
>>> result = loader.load(
...     Path("data.mat"),
...     variable="X",
... )
load(path: Path, variable: str | None = None, squeeze_me: bool = True, struct_as_record: bool = False, header_unit: str = 'index', data_type: str = 'x', **params: Any) LoaderResult[source]

Load data from a MATLAB .mat file.

Parameters:
  • path – Path to the MATLAB file.

  • variable – Name of the variable to load. If None, auto-detects.

  • squeeze_me – Squeeze unit matrix dimensions.

  • struct_as_record – Load structs as record arrays.

  • header_unit – Unit type for generated headers.

  • data_type – Type of data (‘x’, ‘y’, or ‘metadata’).

  • **params – Additional parameters.

Returns:

LoaderResult with the loaded data.

name: ClassVar[str] = 'MATLAB Loader'
priority: ClassVar[int] = 45
supported_extensions: ClassVar[Tuple[str, ...]] = ('.mat',)
classmethod supports(path: Path) bool[source]

Check if this loader supports the given file.

class nirs4all.data.loaders.NumpyLoader[source]

Bases: FileLoader

Loader for NumPy array files.

Supports: - Single array files (.npy) - Multi-array archives (.npz)

Parameters:
  • allow_pickle – Whether to allow loading pickled objects (default: False). Setting this to True may pose a security risk with untrusted files.

  • key – For .npz files, the key of the array to load. If not specified, uses the first array.

  • header_unit – Unit for generated headers (‘cm-1’, ‘nm’, ‘index’, etc.)

Security Note:

NumPy’s allow_pickle=True can execute arbitrary code when loading untrusted files. Only enable this for files you trust completely.

load(path: Path, allow_pickle: bool = False, key: str | None = None, header_unit: str = 'index', data_type: str = 'x', **params: Any) LoaderResult[source]

Load data from a NumPy file.

Parameters:
  • path – Path to the NumPy file.

  • allow_pickle – Whether to allow loading pickled objects.

  • key – For .npz files, the key of the array to load.

  • header_unit – Unit type for generated headers.

  • data_type – Type of data (‘x’, ‘y’, or ‘metadata’).

  • **params – Additional parameters (ignored).

Returns:

LoaderResult with the loaded data as a DataFrame.

name: ClassVar[str] = 'NumPy Loader'
priority: ClassVar[int] = 40
supported_extensions: ClassVar[Tuple[str, ...]] = ('.npy', '.npz')
classmethod supports(path: Path) bool[source]

Check if this loader supports the given file.

class nirs4all.data.loaders.ParquetLoader[source]

Bases: FileLoader

Loader for Apache Parquet files.

Requires pyarrow or fastparquet to be installed.

Supports: - Single Parquet files (.parquet, .pq) - Partitioned datasets (directory of parquet files) - Column selection for efficient loading

Parameters:
  • columns – List of column names to load (default: all columns).

  • engine – Parquet engine to use (‘auto’, ‘pyarrow’, or ‘fastparquet’).

  • filters – Row group filters for predicate pushdown (pyarrow only).

  • header_unit – Unit for headers (‘cm-1’, ‘nm’, ‘text’, etc.)

Example

>>> loader = ParquetLoader()
>>> result = loader.load(
...     Path("data.parquet"),
...     columns=["feature_1", "feature_2"],
... )
load(path: Path, columns: List[str] | None = None, engine: str = 'auto', filters: List | None = None, header_unit: str = 'text', data_type: str = 'x', **params: Any) LoaderResult[source]

Load data from a Parquet file.

Parameters:
  • path – Path to the Parquet file or directory.

  • columns – List of column names to load. If None, loads all columns.

  • engine – Parquet engine (‘auto’, ‘pyarrow’, or ‘fastparquet’).

  • filters – Row group filters for predicate pushdown (pyarrow only).

  • header_unit – Unit type for headers.

  • data_type – Type of data (‘x’, ‘y’, or ‘metadata’).

  • **params – Additional parameters passed to read_parquet.

Returns:

LoaderResult with the loaded data.

name: ClassVar[str] = 'Parquet Loader'
priority: ClassVar[int] = 35
supported_extensions: ClassVar[Tuple[str, ...]] = ('.parquet', '.pq')
classmethod supports(path: Path) bool[source]

Check if this loader supports the given file.

class nirs4all.data.loaders.TarLoader[source]

Bases: FileLoader

Loader for tar archive files.

Supports: - Plain tar files (.tar) - Gzip-compressed tar files (.tar.gz, .tgz) - Bzip2-compressed tar files (.tar.bz2) - XZ-compressed tar files (.tar.xz)

Parameters:
  • member – Name of the member file to extract. If None, auto-detects the first suitable file (prefers CSV).

  • encoding – Text encoding for the extracted file (default: ‘utf-8’).

  • inner_loader_params – Parameters to pass to the inner file loader.

Example

>>> loader = TarLoader()
>>> result = loader.load(
...     Path("data.tar.gz"),
...     member="data/train.csv",
... )
load(path: Path, member: str | None = None, encoding: str = 'utf-8', header_unit: str = 'cm-1', data_type: str = 'x', **params: Any) LoaderResult[source]

Load data from a tar archive.

Parameters:
  • path – Path to the tar archive.

  • member – Name of the member to extract. If None, auto-detects.

  • encoding – Text encoding for extracted files.

  • header_unit – Unit type for headers.

  • data_type – Type of data (‘x’, ‘y’, or ‘metadata’).

  • **params – Additional parameters for the inner loader.

Returns:

LoaderResult with the loaded data.

name: ClassVar[str] = 'Tar Archive Loader'
priority: ClassVar[int] = 60
supported_extensions: ClassVar[Tuple[str, ...]] = ('.tar',)
classmethod supports(path: Path) bool[source]

Check if this loader supports the given file.

nirs4all.data.loaders.get_loader_for_file(path: str | Path) FileLoader[source]

Get the appropriate loader for a file.

Parameters:

path – Path to the file.

Returns:

Instance of the appropriate FileLoader subclass.

Raises:

FormatNotSupportedError – If no loader supports the file format.

nirs4all.data.loaders.get_supported_formats() Dict[str, List[str]][source]

Get all supported file formats and their extensions.

Returns:

Dictionary mapping loader names to their supported extensions.

Example

>>> formats = get_supported_formats()
>>> for name, exts in formats.items():
...     print(f"{name}: {', '.join(exts)}")
nirs4all.data.loaders.list_archive_members(path) List[str][source]

List members in an archive file.

Parameters:

path – Path to the archive.

Returns:

List of member names.

Raises:

FileLoadError – If the archive cannot be read.

nirs4all.data.loaders.load_csv(path, na_policy='auto', data_type='x', categorical_mode='auto', header_unit='cm-1', **user_params)[source]

Loads a CSV file using specified or default parameters, cleans data, handles NA values, and performs type conversions.

Parameters:
  • path (str or Path) – Path to the CSV file (.csv, .gz, .zip).

  • na_policy (str) – ‘remove’ or ‘abort’ (or ‘auto’ which acts like ‘remove’). This policy applies to row removal if NAs are found.

  • data_type (str) – ‘x’ or ‘y’. Influences type conversion.

  • categorical_mode (str) – How to handle string columns in ‘y’ data: - ‘auto’: Convert string columns to numerical categories. - ‘preserve’: Keep string columns (will become NaN if not convertible by final astype). - ‘none’: Treat all columns as potentially numeric.

  • header_unit (str) – Unit type of headers - “cm-1” (wavenumber), “nm” (wavelength), “none” (no headers), “text” (string headers), “index” (feature indices). Default: “cm-1”

  • **user_params – CSV parsing parameters (delimiter, decimal_separator, has_header) and other pandas.read_csv arguments.

Returns:

  • DataFrame with processed data (before NA row removal).

  • Report dictionary.

  • Boolean Series indicating rows with NAs (aligned with the returned DataFrame).

  • List of column headers (or None if no headers).

  • Header unit string.

None if an error occurs before this stage.

Return type:

(Union[pandas.DataFrame, None], dict, Union[pandas.Series, None], Union[List[str], None], str)

nirs4all.data.loaders.load_csv_new(path, na_policy: str = 'auto', data_type: str = 'x', categorical_mode: str = 'auto', header_unit: str = 'cm-1', **user_params)

Load a CSV file using the CSVLoader.

This function maintains backward compatibility with the original load_csv API.

Parameters:
  • path – Path to the CSV file.

  • na_policy – How to handle NA values.

  • data_type – Type of data being loaded.

  • categorical_mode – How to handle categorical columns.

  • header_unit – Unit type for headers.

  • **user_params – Additional CSV parsing parameters.

Returns:

Tuple of (DataFrame, report, na_mask, headers, header_unit).

nirs4all.data.loaders.load_excel(path, sheet_name: str | int | None = 0, header: int | None = 0, skip_rows: int | None = None, skip_footer: int = 0, usecols: List[str] | List[int] | str | None = None, engine: str = 'auto', header_unit: str = 'text', **params)[source]

Load an Excel file.

Convenience function for direct use.

Parameters:
  • path – Path to the Excel file.

  • sheet_name – Sheet to load.

  • header – Row number for header.

  • skip_rows – Rows to skip at start.

  • skip_footer – Rows to skip at end.

  • usecols – Columns to load.

  • engine – Excel engine to use.

  • header_unit – Unit type for headers.

  • **params – Additional parameters.

Returns:

Tuple of (DataFrame, report, na_mask, headers, header_unit).

nirs4all.data.loaders.load_file(path: str | Path, **params: Any) Tuple[DataFrame | None, Dict[str, Any], Series | None, List[str], str][source]

Load a data file with automatic format detection.

This is the main entry point for loading files. It automatically detects the file format and uses the appropriate loader.

Parameters:
  • path – Path to the file to load.

  • **params – Format-specific loading parameters. Common parameters include: - header_unit: Unit for headers (‘cm-1’, ‘nm’, ‘text’, etc.) - data_type: Type of data (‘x’, ‘y’, or ‘metadata’) - delimiter: CSV delimiter - sheet_name: Excel sheet to load - variable: MATLAB variable name - member: Archive member to extract

Returns:

  • DataFrame with loaded data (or None on error)

  • Report dictionary with loading metadata

  • NA mask Series (rows with missing values)

  • List of column headers

  • Header unit string

Return type:

Tuple of

Raises:

FormatNotSupportedError – If no loader supports the file format.

Example

>>> data, report, na_mask, headers, unit = load_file("data.csv")
>>> if report.get("error"):
...     print(f"Error: {report['error']}")
>>> else:
...     print(f"Loaded {data.shape[0]} samples with {data.shape[1]} features")
nirs4all.data.loaders.load_matlab(path, variable: str | None = None, squeeze_me: bool = True, header_unit: str = 'index', **params)[source]

Load a MATLAB .mat file.

Convenience function for direct use.

Parameters:
  • path – Path to the MATLAB file.

  • variable – Name of the variable to load.

  • squeeze_me – Squeeze unit dimensions.

  • header_unit – Unit type for headers.

  • **params – Additional parameters.

Returns:

Tuple of (DataFrame, report, na_mask, headers, header_unit).

nirs4all.data.loaders.load_numpy(path, allow_pickle: bool = False, key: str | None = None, header_unit: str = 'index', **params)[source]

Load a NumPy file.

Convenience function for backward compatibility.

Parameters:
  • path – Path to the NumPy file.

  • allow_pickle – Whether to allow pickled objects.

  • key – For .npz files, the array key to load.

  • header_unit – Unit type for generated headers.

  • **params – Additional parameters.

Returns:

Tuple of (DataFrame, report, na_mask, headers, header_unit).

nirs4all.data.loaders.load_parquet(path, columns: List[str] | None = None, engine: str = 'auto', header_unit: str = 'text', **params)[source]

Load a Parquet file.

Convenience function for direct use.

Parameters:
  • path – Path to the Parquet file.

  • columns – Column names to load.

  • engine – Parquet engine to use.

  • header_unit – Unit type for headers.

  • **params – Additional parameters.

Returns:

Tuple of (DataFrame, report, na_mask, headers, header_unit).

nirs4all.data.loaders.register_loader(cls: Type[FileLoader]) Type[FileLoader][source]

Decorator to register a loader with the global registry.

Example

>>> @register_loader
... class MyLoader(FileLoader):
...     ...