nirs4all.data.loaders.archive_loader module

Archive file loader implementation.

This module provides the ArchiveLoader class for loading data from archive files, including tar (.tar, .tar.gz, .tgz, .tar.bz2) and enhanced zip support.

The ArchiveLoader acts as a wrapper that extracts files from archives and delegates to the appropriate format-specific loader.

class nirs4all.data.loaders.archive_loader.EnhancedZipLoader[source]

Bases: FileLoader

Enhanced loader for zip archive files.

This loader provides additional features over the basic zip support in the CSV loader, including: - Member listing and selection - Support for non-CSV files in archives - Binary file extraction (for NumPy, Parquet, etc.)

Parameters:
  • member – Name of the member file to extract.

  • password – Password for encrypted archives.

  • encoding – Text encoding for text files.

Example

>>> loader = EnhancedZipLoader()
>>> result = loader.load(
...     Path("data.zip"),
...     member="train/features.csv",
... )
load(path: Path, member: str | None = None, password: str | None = None, encoding: str = 'utf-8', header_unit: str = 'cm-1', data_type: str = 'x', **params: Any) LoaderResult[source]

Load data from a zip archive.

Parameters:
  • path – Path to the zip archive.

  • member – Name of the member to extract.

  • password – Password for encrypted archives.

  • encoding – Text encoding for text files.

  • header_unit – Unit type for headers.

  • data_type – Type of data.

  • **params – Additional parameters for the inner loader.

Returns:

LoaderResult with the loaded data.

name: ClassVar[str] = 'Enhanced Zip Loader'
priority: ClassVar[int] = 65
supported_extensions: ClassVar[Tuple[str, ...]] = ('.zip',)
classmethod supports(path: Path) bool[source]

Check if this loader supports the given file.

class nirs4all.data.loaders.archive_loader.TarLoader[source]

Bases: FileLoader

Loader for tar archive files.

Supports: - Plain tar files (.tar) - Gzip-compressed tar files (.tar.gz, .tgz) - Bzip2-compressed tar files (.tar.bz2) - XZ-compressed tar files (.tar.xz)

Parameters:
  • member – Name of the member file to extract. If None, auto-detects the first suitable file (prefers CSV).

  • encoding – Text encoding for the extracted file (default: ‘utf-8’).

  • inner_loader_params – Parameters to pass to the inner file loader.

Example

>>> loader = TarLoader()
>>> result = loader.load(
...     Path("data.tar.gz"),
...     member="data/train.csv",
... )
load(path: Path, member: str | None = None, encoding: str = 'utf-8', header_unit: str = 'cm-1', data_type: str = 'x', **params: Any) LoaderResult[source]

Load data from a tar archive.

Parameters:
  • path – Path to the tar archive.

  • member – Name of the member to extract. If None, auto-detects.

  • encoding – Text encoding for extracted files.

  • header_unit – Unit type for headers.

  • data_type – Type of data (‘x’, ‘y’, or ‘metadata’).

  • **params – Additional parameters for the inner loader.

Returns:

LoaderResult with the loaded data.

name: ClassVar[str] = 'Tar Archive Loader'
priority: ClassVar[int] = 60
supported_extensions: ClassVar[Tuple[str, ...]] = ('.tar',)
classmethod supports(path: Path) bool[source]

Check if this loader supports the given file.

nirs4all.data.loaders.archive_loader.list_archive_members(path) List[str][source]

List members in an archive file.

Parameters:

path – Path to the archive.

Returns:

List of member names.

Raises:

FileLoadError – If the archive cannot be read.