nirs4all.data.loaders.archive_loader module
Archive file loader implementation.
This module provides the ArchiveLoader class for loading data from archive files, including tar (.tar, .tar.gz, .tgz, .tar.bz2) and enhanced zip support.
The ArchiveLoader acts as a wrapper that extracts files from archives and delegates to the appropriate format-specific loader.
- class nirs4all.data.loaders.archive_loader.EnhancedZipLoader[source]
Bases:
FileLoaderEnhanced loader for zip archive files.
This loader provides additional features over the basic zip support in the CSV loader, including: - Member listing and selection - Support for non-CSV files in archives - Binary file extraction (for NumPy, Parquet, etc.)
- Parameters:
member – Name of the member file to extract.
password – Password for encrypted archives.
encoding – Text encoding for text files.
Example
>>> loader = EnhancedZipLoader() >>> result = loader.load( ... Path("data.zip"), ... member="train/features.csv", ... )
- load(path: Path, member: str | None = None, password: str | None = None, encoding: str = 'utf-8', header_unit: str = 'cm-1', data_type: str = 'x', **params: Any) LoaderResult[source]
Load data from a zip archive.
- Parameters:
path – Path to the zip archive.
member – Name of the member to extract.
password – Password for encrypted archives.
encoding – Text encoding for text files.
header_unit – Unit type for headers.
data_type – Type of data.
**params – Additional parameters for the inner loader.
- Returns:
LoaderResult with the loaded data.
- class nirs4all.data.loaders.archive_loader.TarLoader[source]
Bases:
FileLoaderLoader for tar archive files.
Supports: - Plain tar files (.tar) - Gzip-compressed tar files (.tar.gz, .tgz) - Bzip2-compressed tar files (.tar.bz2) - XZ-compressed tar files (.tar.xz)
- Parameters:
member – Name of the member file to extract. If None, auto-detects the first suitable file (prefers CSV).
encoding – Text encoding for the extracted file (default: ‘utf-8’).
inner_loader_params – Parameters to pass to the inner file loader.
Example
>>> loader = TarLoader() >>> result = loader.load( ... Path("data.tar.gz"), ... member="data/train.csv", ... )
- load(path: Path, member: str | None = None, encoding: str = 'utf-8', header_unit: str = 'cm-1', data_type: str = 'x', **params: Any) LoaderResult[source]
Load data from a tar archive.
- Parameters:
path – Path to the tar archive.
member – Name of the member to extract. If None, auto-detects.
encoding – Text encoding for extracted files.
header_unit – Unit type for headers.
data_type – Type of data (‘x’, ‘y’, or ‘metadata’).
**params – Additional parameters for the inner loader.
- Returns:
LoaderResult with the loaded data.
- nirs4all.data.loaders.archive_loader.list_archive_members(path) List[str][source]
List members in an archive file.
- Parameters:
path – Path to the archive.
- Returns:
List of member names.
- Raises:
FileLoadError – If the archive cannot be read.