nirs4all.data.performance.lazy_loader module

Lazy loading support for datasets.

This module provides lazy loading capabilities to defer data loading until it is actually needed, improving startup time and memory usage.

Phase 8 Implementation - Dataset Configuration Roadmap Section 8.5: Performance Optimization - Lazy Loading

class nirs4all.data.performance.lazy_loader.LazyArray(loader: Callable[[], ndarray], shape: Tuple[int, ...] | None = None, dtype: dtype | None = None, source_path: str | None = None)[source]

Bases: object

A lazy-loading array wrapper.

Defers loading until array data is actually accessed. Supports numpy array interface for compatibility.

Example

```python # Create lazy array lazy = LazyArray(

loader=lambda: np.load(“large_file.npy”), shape=(10000, 500), dtype=np.float32

)

# Array not loaded yet print(lazy.shape) # (10000, 500)

# Triggers loading on first access data = lazy[0:100] # Now loads the data

# Explicit loading lazy.load() full_data = lazy.data ```

__array__(dtype=None)[source]

Support numpy array conversion.

__getitem__(key)[source]

Get item from array (triggers load).

__len__() int[source]

Get length (first dimension).

property data: ndarray

Get the loaded data (triggers load if needed).

property dtype: dtype | None

Get array dtype (may trigger load if unknown).

property is_loaded: bool

Check if data has been loaded.

load() ndarray[source]

Load the data if not already loaded.

Returns:

The loaded numpy array.

property ndim: int

Get number of dimensions.

property shape: Tuple[int, ...] | None

Get array shape (may trigger load if unknown).

unload() None[source]

Unload data to free memory.

class nirs4all.data.performance.lazy_loader.LazyDataset(x_loader: Callable[[], ndarray] | None = None, y_loader: Callable[[], ndarray] | None = None, metadata_loader: Callable[[], Any] | None = None, x_shape: Tuple[int, ...] | None = None, y_shape: Tuple[int, ...] | None = None, name: str = 'dataset')[source]

Bases: object

A lazy-loading dataset wrapper.

Wraps multiple data components (X, y, metadata) as lazy arrays that load on demand.

Example

```python # Create from loader functions dataset = LazyDataset(

x_loader=lambda: load_features(“X.csv”), y_loader=lambda: load_targets(“Y.csv”), metadata_loader=lambda: load_metadata(“M.csv”)

)

# Nothing loaded yet print(dataset.x_shape) # Returns cached shape if known

# Triggers X loading only X_data = dataset.X

# Load everything dataset.load_all() ```

property X: ndarray | None

Get features (triggers load if needed).

property is_metadata_loaded: bool

Check if metadata is loaded.

property is_x_loaded: bool

Check if X is loaded.

property is_y_loaded: bool

Check if y is loaded.

load_all() None[source]

Load all data components.

property metadata: Any | None

Get metadata (triggers load if needed).

property n_features: int

Get number of features.

property n_samples: int

Get number of samples.

unload_all() None[source]

Unload all data to free memory.

property x_shape: Tuple[int, ...] | None

Get X shape without loading.

property y: ndarray | None

Get targets (triggers load if needed).

property y_shape: Tuple[int, ...] | None

Get y shape without loading.

nirs4all.data.performance.lazy_loader.create_lazy_dataset(train_x_path: str | None = None, train_y_path: str | None = None, train_group_path: str | None = None, load_params: Dict[str, Any] | None = None) LazyDataset[source]

Create a lazy dataset from file paths.

Parameters:
  • train_x_path – Path to training features.

  • train_y_path – Path to training targets.

  • train_group_path – Path to training metadata.

  • load_params – Loading parameters.

Returns:

LazyDataset instance.