nirs4all.data.performance package
Submodules
- nirs4all.data.performance.cache module
CacheEntryCacheEntry.dataCacheEntry.keyCacheEntry.timestampCacheEntry.size_bytesCacheEntry.source_pathCacheEntry.source_mtimeCacheEntry.hit_countCacheEntry.dataCacheEntry.hit_countCacheEntry.is_stale()CacheEntry.keyCacheEntry.size_bytesCacheEntry.source_mtimeCacheEntry.source_pathCacheEntry.timestamp
DataCachecache_manager()make_cache_key()
- nirs4all.data.performance.lazy_loader module
Module contents
Performance optimization module for dataset loading.
This module provides lazy loading, caching, and memory-mapped file support.
- class nirs4all.data.performance.CacheEntry(data: Any, key: str, timestamp: float = <factory>, size_bytes: int = 0, source_path: str | None = None, source_mtime: float | None = None, hit_count: int = 0)[source]
Bases:
objectA cached data entry.
- data
The cached data.
- Type:
Any
- class nirs4all.data.performance.DataCache(max_size_mb: float = 500, max_entries: int = 100, ttl_seconds: float | None = None)[source]
Bases:
objectLRU cache for loaded data.
Provides in-memory caching with: - Configurable size limits - LRU eviction policy - File modification detection - Thread-safe access - Cache statistics
Example
```python cache = DataCache(max_size_mb=500)
# Store data cache.set(“my_data”, numpy_array, source_path=”/path/to/file.csv”)
# Retrieve data data = cache.get(“my_data”)
# With automatic loading data = cache.get_or_load(“key”, lambda: load_expensive_data())
# Check stats print(cache.stats()) ```
- get(key: str) Any | None[source]
Get data from cache.
- Parameters:
key – Cache key.
- Returns:
Cached data or None if not found.
- get_or_load(key: str, loader: Callable[[], T], source_path: str | None = None) T[source]
Get from cache or load and cache.
- Parameters:
key – Cache key.
loader – Function to call if not cached.
source_path – Optional source file path.
- Returns:
Cached or newly loaded data.
- invalidate(key: str) bool[source]
Remove entry from cache.
- Parameters:
key – Cache key.
- Returns:
True if entry was removed.
- class nirs4all.data.performance.LazyArray(loader: Callable[[], ndarray], shape: Tuple[int, ...] | None = None, dtype: dtype | None = None, source_path: str | None = None)[source]
Bases:
objectA lazy-loading array wrapper.
Defers loading until array data is actually accessed. Supports numpy array interface for compatibility.
Example
```python # Create lazy array lazy = LazyArray(
loader=lambda: np.load(“large_file.npy”), shape=(10000, 500), dtype=np.float32
)
# Array not loaded yet print(lazy.shape) # (10000, 500)
# Triggers loading on first access data = lazy[0:100] # Now loads the data
- class nirs4all.data.performance.LazyDataset(x_loader: Callable[[], ndarray] | None = None, y_loader: Callable[[], ndarray] | None = None, metadata_loader: Callable[[], Any] | None = None, x_shape: Tuple[int, ...] | None = None, y_shape: Tuple[int, ...] | None = None, name: str = 'dataset')[source]
Bases:
objectA lazy-loading dataset wrapper.
Wraps multiple data components (X, y, metadata) as lazy arrays that load on demand.
Example
```python # Create from loader functions dataset = LazyDataset(
x_loader=lambda: load_features(“X.csv”), y_loader=lambda: load_targets(“Y.csv”), metadata_loader=lambda: load_metadata(“M.csv”)
)
# Nothing loaded yet print(dataset.x_shape) # Returns cached shape if known
# Triggers X loading only X_data = dataset.X