nirs4all.data.performance.cache module

Data caching for dataset loading.

This module provides caching functionality to avoid redundant file loading and improve performance for repeated data access.

Phase 8 Implementation - Dataset Configuration Roadmap Section 8.5: Performance Optimization - Caching

class nirs4all.data.performance.cache.CacheEntry(data: Any, key: str, timestamp: float = <factory>, size_bytes: int = 0, source_path: str | None = None, source_mtime: float | None = None, hit_count: int = 0)[source]

Bases: object

A cached data entry.

data

The cached data.

Type:

Any

key

Cache key.

Type:

str

timestamp

When the data was cached.

Type:

float

size_bytes

Estimated size in bytes.

Type:

int

source_path

Original file path (if applicable).

Type:

str | None

source_mtime

Modification time of source file.

Type:

float | None

hit_count

Number of times this entry was accessed.

Type:

int

data: Any
hit_count: int = 0
is_stale() bool[source]

Check if entry is stale (source file modified).

key: str
size_bytes: int = 0
source_mtime: float | None = None
source_path: str | None = None
timestamp: float
class nirs4all.data.performance.cache.DataCache(max_size_mb: float = 500, max_entries: int = 100, ttl_seconds: float | None = None)[source]

Bases: object

LRU cache for loaded data.

Provides in-memory caching with: - Configurable size limits - LRU eviction policy - File modification detection - Thread-safe access - Cache statistics

Example

```python cache = DataCache(max_size_mb=500)

# Store data cache.set(“my_data”, numpy_array, source_path=”/path/to/file.csv”)

# Retrieve data data = cache.get(“my_data”)

# With automatic loading data = cache.get_or_load(“key”, lambda: load_expensive_data())

# Check stats print(cache.stats()) ```

clear() None[source]

Clear all cached data.

get(key: str) Any | None[source]

Get data from cache.

Parameters:

key – Cache key.

Returns:

Cached data or None if not found.

get_or_load(key: str, loader: Callable[[], T], source_path: str | None = None) T[source]

Get from cache or load and cache.

Parameters:
  • key – Cache key.

  • loader – Function to call if not cached.

  • source_path – Optional source file path.

Returns:

Cached or newly loaded data.

invalidate(key: str) bool[source]

Remove entry from cache.

Parameters:

key – Cache key.

Returns:

True if entry was removed.

set(key: str, data: Any, source_path: str | None = None) None[source]

Store data in cache.

Parameters:
  • key – Cache key.

  • data – Data to cache.

  • source_path – Optional source file path for staleness detection.

stats() Dict[str, Any][source]

Get cache statistics.

Returns:

Dictionary with cache statistics.

nirs4all.data.performance.cache.cache_manager(max_size_mb: float = 500) DataCache[source]

Get or create the global cache instance.

Parameters:

max_size_mb – Maximum cache size (only used when creating).

Returns:

DataCache instance.

nirs4all.data.performance.cache.make_cache_key(path: str | Path, params: Dict[str, Any] | None = None) str[source]

Create a cache key from path and parameters.

Parameters:
  • path – File path.

  • params – Loading parameters.

Returns:

Hash-based cache key.