nirs4all.pipeline.storage.artifacts.artifact_persistence module
Central Artifact Serializer - Framework-aware object persistence
Provides content-addressed storage for ML artifacts with automatic framework detection. Supports sklearn, TensorFlow, PyTorch, XGBoost, CatBoost, and generic objects.
- Architecture:
Content-addressed storage using SHA256 hashing
Framework-specific serialization formats
Deduplication via hash-based storage
Git-style sharded directories (hash[:2]/hash.ext)
- class nirs4all.pipeline.storage.artifacts.artifact_persistence.ArtifactMeta[source]
Bases:
TypedDictMetadata for a persisted artifact.
- nirs4all.pipeline.storage.artifacts.artifact_persistence.compute_hash(data: bytes) str[source]
Compute SHA256 hash of data.
- nirs4all.pipeline.storage.artifacts.artifact_persistence.from_bytes(data: bytes, format: str) Any[source]
Deserialize object from bytes based on format.
- Parameters:
data – Serialized bytes
format – Format string from artifact metadata
- Returns:
Deserialized object
- nirs4all.pipeline.storage.artifacts.artifact_persistence.get_artifact_size(artifact_meta: ArtifactMeta, results_dir: str | Path) int[source]
Get the actual size of an artifact file on disk.
- nirs4all.pipeline.storage.artifacts.artifact_persistence.is_serializable(obj: Any) bool[source]
Check if an object can be serialized.
- Parameters:
obj – Object to check
- Returns:
True if serializable, False otherwise
- nirs4all.pipeline.storage.artifacts.artifact_persistence.load(artifact_meta: ArtifactMeta, results_dir: str | Path, binaries_dir: str | Path | None = None) Any[source]
Load object from artifact metadata.
- Parameters:
artifact_meta – Artifact metadata dictionary
results_dir – Path to run directory
binaries_dir – Optional path to centralized binaries directory
- Returns:
Deserialized object
- Raises:
FileNotFoundError – If artifact file doesn’t exist
ValueError – If artifact cannot be deserialized
- nirs4all.pipeline.storage.artifacts.artifact_persistence.persist(obj: Any, artifacts_dir: str | Path, name: str, format_hint: str | None = None, branch_id: int | None = None, branch_name: str | None = None) ArtifactMeta[source]
Persist object to _binaries storage with meaningful names.
- Parameters:
obj – Object to persist
artifacts_dir – Path to run _binaries/ directory
name – Artifact name (e.g., “scaler”, “model”)
format_hint – Optional format hint (‘sklearn’, ‘tensorflow’, etc.)
branch_id – Optional branch ID for pipeline branching
branch_name – Optional human-readable branch name
- Returns:
ArtifactMeta with hash, path, format, size, and branch info
- Raises:
ValueError – If object cannot be serialized
- nirs4all.pipeline.storage.artifacts.artifact_persistence.to_bytes(obj: Any, format_hint: str | None = None) Tuple[bytes, str][source]
Serialize object to bytes using appropriate format.
- Parameters:
obj – Object to serialize
format_hint – Optional format override (‘sklearn’, ‘tensorflow’, etc.)
- Returns:
(bytes, format_string) tuple