nirs4all.pipeline.storage.artifacts.artifact_persistence module

Central Artifact Serializer - Framework-aware object persistence

Provides content-addressed storage for ML artifacts with automatic framework detection. Supports sklearn, TensorFlow, PyTorch, XGBoost, CatBoost, and generic objects.

Architecture:
  • Content-addressed storage using SHA256 hashing

  • Framework-specific serialization formats

  • Deduplication via hash-based storage

  • Git-style sharded directories (hash[:2]/hash.ext)

class nirs4all.pipeline.storage.artifacts.artifact_persistence.ArtifactMeta[source]

Bases: TypedDict

Metadata for a persisted artifact.

branch_id: int | None
branch_name: str | None
format: str
format_version: str
hash: str
name: str
nirs4all_version: str
path: str
saved_at: str
size: int
step: int
nirs4all.pipeline.storage.artifacts.artifact_persistence.compute_hash(data: bytes) str[source]

Compute SHA256 hash of data.

nirs4all.pipeline.storage.artifacts.artifact_persistence.from_bytes(data: bytes, format: str) Any[source]

Deserialize object from bytes based on format.

Parameters:
  • data – Serialized bytes

  • format – Format string from artifact metadata

Returns:

Deserialized object

nirs4all.pipeline.storage.artifacts.artifact_persistence.get_artifact_size(artifact_meta: ArtifactMeta, results_dir: str | Path) int[source]

Get the actual size of an artifact file on disk.

nirs4all.pipeline.storage.artifacts.artifact_persistence.is_serializable(obj: Any) bool[source]

Check if an object can be serialized.

Parameters:

obj – Object to check

Returns:

True if serializable, False otherwise

nirs4all.pipeline.storage.artifacts.artifact_persistence.load(artifact_meta: ArtifactMeta, results_dir: str | Path, binaries_dir: str | Path | None = None) Any[source]

Load object from artifact metadata.

Parameters:
  • artifact_meta – Artifact metadata dictionary

  • results_dir – Path to run directory

  • binaries_dir – Optional path to centralized binaries directory

Returns:

Deserialized object

Raises:
nirs4all.pipeline.storage.artifacts.artifact_persistence.persist(obj: Any, artifacts_dir: str | Path, name: str, format_hint: str | None = None, branch_id: int | None = None, branch_name: str | None = None) ArtifactMeta[source]

Persist object to _binaries storage with meaningful names.

Parameters:
  • obj – Object to persist

  • artifacts_dir – Path to run _binaries/ directory

  • name – Artifact name (e.g., “scaler”, “model”)

  • format_hint – Optional format hint (‘sklearn’, ‘tensorflow’, etc.)

  • branch_id – Optional branch ID for pipeline branching

  • branch_name – Optional human-readable branch name

Returns:

ArtifactMeta with hash, path, format, size, and branch info

Raises:

ValueError – If object cannot be serialized

nirs4all.pipeline.storage.artifacts.artifact_persistence.to_bytes(obj: Any, format_hint: str | None = None) Tuple[bytes, str][source]

Serialize object to bytes using appropriate format.

Parameters:
  • obj – Object to serialize

  • format_hint – Optional format override (‘sklearn’, ‘tensorflow’, etc.)

Returns:

(bytes, format_string) tuple