nirs4all.pipeline.storage.artifacts.types module
Artifact type definitions for the V3 artifacts system.
This module defines the core data structures for artifact management: - ArtifactType: Enum for artifact classification - ArtifactRecord: Complete artifact metadata for manifest storage
The V3 artifacts system uses operator chains for complete execution path tracking, enabling deterministic artifact IDs that work correctly with branching, multi-source, stacking, and cross-validation.
Key V3 improvements: - OperatorChain tracking for full execution path - Source index tracking for multi-source pipelines - Chain hash-based artifact IDs for deterministic identification - Unified handling of all edge cases (branching, stacking, bundles)
- class nirs4all.pipeline.storage.artifacts.types.ArtifactRecord(artifact_id: str, content_hash: str, path: str, chain_path: str = '', source_index: int | None = None, pipeline_id: str = '', branch_path: List[int] = <factory>, step_index: int = 0, substep_index: int | None = None, fold_id: int | None = None, artifact_type: ArtifactType = ArtifactType.MODEL, class_name: str = '', custom_name: str = '', depends_on: List[str] = <factory>, format: str = 'joblib', format_version: str = '', nirs4all_version: str = '', size_bytes: int = 0, created_at: str = <factory>, params: Dict[str, ~typing.Any]=<factory>, meta_config: MetaModelConfig | None = None, version: int = 3)[source]
Bases:
objectComplete artifact metadata for manifest storage (V3).
This record contains all metadata needed to: - Uniquely identify an artifact via operator chain - Load the artifact from centralized storage - Resolve dependencies for stacking/transfer - Track serialization format and library versions
- V3 Format:
artifact_id: “{pipeline_id}${chain_hash}:{fold_id}” chain_path: Full operator chain path string
- artifact_id
Unique, deterministic ID based on chain hash Format: “{pipeline_id}${chain_hash}:{fold_id}”
- Type:
- # Chain tracking
- Type:
V3
- # Context
- # Classification
- artifact_type
Type classification (model, transformer, etc.)
- # Dependencies
- # Serialization
- # Metadata
- meta_config
Configuration for meta-models
- artifact_type: ArtifactType = 'model'
- property chain_hash: str
Get chain hash from artifact ID (V3 format).
- Returns:
Chain hash portion of the artifact ID, or empty if not V3 format
- classmethod from_dict(data: Dict[str, Any]) ArtifactRecord[source]
Create ArtifactRecord from dictionary.
- Parameters:
data – Dictionary from YAML manifest
- Returns:
ArtifactRecord instance
- get_branch_path_str() str[source]
Get branch path as string.
- Returns:
Colon-separated branch indices or empty string
- get_fold_str() str[source]
Get fold ID as string.
- Returns:
Fold ID as string or “all” for shared artifacts
- property is_branch_specific: bool
Check if artifact is branch-specific.
- Returns:
True if artifact belongs to a specific branch path
- property is_fold_specific: bool
Check if artifact is fold-specific.
- Returns:
True if artifact belongs to a specific CV fold
- property is_meta_model: bool
Check if artifact is a meta-model.
- Returns:
True if artifact is a stacking meta-model
- property is_source_specific: bool
Check if artifact is source-specific.
- Returns:
True if artifact belongs to a specific source in multi-source
- matches_context(step_index: int | None = None, branch_path: List[int] | None = None, source_index: int | None = None, fold_id: int | None = None) bool[source]
Check if artifact matches a given context.
- Parameters:
step_index – Step to match (None = any)
branch_path – Branch path to match (None = any)
source_index – Source index to match (None = any)
fold_id – Fold ID to match (None = any)
- Returns:
True if artifact matches all specified filters
- meta_config: MetaModelConfig | None = None
- property short_hash: str
Get short version of content hash for filenames.
- Returns:
prefix if present)
- Return type:
First 12 characters of hash (after sha256
- class nirs4all.pipeline.storage.artifacts.types.ArtifactType(value)[source]
-
Classification of artifact types.
Each type has specific handling: - model: Trained ML models (sklearn, tensorflow, pytorch, etc.) - transformer: Fitted preprocessors (scalers, feature extractors) - splitter: Train/test split configuration (for reproducibility) - encoder: Label encoders, y-scalers - meta_model: Stacking meta-models with source model dependencies
- ENCODER = 'encoder'
- META_MODEL = 'meta_model'
- MODEL = 'model'
- SPLITTER = 'splitter'
- TRANSFORMER = 'transformer'
- class nirs4all.pipeline.storage.artifacts.types.MetaModelConfig(source_models: Dict[str, ~typing.Any]]=<factory>, feature_columns: List[str] = <factory>)[source]
Bases:
objectConfiguration for meta-model source tracking.
Stores the ordered source models that feed into a stacking meta-model, along with their feature column mapping.