nirs4all.pipeline.storage.artifacts.utils module

Utility functions for artifact identification and path handling (V3).

This module provides utility functions for the V3 artifact system, primarily for file path handling, content hashing, and artifact ID utilities.

For the core V3 artifact ID functions (compute_chain_hash, generate_artifact_id_v3, parse_artifact_id_v3, is_v3_artifact_id), use the operator_chain module directly.

V3 Artifact ID Format:

“{pipeline_id}${chain_hash}:{fold_id}”

Examples

  • “0001_pls$a1b2c3d4e5f6:all” - Shared artifact

  • “0001_pls$7f8e9d0c1b2a:0” - Fold 0 artifact

  • “0001_pls$3c4d5e6f7a8b:1” - Fold 1 artifact

class nirs4all.pipeline.storage.artifacts.utils.ExecutionPath(pipeline_id: str, chain_path: str = '', branch_path: List[int] = None, step_index: int = 0, source_index: int | None = None, fold_id: int | None = None, substep_index: int | None = None)[source]

Bases: object

Represents the execution context for an artifact (V3).

Captures all context needed to uniquely identify an artifact within a pipeline execution.

pipeline_id

Pipeline identifier (e.g., “0001_pls_abc123”)

Type:

str

chain_path

Full operator chain path string

Type:

str

branch_path

List of branch indices for nested branching

Type:

List[int]

step_index

Logical step number within current branch

Type:

int

source_index

Multi-source index (None for single source)

Type:

int | None

fold_id

CV fold identifier (None for shared artifacts)

Type:

int | None

substep_index

Substep index (for [model1, model2])

Type:

int | None

branch_path: List[int] = None
chain_path: str = ''
fold_id: int | None = None
classmethod from_artifact_id_v3(artifact_id: str, chain_path: str = '') ExecutionPath[source]

Create ExecutionPath from V3 artifact ID string.

Parameters:
  • artifact_id – V3 artifact ID to parse

  • chain_path – Full chain path (required for complete reconstruction)

Returns:

ExecutionPath instance

pipeline_id: str
source_index: int | None = None
step_index: int = 0
substep_index: int | None = None
to_artifact_id() str[source]

Convert execution path to V3 artifact ID string.

Returns:

{fold_id}”

Return type:

V3 Artifact ID in format “{pipeline_id}${chain_hash}

nirs4all.pipeline.storage.artifacts.utils.artifact_id_matches_context(artifact_id: str, pipeline_id: str | None = None, branch_path: List[int] | None = None, step_index: int | None = None, fold_id: int | None = None) bool[source]

Check if a V3 artifact ID matches a given context.

Partial matching is supported - only specified parameters are checked. Note: branch_path and step_index matching requires ArtifactRecord access.

Parameters:
  • artifact_id – V3 artifact ID to check

  • pipeline_id – Expected pipeline ID (None = don’t check)

  • branch_path – Expected branch path (ignored for V3 - use ArtifactRecord)

  • step_index – Expected step index (ignored for V3 - use ArtifactRecord)

  • fold_id – Expected fold ID (None = don’t check)

Returns:

True if artifact matches specified criteria, False otherwise

nirs4all.pipeline.storage.artifacts.utils.compute_content_hash(content: bytes) str[source]

Compute SHA256 hash of binary content.

Parameters:

content – Binary content to hash

Returns:

“ prefix

Return type:

Full SHA256 hash with “sha256

nirs4all.pipeline.storage.artifacts.utils.extract_fold_id_from_artifact_id(artifact_id: str) int | None[source]

Extract fold ID from artifact ID (V2 or V3).

Parameters:

artifact_id – Full artifact ID

Returns:

Fold ID or None if “all”

nirs4all.pipeline.storage.artifacts.utils.extract_pipeline_id_from_artifact_id(artifact_id: str) str[source]

Extract pipeline ID from artifact ID (V2 or V3).

Parameters:

artifact_id – Full artifact ID

Returns:

Pipeline ID component

nirs4all.pipeline.storage.artifacts.utils.generate_filename(artifact_type: str, class_name: str, content_hash: str, extension: str = 'joblib') str[source]

Generate artifact filename from components.

New format: <type>_<class>_<short_hash>.<ext>

Parameters:
  • artifact_type – Artifact type (model, transformer, etc.)

  • class_name – Python class name

  • content_hash – Full SHA256 hash (will be truncated)

  • extension – File extension (default: joblib)

Returns:

Filename string

Examples

>>> generate_filename("model", "PLSRegression", "abc123def456")
"model_PLSRegression_abc123def456.joblib"
nirs4all.pipeline.storage.artifacts.utils.get_binaries_path(workspace: Path, dataset: str) Path[source]

Get the centralized binaries directory for a dataset.

New architecture stores artifacts at workspace/binaries/<dataset>/

Parameters:
  • workspace – Workspace root path

  • dataset – Dataset name

Returns:

Path to binaries directory

nirs4all.pipeline.storage.artifacts.utils.get_short_hash(content_hash: str, length: int = 12) str[source]

Extract short hash from full content hash.

Parameters:
  • content_hash – Full hash (with or without sha256: prefix)

  • length – Number of characters to return (default: 12)

Returns:

Short hash string

nirs4all.pipeline.storage.artifacts.utils.parse_artifact_id(artifact_id: str) Tuple[str, List[int], int, int | None, int | None][source]

Parse an artifact ID into its components (V3 only).

V3 format: {pipeline_id}${chain_hash}:{fold_id}

Parameters:

artifact_id – V3 artifact ID to parse

Returns:

Tuple of (pipeline_id, branch_path, step_index, fold_id, sub_index) For V3: step_index will be 0, branch_path empty (use ArtifactRecord for full info)

Raises:

ValueError – If artifact ID format is not V3

nirs4all.pipeline.storage.artifacts.utils.parse_filename(filename: str) Tuple[str, str, str] | None[source]

Parse artifact filename into components.

Handles new format: <type>_<class>_<short_hash>.<ext> Also handles legacy format: <class>_<short_hash>.<ext>

Parameters:

filename – Filename to parse

Returns:

Tuple of (artifact_type, class_name, short_hash) or None if invalid

nirs4all.pipeline.storage.artifacts.utils.validate_artifact_id(artifact_id: str) bool[source]

Validate artifact ID format (V3 only).

Parameters:

artifact_id – Artifact ID to validate

Returns:

True if valid V3 format, False otherwise