nirs4all.pipeline.storage.store_protocol module
Protocol definition for workspace storage backends.
Provides a structural subtyping contract (typing.Protocol) that any
storage backend must satisfy. This enables testing with in-memory stubs
and, in the future, swapping DuckDB for another backend without changing
consumer code.
The protocol captures the minimal set of methods that all consumers
depend on. WorkspaceStore itself implements a much richer interface
(export, replay, logging, cleanup); those additional methods are not part
of the protocol because backend-swappability is only required for the
core storage operations.
- class nirs4all.pipeline.storage.store_protocol.WorkspaceStoreProtocol(*args, **kwargs)[source]
Bases:
ProtocolMinimal protocol for workspace storage backends.
A backend is any object that satisfies this structural interface. Use
isinstance(obj, WorkspaceStoreProtocol)at runtime to verify compliance (enabled by@runtime_checkable).The protocol covers:
Run lifecycle – creating, completing, and failing runs.
Pipeline lifecycle – creating, completing, and failing pipeline executions.
Chain storage – persisting the preprocessing-to-model chain.
Prediction storage – saving scalar prediction records and their associated dense arrays.
Artifact storage – content-addressed persistence of fitted Python objects.
Core queries – ranking predictions and retrieving single records.
Export – producing standalone bundles from stored chains.
- begin_pipeline(run_id: str, name: str, expanded_config: Any, generator_choices: list, dataset_name: str, dataset_hash: str) str[source]
Register a new pipeline execution under a run.
- begin_run(name: str, config: Any, datasets: list[dict]) str[source]
Create a new run and return its identifier.
- complete_pipeline(pipeline_id: str, best_val: float, best_test: float, metric: str, duration_ms: int) None[source]
Mark a pipeline as completed.
- export_chain(chain_id: str, output_path: Path, format: str = 'n4a') Path[source]
Export a chain as a standalone bundle.
- get_prediction(prediction_id: str, load_arrays: bool = False) dict | None[source]
Retrieve a single prediction record.
- save_artifact(obj: Any, operator_class: str, artifact_type: str, format: str) str[source]
Persist a binary artifact and return its identifier.
- save_chain(pipeline_id: str, steps: list[dict], model_step_idx: int, model_class: str, preprocessings: str, fold_strategy: str, fold_artifacts: dict, shared_artifacts: dict, branch_path: list[int] | None = None, source_index: int | None = None) str[source]
Store a chain and return its identifier.
- save_prediction(pipeline_id: str, chain_id: str, dataset_name: str, model_name: str, model_class: str, fold_id: str, partition: str, val_score: float, test_score: float, train_score: float, metric: str, task_type: str, n_samples: int, n_features: int, scores: dict, best_params: dict, branch_id: int | None, branch_name: str | None, exclusion_count: int, exclusion_rate: float, preprocessings: str = '', prediction_id: str | None = None) str[source]
Store a prediction record and return its identifier.