nirs4all.pipeline.storage.io module

Simulation IO Manager - Save and manage simulation outputs

Provides organized storage for pipeline simulation results with dataset and pipeline-based folder structure management.

REFACTORED: Now uses content-addressed artifact storage via serializer. Delegates to focused classes: PipelineWriter, WorkspaceExporter, PredictionResolver.

class nirs4all.pipeline.storage.io.SimulationSaver(base_path: str | Path | None = None, save_artifacts: bool = True, save_charts: bool = True)[source]

Bases: object

Manages saving simulation results with flat pipeline structure.

Acts as a facade coordinating: - PipelineWriter: File I/O within pipeline directories - WorkspaceExporter: Exporting best results - PredictionResolver: Resolving prediction targets

Works with ManifestManager to create: base_path/NNNN_hash/files

cleanup(confirm: bool = False) None[source]

Remove the current simulation directory and all its contents.

Parameters:

confirm – Must be True to actually delete files

Raises:

RuntimeError – If not registered or confirm is False

export_best_for_dataset(dataset_name: str, workspace_path: Path, runs_dir: Path, mode: str = 'predictions') Path | None[source]

Export best results for a dataset to exports/ folder.

Delegates to WorkspaceExporter.

Creates exports/{dataset_name}/ with best predictions, pipeline config, and charts. Files are renamed to include run date for tracking.

Parameters:
  • dataset_name – Dataset name (matches global prediction JSON filename)

  • workspace_path – Workspace root path (unused, maintained for compatibility)

  • runs_dir – Runs directory path

  • mode – Export mode - “predictions”, “template”, “trained”, or “full”

Returns:

Path to export directory, or None if no predictions found

export_best_prediction(predictions_file: Path, exports_dir: Path, dataset_name: str, run_date: str, pipeline_id: str, custom_name: str = None) Path[source]

Export predictions CSV to best_predictions/ folder with optional custom name.

Delegates to WorkspaceExporter.

Parameters:
  • predictions_file – Path to predictions.csv

  • exports_dir – Workspace exports directory (unused, maintained for compatibility)

  • dataset_name – Metadata for naming

  • run_date – Metadata for naming

  • pipeline_id – Metadata for naming

  • custom_name – Optional custom name for export

Returns: Path to exported CSV

export_pipeline_full(pipeline_dir: Path, exports_dir: Path, dataset_name: str, run_date: str, custom_name: str = None) Path[source]

Export full pipeline results to flat structure with optional custom name.

Delegates to WorkspaceExporter.

Parameters:
  • pipeline_dir – Path to pipeline (NNNN_hash/ or NNNN_pipelinename_hash/)

  • exports_dir – Workspace exports directory (unused, maintained for compatibility)

  • dataset_name – Dataset name

  • run_date – Run date (YYYYMMDD)

  • custom_name – Optional custom name for export

Returns: Path to exported directory

property exporter: WorkspaceExporter

Get or create WorkspaceExporter instance.

get_metadata() Dict[str, Any][source]

Get the current metadata.

get_path() Path[source]

Get the current pipeline path.

Delegates to PipelineWriter.

get_predict_targets(prediction_obj: Dict[str, Any] | str)[source]

Get target variable names for prediction from a prediction object.

Delegates to PredictionResolver.

list_files() Dict[str, List[str]][source]

List all saved files in the current pipeline.

Returns:

Dictionary with file lists

persist_artifact(step_number: int, name: str, obj: Any, format_hint: str | None = None, branch_id: int | None = None, branch_name: str | None = None) Dict[str, Any][source]

Persist artifact using the serializer with content-addressed storage.

NOTE: This is for internal binary artifacts (models, transformers, etc.) For human-readable outputs (charts, reports), use save_output() instead.

Parameters:
  • step_number – Pipeline step number

  • name – Artifact name (for reference)

  • obj – Object to persist

  • format_hint – Optional format hint for serializer

  • branch_id – Optional branch ID for pipeline branching

  • branch_name – Optional human-readable branch name

Returns:

Artifact metadata dictionary (empty if save_artifacts=False)

register(pipeline_id: str) Path[source]

Register a pipeline ID and set current directory.

Parameters:

pipeline_id – Pipeline ID from ManifestManager (e.g., “0001_abc123”)

Returns:

Path to the pipeline directory

register_workspace(workspace_root: Path, dataset_name: str, pipeline_hash: str, run_name: str = None, pipeline_name: str = None) Path[source]

Register pipeline in workspace structure with optional custom names.

Creates: - Without custom names: workspace_root/runs/{dataset}/NNNN_{hash}/ - With run_name: workspace_root/runs/{dataset}_{runname}/NNNN_{hash}/ - With pipeline_name: workspace_root/runs/{dataset}/NNNN_{pipelinename}_{hash}/ - With both: workspace_root/runs/{dataset}_{runname}/NNNN_{pipelinename}_{hash}/

All pipelines for a dataset are stored in the same folder regardless of date.

Returns:

Full path to pipeline directory

property resolver: TargetResolver

Get or create PredictionResolver instance.

save_file(filename: str, content: str, overwrite: bool = True, encoding: str = 'utf-8', warn_on_overwrite: bool = True) Path[source]

Save a text file to the pipeline directory.

Delegates to PipelineWriter.

save_json(filename: str, data: Any, overwrite: bool = True, indent: int | None = 2) Path[source]

Save data as JSON file.

Delegates to PipelineWriter.

save_output(step_number: int, name: str, data: bytes | str, extension: str = '.png') Path | None[source]

Save a human-readable output file (chart, report, etc.).

Delegates to PipelineWriter.

Parameters:
  • step_number – Pipeline step number (unused, kept for compatibility)

  • name – Output name (e.g., “2D_Chart”)

  • data – Binary or text data to save

  • extension – File extension (e.g., “.png”, “.csv”, “.txt”)

Returns:

Path to saved file, or None if save_charts=False

property writer: PipelineWriter

Get or create PipelineWriter instance.