nirs4all.pipeline.run module
Run entity for nirs4all pipeline execution.
A Run represents a complete experiment session that combines: - One or more Pipeline Templates (or concrete Pipelines) - One or more Datasets
The Run generates Results for every combination of expanded pipeline configurations and datasets.
- Formula:
- Run = [Pipeline Templates] × [Datasets]
= [Σ Expanded Pipelines from all Templates] × [All Datasets] = Results
- class nirs4all.pipeline.run.DatasetInfo(name: str, path: str, hash: str | None = None, file_size: int | None = None, n_samples: int | None = None, n_features: int | None = None, task_type: str | None = None, y_columns: List[str] | None = None, y_stats: Dict[str, Dict[str, float]] | None = None, wavelength_range: List[float] | None = None, wavelength_unit: str | None = None, metadata: Dict[str, Any] | None = None, version: str | None = None)[source]
Bases:
objectInformation about a dataset used in a run.
- class nirs4all.pipeline.run.Run(id: str = <factory>, name: str = '', templates: List[TemplateInfo] = <factory>, datasets: List[DatasetInfo] = <factory>, status: RunStatus = RunStatus.QUEUED, config: RunConfig = <factory>, created_at: str = <factory>, started_at: str | None = None, completed_at: str | None = None, summary: RunSummary = <factory>, checkpoints: Dict[str, ~typing.Any]]=<factory>)[source]
Bases:
objectRepresents a complete experiment session.
A Run combines pipeline templates with datasets and generates results for every combination of expanded pipeline configurations and datasets.
- templates
List of pipeline templates
- Type:
- datasets
List of datasets
- Type:
- status
Current execution status
- config
Run configuration
- summary
Post-execution summary
- add_checkpoint(result_id: str, metadata: Dict[str, Any] | None = None) None[source]
Record a completed result as a checkpoint.
- datasets: List[DatasetInfo]
- summary: RunSummary
- templates: List[TemplateInfo]
- transition_to(new_status: RunStatus) None[source]
Transition to a new status.
- Raises:
ValueError – If transition is not valid
- class nirs4all.pipeline.run.RunConfig(cv_folds: int = 5, cv_strategy: str = 'kfold', random_state: int | None = 42, metric: str = 'r2', save_predictions: bool = True, save_models: bool = True)[source]
Bases:
objectConfiguration for a run.
- class nirs4all.pipeline.run.RunStatus(value)[source]
Bases:
EnumRun execution status.
- CANCELLED = 'cancelled'
- COMPLETED = 'completed'
- FAILED = 'failed'
- PAUSED = 'paused'
- QUEUED = 'queued'
- RUNNING = 'running'
- class nirs4all.pipeline.run.RunSummary(total_results: int = 0, completed_results: int = 0, failed_results: int = 0, best_result: Dict[str, Any] | None = None)[source]
Bases:
objectSummary of run results.
- class nirs4all.pipeline.run.TemplateInfo(id: str, name: str, file_path: str | None = None, expansion_count: int = 1, description: str | None = None)[source]
Bases:
objectInformation about a pipeline template in a run.
- nirs4all.pipeline.run.generate_run_id(name: str = '') str[source]
Generate a unique run ID.
Format: YYYY-MM-DD_<Name>_<hash>
- Parameters:
name – Optional descriptive name
- Returns:
Unique run ID string
- nirs4all.pipeline.run.get_metric_info(metric_name: str) Dict[str, Any][source]
Get metadata for a metric.
- Parameters:
metric_name – Name of the metric (e.g., ‘r2’, ‘rmse’, ‘accuracy’)
- Returns:
Dict with ‘higher_is_better’, ‘optimal’, and ‘range’ keys
- nirs4all.pipeline.run.is_better_score(score: float, best_score: float, metric: str) bool[source]
Compare two scores and determine if the new score is better.
- Parameters:
score – New score to compare
best_score – Current best score
metric – Metric name to determine comparison direction
- Returns:
True if score is better than best_score