nirs4all.pipeline.run module

Run entity for nirs4all pipeline execution.

A Run represents a complete experiment session that combines: - One or more Pipeline Templates (or concrete Pipelines) - One or more Datasets

The Run generates Results for every combination of expanded pipeline configurations and datasets.

Formula:
Run = [Pipeline Templates] × [Datasets]

= [Σ Expanded Pipelines from all Templates] × [All Datasets] = Results

class nirs4all.pipeline.run.DatasetInfo(name: str, path: str, hash: str | None = None, file_size: int | None = None, n_samples: int | None = None, n_features: int | None = None, task_type: str | None = None, y_columns: List[str] | None = None, y_stats: Dict[str, Dict[str, float]] | None = None, wavelength_range: List[float] | None = None, wavelength_unit: str | None = None, metadata: Dict[str, Any] | None = None, version: str | None = None)[source]

Bases: object

Information about a dataset used in a run.

file_size: int | None = None
hash: str | None = None
metadata: Dict[str, Any] | None = None
n_features: int | None = None
n_samples: int | None = None
name: str
path: str
task_type: str | None = None
version: str | None = None
wavelength_range: List[float] | None = None
wavelength_unit: str | None = None
y_columns: List[str] | None = None
y_stats: Dict[str, Dict[str, float]] | None = None
class nirs4all.pipeline.run.Run(id: str = <factory>, name: str = '', templates: List[TemplateInfo] = <factory>, datasets: List[DatasetInfo] = <factory>, status: RunStatus = RunStatus.QUEUED, config: RunConfig = <factory>, created_at: str = <factory>, started_at: str | None = None, completed_at: str | None = None, summary: RunSummary = <factory>, checkpoints: Dict[str, ~typing.Any]]=<factory>)[source]

Bases: object

Represents a complete experiment session.

A Run combines pipeline templates with datasets and generates results for every combination of expanded pipeline configurations and datasets.

id

Unique identifier for the run

Type:

str

name

Human-readable name

Type:

str

templates

List of pipeline templates

Type:

List[nirs4all.pipeline.run.TemplateInfo]

datasets

List of datasets

Type:

List[nirs4all.pipeline.run.DatasetInfo]

status

Current execution status

Type:

nirs4all.pipeline.run.RunStatus

config

Run configuration

Type:

nirs4all.pipeline.run.RunConfig

created_at

Creation timestamp

Type:

str

started_at

Execution start timestamp

Type:

str | None

completed_at

Completion timestamp

Type:

str | None

summary

Post-execution summary

Type:

nirs4all.pipeline.run.RunSummary

add_checkpoint(result_id: str, metadata: Dict[str, Any] | None = None) None[source]

Record a completed result as a checkpoint.

can_transition_to(new_status: RunStatus) bool[source]

Check if transition to new status is valid.

checkpoints: List[Dict[str, Any]]
completed_at: str | None = None
config: RunConfig
created_at: str
datasets: List[DatasetInfo]
classmethod from_dict(data: Dict[str, Any]) Run[source]

Create run from dictionary.

id: str
name: str = ''
started_at: str | None = None
status: RunStatus = 'queued'
summary: RunSummary
templates: List[TemplateInfo]
to_dict() Dict[str, Any][source]

Convert run to dictionary for serialization.

property total_pipeline_configs: int

Total number of expanded pipeline configurations.

property total_results_expected: int

Expected number of results (configs × datasets).

transition_to(new_status: RunStatus) None[source]

Transition to a new status.

Raises:

ValueError – If transition is not valid

class nirs4all.pipeline.run.RunConfig(cv_folds: int = 5, cv_strategy: str = 'kfold', random_state: int | None = 42, metric: str = 'r2', save_predictions: bool = True, save_models: bool = True)[source]

Bases: object

Configuration for a run.

cv_folds: int = 5
cv_strategy: str = 'kfold'
metric: str = 'r2'
random_state: int | None = 42
save_models: bool = True
save_predictions: bool = True
class nirs4all.pipeline.run.RunStatus(value)[source]

Bases: Enum

Run execution status.

CANCELLED = 'cancelled'
COMPLETED = 'completed'
FAILED = 'failed'
PAUSED = 'paused'
QUEUED = 'queued'
RUNNING = 'running'
class nirs4all.pipeline.run.RunSummary(total_results: int = 0, completed_results: int = 0, failed_results: int = 0, best_result: Dict[str, Any] | None = None)[source]

Bases: object

Summary of run results.

best_result: Dict[str, Any] | None = None
completed_results: int = 0
failed_results: int = 0
total_results: int = 0
class nirs4all.pipeline.run.TemplateInfo(id: str, name: str, file_path: str | None = None, expansion_count: int = 1, description: str | None = None)[source]

Bases: object

Information about a pipeline template in a run.

description: str | None = None
expansion_count: int = 1
file_path: str | None = None
id: str
name: str
nirs4all.pipeline.run.generate_run_id(name: str = '') str[source]

Generate a unique run ID.

Format: YYYY-MM-DD_<Name>_<hash>

Parameters:

name – Optional descriptive name

Returns:

Unique run ID string

nirs4all.pipeline.run.get_metric_info(metric_name: str) Dict[str, Any][source]

Get metadata for a metric.

Parameters:

metric_name – Name of the metric (e.g., ‘r2’, ‘rmse’, ‘accuracy’)

Returns:

Dict with ‘higher_is_better’, ‘optimal’, and ‘range’ keys

nirs4all.pipeline.run.is_better_score(score: float, best_score: float, metric: str) bool[source]

Compare two scores and determine if the new score is better.

Parameters:
  • score – New score to compare

  • best_score – Current best score

  • metric – Metric name to determine comparison direction

Returns:

True if score is better than best_score