nirs4all.synthesis.reconstruction.pipeline module

Complete reconstruction pipeline for end-to-end workflow.

Provides a unified interface for: 1. Dataset configuration and preprocessing detection 2. Global calibration 3. Batch inversion 4. Parameter distribution learning 5. Synthetic generation 6. Validation

class nirs4all.synthesis.reconstruction.pipeline.DatasetConfig(wavelengths: ndarray, signal_type: Literal['absorbance', 'reflectance', 'unknown'] = 'absorbance', preprocessing: Literal['none', 'first_derivative', 'second_derivative', 'snv', 'msc', 'unknown'] = 'none', domain: str = 'unknown', sg_window: int = 15, sg_polyorder: int = 2, name: str = 'dataset')[source]

Bases: object

Configuration for a dataset to be reconstructed.

Captures all dataset-specific information needed for reconstruction: - Wavelength grid - Signal type (absorbance, reflectance) - Preprocessing applied - Application domain (for component selection)

wavelengths

Wavelength grid in nm.

Type:

numpy.ndarray

signal_type

Signal type (‘absorbance’, ‘reflectance’).

Type:

Literal[‘absorbance’, ‘reflectance’, ‘unknown’]

preprocessing

Detected or specified preprocessing type.

Type:

Literal[‘none’, ‘first_derivative’, ‘second_derivative’, ‘snv’, ‘msc’, ‘unknown’]

domain

Application domain for component selection.

Type:

str

sg_window

Savitzky-Golay window (for derivatives).

Type:

int

sg_polyorder

Savitzky-Golay polynomial order.

Type:

int

name

Optional dataset name.

Type:

str

domain: str = 'unknown'
classmethod from_data(X: ndarray, wavelengths: ndarray, name: str = 'dataset') DatasetConfig[source]

Create configuration by auto-detecting properties from data.

Parameters:
  • X – Spectra matrix (n_samples, n_wavelengths).

  • wavelengths – Wavelength grid.

  • name – Dataset name.

Returns:

DatasetConfig with detected properties.

name: str = 'dataset'
preprocessing: Literal['none', 'first_derivative', 'second_derivative', 'snv', 'msc', 'unknown'] = 'none'
sg_polyorder: int = 2
sg_window: int = 15
signal_type: Literal['absorbance', 'reflectance', 'unknown'] = 'absorbance'
wavelengths: ndarray
class nirs4all.synthesis.reconstruction.pipeline.PipelineResult(config: DatasetConfig, calibration: 'CalibrationResult' | None = None, inversion_results: List['InversionResult'] | None = None, distribution: 'DistributionResult' | None = None, X_synthetic: np.ndarray | None = None, validation: 'ValidationResult' | None = None, forward_chain: 'ForwardChain' | None = None)[source]

Bases: object

Result of reconstruction pipeline.

Contains all outputs from the reconstruction workflow: - Calibration results - Inversion results - Learned distributions - Generated synthetic data - Validation metrics

config

Dataset configuration used.

Type:

DatasetConfig

calibration

Global calibration result.

Type:

Optional[‘CalibrationResult’]

inversion_results

Per-sample inversion results.

Type:

Optional[List[‘InversionResult’]]

distribution

Learned parameter distributions.

Type:

Optional[‘DistributionResult’]

X_synthetic

Generated synthetic spectra.

Type:

Optional[np.ndarray]

validation

Validation result.

Type:

Optional[‘ValidationResult’]

forward_chain

Calibrated forward chain.

Type:

Optional[‘ForwardChain’]

X_synthetic: np.ndarray | None = None
calibration: 'CalibrationResult' | None = None
config: DatasetConfig
distribution: 'DistributionResult' | None = None
forward_chain: 'ForwardChain' | None = None
inversion_results: List['InversionResult'] | None = None
summary() str[source]

Generate pipeline summary.

validation: 'ValidationResult' | None = None
class nirs4all.synthesis.reconstruction.pipeline.ReconstructionPipeline(config: DatasetConfig, component_names: List[str] | None = None, canonical_resolution: float = 0.5, baseline_order: int = 5, continuum_order: int = 3, n_prototypes: int = 5, fit_environmental: bool = False, verbose: bool = True)[source]

Bases: object

Complete reconstruction pipeline.

Orchestrates the full workflow: 1. Configuration and component selection 2. Prototype selection and global calibration 3. Per-sample inversion (optionally with environmental parameters) 4. Parameter distribution learning 5. Synthetic generation 6. Validation

config

Dataset configuration.

Type:

nirs4all.synthesis.reconstruction.pipeline.DatasetConfig

component_names

Components to use (auto-selected if None).

Type:

List[str] | None

canonical_resolution

Resolution of canonical grid (nm).

Type:

float

baseline_order

Baseline polynomial order.

Type:

int

n_prototypes

Number of prototypes for calibration.

Type:

int

fit_environmental

Whether to fit environmental parameters.

Type:

bool

verbose

Print progress.

Type:

bool

__post_init__()[source]

Initialize components if not provided.

baseline_order: int = 5
canonical_resolution: float = 0.5
component_names: List[str] | None = None
config: DatasetConfig
continuum_order: int = 3
fit(X: ndarray, max_samples: int | None = None) PipelineResult[source]

Run full reconstruction pipeline.

Parameters:
  • X – Spectra matrix (n_samples, n_wavelengths).

  • max_samples – Max samples to invert (for speed).

Returns:

PipelineResult with all outputs.

fit_environmental: bool = False
generate(n_samples: int, result: PipelineResult, random_state: int | None = None) ndarray[source]

Generate additional synthetic samples using fitted pipeline.

Parameters:
  • n_samples – Number of samples to generate.

  • result – PipelineResult from fit().

  • random_state – Random seed.

Returns:

Synthetic spectra matrix.

n_prototypes: int = 5
verbose: bool = True
nirs4all.synthesis.reconstruction.pipeline.reconstruct_and_generate(X: ndarray, wavelengths: ndarray, n_synthetic: int | None = None, domain: str = 'unknown', component_names: List[str] | None = None, fit_environmental: bool = False, verbose: bool = True) Tuple[ndarray, PipelineResult][source]

Convenience function for end-to-end reconstruction and generation.

Parameters:
  • X – Real spectra matrix.

  • wavelengths – Wavelength grid.

  • n_synthetic – Number of synthetic samples (default: same as X).

  • domain – Application domain.

  • component_names – Components to use.

  • fit_environmental – Whether to fit environmental parameters (temperature, water activity, scattering).

  • verbose – Print progress.

Returns:

Tuple of (X_synthetic, PipelineResult).