nirs4all.synthesis.reconstruction.pipeline module
Complete reconstruction pipeline for end-to-end workflow.
Provides a unified interface for: 1. Dataset configuration and preprocessing detection 2. Global calibration 3. Batch inversion 4. Parameter distribution learning 5. Synthetic generation 6. Validation
- class nirs4all.synthesis.reconstruction.pipeline.DatasetConfig(wavelengths: ndarray, signal_type: Literal['absorbance', 'reflectance', 'unknown'] = 'absorbance', preprocessing: Literal['none', 'first_derivative', 'second_derivative', 'snv', 'msc', 'unknown'] = 'none', domain: str = 'unknown', sg_window: int = 15, sg_polyorder: int = 2, name: str = 'dataset')[source]
Bases:
objectConfiguration for a dataset to be reconstructed.
Captures all dataset-specific information needed for reconstruction: - Wavelength grid - Signal type (absorbance, reflectance) - Preprocessing applied - Application domain (for component selection)
- wavelengths
Wavelength grid in nm.
- Type:
- signal_type
Signal type (‘absorbance’, ‘reflectance’).
- Type:
Literal[‘absorbance’, ‘reflectance’, ‘unknown’]
- preprocessing
Detected or specified preprocessing type.
- Type:
Literal[‘none’, ‘first_derivative’, ‘second_derivative’, ‘snv’, ‘msc’, ‘unknown’]
- classmethod from_data(X: ndarray, wavelengths: ndarray, name: str = 'dataset') DatasetConfig[source]
Create configuration by auto-detecting properties from data.
- Parameters:
X – Spectra matrix (n_samples, n_wavelengths).
wavelengths – Wavelength grid.
name – Dataset name.
- Returns:
DatasetConfig with detected properties.
- class nirs4all.synthesis.reconstruction.pipeline.PipelineResult(config: DatasetConfig, calibration: 'CalibrationResult' | None = None, inversion_results: List['InversionResult'] | None = None, distribution: 'DistributionResult' | None = None, X_synthetic: np.ndarray | None = None, validation: 'ValidationResult' | None = None, forward_chain: 'ForwardChain' | None = None)[source]
Bases:
objectResult of reconstruction pipeline.
Contains all outputs from the reconstruction workflow: - Calibration results - Inversion results - Learned distributions - Generated synthetic data - Validation metrics
- config
Dataset configuration used.
- Type:
- calibration
Global calibration result.
- Type:
Optional[‘CalibrationResult’]
- inversion_results
Per-sample inversion results.
- Type:
Optional[List[‘InversionResult’]]
- distribution
Learned parameter distributions.
- Type:
Optional[‘DistributionResult’]
- X_synthetic
Generated synthetic spectra.
- Type:
Optional[np.ndarray]
- validation
Validation result.
- Type:
Optional[‘ValidationResult’]
- forward_chain
Calibrated forward chain.
- Type:
Optional[‘ForwardChain’]
- config: DatasetConfig
- class nirs4all.synthesis.reconstruction.pipeline.ReconstructionPipeline(config: DatasetConfig, component_names: List[str] | None = None, canonical_resolution: float = 0.5, baseline_order: int = 5, continuum_order: int = 3, n_prototypes: int = 5, fit_environmental: bool = False, verbose: bool = True)[source]
Bases:
objectComplete reconstruction pipeline.
Orchestrates the full workflow: 1. Configuration and component selection 2. Prototype selection and global calibration 3. Per-sample inversion (optionally with environmental parameters) 4. Parameter distribution learning 5. Synthetic generation 6. Validation
- config
Dataset configuration.
- config: DatasetConfig
- fit(X: ndarray, max_samples: int | None = None) PipelineResult[source]
Run full reconstruction pipeline.
- Parameters:
X – Spectra matrix (n_samples, n_wavelengths).
max_samples – Max samples to invert (for speed).
- Returns:
PipelineResult with all outputs.
- generate(n_samples: int, result: PipelineResult, random_state: int | None = None) ndarray[source]
Generate additional synthetic samples using fitted pipeline.
- Parameters:
n_samples – Number of samples to generate.
result – PipelineResult from fit().
random_state – Random seed.
- Returns:
Synthetic spectra matrix.
- nirs4all.synthesis.reconstruction.pipeline.reconstruct_and_generate(X: ndarray, wavelengths: ndarray, n_synthetic: int | None = None, domain: str = 'unknown', component_names: List[str] | None = None, fit_environmental: bool = False, verbose: bool = True) Tuple[ndarray, PipelineResult][source]
Convenience function for end-to-end reconstruction and generation.
- Parameters:
X – Real spectra matrix.
wavelengths – Wavelength grid.
n_synthetic – Number of synthetic samples (default: same as X).
domain – Application domain.
component_names – Components to use.
fit_environmental – Whether to fit environmental parameters (temperature, water activity, scattering).
verbose – Print progress.
- Returns:
Tuple of (X_synthetic, PipelineResult).