nirs4all.synthesis.reconstruction package

Submodules

Module contents

Physical signal-chain reconstruction and variance modeling for NIR spectra.

This module implements a physically realistic “full signal-chain” reconstruction workflow that: 1. Reconstructs spectra using a physical forward model (Beer-Lambert + instrument chain) 2. Learns distributions of physical parameters for variance modeling 3. Generates realistic synthetic datasets by sampling from learned distributions

Key Components:

CanonicalForwardModel: Physical model on canonical grid
InstrumentModel: Wavelength warp, ILS convolution, gain/offset
EnvironmentalEffectsModel: Temperature, moisture, and scattering effects
DomainModel: Absorbance/reflectance transformation
PreprocessingOperator: Match dataset preprocessing (SG derivatives, SNV, etc.)
VariableProjectionSolver: NNLS inner solve + nonlinear outer optimization
GlobalCalibrator: Prototype-based instrument parameter estimation
ParameterDistributionFitter: Learn distributions in parameter space
ReconstructionGenerator: Generate synthetic data from learned distributions

Example

>>> from nirs4all.synthesis.reconstruction import (
...     ReconstructionPipeline,
...     DatasetConfig,
... )
>>>
>>> # Configure for a dataset
>>> config = DatasetConfig(
...     wavelengths=wavelengths,
...     signal_type="absorbance",
...     preprocessing="first_derivative",
...     domain="food_dairy",
... )
>>>
>>> # Run full reconstruction pipeline
>>> pipeline = ReconstructionPipeline(config)
>>> result = pipeline.fit(X_real)
>>>
>>> # Generate synthetic data
>>> X_synth = pipeline.generate(n_samples=1000)

References

Burns, D. A., & Ciurczak, E. W. (2007). Handbook of Near-Infrared Analysis.
Workman Jr, J., & Weyer, L. (2012). Practical Guide and Spectral Atlas for Interpretive Near-Infrared Spectroscopy.

class nirs4all.synthesis.reconstruction.CalibrationResult(wl_shift: float = 0.0, wl_stretch: float = 1.0, ils_sigma: float = 4.0, stray_light: float = 0.0, gain: float = 1.0, offset: float = 0.0, prototype_residuals: ndarray | None = None, prototype_r2: ndarray | None = None, total_loss: float = inf)[source]

Bases: object

Result of global calibration.

wl_shift

Calibrated wavelength shift.

Type:: float

wl_stretch

Calibrated wavelength stretch.

Type:: float

ils_sigma

Calibrated ILS width.

Type:: float

stray_light

Calibrated stray light fraction.

Type:: float

gain

Calibrated photometric gain.

Type:: float

offset

Calibrated photometric offset.

Type:: float

prototype_residuals

Residuals for each prototype.

Type:: numpy.ndarray | None

prototype_r2

R² for each prototype.

Type:: numpy.ndarray | None

total_loss

Total calibration loss.

Type:: float

classmethod from_array(params: ndarray) → CalibrationResult[source]: Create from parameter array [wl_shift, wl_stretch, ils_sigma].

gain: float = 1.0

ils_sigma: float = 4.0

offset: float = 0.0

prototype_r2: ndarray | None = None

prototype_residuals: ndarray | None = None

stray_light: float = 0.0

to_dict() → Dict[str, float][source]: Convert to parameter dictionary.

total_loss: float = inf

wl_shift: float = 0.0

wl_stretch: float = 1.0

class nirs4all.synthesis.reconstruction.CanonicalForwardModel(canonical_grid: ndarray, component_names: List[str] = <factory>, baseline_order: int = 5, continuum_order: int = 3, _component_spectra: ndarray | None = None, _baseline_basis: ndarray | None = None, _continuum_basis: ndarray | None = None)[source]

Bases: object

Physical model on canonical high-resolution wavelength grid.

Computes absorption coefficient K(λ) from chemical components:

K(λ) = Σ c_k * ε_k(λ) + K0(λ)

where:

c_k: concentration of component k
ε_k(λ): molar absorptivity (from component library)
K0(λ): continuum/background absorption (low-frequency)

canonical_grid

High-resolution wavelength grid (nm).

Type:: numpy.ndarray

component_names

Names of components to include.

Type:: List[str]

component_spectra: Pre-computed component spectra on canonical grid.

baseline_order

Order of Chebyshev baseline polynomial.

Type:: int

continuum_order

Order of continuum absorption polynomial.

Type:: int

__post_init__()[source]: Initialize component spectra and basis matrices.

baseline_order: int = 5

canonical_grid: ndarray

component_names: List[str]

compute_absorption(concentrations: ndarray, path_length: float = 1.0, baseline_coeffs: ndarray | None = None, continuum_coeffs: ndarray | None = None) → ndarray[source]

Compute absorption coefficient on canonical grid.

Parameters:

concentrations – Component concentrations, shape (n_components,).
path_length – Optical path length factor.
baseline_coeffs – Baseline polynomial coefficients.
continuum_coeffs – Continuum absorption coefficients.

Returns:

Absorbance spectrum on canonical grid.

continuum_order: int = 3

get_design_matrix(path_length: float = 1.0) → ndarray[source]

Get full design matrix for linear fitting.

Returns:: Design matrix of shape (n_wavelengths, n_components + n_baseline + n_continuum).

property n_baseline: int: Number of baseline coefficients.

property n_components: int: Number of chemical components.

property n_continuum: int: Number of continuum coefficients.

property n_linear_params: int: Total number of linear parameters.

class nirs4all.synthesis.reconstruction.DatasetConfig(wavelengths: ndarray, signal_type: Literal['absorbance', 'reflectance', 'unknown'] = 'absorbance', preprocessing: Literal['none', 'first_derivative', 'second_derivative', 'snv', 'msc', 'unknown'] = 'none', domain: str = 'unknown', sg_window: int = 15, sg_polyorder: int = 2, name: str = 'dataset')[source]

Bases: object

Configuration for a dataset to be reconstructed.

Captures all dataset-specific information needed for reconstruction: - Wavelength grid - Signal type (absorbance, reflectance) - Preprocessing applied - Application domain (for component selection)

wavelengths

Wavelength grid in nm.

Type:: numpy.ndarray

signal_type

Signal type (‘absorbance’, ‘reflectance’).

Type:: Literal[‘absorbance’, ‘reflectance’, ‘unknown’]

preprocessing

Detected or specified preprocessing type.

Type:: Literal[‘none’, ‘first_derivative’, ‘second_derivative’, ‘snv’, ‘msc’, ‘unknown’]

domain

Application domain for component selection.

Type:: str

sg_window

Savitzky-Golay window (for derivatives).

Type:: int

sg_polyorder

Savitzky-Golay polynomial order.

Type:: int

name

Optional dataset name.

Type:: str

domain: str = 'unknown'

classmethod from_data(X: ndarray, wavelengths: ndarray, name: str = 'dataset') → DatasetConfig[source]

Create configuration by auto-detecting properties from data.

Parameters:

X – Spectra matrix (n_samples, n_wavelengths).
wavelengths – Wavelength grid.
name – Dataset name.

Returns:

DatasetConfig with detected properties.

name: str = 'dataset'

preprocessing: Literal['none', 'first_derivative', 'second_derivative', 'snv', 'msc', 'unknown'] = 'none'

sg_polyorder: int = 2

sg_window: int = 15

signal_type: Literal['absorbance', 'reflectance', 'unknown'] = 'absorbance'

wavelengths: ndarray

class nirs4all.synthesis.reconstruction.DistributionResult(param_names: ~typing.List[str], distributions: ~typing.Dict[str, ~typing.Dict[str, ~typing.Any]], correlations: ~numpy.ndarray | None = None, factor_loadings: ~numpy.ndarray | None = None, transform_params: ~typing.Dict[str, ~typing.Dict[str, ~typing.Any]] = <factory>, n_samples_fitted: int = 0)[source]

Bases: object

Result of parameter distribution fitting.

param_names

Names of parameters.

Type:: List[str]

distributions

Dict of distribution parameters for each param.

Type:: Dict[str, Dict[str, Any]]

correlations

Correlation matrix of transformed parameters.

Type:: numpy.ndarray | None

factor_loadings

Low-rank factor model loadings (optional).

Type:: numpy.ndarray | None

transform_params

Parameters for transformations (log, etc.).

Type:: Dict[str, Dict[str, Any]]

n_samples_fitted

Number of samples used for fitting.

Type:: int

correlations: ndarray | None = None

distributions: Dict[str, Dict[str, Any]]

factor_loadings: ndarray | None = None

n_samples_fitted: int = 0

param_names: List[str]

summary() → str[source]: Generate human-readable summary.

transform_params: Dict[str, Dict[str, Any]]

class nirs4all.synthesis.reconstruction.DomainTransform(domain: Literal['absorbance', 'reflectance', 'transmittance', 'km'] = 'absorbance', scatter_coeffs: ndarray | None = None, scatter_wavelength_exp: float = 0.0)[source]

Bases: object

Transform between physical domains (absorbance, reflectance, etc.).

For absorbance datasets: A(λ) = absorption coefficient (direct) For reflectance datasets: R(λ) computed via Kubelka-Munk or approximation

domain

Domain type (‘absorbance’, ‘reflectance’, ‘transmittance’, ‘km’).

Type:: Literal[‘absorbance’, ‘reflectance’, ‘transmittance’, ‘km’]

scatter_coeffs

Scattering coefficients for KM model (reflectance).

Type:: numpy.ndarray | None

scatter_wavelength_dep: Wavelength-dependent scatter (λ^-n).

domain: Literal['absorbance', 'reflectance', 'transmittance', 'km'] = 'absorbance'

inverse_transform(spectrum: ndarray, wavelengths: ndarray, scatter: ndarray | None = None) → ndarray[source]

Inverse transform from domain to absorption.

Parameters:

spectrum – Spectrum in domain representation.
wavelengths – Wavelength grid.
scatter – Scattering coefficient for reflectance.

Returns:

Absorption coefficient.

scatter_coeffs: ndarray | None = None

scatter_wavelength_exp: float = 0.0

transform(absorption: ndarray, wavelengths: ndarray, scatter: ndarray | None = None) → ndarray[source]

Transform absorption to target domain.

Parameters:

absorption – Absorption coefficient K(λ).
wavelengths – Wavelength grid.
scatter – Scattering coefficient S(λ) for reflectance.

Returns:

Spectrum in target domain representation.

class nirs4all.synthesis.reconstruction.EnvironmentalEffectsModel(temperature_delta: float = 0.0, water_activity: float = 0.5, scattering_power: float = 1.5, scattering_amplitude: float = 0.0, enabled: bool = True, reference_wavelength: float = 1500.0, _region_masks: Dict[str, ndarray] | None = None, _cached_wavelengths: ndarray | None = None)[source]

Bases: object

Environmental effects on the canonical absorption spectrum.

Applied to absorption in canonical space before domain transform and instrument effects. Implements region-specific temperature and moisture effects based on literature parameters.

temperature_delta

Temperature deviation from reference (25°C).

Type:: float

water_activity

Effective water activity (0-1 scale).

Type:: float

scattering_power

Wavelength-dependent scattering exponent (λ^-n).

Type:: float

scattering_amplitude

Amplitude of scattering baseline.

Type:: float

enabled

Whether to apply environmental effects.

Type:: bool

reference_wavelength

Reference wavelength for scattering normalization (nm).

Type:: float

apply(absorption: ndarray, wavelengths: ndarray) → ndarray[source]

Apply environmental effects to absorption spectrum.

Effects are applied in order: 1. Temperature effects (region-specific shifts, intensity changes) 2. Moisture effects (water band shifts based on water activity) 3. Scattering baseline (wavelength-dependent λ^-n)

Parameters:

absorption – Absorption coefficient on canonical grid.
wavelengths – Wavelength grid (nm).

Returns:

Modified absorption spectrum with environmental effects.

copy() → EnvironmentalEffectsModel[source]: Create a copy of this model.

enabled: bool = True

classmethod from_dict(d: Dict[str, Any]) → EnvironmentalEffectsModel[source]: Create from dictionary.

get_jacobian_wrt_scattering_amplitude(absorption: ndarray, wavelengths: ndarray, eps: float = 0.001) → ndarray[source]: Numerical Jacobian w.r.t. scattering_amplitude.

get_jacobian_wrt_scattering_power(absorption: ndarray, wavelengths: ndarray, eps: float = 0.05) → ndarray[source]: Numerical Jacobian w.r.t. scattering_power.

get_jacobian_wrt_temperature(absorption: ndarray, wavelengths: ndarray, eps: float = 0.1) → ndarray[source]: Numerical Jacobian w.r.t. temperature_delta.

get_jacobian_wrt_water_activity(absorption: ndarray, wavelengths: ndarray, eps: float = 0.01) → ndarray[source]: Numerical Jacobian w.r.t. water_activity.

reference_wavelength: float = 1500.0

scattering_amplitude: float = 0.0

scattering_power: float = 1.5

temperature_delta: float = 0.0

to_dict() → Dict[str, Any][source]: Convert to dictionary.

water_activity: float = 0.5

class nirs4all.synthesis.reconstruction.EnvironmentalParameterConfig(temperature_bounds: Tuple[float, float] = (-15.0, 15.0), temperature_prior_mean: float = 0.0, temperature_prior_std: float = 5.0, water_activity_bounds: Tuple[float, float] = (0.1, 0.9), water_activity_prior_alpha: float = 2.0, water_activity_prior_beta: float = 2.0, scattering_power_bounds: Tuple[float, float] = (0.5, 3.0), scattering_power_prior_mean: float = 1.5, scattering_power_prior_std: float = 0.5, scattering_amplitude_bounds: Tuple[float, float] = (0.0, 0.2), scattering_amplitude_prior_scale: float = 0.02)[source]

Bases: object

Configuration for environmental parameter fitting.

Defines bounds and prior distributions for each parameter.

compute_prior_penalty(temperature_delta: float, water_activity: float, scattering_power: float, scattering_amplitude: float) → float[source]

Compute prior penalty for regularization.

Returns negative log-prior (to be added to objective function).

get_bounds_list() → List[Tuple[float, float]][source]: Get list of bounds for all 4 environmental parameters.

scattering_amplitude_bounds: Tuple[float, float] = (0.0, 0.2)

scattering_amplitude_prior_scale: float = 0.02

scattering_power_bounds: Tuple[float, float] = (0.5, 3.0)

scattering_power_prior_mean: float = 1.5

scattering_power_prior_std: float = 0.5

temperature_bounds: Tuple[float, float] = (-15.0, 15.0)

temperature_prior_mean: float = 0.0

temperature_prior_std: float = 5.0

water_activity_bounds: Tuple[float, float] = (0.1, 0.9)

water_activity_prior_alpha: float = 2.0

water_activity_prior_beta: float = 2.0

class nirs4all.synthesis.reconstruction.ForwardChain(canonical_model: CanonicalForwardModel, instrument_model: InstrumentModel, domain_transform: DomainTransform, preprocessing: PreprocessingOperator, environmental_model: 'EnvironmentalEffectsModel' | None = None)[source]

Bases: object

Complete forward measurement chain combining all components.

Chain: CanonicalForwardModel → [EnvironmentalEffects] → DomainTransform → InstrumentModel → PreprocessingOperator

canonical_model

Physical model on canonical grid.

Type:: CanonicalForwardModel

environmental_model

Optional environmental effects (temperature, moisture, scattering).

Type:: Optional[‘EnvironmentalEffectsModel’]

instrument_model

Instrument effects.

Type:: InstrumentModel

domain_transform

Domain conversion.

Type:: DomainTransform

preprocessing

Dataset preprocessing.

Type:: PreprocessingOperator

canonical_model: CanonicalForwardModel

classmethod create(canonical_grid: ndarray, target_grid: ndarray, component_names: List[str], domain: str = 'absorbance', preprocessing_type: str = 'none', instrument_params: Dict[str, float] | None = None, baseline_order: int = 5, continuum_order: int = 3, sg_window: int = 15, sg_polyorder: int = 2, include_environmental: bool = False) → ForwardChain[source]

Convenience factory method to create ForwardChain.

Parameters:

canonical_grid – High-resolution canonical wavelength grid.
target_grid – Target dataset wavelength grid.
component_names – Names of components to include.
domain – Domain type (‘absorbance’, ‘reflectance’).
preprocessing_type – Preprocessing type.
instrument_params – Instrument parameters dict.
baseline_order – Baseline polynomial order.
continuum_order – Continuum polynomial order.
sg_window – Savitzky-Golay window.
sg_polyorder – Savitzky-Golay polynomial order.
include_environmental – Whether to include environmental effects model.

Returns:

Configured ForwardChain instance.

domain_transform: DomainTransform

environmental_model: 'EnvironmentalEffectsModel' | None = None

forward(concentrations: ndarray, path_length: float = 1.0, baseline_coeffs: ndarray | None = None, continuum_coeffs: ndarray | None = None, scatter: ndarray | None = None) → ndarray[source]

Run full forward chain.

Parameters:

concentrations – Component concentrations.
path_length – Optical path length factor.
baseline_coeffs – Baseline polynomial coefficients.
continuum_coeffs – Continuum absorption coefficients.
scatter – Scattering coefficients for reflectance.

Returns:

Spectrum on target grid with preprocessing applied.

forward_design_matrix(path_length: float = 1.0) → ndarray[source]

Get transformed design matrix for linear fitting.

Returns the design matrix after applying instrument and preprocessing transforms. Note: Domain transform is not applied here as it may be nonlinear (KM).

instrument_model: InstrumentModel

preprocessing: PreprocessingOperator

class nirs4all.synthesis.reconstruction.GenerationResult(X: ndarray, concentrations: ndarray, path_lengths: ndarray, baseline_coeffs: ndarray, wavelengths: ndarray, noise_level: float = 0.0, wl_shifts: ndarray | None = None, temperature_deltas: ndarray | None = None, water_activities: ndarray | None = None, scattering_powers: ndarray | None = None, scattering_amplitudes: ndarray | None = None)[source]

Bases: object

Result of synthetic generation.

X

Generated spectra (n_samples, n_wavelengths).

Type:: numpy.ndarray

concentrations

Sampled concentrations (n_samples, n_components).

Type:: numpy.ndarray

path_lengths

Sampled path lengths (n_samples,).

Type:: numpy.ndarray

baseline_coeffs

Sampled baseline coefficients.

Type:: numpy.ndarray

wavelengths

Wavelength grid.

Type:: numpy.ndarray

noise_level

Applied noise level.

Type:: float

wl_shifts

Per-sample wavelength shifts.

Type:: numpy.ndarray | None

temperature_deltas

Per-sample temperature deviations (°C).

Type:: numpy.ndarray | None

water_activities

Per-sample water activity values.

Type:: numpy.ndarray | None

scattering_powers

Per-sample scattering exponents.

Type:: numpy.ndarray | None

scattering_amplitudes

Per-sample scattering amplitudes.

Type:: numpy.ndarray | None

X: ndarray

baseline_coeffs: ndarray

concentrations: ndarray

property n_samples: int: Number of generated samples.

property n_wavelengths: int: Number of wavelengths.

noise_level: float = 0.0

path_lengths: ndarray

scattering_amplitudes: ndarray | None = None

scattering_powers: ndarray | None = None

temperature_deltas: ndarray | None = None

water_activities: ndarray | None = None

wavelengths: ndarray

wl_shifts: ndarray | None = None

class nirs4all.synthesis.reconstruction.GlobalCalibrator(wl_shift_bounds: Tuple[float, float] = (-10.0, 10.0), wl_stretch_bounds: Tuple[float, float] = (0.98, 1.02), ils_sigma_bounds: Tuple[float, float] = (2.0, 20.0), regularization: float = 1e-06, use_global_search: bool = False)[source]

Bases: object

Calibrate global instrument parameters using prototype spectra.

Optimizes θ_global = {wl_shift, wl_stretch, ils_sigma} to minimize total fitting loss across all prototypes, with per-prototype linear parameters solved via NNLS.

forward_chain: ForwardChain for computing model predictions.

wl_shift_bounds

Bounds for wavelength shift.

Type:: Tuple[float, float]

wl_stretch_bounds

Bounds for wavelength stretch.

Type:: Tuple[float, float]

ils_sigma_bounds

Bounds for ILS sigma.

Type:: Tuple[float, float]

regularization

L2 regularization strength.

Type:: float

use_global_search

Use differential evolution for global search.

Type:: bool

calibrate(prototypes: np.ndarray, forward_chain: ForwardChain, initial_guess: np.ndarray | None = None) → CalibrationResult[source]

Calibrate global parameters on prototype spectra.

Parameters:

prototypes – Prototype spectra (n_prototypes, n_wavelengths).
forward_chain – Forward chain for model evaluation.
initial_guess – Initial [wl_shift, wl_stretch, ils_sigma].

Returns:

CalibrationResult with optimized parameters.

ils_sigma_bounds: Tuple[float, float] = (2.0, 20.0)

refine(current_result: CalibrationResult, prototypes: np.ndarray, forward_chain: ForwardChain) → CalibrationResult[source]

Refine calibration with tighter bounds around current estimate.

Parameters:

current_result – Current calibration result.
prototypes – Prototype spectra.
forward_chain – Forward chain.

Returns:

Refined CalibrationResult.

regularization: float = 1e-06

use_global_search: bool = False

wl_shift_bounds: Tuple[float, float] = (-10.0, 10.0)

wl_stretch_bounds: Tuple[float, float] = (0.98, 1.02)

class nirs4all.synthesis.reconstruction.InstrumentModel(target_grid: ndarray, wl_shift: float = 0.0, wl_stretch: float = 1.0, wl_poly_coeffs: ndarray | None = None, ils_sigma: float = 4.0, stray_light: float = 0.0, gain: float = 1.0, offset: float = 0.0)[source]

Bases: object

Instrument effects: warp, ILS convolution, gain/offset, resampling.

Transforms spectrum from canonical grid to target instrument grid:

Wavelength warp: λ* → λ’ (shift + stretch + optional higher order)
ILS convolution: Gaussian or Voigt line shape
Stray light / gain / offset
Resample to target grid

target_grid

Target wavelength grid (dataset grid).

Type:: numpy.ndarray

wl_shift

Wavelength shift in nm (default 0).

Type:: float

wl_stretch

Wavelength scale factor (default 1).

Type:: float

wl_poly_coeffs

Higher-order polynomial warp coefficients.

Type:: numpy.ndarray | None

ils_sigma

Instrument line shape Gaussian sigma in nm.

Type:: float

stray_light

Stray light fraction (default 0).

Type:: float

gain

Photometric gain (default 1).

Type:: float

offset

Photometric offset (default 0).

Type:: float

apply(spectrum: ndarray, canonical_grid: ndarray) → ndarray[source]

Apply instrument chain to transform spectrum.

Parameters:

spectrum – Input spectrum on canonical grid.
canonical_grid – Canonical wavelength grid.

Returns:

Transformed spectrum on target grid.

classmethod from_params(target_grid: ndarray, params: Dict[str, float]) → InstrumentModel[source]: Create InstrumentModel from parameter dictionary.

gain: float = 1.0

get_jacobian_wrt_ils_sigma(spectrum: ndarray, canonical_grid: ndarray, eps: float = 0.1) → ndarray[source]: Numerical Jacobian w.r.t. ILS sigma.

get_jacobian_wrt_wl_shift(spectrum: ndarray, canonical_grid: ndarray, eps: float = 0.1) → ndarray[source]: Numerical Jacobian w.r.t. wavelength shift.

ils_sigma: float = 4.0

offset: float = 0.0

stray_light: float = 0.0

target_grid: ndarray

wl_poly_coeffs: ndarray | None = None

wl_shift: float = 0.0

wl_stretch: float = 1.0

class nirs4all.synthesis.reconstruction.InversionResult(concentrations: ndarray, baseline_coeffs: ndarray, continuum_coeffs: ndarray | None = None, path_length: float = 1.0, wl_shift_residual: float = 0.0, scatter_coeffs: ndarray | None = None, fitted_spectrum: ndarray | None = None, residuals: ndarray | None = None, r_squared: float = 0.0, rmse: float = inf, converged: bool = False, temperature_delta: float = 0.0, water_activity: float = 0.5, scattering_power: float = 1.5, scattering_amplitude: float = 0.0)[source]

Bases: object

Result of per-sample inversion.

concentrations

Fitted component concentrations.

Type:: numpy.ndarray

baseline_coeffs

Fitted baseline coefficients.

Type:: numpy.ndarray

continuum_coeffs

Fitted continuum coefficients.

Type:: numpy.ndarray | None

path_length

Fitted path length factor.

Type:: float

wl_shift_residual

Per-sample wavelength shift correction.

Type:: float

scatter_coeffs

Fitted scatter coefficients (reflectance).

Type:: numpy.ndarray | None

fitted_spectrum

Reconstructed spectrum.

Type:: numpy.ndarray | None

residuals

Fitting residuals.

Type:: numpy.ndarray | None

r_squared

Coefficient of determination.

Type:: float

rmse

Root mean squared error.

Type:: float

converged

Whether optimization converged.

Type:: bool

temperature_delta

Fitted temperature deviation (°C from reference).

Type:: float

water_activity

Fitted water activity (0-1 scale).

Type:: float

scattering_power

Fitted scattering wavelength exponent.

Type:: float

scattering_amplitude

Fitted scattering baseline amplitude.

Type:: float

baseline_coeffs: ndarray

concentrations: ndarray

continuum_coeffs: ndarray | None = None

converged: bool = False

fitted_spectrum: ndarray | None = None

property linear_params: ndarray: Get all linear parameters as single array.

path_length: float = 1.0

r_squared: float = 0.0

residuals: ndarray | None = None

rmse: float = inf

scatter_coeffs: ndarray | None = None

scattering_amplitude: float = 0.0

scattering_power: float = 1.5

temperature_delta: float = 0.0

to_dict() → Dict[str, Any][source]: Convert to dictionary.

water_activity: float = 0.5

wl_shift_residual: float = 0.0

class nirs4all.synthesis.reconstruction.MultiscaleSchedule(smooth_sigmas: List[float] = <factory>, derivative_weights: List[float] = <factory>, baseline_regularization: List[float] = <factory>, max_iterations: List[int] = <factory>)[source]

Bases: object

Configuration for multiscale fitting curriculum.

Fits coarse features first, then progressively adds detail: 1. Smooth target + no derivatives + strong baseline prior 2. Less smooth + partial derivative weight 3. Full resolution + full preprocessing

smooth_sigmas

Gaussian sigma values for each stage (0 = no smoothing).

Type:: List[float]

derivative_weights

Weight on derivative space at each stage.

Type:: List[float]

baseline_regularization

Baseline regularization at each stage.

Type:: List[float]

max_iterations

Max iterations at each stage.

Type:: List[int]

baseline_regularization: List[float]

derivative_weights: List[float]

max_iterations: List[int]

property n_stages: int: Number of stages in schedule.

classmethod quick() → MultiscaleSchedule[source]: Quick schedule for fast fitting.

smooth_sigmas: List[float]

classmethod thorough() → MultiscaleSchedule[source]: Thorough schedule for best accuracy.

class nirs4all.synthesis.reconstruction.ParameterDistributionFitter(positive_params: List[str] = <factory>, bounded_params: Dict[str, ~typing.Tuple[float, float]]=<factory>, use_factor_model: bool = False, n_factors: int = 3, min_std: float = 1e-06)[source]

Bases: object

Fit distributions to parameter samples.

For positive parameters (concentrations, path_length):

Use log-normal or gamma distributions
Transform to log space for correlation modeling

For shift parameters (wl_shift):

Use Gaussian distributions

For bounded parameters:

Use truncated normal or beta distributions

positive_params

Names of parameters that must be positive.

Type:: List[str]

bounded_params

Dict of param_name -> (lower, upper) bounds.

Type:: Dict[str, Tuple[float, float]]

use_factor_model

Use low-rank factor model for correlations.

Type:: bool

n_factors

Number of factors for factor model.

Type:: int

min_std

Minimum standard deviation to avoid degenerate distributions.

Type:: float

bounded_params: Dict[str, Tuple[float, float]]

fit(params: Dict[str, ndarray], param_names: List[str] | None = None) → DistributionResult[source]

Fit distributions to parameter samples.

Parameters:

params – Dict of parameter arrays. Each array has shape (n_samples,) or (n_samples, n_features) for multi-dimensional params.
param_names – Optional list of parameter names to fit.

Returns:

DistributionResult with fitted distributions.

min_std: float = 1e-06

n_factors: int = 3

positive_params: List[str]

use_factor_model: bool = False

class nirs4all.synthesis.reconstruction.ParameterSampler(distribution_result: DistributionResult, use_correlations: bool = True)[source]

Bases: object

Sample parameters from fitted distributions.

Uses Gaussian copula to maintain correlations between parameters while respecting marginal distributions.

distribution_result

Fitted DistributionResult.

Type:: nirs4all.synthesis.reconstruction.distributions.DistributionResult

use_correlations

Whether to model parameter correlations.

Type:: bool

distribution_result: DistributionResult

sample(n_samples: int, random_state: int | None = None) → Dict[str, ndarray][source]

Sample parameters from fitted distributions.

Parameters:

n_samples – Number of samples to generate.
random_state – Random seed.

Returns:

Dict of parameter arrays with same structure as fit input.

sample_single(random_state: int | None = None) → Dict[str, ndarray][source]: Sample a single parameter set.

use_correlations: bool = True

class nirs4all.synthesis.reconstruction.PipelineResult(config: DatasetConfig, calibration: 'CalibrationResult' | None = None, inversion_results: List['InversionResult'] | None = None, distribution: 'DistributionResult' | None = None, X_synthetic: np.ndarray | None = None, validation: 'ValidationResult' | None = None, forward_chain: 'ForwardChain' | None = None)[source]

Bases: object

Result of reconstruction pipeline.

Contains all outputs from the reconstruction workflow: - Calibration results - Inversion results - Learned distributions - Generated synthetic data - Validation metrics

config

Dataset configuration used.

Type:: DatasetConfig

calibration

Global calibration result.

Type:: Optional[‘CalibrationResult’]

inversion_results

Per-sample inversion results.

Type:: Optional[List[‘InversionResult’]]

distribution

Learned parameter distributions.

Type:: Optional[‘DistributionResult’]

X_synthetic

Generated synthetic spectra.

Type:: Optional[np.ndarray]

validation

Validation result.

Type:: Optional[‘ValidationResult’]

forward_chain

Calibrated forward chain.

Type:: Optional[‘ForwardChain’]

X_synthetic: np.ndarray | None = None

calibration: 'CalibrationResult' | None = None

config: DatasetConfig

distribution: 'DistributionResult' | None = None

forward_chain: 'ForwardChain' | None = None

inversion_results: List['InversionResult'] | None = None

summary() → str[source]: Generate pipeline summary.

validation: 'ValidationResult' | None = None

class nirs4all.synthesis.reconstruction.PreprocessingOperator(preprocessing_type: Literal['none', 'first_derivative', 'second_derivative', 'snv', 'msc', 'detrend', 'mean_centered'] = 'none', sg_window: int = 15, sg_polyorder: int = 2, sg_deriv: int = 0, reference_spectrum: ndarray | None = None)[source]

Bases: object

Apply dataset preprocessing to match stored representation.

Implements exact preprocessing steps:

Savitzky-Golay derivatives (1st, 2nd order)
SNV (Standard Normal Variate)
MSC (Multiplicative Scatter Correction)
Detrend
Mean centering

preprocessing_type

Type of preprocessing.

Type:: Literal[‘none’, ‘first_derivative’, ‘second_derivative’, ‘snv’, ‘msc’, ‘detrend’, ‘mean_centered’]

sg_window

Savitzky-Golay window length.

Type:: int

sg_polyorder

Savitzky-Golay polynomial order.

Type:: int

sg_deriv

Derivative order (0, 1, 2).

Type:: int

reference_spectrum

Reference for MSC (mean of calibration set).

Type:: numpy.ndarray | None

apply(spectrum: ndarray) → ndarray[source]

Apply preprocessing to spectrum.

Parameters:: spectrum – Input spectrum, shape (n_wavelengths,) or (n_samples, n_wavelengths).
Returns:: Preprocessed spectrum(a).

apply_to_matrix(X: ndarray) → ndarray[source]: Apply preprocessing to design matrix columns.

classmethod from_detection(preprocessing_type: str, sg_window: int = 15, sg_polyorder: int = 2) → PreprocessingOperator[source]: Create PreprocessingOperator from detected preprocessing type.

preprocessing_type: Literal['none', 'first_derivative', 'second_derivative', 'snv', 'msc', 'detrend', 'mean_centered'] = 'none'

reference_spectrum: ndarray | None = None

sg_deriv: int = 0

sg_polyorder: int = 2

sg_window: int = 15

class nirs4all.synthesis.reconstruction.PrototypeSelector(n_prototypes: int = 5, include_median: bool = True, include_quantiles: bool = True, pca_components: int = 5)[source]

Bases: object

Select representative prototype spectra from a dataset.

Uses multiple strategies to ensure robust global calibration: 1. Median spectrum (robust central tendency) 2. Quantile spectra (25%, 75% in PC1) 3. K-medoids in PCA space (capture diversity)

n_prototypes

Number of prototypes to select.

Type:: int

include_median

Always include median spectrum.

Type:: bool

include_quantiles

Include quantile spectra.

Type:: bool

pca_components

Number of PCA components for clustering.

Type:: int

include_median: bool = True

include_quantiles: bool = True

n_prototypes: int = 5

pca_components: int = 5

select(X: ndarray) → Tuple[ndarray, ndarray][source]

Select prototype spectra.

Parameters:: X – Spectra matrix (n_samples, n_wavelengths).
Returns:: Tuple of (prototype_spectra, prototype_indices).

class nirs4all.synthesis.reconstruction.ReconstructionGenerator(noise_level: float = 0.001, multiplicative_noise: float = 0.01, add_noise: bool = True, noise_type: str = 'both')[source]

Bases: object

Generate synthetic spectra from learned parameter distributions.

Uses the calibrated forward chain and learned parameter distributions to generate realistic synthetic data that matches the statistical properties of the original dataset.

forward_chain: Calibrated forward chain.

sampler: Parameter sampler with learned distributions.

noise_estimator: Estimated noise level from inversion residuals.

add_noise

Whether to add noise to generated spectra.

Type:: bool

noise_type

Type of noise (‘additive’, ‘multiplicative’, ‘both’).

Type:: str

add_noise: bool = True

generate(n_samples: int, forward_chain: ForwardChain, sampler: ParameterSampler, random_state: int | None = None) → GenerationResult[source]

Generate synthetic spectra.

Parameters:

n_samples – Number of samples to generate.
forward_chain – Calibrated forward chain.
sampler – Parameter sampler.
random_state – Random seed.

Returns:

GenerationResult with generated spectra and parameters.

generate_matched(X_real: np.ndarray, forward_chain: ForwardChain, sampler: ParameterSampler, random_state: int | None = None) → GenerationResult[source]

Generate synthetic data matched to real data statistics.

Generates same number of samples as real data and optionally adjusts noise level based on estimated residuals.

Parameters:

X_real – Real data matrix for reference.
forward_chain – Calibrated forward chain.
sampler – Parameter sampler.
random_state – Random seed.

Returns:

GenerationResult.

multiplicative_noise: float = 0.01

noise_level: float = 0.001

noise_type: str = 'both'

class nirs4all.synthesis.reconstruction.ReconstructionPipeline(config: DatasetConfig, component_names: List[str] | None = None, canonical_resolution: float = 0.5, baseline_order: int = 5, continuum_order: int = 3, n_prototypes: int = 5, fit_environmental: bool = False, verbose: bool = True)[source]

Bases: object

Complete reconstruction pipeline.

Orchestrates the full workflow: 1. Configuration and component selection 2. Prototype selection and global calibration 3. Per-sample inversion (optionally with environmental parameters) 4. Parameter distribution learning 5. Synthetic generation 6. Validation

config

Dataset configuration.

Type:: nirs4all.synthesis.reconstruction.pipeline.DatasetConfig

component_names

Components to use (auto-selected if None).

Type:: List[str] | None

canonical_resolution

Resolution of canonical grid (nm).

Type:: float

baseline_order

Baseline polynomial order.

Type:: int

n_prototypes

Number of prototypes for calibration.

Type:: int

fit_environmental

Whether to fit environmental parameters.

Type:: bool

verbose

Print progress.

Type:: bool

__post_init__()[source]: Initialize components if not provided.

baseline_order: int = 5

canonical_resolution: float = 0.5

component_names: List[str] | None = None

config: DatasetConfig

continuum_order: int = 3

fit(X: ndarray, max_samples: int | None = None) → PipelineResult[source]

Run full reconstruction pipeline.

Parameters:

X – Spectra matrix (n_samples, n_wavelengths).
max_samples – Max samples to invert (for speed).

Returns:

PipelineResult with all outputs.

fit_environmental: bool = False

generate(n_samples: int, result: PipelineResult, random_state: int | None = None) → ndarray[source]

Generate additional synthetic samples using fitted pipeline.

Parameters:

n_samples – Number of samples to generate.
result – PipelineResult from fit().
random_state – Random seed.

Returns:

Synthetic spectra matrix.

n_prototypes: int = 5

verbose: bool = True

class nirs4all.synthesis.reconstruction.ReconstructionValidator(r2_threshold: float = 0.9, residual_autocorr_threshold: float = 0.3, pca_distance_threshold: float = 3.0, concentration_max: float = 10.0, path_length_bounds: Tuple[float, float] = (0.3, 3.0))[source]

Bases: object

Validate reconstruction quality and synthetic realism.

Checks: 1. Residuals should be structureless (no systematic patterns) 2. Synthetic should match real in PCA space 3. Per-wavelength statistics should be similar 4. Parameters should be physically plausible

r2_threshold

Minimum acceptable R² for reconstruction.

Type:: float

residual_autocorr_threshold

Max autocorrelation in residuals.

Type:: float

pca_distance_threshold

Max Mahalanobis distance in PCA space.

Type:: float

concentration_max

Max plausible concentration value.

Type:: float

concentration_max: float = 10.0

path_length_bounds: Tuple[float, float] = (0.3, 3.0)

pca_distance_threshold: float = 3.0

r2_threshold: float = 0.9

residual_autocorr_threshold: float = 0.3

validate(inversion_results: List['InversionResult'], X_real: np.ndarray, X_synth: np.ndarray) → ValidationResult[source]

Run full validation.

Parameters:

inversion_results – Inversion results.
X_real – Real data.
X_synth – Synthetic data.

Returns:

ValidationResult.

validate_parameters(inversion_results: List['InversionResult']) → Dict[str, Any][source]

Validate parameter plausibility.

Parameters:: inversion_results – List of inversion results.
Returns:: Dict of parameter metrics.

validate_reconstruction(inversion_results: List['InversionResult']) → Dict[str, Any][source]

Validate reconstruction quality.

Parameters:: inversion_results – List of inversion results.
Returns:: Dict of reconstruction metrics.

validate_synthetic(X_real: ndarray, X_synth: ndarray) → Dict[str, Any][source]

Validate synthetic vs real data.

Parameters:

X_real – Real data matrix.
X_synth – Synthetic data matrix.

Returns:

Dict of comparison metrics.

class nirs4all.synthesis.reconstruction.ValidationResult(reconstruction_metrics: Dict[str, ~typing.Any]=<factory>, synthetic_metrics: Dict[str, ~typing.Any]=<factory>, parameter_metrics: Dict[str, ~typing.Any]=<factory>, overall_score: float = 0.0, passed: bool = False, warnings: List[str] = <factory>)[source]

Bases: object

Result of reconstruction validation.

reconstruction_metrics

Per-sample reconstruction quality.

Type:: Dict[str, Any]

synthetic_metrics

Synthetic vs real comparison metrics.

Type:: Dict[str, Any]

parameter_metrics

Parameter plausibility metrics.

Type:: Dict[str, Any]

overall_score

Combined quality score (0-100).

Type:: float

passed

Whether all quality checks passed.

Type:: bool

warnings

List of warning messages.

Type:: List[str]

overall_score: float = 0.0

parameter_metrics: Dict[str, Any]

passed: bool = False

reconstruction_metrics: Dict[str, Any]

summary() → str[source]: Generate human-readable summary.

synthetic_metrics: Dict[str, Any]

warnings: List[str]

class nirs4all.synthesis.reconstruction.VariableProjectionSolver(path_length_bounds: Tuple[float, float] = (0.5, 2.0), wl_shift_bounds: Tuple[float, float] = (-2.0, 2.0), concentration_regularization: float = 1e-06, baseline_smoothness_penalty: float = 0.0001, use_derivatives: bool = False, verbose: bool = False, fit_environmental: bool = False, temperature_bounds: Tuple[float, float] = (-15.0, 15.0), water_activity_bounds: Tuple[float, float] = (0.1, 0.9), scattering_power_bounds: Tuple[float, float] = (0.5, 3.0), scattering_amplitude_bounds: Tuple[float, float] = (0.0, 0.2), environmental_prior_weight: float = 0.1)[source]

Bases: object

Variable projection solver for spectral inversion.

Separates optimization into: - Nonlinear params: path_length, per-sample wl_shift, [environmental] (outer loop) - Linear params: concentrations, baseline, continuum (inner NNLS/QP)

path_length_bounds

Bounds for path length.

Type:: Tuple[float, float]

wl_shift_bounds

Per-sample wavelength shift bounds.

Type:: Tuple[float, float]

concentration_regularization

L2 reg on concentrations.

Type:: float

baseline_smoothness_penalty

Penalty on baseline curvature.

Type:: float

use_derivatives

Fit in derivative space (for derivative data).

Type:: bool

fit_environmental

Whether to fit environmental parameters.

Type:: bool

temperature_bounds

Bounds for temperature deviation (°C).

Type:: Tuple[float, float]

water_activity_bounds

Bounds for water activity.

Type:: Tuple[float, float]

scattering_power_bounds

Bounds for scattering exponent.

Type:: Tuple[float, float]

scattering_amplitude_bounds

Bounds for scattering amplitude.

Type:: Tuple[float, float]

environmental_prior_weight

Weight for environmental parameter priors.

Type:: float

baseline_smoothness_penalty: float = 0.0001

concentration_regularization: float = 1e-06

environmental_prior_weight: float = 0.1

fit(target: np.ndarray, forward_chain: ForwardChain, schedule: MultiscaleSchedule | None = None, initial_params: Dict[str, float] | None = None) → InversionResult[source]

Fit forward model to target spectrum.

Parameters:

target – Target spectrum to fit.
forward_chain – Forward chain with calibrated global params.
schedule – Multiscale fitting schedule.
initial_params – Initial nonlinear parameters.

Returns:

InversionResult with fitted parameters.

fit_batch(X: np.ndarray, forward_chain: ForwardChain, schedule: MultiscaleSchedule | None = None, n_jobs: int = 1) → List[InversionResult][source]

Fit multiple spectra.

Parameters:

X – Spectra matrix (n_samples, n_wavelengths).
forward_chain – Forward chain with calibrated global params.
schedule – Multiscale fitting schedule.
n_jobs – Number of parallel jobs (1 = sequential).

Returns:

List of InversionResult for each sample.

fit_environmental: bool = False

path_length_bounds: Tuple[float, float] = (0.5, 2.0)

scattering_amplitude_bounds: Tuple[float, float] = (0.0, 0.2)

scattering_power_bounds: Tuple[float, float] = (0.5, 3.0)

temperature_bounds: Tuple[float, float] = (-15.0, 15.0)

use_derivatives: bool = False

verbose: bool = False

water_activity_bounds: Tuple[float, float] = (0.1, 0.9)

wl_shift_bounds: Tuple[float, float] = (-2.0, 2.0)

nirs4all.synthesis.reconstruction.reconstruct_and_generate(X: ndarray, wavelengths: ndarray, n_synthetic: int | None = None, domain: str = 'unknown', component_names: List[str] | None = None, fit_environmental: bool = False, verbose: bool = True) → Tuple[ndarray, PipelineResult][source]

Convenience function for end-to-end reconstruction and generation.

Parameters:

X – Real spectra matrix.
wavelengths – Wavelength grid.
n_synthetic – Number of synthetic samples (default: same as X).
domain – Application domain.
component_names – Components to use.
fit_environmental – Whether to fit environmental parameters (temperature, water activity, scattering).
verbose – Print progress.

Returns:

Tuple of (X_synthetic, PipelineResult).