nirs4all.synthesis.reconstruction package

Submodules

Module contents

Physical signal-chain reconstruction and variance modeling for NIR spectra.

This module implements a physically realistic “full signal-chain” reconstruction workflow that: 1. Reconstructs spectra using a physical forward model (Beer-Lambert + instrument chain) 2. Learns distributions of physical parameters for variance modeling 3. Generates realistic synthetic datasets by sampling from learned distributions

Key Components:
  • CanonicalForwardModel: Physical model on canonical grid

  • InstrumentModel: Wavelength warp, ILS convolution, gain/offset

  • EnvironmentalEffectsModel: Temperature, moisture, and scattering effects

  • DomainModel: Absorbance/reflectance transformation

  • PreprocessingOperator: Match dataset preprocessing (SG derivatives, SNV, etc.)

  • VariableProjectionSolver: NNLS inner solve + nonlinear outer optimization

  • GlobalCalibrator: Prototype-based instrument parameter estimation

  • ParameterDistributionFitter: Learn distributions in parameter space

  • ReconstructionGenerator: Generate synthetic data from learned distributions

Example

>>> from nirs4all.synthesis.reconstruction import (
...     ReconstructionPipeline,
...     DatasetConfig,
... )
>>>
>>> # Configure for a dataset
>>> config = DatasetConfig(
...     wavelengths=wavelengths,
...     signal_type="absorbance",
...     preprocessing="first_derivative",
...     domain="food_dairy",
... )
>>>
>>> # Run full reconstruction pipeline
>>> pipeline = ReconstructionPipeline(config)
>>> result = pipeline.fit(X_real)
>>>
>>> # Generate synthetic data
>>> X_synth = pipeline.generate(n_samples=1000)

References

  • Burns, D. A., & Ciurczak, E. W. (2007). Handbook of Near-Infrared Analysis.

  • Workman Jr, J., & Weyer, L. (2012). Practical Guide and Spectral Atlas for Interpretive Near-Infrared Spectroscopy.

class nirs4all.synthesis.reconstruction.CalibrationResult(wl_shift: float = 0.0, wl_stretch: float = 1.0, ils_sigma: float = 4.0, stray_light: float = 0.0, gain: float = 1.0, offset: float = 0.0, prototype_residuals: ndarray | None = None, prototype_r2: ndarray | None = None, total_loss: float = inf)[source]

Bases: object

Result of global calibration.

wl_shift

Calibrated wavelength shift.

Type:

float

wl_stretch

Calibrated wavelength stretch.

Type:

float

ils_sigma

Calibrated ILS width.

Type:

float

stray_light

Calibrated stray light fraction.

Type:

float

gain

Calibrated photometric gain.

Type:

float

offset

Calibrated photometric offset.

Type:

float

prototype_residuals

Residuals for each prototype.

Type:

numpy.ndarray | None

prototype_r2

R² for each prototype.

Type:

numpy.ndarray | None

total_loss

Total calibration loss.

Type:

float

classmethod from_array(params: ndarray) CalibrationResult[source]

Create from parameter array [wl_shift, wl_stretch, ils_sigma].

gain: float = 1.0
ils_sigma: float = 4.0
offset: float = 0.0
prototype_r2: ndarray | None = None
prototype_residuals: ndarray | None = None
stray_light: float = 0.0
to_dict() Dict[str, float][source]

Convert to parameter dictionary.

total_loss: float = inf
wl_shift: float = 0.0
wl_stretch: float = 1.0
class nirs4all.synthesis.reconstruction.CanonicalForwardModel(canonical_grid: ndarray, component_names: List[str] = <factory>, baseline_order: int = 5, continuum_order: int = 3, _component_spectra: ndarray | None = None, _baseline_basis: ndarray | None = None, _continuum_basis: ndarray | None = None)[source]

Bases: object

Physical model on canonical high-resolution wavelength grid.

Computes absorption coefficient K(λ) from chemical components:

K(λ) = Σ c_k * ε_k(λ) + K0(λ)

where:
  • c_k: concentration of component k

  • ε_k(λ): molar absorptivity (from component library)

  • K0(λ): continuum/background absorption (low-frequency)

canonical_grid

High-resolution wavelength grid (nm).

Type:

numpy.ndarray

component_names

Names of components to include.

Type:

List[str]

component_spectra

Pre-computed component spectra on canonical grid.

baseline_order

Order of Chebyshev baseline polynomial.

Type:

int

continuum_order

Order of continuum absorption polynomial.

Type:

int

__post_init__()[source]

Initialize component spectra and basis matrices.

baseline_order: int = 5
canonical_grid: ndarray
component_names: List[str]
compute_absorption(concentrations: ndarray, path_length: float = 1.0, baseline_coeffs: ndarray | None = None, continuum_coeffs: ndarray | None = None) ndarray[source]

Compute absorption coefficient on canonical grid.

Parameters:
  • concentrations – Component concentrations, shape (n_components,).

  • path_length – Optical path length factor.

  • baseline_coeffs – Baseline polynomial coefficients.

  • continuum_coeffs – Continuum absorption coefficients.

Returns:

Absorbance spectrum on canonical grid.

continuum_order: int = 3
get_design_matrix(path_length: float = 1.0) ndarray[source]

Get full design matrix for linear fitting.

Returns:

Design matrix of shape (n_wavelengths, n_components + n_baseline + n_continuum).

property n_baseline: int

Number of baseline coefficients.

property n_components: int

Number of chemical components.

property n_continuum: int

Number of continuum coefficients.

property n_linear_params: int

Total number of linear parameters.

class nirs4all.synthesis.reconstruction.DatasetConfig(wavelengths: ndarray, signal_type: Literal['absorbance', 'reflectance', 'unknown'] = 'absorbance', preprocessing: Literal['none', 'first_derivative', 'second_derivative', 'snv', 'msc', 'unknown'] = 'none', domain: str = 'unknown', sg_window: int = 15, sg_polyorder: int = 2, name: str = 'dataset')[source]

Bases: object

Configuration for a dataset to be reconstructed.

Captures all dataset-specific information needed for reconstruction: - Wavelength grid - Signal type (absorbance, reflectance) - Preprocessing applied - Application domain (for component selection)

wavelengths

Wavelength grid in nm.

Type:

numpy.ndarray

signal_type

Signal type (‘absorbance’, ‘reflectance’).

Type:

Literal[‘absorbance’, ‘reflectance’, ‘unknown’]

preprocessing

Detected or specified preprocessing type.

Type:

Literal[‘none’, ‘first_derivative’, ‘second_derivative’, ‘snv’, ‘msc’, ‘unknown’]

domain

Application domain for component selection.

Type:

str

sg_window

Savitzky-Golay window (for derivatives).

Type:

int

sg_polyorder

Savitzky-Golay polynomial order.

Type:

int

name

Optional dataset name.

Type:

str

domain: str = 'unknown'
classmethod from_data(X: ndarray, wavelengths: ndarray, name: str = 'dataset') DatasetConfig[source]

Create configuration by auto-detecting properties from data.

Parameters:
  • X – Spectra matrix (n_samples, n_wavelengths).

  • wavelengths – Wavelength grid.

  • name – Dataset name.

Returns:

DatasetConfig with detected properties.

name: str = 'dataset'
preprocessing: Literal['none', 'first_derivative', 'second_derivative', 'snv', 'msc', 'unknown'] = 'none'
sg_polyorder: int = 2
sg_window: int = 15
signal_type: Literal['absorbance', 'reflectance', 'unknown'] = 'absorbance'
wavelengths: ndarray
class nirs4all.synthesis.reconstruction.DistributionResult(param_names: ~typing.List[str], distributions: ~typing.Dict[str, ~typing.Dict[str, ~typing.Any]], correlations: ~numpy.ndarray | None = None, factor_loadings: ~numpy.ndarray | None = None, transform_params: ~typing.Dict[str, ~typing.Dict[str, ~typing.Any]] = <factory>, n_samples_fitted: int = 0)[source]

Bases: object

Result of parameter distribution fitting.

param_names

Names of parameters.

Type:

List[str]

distributions

Dict of distribution parameters for each param.

Type:

Dict[str, Dict[str, Any]]

correlations

Correlation matrix of transformed parameters.

Type:

numpy.ndarray | None

factor_loadings

Low-rank factor model loadings (optional).

Type:

numpy.ndarray | None

transform_params

Parameters for transformations (log, etc.).

Type:

Dict[str, Dict[str, Any]]

n_samples_fitted

Number of samples used for fitting.

Type:

int

correlations: ndarray | None = None
distributions: Dict[str, Dict[str, Any]]
factor_loadings: ndarray | None = None
n_samples_fitted: int = 0
param_names: List[str]
summary() str[source]

Generate human-readable summary.

transform_params: Dict[str, Dict[str, Any]]
class nirs4all.synthesis.reconstruction.DomainTransform(domain: Literal['absorbance', 'reflectance', 'transmittance', 'km'] = 'absorbance', scatter_coeffs: ndarray | None = None, scatter_wavelength_exp: float = 0.0)[source]

Bases: object

Transform between physical domains (absorbance, reflectance, etc.).

For absorbance datasets: A(λ) = absorption coefficient (direct) For reflectance datasets: R(λ) computed via Kubelka-Munk or approximation

domain

Domain type (‘absorbance’, ‘reflectance’, ‘transmittance’, ‘km’).

Type:

Literal[‘absorbance’, ‘reflectance’, ‘transmittance’, ‘km’]

scatter_coeffs

Scattering coefficients for KM model (reflectance).

Type:

numpy.ndarray | None

scatter_wavelength_dep

Wavelength-dependent scatter (λ^-n).

domain: Literal['absorbance', 'reflectance', 'transmittance', 'km'] = 'absorbance'
inverse_transform(spectrum: ndarray, wavelengths: ndarray, scatter: ndarray | None = None) ndarray[source]

Inverse transform from domain to absorption.

Parameters:
  • spectrum – Spectrum in domain representation.

  • wavelengths – Wavelength grid.

  • scatter – Scattering coefficient for reflectance.

Returns:

Absorption coefficient.

scatter_coeffs: ndarray | None = None
scatter_wavelength_exp: float = 0.0
transform(absorption: ndarray, wavelengths: ndarray, scatter: ndarray | None = None) ndarray[source]

Transform absorption to target domain.

Parameters:
  • absorption – Absorption coefficient K(λ).

  • wavelengths – Wavelength grid.

  • scatter – Scattering coefficient S(λ) for reflectance.

Returns:

Spectrum in target domain representation.

class nirs4all.synthesis.reconstruction.EnvironmentalEffectsModel(temperature_delta: float = 0.0, water_activity: float = 0.5, scattering_power: float = 1.5, scattering_amplitude: float = 0.0, enabled: bool = True, reference_wavelength: float = 1500.0, _region_masks: Dict[str, ndarray] | None = None, _cached_wavelengths: ndarray | None = None)[source]

Bases: object

Environmental effects on the canonical absorption spectrum.

Applied to absorption in canonical space before domain transform and instrument effects. Implements region-specific temperature and moisture effects based on literature parameters.

temperature_delta

Temperature deviation from reference (25°C).

Type:

float

water_activity

Effective water activity (0-1 scale).

Type:

float

scattering_power

Wavelength-dependent scattering exponent (λ^-n).

Type:

float

scattering_amplitude

Amplitude of scattering baseline.

Type:

float

enabled

Whether to apply environmental effects.

Type:

bool

reference_wavelength

Reference wavelength for scattering normalization (nm).

Type:

float

apply(absorption: ndarray, wavelengths: ndarray) ndarray[source]

Apply environmental effects to absorption spectrum.

Effects are applied in order: 1. Temperature effects (region-specific shifts, intensity changes) 2. Moisture effects (water band shifts based on water activity) 3. Scattering baseline (wavelength-dependent λ^-n)

Parameters:
  • absorption – Absorption coefficient on canonical grid.

  • wavelengths – Wavelength grid (nm).

Returns:

Modified absorption spectrum with environmental effects.

copy() EnvironmentalEffectsModel[source]

Create a copy of this model.

enabled: bool = True
classmethod from_dict(d: Dict[str, Any]) EnvironmentalEffectsModel[source]

Create from dictionary.

get_jacobian_wrt_scattering_amplitude(absorption: ndarray, wavelengths: ndarray, eps: float = 0.001) ndarray[source]

Numerical Jacobian w.r.t. scattering_amplitude.

get_jacobian_wrt_scattering_power(absorption: ndarray, wavelengths: ndarray, eps: float = 0.05) ndarray[source]

Numerical Jacobian w.r.t. scattering_power.

get_jacobian_wrt_temperature(absorption: ndarray, wavelengths: ndarray, eps: float = 0.1) ndarray[source]

Numerical Jacobian w.r.t. temperature_delta.

get_jacobian_wrt_water_activity(absorption: ndarray, wavelengths: ndarray, eps: float = 0.01) ndarray[source]

Numerical Jacobian w.r.t. water_activity.

reference_wavelength: float = 1500.0
scattering_amplitude: float = 0.0
scattering_power: float = 1.5
temperature_delta: float = 0.0
to_dict() Dict[str, Any][source]

Convert to dictionary.

water_activity: float = 0.5
class nirs4all.synthesis.reconstruction.EnvironmentalParameterConfig(temperature_bounds: Tuple[float, float] = (-15.0, 15.0), temperature_prior_mean: float = 0.0, temperature_prior_std: float = 5.0, water_activity_bounds: Tuple[float, float] = (0.1, 0.9), water_activity_prior_alpha: float = 2.0, water_activity_prior_beta: float = 2.0, scattering_power_bounds: Tuple[float, float] = (0.5, 3.0), scattering_power_prior_mean: float = 1.5, scattering_power_prior_std: float = 0.5, scattering_amplitude_bounds: Tuple[float, float] = (0.0, 0.2), scattering_amplitude_prior_scale: float = 0.02)[source]

Bases: object

Configuration for environmental parameter fitting.

Defines bounds and prior distributions for each parameter.

compute_prior_penalty(temperature_delta: float, water_activity: float, scattering_power: float, scattering_amplitude: float) float[source]

Compute prior penalty for regularization.

Returns negative log-prior (to be added to objective function).

get_bounds_list() List[Tuple[float, float]][source]

Get list of bounds for all 4 environmental parameters.

scattering_amplitude_bounds: Tuple[float, float] = (0.0, 0.2)
scattering_amplitude_prior_scale: float = 0.02
scattering_power_bounds: Tuple[float, float] = (0.5, 3.0)
scattering_power_prior_mean: float = 1.5
scattering_power_prior_std: float = 0.5
temperature_bounds: Tuple[float, float] = (-15.0, 15.0)
temperature_prior_mean: float = 0.0
temperature_prior_std: float = 5.0
water_activity_bounds: Tuple[float, float] = (0.1, 0.9)
water_activity_prior_alpha: float = 2.0
water_activity_prior_beta: float = 2.0
class nirs4all.synthesis.reconstruction.ForwardChain(canonical_model: CanonicalForwardModel, instrument_model: InstrumentModel, domain_transform: DomainTransform, preprocessing: PreprocessingOperator, environmental_model: 'EnvironmentalEffectsModel' | None = None)[source]

Bases: object

Complete forward measurement chain combining all components.

Chain: CanonicalForwardModel → [EnvironmentalEffects] → DomainTransform → InstrumentModel → PreprocessingOperator

canonical_model

Physical model on canonical grid.

Type:

CanonicalForwardModel

environmental_model

Optional environmental effects (temperature, moisture, scattering).

Type:

Optional[‘EnvironmentalEffectsModel’]

instrument_model

Instrument effects.

Type:

InstrumentModel

domain_transform

Domain conversion.

Type:

DomainTransform

preprocessing

Dataset preprocessing.

Type:

PreprocessingOperator

canonical_model: CanonicalForwardModel
classmethod create(canonical_grid: ndarray, target_grid: ndarray, component_names: List[str], domain: str = 'absorbance', preprocessing_type: str = 'none', instrument_params: Dict[str, float] | None = None, baseline_order: int = 5, continuum_order: int = 3, sg_window: int = 15, sg_polyorder: int = 2, include_environmental: bool = False) ForwardChain[source]

Convenience factory method to create ForwardChain.

Parameters:
  • canonical_grid – High-resolution canonical wavelength grid.

  • target_grid – Target dataset wavelength grid.

  • component_names – Names of components to include.

  • domain – Domain type (‘absorbance’, ‘reflectance’).

  • preprocessing_type – Preprocessing type.

  • instrument_params – Instrument parameters dict.

  • baseline_order – Baseline polynomial order.

  • continuum_order – Continuum polynomial order.

  • sg_window – Savitzky-Golay window.

  • sg_polyorder – Savitzky-Golay polynomial order.

  • include_environmental – Whether to include environmental effects model.

Returns:

Configured ForwardChain instance.

domain_transform: DomainTransform
environmental_model: 'EnvironmentalEffectsModel' | None = None
forward(concentrations: ndarray, path_length: float = 1.0, baseline_coeffs: ndarray | None = None, continuum_coeffs: ndarray | None = None, scatter: ndarray | None = None) ndarray[source]

Run full forward chain.

Parameters:
  • concentrations – Component concentrations.

  • path_length – Optical path length factor.

  • baseline_coeffs – Baseline polynomial coefficients.

  • continuum_coeffs – Continuum absorption coefficients.

  • scatter – Scattering coefficients for reflectance.

Returns:

Spectrum on target grid with preprocessing applied.

forward_design_matrix(path_length: float = 1.0) ndarray[source]

Get transformed design matrix for linear fitting.

Returns the design matrix after applying instrument and preprocessing transforms. Note: Domain transform is not applied here as it may be nonlinear (KM).

instrument_model: InstrumentModel
preprocessing: PreprocessingOperator
class nirs4all.synthesis.reconstruction.GenerationResult(X: ndarray, concentrations: ndarray, path_lengths: ndarray, baseline_coeffs: ndarray, wavelengths: ndarray, noise_level: float = 0.0, wl_shifts: ndarray | None = None, temperature_deltas: ndarray | None = None, water_activities: ndarray | None = None, scattering_powers: ndarray | None = None, scattering_amplitudes: ndarray | None = None)[source]

Bases: object

Result of synthetic generation.

X

Generated spectra (n_samples, n_wavelengths).

Type:

numpy.ndarray

concentrations

Sampled concentrations (n_samples, n_components).

Type:

numpy.ndarray

path_lengths

Sampled path lengths (n_samples,).

Type:

numpy.ndarray

baseline_coeffs

Sampled baseline coefficients.

Type:

numpy.ndarray

wavelengths

Wavelength grid.

Type:

numpy.ndarray

noise_level

Applied noise level.

Type:

float

wl_shifts

Per-sample wavelength shifts.

Type:

numpy.ndarray | None

temperature_deltas

Per-sample temperature deviations (°C).

Type:

numpy.ndarray | None

water_activities

Per-sample water activity values.

Type:

numpy.ndarray | None

scattering_powers

Per-sample scattering exponents.

Type:

numpy.ndarray | None

scattering_amplitudes

Per-sample scattering amplitudes.

Type:

numpy.ndarray | None

X: ndarray
baseline_coeffs: ndarray
concentrations: ndarray
property n_samples: int

Number of generated samples.

property n_wavelengths: int

Number of wavelengths.

noise_level: float = 0.0
path_lengths: ndarray
scattering_amplitudes: ndarray | None = None
scattering_powers: ndarray | None = None
temperature_deltas: ndarray | None = None
water_activities: ndarray | None = None
wavelengths: ndarray
wl_shifts: ndarray | None = None
class nirs4all.synthesis.reconstruction.GlobalCalibrator(wl_shift_bounds: Tuple[float, float] = (-10.0, 10.0), wl_stretch_bounds: Tuple[float, float] = (0.98, 1.02), ils_sigma_bounds: Tuple[float, float] = (2.0, 20.0), regularization: float = 1e-06, use_global_search: bool = False)[source]

Bases: object

Calibrate global instrument parameters using prototype spectra.

Optimizes θ_global = {wl_shift, wl_stretch, ils_sigma} to minimize total fitting loss across all prototypes, with per-prototype linear parameters solved via NNLS.

forward_chain

ForwardChain for computing model predictions.

wl_shift_bounds

Bounds for wavelength shift.

Type:

Tuple[float, float]

wl_stretch_bounds

Bounds for wavelength stretch.

Type:

Tuple[float, float]

ils_sigma_bounds

Bounds for ILS sigma.

Type:

Tuple[float, float]

regularization

L2 regularization strength.

Type:

float

Use differential evolution for global search.

Type:

bool

calibrate(prototypes: np.ndarray, forward_chain: ForwardChain, initial_guess: np.ndarray | None = None) CalibrationResult[source]

Calibrate global parameters on prototype spectra.

Parameters:
  • prototypes – Prototype spectra (n_prototypes, n_wavelengths).

  • forward_chain – Forward chain for model evaluation.

  • initial_guess – Initial [wl_shift, wl_stretch, ils_sigma].

Returns:

CalibrationResult with optimized parameters.

ils_sigma_bounds: Tuple[float, float] = (2.0, 20.0)
refine(current_result: CalibrationResult, prototypes: np.ndarray, forward_chain: ForwardChain) CalibrationResult[source]

Refine calibration with tighter bounds around current estimate.

Parameters:
  • current_result – Current calibration result.

  • prototypes – Prototype spectra.

  • forward_chain – Forward chain.

Returns:

Refined CalibrationResult.

regularization: float = 1e-06
use_global_search: bool = False
wl_shift_bounds: Tuple[float, float] = (-10.0, 10.0)
wl_stretch_bounds: Tuple[float, float] = (0.98, 1.02)
class nirs4all.synthesis.reconstruction.InstrumentModel(target_grid: ndarray, wl_shift: float = 0.0, wl_stretch: float = 1.0, wl_poly_coeffs: ndarray | None = None, ils_sigma: float = 4.0, stray_light: float = 0.0, gain: float = 1.0, offset: float = 0.0)[source]

Bases: object

Instrument effects: warp, ILS convolution, gain/offset, resampling.

Transforms spectrum from canonical grid to target instrument grid:
  1. Wavelength warp: λ* → λ’ (shift + stretch + optional higher order)

  2. ILS convolution: Gaussian or Voigt line shape

  3. Stray light / gain / offset

  4. Resample to target grid

target_grid

Target wavelength grid (dataset grid).

Type:

numpy.ndarray

wl_shift

Wavelength shift in nm (default 0).

Type:

float

wl_stretch

Wavelength scale factor (default 1).

Type:

float

wl_poly_coeffs

Higher-order polynomial warp coefficients.

Type:

numpy.ndarray | None

ils_sigma

Instrument line shape Gaussian sigma in nm.

Type:

float

stray_light

Stray light fraction (default 0).

Type:

float

gain

Photometric gain (default 1).

Type:

float

offset

Photometric offset (default 0).

Type:

float

apply(spectrum: ndarray, canonical_grid: ndarray) ndarray[source]

Apply instrument chain to transform spectrum.

Parameters:
  • spectrum – Input spectrum on canonical grid.

  • canonical_grid – Canonical wavelength grid.

Returns:

Transformed spectrum on target grid.

classmethod from_params(target_grid: ndarray, params: Dict[str, float]) InstrumentModel[source]

Create InstrumentModel from parameter dictionary.

gain: float = 1.0
get_jacobian_wrt_ils_sigma(spectrum: ndarray, canonical_grid: ndarray, eps: float = 0.1) ndarray[source]

Numerical Jacobian w.r.t. ILS sigma.

get_jacobian_wrt_wl_shift(spectrum: ndarray, canonical_grid: ndarray, eps: float = 0.1) ndarray[source]

Numerical Jacobian w.r.t. wavelength shift.

ils_sigma: float = 4.0
offset: float = 0.0
stray_light: float = 0.0
target_grid: ndarray
wl_poly_coeffs: ndarray | None = None
wl_shift: float = 0.0
wl_stretch: float = 1.0
class nirs4all.synthesis.reconstruction.InversionResult(concentrations: ndarray, baseline_coeffs: ndarray, continuum_coeffs: ndarray | None = None, path_length: float = 1.0, wl_shift_residual: float = 0.0, scatter_coeffs: ndarray | None = None, fitted_spectrum: ndarray | None = None, residuals: ndarray | None = None, r_squared: float = 0.0, rmse: float = inf, converged: bool = False, temperature_delta: float = 0.0, water_activity: float = 0.5, scattering_power: float = 1.5, scattering_amplitude: float = 0.0)[source]

Bases: object

Result of per-sample inversion.

concentrations

Fitted component concentrations.

Type:

numpy.ndarray

baseline_coeffs

Fitted baseline coefficients.

Type:

numpy.ndarray

continuum_coeffs

Fitted continuum coefficients.

Type:

numpy.ndarray | None

path_length

Fitted path length factor.

Type:

float

wl_shift_residual

Per-sample wavelength shift correction.

Type:

float

scatter_coeffs

Fitted scatter coefficients (reflectance).

Type:

numpy.ndarray | None

fitted_spectrum

Reconstructed spectrum.

Type:

numpy.ndarray | None

residuals

Fitting residuals.

Type:

numpy.ndarray | None

r_squared

Coefficient of determination.

Type:

float

rmse

Root mean squared error.

Type:

float

converged

Whether optimization converged.

Type:

bool

temperature_delta

Fitted temperature deviation (°C from reference).

Type:

float

water_activity

Fitted water activity (0-1 scale).

Type:

float

scattering_power

Fitted scattering wavelength exponent.

Type:

float

scattering_amplitude

Fitted scattering baseline amplitude.

Type:

float

baseline_coeffs: ndarray
concentrations: ndarray
continuum_coeffs: ndarray | None = None
converged: bool = False
fitted_spectrum: ndarray | None = None
property linear_params: ndarray

Get all linear parameters as single array.

path_length: float = 1.0
r_squared: float = 0.0
residuals: ndarray | None = None
rmse: float = inf
scatter_coeffs: ndarray | None = None
scattering_amplitude: float = 0.0
scattering_power: float = 1.5
temperature_delta: float = 0.0
to_dict() Dict[str, Any][source]

Convert to dictionary.

water_activity: float = 0.5
wl_shift_residual: float = 0.0
class nirs4all.synthesis.reconstruction.MultiscaleSchedule(smooth_sigmas: List[float] = <factory>, derivative_weights: List[float] = <factory>, baseline_regularization: List[float] = <factory>, max_iterations: List[int] = <factory>)[source]

Bases: object

Configuration for multiscale fitting curriculum.

Fits coarse features first, then progressively adds detail: 1. Smooth target + no derivatives + strong baseline prior 2. Less smooth + partial derivative weight 3. Full resolution + full preprocessing

smooth_sigmas

Gaussian sigma values for each stage (0 = no smoothing).

Type:

List[float]

derivative_weights

Weight on derivative space at each stage.

Type:

List[float]

baseline_regularization

Baseline regularization at each stage.

Type:

List[float]

max_iterations

Max iterations at each stage.

Type:

List[int]

baseline_regularization: List[float]
derivative_weights: List[float]
max_iterations: List[int]
property n_stages: int

Number of stages in schedule.

classmethod quick() MultiscaleSchedule[source]

Quick schedule for fast fitting.

smooth_sigmas: List[float]
classmethod thorough() MultiscaleSchedule[source]

Thorough schedule for best accuracy.

class nirs4all.synthesis.reconstruction.ParameterDistributionFitter(positive_params: List[str] = <factory>, bounded_params: Dict[str, ~typing.Tuple[float, float]]=<factory>, use_factor_model: bool = False, n_factors: int = 3, min_std: float = 1e-06)[source]

Bases: object

Fit distributions to parameter samples.

For positive parameters (concentrations, path_length):
  • Use log-normal or gamma distributions

  • Transform to log space for correlation modeling

For shift parameters (wl_shift):
  • Use Gaussian distributions

For bounded parameters:
  • Use truncated normal or beta distributions

positive_params

Names of parameters that must be positive.

Type:

List[str]

bounded_params

Dict of param_name -> (lower, upper) bounds.

Type:

Dict[str, Tuple[float, float]]

use_factor_model

Use low-rank factor model for correlations.

Type:

bool

n_factors

Number of factors for factor model.

Type:

int

min_std

Minimum standard deviation to avoid degenerate distributions.

Type:

float

bounded_params: Dict[str, Tuple[float, float]]
fit(params: Dict[str, ndarray], param_names: List[str] | None = None) DistributionResult[source]

Fit distributions to parameter samples.

Parameters:
  • params – Dict of parameter arrays. Each array has shape (n_samples,) or (n_samples, n_features) for multi-dimensional params.

  • param_names – Optional list of parameter names to fit.

Returns:

DistributionResult with fitted distributions.

min_std: float = 1e-06
n_factors: int = 3
positive_params: List[str]
use_factor_model: bool = False
class nirs4all.synthesis.reconstruction.ParameterSampler(distribution_result: DistributionResult, use_correlations: bool = True)[source]

Bases: object

Sample parameters from fitted distributions.

Uses Gaussian copula to maintain correlations between parameters while respecting marginal distributions.

distribution_result

Fitted DistributionResult.

Type:

nirs4all.synthesis.reconstruction.distributions.DistributionResult

use_correlations

Whether to model parameter correlations.

Type:

bool

distribution_result: DistributionResult
sample(n_samples: int, random_state: int | None = None) Dict[str, ndarray][source]

Sample parameters from fitted distributions.

Parameters:
  • n_samples – Number of samples to generate.

  • random_state – Random seed.

Returns:

Dict of parameter arrays with same structure as fit input.

sample_single(random_state: int | None = None) Dict[str, ndarray][source]

Sample a single parameter set.

use_correlations: bool = True
class nirs4all.synthesis.reconstruction.PipelineResult(config: DatasetConfig, calibration: 'CalibrationResult' | None = None, inversion_results: List['InversionResult'] | None = None, distribution: 'DistributionResult' | None = None, X_synthetic: np.ndarray | None = None, validation: 'ValidationResult' | None = None, forward_chain: 'ForwardChain' | None = None)[source]

Bases: object

Result of reconstruction pipeline.

Contains all outputs from the reconstruction workflow: - Calibration results - Inversion results - Learned distributions - Generated synthetic data - Validation metrics

config

Dataset configuration used.

Type:

DatasetConfig

calibration

Global calibration result.

Type:

Optional[‘CalibrationResult’]

inversion_results

Per-sample inversion results.

Type:

Optional[List[‘InversionResult’]]

distribution

Learned parameter distributions.

Type:

Optional[‘DistributionResult’]

X_synthetic

Generated synthetic spectra.

Type:

Optional[np.ndarray]

validation

Validation result.

Type:

Optional[‘ValidationResult’]

forward_chain

Calibrated forward chain.

Type:

Optional[‘ForwardChain’]

X_synthetic: np.ndarray | None = None
calibration: 'CalibrationResult' | None = None
config: DatasetConfig
distribution: 'DistributionResult' | None = None
forward_chain: 'ForwardChain' | None = None
inversion_results: List['InversionResult'] | None = None
summary() str[source]

Generate pipeline summary.

validation: 'ValidationResult' | None = None
class nirs4all.synthesis.reconstruction.PreprocessingOperator(preprocessing_type: Literal['none', 'first_derivative', 'second_derivative', 'snv', 'msc', 'detrend', 'mean_centered'] = 'none', sg_window: int = 15, sg_polyorder: int = 2, sg_deriv: int = 0, reference_spectrum: ndarray | None = None)[source]

Bases: object

Apply dataset preprocessing to match stored representation.

Implements exact preprocessing steps:
  • Savitzky-Golay derivatives (1st, 2nd order)

  • SNV (Standard Normal Variate)

  • MSC (Multiplicative Scatter Correction)

  • Detrend

  • Mean centering

preprocessing_type

Type of preprocessing.

Type:

Literal[‘none’, ‘first_derivative’, ‘second_derivative’, ‘snv’, ‘msc’, ‘detrend’, ‘mean_centered’]

sg_window

Savitzky-Golay window length.

Type:

int

sg_polyorder

Savitzky-Golay polynomial order.

Type:

int

sg_deriv

Derivative order (0, 1, 2).

Type:

int

reference_spectrum

Reference for MSC (mean of calibration set).

Type:

numpy.ndarray | None

apply(spectrum: ndarray) ndarray[source]

Apply preprocessing to spectrum.

Parameters:

spectrum – Input spectrum, shape (n_wavelengths,) or (n_samples, n_wavelengths).

Returns:

Preprocessed spectrum(a).

apply_to_matrix(X: ndarray) ndarray[source]

Apply preprocessing to design matrix columns.

classmethod from_detection(preprocessing_type: str, sg_window: int = 15, sg_polyorder: int = 2) PreprocessingOperator[source]

Create PreprocessingOperator from detected preprocessing type.

preprocessing_type: Literal['none', 'first_derivative', 'second_derivative', 'snv', 'msc', 'detrend', 'mean_centered'] = 'none'
reference_spectrum: ndarray | None = None
sg_deriv: int = 0
sg_polyorder: int = 2
sg_window: int = 15
class nirs4all.synthesis.reconstruction.PrototypeSelector(n_prototypes: int = 5, include_median: bool = True, include_quantiles: bool = True, pca_components: int = 5)[source]

Bases: object

Select representative prototype spectra from a dataset.

Uses multiple strategies to ensure robust global calibration: 1. Median spectrum (robust central tendency) 2. Quantile spectra (25%, 75% in PC1) 3. K-medoids in PCA space (capture diversity)

n_prototypes

Number of prototypes to select.

Type:

int

include_median

Always include median spectrum.

Type:

bool

include_quantiles

Include quantile spectra.

Type:

bool

pca_components

Number of PCA components for clustering.

Type:

int

include_median: bool = True
include_quantiles: bool = True
n_prototypes: int = 5
pca_components: int = 5
select(X: ndarray) Tuple[ndarray, ndarray][source]

Select prototype spectra.

Parameters:

X – Spectra matrix (n_samples, n_wavelengths).

Returns:

Tuple of (prototype_spectra, prototype_indices).

class nirs4all.synthesis.reconstruction.ReconstructionGenerator(noise_level: float = 0.001, multiplicative_noise: float = 0.01, add_noise: bool = True, noise_type: str = 'both')[source]

Bases: object

Generate synthetic spectra from learned parameter distributions.

Uses the calibrated forward chain and learned parameter distributions to generate realistic synthetic data that matches the statistical properties of the original dataset.

forward_chain

Calibrated forward chain.

sampler

Parameter sampler with learned distributions.

noise_estimator

Estimated noise level from inversion residuals.

add_noise

Whether to add noise to generated spectra.

Type:

bool

noise_type

Type of noise (‘additive’, ‘multiplicative’, ‘both’).

Type:

str

add_noise: bool = True
generate(n_samples: int, forward_chain: ForwardChain, sampler: ParameterSampler, random_state: int | None = None) GenerationResult[source]

Generate synthetic spectra.

Parameters:
  • n_samples – Number of samples to generate.

  • forward_chain – Calibrated forward chain.

  • sampler – Parameter sampler.

  • random_state – Random seed.

Returns:

GenerationResult with generated spectra and parameters.

generate_matched(X_real: np.ndarray, forward_chain: ForwardChain, sampler: ParameterSampler, random_state: int | None = None) GenerationResult[source]

Generate synthetic data matched to real data statistics.

Generates same number of samples as real data and optionally adjusts noise level based on estimated residuals.

Parameters:
  • X_real – Real data matrix for reference.

  • forward_chain – Calibrated forward chain.

  • sampler – Parameter sampler.

  • random_state – Random seed.

Returns:

GenerationResult.

multiplicative_noise: float = 0.01
noise_level: float = 0.001
noise_type: str = 'both'
class nirs4all.synthesis.reconstruction.ReconstructionPipeline(config: DatasetConfig, component_names: List[str] | None = None, canonical_resolution: float = 0.5, baseline_order: int = 5, continuum_order: int = 3, n_prototypes: int = 5, fit_environmental: bool = False, verbose: bool = True)[source]

Bases: object

Complete reconstruction pipeline.

Orchestrates the full workflow: 1. Configuration and component selection 2. Prototype selection and global calibration 3. Per-sample inversion (optionally with environmental parameters) 4. Parameter distribution learning 5. Synthetic generation 6. Validation

config

Dataset configuration.

Type:

nirs4all.synthesis.reconstruction.pipeline.DatasetConfig

component_names

Components to use (auto-selected if None).

Type:

List[str] | None

canonical_resolution

Resolution of canonical grid (nm).

Type:

float

baseline_order

Baseline polynomial order.

Type:

int

n_prototypes

Number of prototypes for calibration.

Type:

int

fit_environmental

Whether to fit environmental parameters.

Type:

bool

verbose

Print progress.

Type:

bool

__post_init__()[source]

Initialize components if not provided.

baseline_order: int = 5
canonical_resolution: float = 0.5
component_names: List[str] | None = None
config: DatasetConfig
continuum_order: int = 3
fit(X: ndarray, max_samples: int | None = None) PipelineResult[source]

Run full reconstruction pipeline.

Parameters:
  • X – Spectra matrix (n_samples, n_wavelengths).

  • max_samples – Max samples to invert (for speed).

Returns:

PipelineResult with all outputs.

fit_environmental: bool = False
generate(n_samples: int, result: PipelineResult, random_state: int | None = None) ndarray[source]

Generate additional synthetic samples using fitted pipeline.

Parameters:
  • n_samples – Number of samples to generate.

  • result – PipelineResult from fit().

  • random_state – Random seed.

Returns:

Synthetic spectra matrix.

n_prototypes: int = 5
verbose: bool = True
class nirs4all.synthesis.reconstruction.ReconstructionValidator(r2_threshold: float = 0.9, residual_autocorr_threshold: float = 0.3, pca_distance_threshold: float = 3.0, concentration_max: float = 10.0, path_length_bounds: Tuple[float, float] = (0.3, 3.0))[source]

Bases: object

Validate reconstruction quality and synthetic realism.

Checks: 1. Residuals should be structureless (no systematic patterns) 2. Synthetic should match real in PCA space 3. Per-wavelength statistics should be similar 4. Parameters should be physically plausible

r2_threshold

Minimum acceptable R² for reconstruction.

Type:

float

residual_autocorr_threshold

Max autocorrelation in residuals.

Type:

float

pca_distance_threshold

Max Mahalanobis distance in PCA space.

Type:

float

concentration_max

Max plausible concentration value.

Type:

float

concentration_max: float = 10.0
path_length_bounds: Tuple[float, float] = (0.3, 3.0)
pca_distance_threshold: float = 3.0
r2_threshold: float = 0.9
residual_autocorr_threshold: float = 0.3
validate(inversion_results: List['InversionResult'], X_real: np.ndarray, X_synth: np.ndarray) ValidationResult[source]

Run full validation.

Parameters:
  • inversion_results – Inversion results.

  • X_real – Real data.

  • X_synth – Synthetic data.

Returns:

ValidationResult.

validate_parameters(inversion_results: List['InversionResult']) Dict[str, Any][source]

Validate parameter plausibility.

Parameters:

inversion_results – List of inversion results.

Returns:

Dict of parameter metrics.

validate_reconstruction(inversion_results: List['InversionResult']) Dict[str, Any][source]

Validate reconstruction quality.

Parameters:

inversion_results – List of inversion results.

Returns:

Dict of reconstruction metrics.

validate_synthetic(X_real: ndarray, X_synth: ndarray) Dict[str, Any][source]

Validate synthetic vs real data.

Parameters:
  • X_real – Real data matrix.

  • X_synth – Synthetic data matrix.

Returns:

Dict of comparison metrics.

class nirs4all.synthesis.reconstruction.ValidationResult(reconstruction_metrics: Dict[str, ~typing.Any]=<factory>, synthetic_metrics: Dict[str, ~typing.Any]=<factory>, parameter_metrics: Dict[str, ~typing.Any]=<factory>, overall_score: float = 0.0, passed: bool = False, warnings: List[str] = <factory>)[source]

Bases: object

Result of reconstruction validation.

reconstruction_metrics

Per-sample reconstruction quality.

Type:

Dict[str, Any]

synthetic_metrics

Synthetic vs real comparison metrics.

Type:

Dict[str, Any]

parameter_metrics

Parameter plausibility metrics.

Type:

Dict[str, Any]

overall_score

Combined quality score (0-100).

Type:

float

passed

Whether all quality checks passed.

Type:

bool

warnings

List of warning messages.

Type:

List[str]

overall_score: float = 0.0
parameter_metrics: Dict[str, Any]
passed: bool = False
reconstruction_metrics: Dict[str, Any]
summary() str[source]

Generate human-readable summary.

synthetic_metrics: Dict[str, Any]
warnings: List[str]
class nirs4all.synthesis.reconstruction.VariableProjectionSolver(path_length_bounds: Tuple[float, float] = (0.5, 2.0), wl_shift_bounds: Tuple[float, float] = (-2.0, 2.0), concentration_regularization: float = 1e-06, baseline_smoothness_penalty: float = 0.0001, use_derivatives: bool = False, verbose: bool = False, fit_environmental: bool = False, temperature_bounds: Tuple[float, float] = (-15.0, 15.0), water_activity_bounds: Tuple[float, float] = (0.1, 0.9), scattering_power_bounds: Tuple[float, float] = (0.5, 3.0), scattering_amplitude_bounds: Tuple[float, float] = (0.0, 0.2), environmental_prior_weight: float = 0.1)[source]

Bases: object

Variable projection solver for spectral inversion.

Separates optimization into: - Nonlinear params: path_length, per-sample wl_shift, [environmental] (outer loop) - Linear params: concentrations, baseline, continuum (inner NNLS/QP)

path_length_bounds

Bounds for path length.

Type:

Tuple[float, float]

wl_shift_bounds

Per-sample wavelength shift bounds.

Type:

Tuple[float, float]

concentration_regularization

L2 reg on concentrations.

Type:

float

baseline_smoothness_penalty

Penalty on baseline curvature.

Type:

float

use_derivatives

Fit in derivative space (for derivative data).

Type:

bool

fit_environmental

Whether to fit environmental parameters.

Type:

bool

temperature_bounds

Bounds for temperature deviation (°C).

Type:

Tuple[float, float]

water_activity_bounds

Bounds for water activity.

Type:

Tuple[float, float]

scattering_power_bounds

Bounds for scattering exponent.

Type:

Tuple[float, float]

scattering_amplitude_bounds

Bounds for scattering amplitude.

Type:

Tuple[float, float]

environmental_prior_weight

Weight for environmental parameter priors.

Type:

float

baseline_smoothness_penalty: float = 0.0001
concentration_regularization: float = 1e-06
environmental_prior_weight: float = 0.1
fit(target: np.ndarray, forward_chain: ForwardChain, schedule: MultiscaleSchedule | None = None, initial_params: Dict[str, float] | None = None) InversionResult[source]

Fit forward model to target spectrum.

Parameters:
  • target – Target spectrum to fit.

  • forward_chain – Forward chain with calibrated global params.

  • schedule – Multiscale fitting schedule.

  • initial_params – Initial nonlinear parameters.

Returns:

InversionResult with fitted parameters.

fit_batch(X: np.ndarray, forward_chain: ForwardChain, schedule: MultiscaleSchedule | None = None, n_jobs: int = 1) List[InversionResult][source]

Fit multiple spectra.

Parameters:
  • X – Spectra matrix (n_samples, n_wavelengths).

  • forward_chain – Forward chain with calibrated global params.

  • schedule – Multiscale fitting schedule.

  • n_jobs – Number of parallel jobs (1 = sequential).

Returns:

List of InversionResult for each sample.

fit_environmental: bool = False
path_length_bounds: Tuple[float, float] = (0.5, 2.0)
scattering_amplitude_bounds: Tuple[float, float] = (0.0, 0.2)
scattering_power_bounds: Tuple[float, float] = (0.5, 3.0)
temperature_bounds: Tuple[float, float] = (-15.0, 15.0)
use_derivatives: bool = False
verbose: bool = False
water_activity_bounds: Tuple[float, float] = (0.1, 0.9)
wl_shift_bounds: Tuple[float, float] = (-2.0, 2.0)
nirs4all.synthesis.reconstruction.reconstruct_and_generate(X: ndarray, wavelengths: ndarray, n_synthetic: int | None = None, domain: str = 'unknown', component_names: List[str] | None = None, fit_environmental: bool = False, verbose: bool = True) Tuple[ndarray, PipelineResult][source]

Convenience function for end-to-end reconstruction and generation.

Parameters:
  • X – Real spectra matrix.

  • wavelengths – Wavelength grid.

  • n_synthetic – Number of synthetic samples (default: same as X).

  • domain – Application domain.

  • component_names – Components to use.

  • fit_environmental – Whether to fit environmental parameters (temperature, water activity, scattering).

  • verbose – Print progress.

Returns:

Tuple of (X_synthetic, PipelineResult).