nirs4all.synthesis.reconstruction.generator module

Synthetic data generation from learned parameter distributions.

Generates realistic synthetic spectra by: 1. Sampling physical parameters from learned distributions 2. Running the forward model chain 3. Adding appropriate noise

class nirs4all.synthesis.reconstruction.generator.GenerationResult(X: ndarray, concentrations: ndarray, path_lengths: ndarray, baseline_coeffs: ndarray, wavelengths: ndarray, noise_level: float = 0.0, wl_shifts: ndarray | None = None, temperature_deltas: ndarray | None = None, water_activities: ndarray | None = None, scattering_powers: ndarray | None = None, scattering_amplitudes: ndarray | None = None)[source]

Bases: object

Result of synthetic generation.

X

Generated spectra (n_samples, n_wavelengths).

Type:

numpy.ndarray

concentrations

Sampled concentrations (n_samples, n_components).

Type:

numpy.ndarray

path_lengths

Sampled path lengths (n_samples,).

Type:

numpy.ndarray

baseline_coeffs

Sampled baseline coefficients.

Type:

numpy.ndarray

wavelengths

Wavelength grid.

Type:

numpy.ndarray

noise_level

Applied noise level.

Type:

float

wl_shifts

Per-sample wavelength shifts.

Type:

numpy.ndarray | None

temperature_deltas

Per-sample temperature deviations (°C).

Type:

numpy.ndarray | None

water_activities

Per-sample water activity values.

Type:

numpy.ndarray | None

scattering_powers

Per-sample scattering exponents.

Type:

numpy.ndarray | None

scattering_amplitudes

Per-sample scattering amplitudes.

Type:

numpy.ndarray | None

X: ndarray
baseline_coeffs: ndarray
concentrations: ndarray
property n_samples: int

Number of generated samples.

property n_wavelengths: int

Number of wavelengths.

noise_level: float = 0.0
path_lengths: ndarray
scattering_amplitudes: ndarray | None = None
scattering_powers: ndarray | None = None
temperature_deltas: ndarray | None = None
water_activities: ndarray | None = None
wavelengths: ndarray
wl_shifts: ndarray | None = None
class nirs4all.synthesis.reconstruction.generator.ReconstructionGenerator(noise_level: float = 0.001, multiplicative_noise: float = 0.01, add_noise: bool = True, noise_type: str = 'both')[source]

Bases: object

Generate synthetic spectra from learned parameter distributions.

Uses the calibrated forward chain and learned parameter distributions to generate realistic synthetic data that matches the statistical properties of the original dataset.

forward_chain

Calibrated forward chain.

sampler

Parameter sampler with learned distributions.

noise_estimator

Estimated noise level from inversion residuals.

add_noise

Whether to add noise to generated spectra.

Type:

bool

noise_type

Type of noise (‘additive’, ‘multiplicative’, ‘both’).

Type:

str

add_noise: bool = True
generate(n_samples: int, forward_chain: ForwardChain, sampler: ParameterSampler, random_state: int | None = None) GenerationResult[source]

Generate synthetic spectra.

Parameters:
  • n_samples – Number of samples to generate.

  • forward_chain – Calibrated forward chain.

  • sampler – Parameter sampler.

  • random_state – Random seed.

Returns:

GenerationResult with generated spectra and parameters.

generate_matched(X_real: np.ndarray, forward_chain: ForwardChain, sampler: ParameterSampler, random_state: int | None = None) GenerationResult[source]

Generate synthetic data matched to real data statistics.

Generates same number of samples as real data and optionally adjusts noise level based on estimated residuals.

Parameters:
  • X_real – Real data matrix for reference.

  • forward_chain – Calibrated forward chain.

  • sampler – Parameter sampler.

  • random_state – Random seed.

Returns:

GenerationResult.

multiplicative_noise: float = 0.01
noise_level: float = 0.001
noise_type: str = 'both'
nirs4all.synthesis.reconstruction.generator.estimate_noise_from_residuals(inversion_results: List['InversionResult']) Tuple[float, float][source]

Estimate noise parameters from inversion residuals.

Parameters:

inversion_results – List of inversion results with residuals.

Returns:

Tuple of (additive_noise_std, multiplicative_noise_std).

nirs4all.synthesis.reconstruction.generator.generate_synthetic_dataset(forward_chain: ForwardChain, distribution_result: DistributionResult, n_samples: int, noise_level: float = 0.001, multiplicative_noise: float = 0.01, random_state: int | None = None) GenerationResult[source]

Complete pipeline to generate synthetic dataset.

Parameters:
  • forward_chain – Calibrated forward chain.

  • distribution_result – Fitted parameter distributions.

  • n_samples – Number of samples to generate.

  • noise_level – Additive noise level.

  • multiplicative_noise – Multiplicative noise level.

  • random_state – Random seed.

Returns:

GenerationResult with synthetic spectra.