nirs4all.synthesis.prior module

Conditional prior sampling for synthetic NIRS data generation.

This module provides structured prior sampling where configuration parameters are sampled conditionally based on domain, instrument type, and other hierarchical dependencies.

Phase 4 Features:
  • Domain-weighted sampling

  • Conditional instrument selection given domain

  • Conditional measurement mode given instrument

  • Matrix type conditioning on domain

  • Component set selection based on domain

  • Full configuration sampling from prior

Generative DAG:
Domain → Instrument Category → Wavelength Range, Resolution, Mode, Noise

→ Matrix Type → Particle Size, Scattering, Water Activity → Component Set → Concentration Distributions → Target Type

References

  • Workman Jr, J., & Weyer, L. (2012). Practical Guide and Spectral Atlas for Interpretive Near-Infrared Spectroscopy. CRC Press.

class nirs4all.synthesis.prior.MatrixType(value)[source]

Bases: str, Enum

Physical matrix types that affect spectral properties.

EMULSION = 'emulsion'
FILM = 'film'
GEL = 'gel'
GRANULAR = 'granular'
LIQUID = 'liquid'
PASTE = 'paste'
POWDER = 'powder'
SLURRY = 'slurry'
SOLID = 'solid'
TISSUE = 'tissue'
class nirs4all.synthesis.prior.NIRSPriorConfig(domain_weights: Dict[str, float]=<factory>, instrument_given_domain: Dict[str, ~typing.Dict[str, float]]=<factory>, mode_given_category: Dict[str, ~typing.Dict[str, float]]=<factory>, matrix_given_domain: Dict[str, ~typing.Dict[str, float]]=<factory>, temperature_range: Tuple[float, float]=(15.0, 40.0), particle_size_range: Tuple[float, float]=(5.0, 200.0), noise_level_range: Tuple[float, float]=(0.5, 2.0), n_samples_range: Tuple[int, int]=(100, 2000), target_type_weights: Dict[str, float]=<factory>, n_targets_range: Tuple[int, int]=(1, 5), n_classes_range: Tuple[int, int]=(2, 5))[source]

Bases: object

Configuration for NIRS data generation with conditional sampling.

This class defines the prior distributions and conditional dependencies for sampling complete generation configurations.

domain_weights

Prior weights for each domain.

Type:

Dict[str, float]

instrument_given_domain

P(instrument_category | domain).

Type:

Dict[str, Dict[str, float]]

mode_given_category

P(measurement_mode | instrument_category).

Type:

Dict[str, Dict[str, float]]

matrix_given_domain

P(matrix_type | domain).

Type:

Dict[str, Dict[str, float]]

temperature_range

(min, max) temperature in Celsius.

Type:

Tuple[float, float]

particle_size_range

(min, max) particle size in microns.

Type:

Tuple[float, float]

noise_level_range

(min, max) noise level multiplier.

Type:

Tuple[float, float]

Example

>>> config = NIRSPriorConfig()
>>> sampler = PriorSampler(config, random_state=42)
>>> sample = sampler.sample()
>>> print(sample["domain"], sample["instrument"])
domain_weights: Dict[str, float]
get_domain_weight(domain: str) float[source]

Get prior weight for a domain.

instrument_given_domain: Dict[str, Dict[str, float]]
matrix_given_domain: Dict[str, Dict[str, float]]
mode_given_category: Dict[str, Dict[str, float]]
n_classes_range: Tuple[int, int] = (2, 5)
n_samples_range: Tuple[int, int] = (100, 2000)
n_targets_range: Tuple[int, int] = (1, 5)
noise_level_range: Tuple[float, float] = (0.5, 2.0)
normalize_weights(weights: Dict[str, float]) Dict[str, float][source]

Normalize weights to sum to 1.

particle_size_range: Tuple[float, float] = (5.0, 200.0)
target_type_weights: Dict[str, float]
temperature_range: Tuple[float, float] = (15.0, 40.0)
class nirs4all.synthesis.prior.PriorSampler(config: NIRSPriorConfig | None = None, random_state: int | None = None)[source]

Bases: object

Sample complete generation configurations from prior distributions.

This class implements hierarchical sampling where lower-level configurations are conditioned on higher-level choices.

Parameters:
  • config – Prior configuration.

  • random_state – Random state for reproducibility.

Example

>>> config = NIRSPriorConfig()
>>> sampler = PriorSampler(config, random_state=42)
>>>
>>> # Sample a single configuration
>>> sample = sampler.sample()
>>> print(sample)
>>>
>>> # Sample multiple configurations
>>> samples = sampler.sample_batch(10)
sample() Dict[str, Any][source]

Sample a complete dataset configuration from the prior.

Returns:

Dictionary with all configuration parameters.

Example

>>> sampler = PriorSampler(random_state=42)
>>> config = sampler.sample()
>>> print(config["domain"])
>>> print(config["instrument"])
sample_batch(n: int) List[Dict[str, Any]][source]

Sample multiple configurations from the prior.

Parameters:

n – Number of configurations to sample.

Returns:

List of configuration dictionaries.

sample_components(domain: str, n_components: int | None = None) List[str][source]

Sample component set based on domain.

sample_domain() str[source]

Sample a domain from the prior.

sample_for_domain(domain: str, n_samples: int | None = None) Dict[str, Any][source]

Sample a configuration constrained to a specific domain.

Parameters:
  • domain – Domain to sample for.

  • n_samples – Optional number of samples (uses prior if None).

Returns:

Configuration dictionary for the specified domain.

sample_for_instrument(instrument: str, n_samples: int | None = None) Dict[str, Any][source]

Sample a configuration constrained to a specific instrument.

Parameters:
  • instrument – Instrument name to use.

  • n_samples – Optional number of samples.

Returns:

Configuration dictionary for the specified instrument.

sample_instrument(category: str) str[source]

Sample a specific instrument given the category.

sample_instrument_category(domain: str) str[source]

Sample an instrument category given the domain.

sample_matrix_type(domain: str) str[source]

Sample a matrix type given the domain.

sample_measurement_mode(instrument_category: str) str[source]

Sample a measurement mode given the instrument category.

sample_n_samples() int[source]

Sample number of samples to generate.

sample_noise_level(instrument_category: str) float[source]

Sample noise level multiplier based on instrument category.

sample_particle_size(matrix_type: str) float[source]

Sample particle size based on matrix type.

sample_target_config() Dict[str, Any][source]

Sample target generation configuration.

sample_temperature() float[source]

Sample a temperature from the prior range.

nirs4all.synthesis.prior.get_domain_compatible_instruments(domain: str) List[str][source]

Get list of instruments commonly used with a domain.

Parameters:

domain – Domain name.

Returns:

List of instrument names.

Example

>>> instruments = get_domain_compatible_instruments("tablets")
>>> print(instruments)
nirs4all.synthesis.prior.get_instrument_typical_modes(instrument: str) List[str][source]

Get typical measurement modes for an instrument.

Parameters:

instrument – Instrument name.

Returns:

List of measurement mode names.

Example

>>> modes = get_instrument_typical_modes("viavi_micronir")
>>> print(modes)
nirs4all.synthesis.prior.sample_prior(domain: str | None = None, instrument: str | None = None, random_state: int | None = None) Dict[str, Any][source]

Quick function to sample a single configuration from default prior.

Parameters:
  • domain – Optional domain constraint.

  • instrument – Optional instrument constraint.

  • random_state – Random state for reproducibility.

Returns:

Configuration dictionary.

Example

>>> config = sample_prior(domain="food", random_state=42)
>>> print(config["domain"], config["instrument"])
nirs4all.synthesis.prior.sample_prior_batch(n: int, random_state: int | None = None) List[Dict[str, Any]][source]

Quick function to sample multiple configurations from default prior.

Parameters:
  • n – Number of configurations to sample.

  • random_state – Random state for reproducibility.

Returns:

List of configuration dictionaries.

Example

>>> configs = sample_prior_batch(10, random_state=42)
>>> for c in configs:
...     print(c["domain"], c["instrument"])