nirs4all.data.synthetic.domains module

Application domain configurations for synthetic NIRS data generation.

This module provides domain-specific priors and configurations for generating realistic synthetic NIRS data tailored to specific application areas such as agriculture, pharmaceutical, food processing, petrochemical, and others.

Each domain configuration includes: - Typical spectral components (chemical compounds) - Concentration distributions specific to the domain - Wavelength ranges commonly used - Typical number of components in samples - Domain-specific noise and artifact characteristics

Key Features:
  • 15+ predefined application domains

  • Domain-aware component selection

  • Realistic concentration priors

  • Easy integration with generators

Example

>>> from nirs4all.data.synthetic.domains import (
...     get_domain_config,
...     APPLICATION_DOMAINS,
...     DomainConfig
... )
>>>
>>> # Get configuration for agricultural samples
>>> config = get_domain_config("agriculture_grain")
>>> print(config.typical_components)
['starch', 'protein', 'moisture', 'lipid', 'cellulose']

References

  • Burns, D. A., & Ciurczak, E. W. (2007). Handbook of Near-Infrared Analysis (3rd ed.). CRC Press.

  • Williams, P. C., & Norris, K. H. (2001). Near-Infrared Technology in the Agricultural and Food Industries (2nd ed.). AACC International.

  • Reich, G. (2005). Near-Infrared Spectroscopy and Imaging: Basic Principles and Pharmaceutical Applications. Advanced Drug Delivery Reviews.

class nirs4all.data.synthetic.domains.ConcentrationPrior(distribution: str = 'uniform', params: Dict[str, float]=<factory>, min_value: float = 0.0, max_value: float = 1.0)[source]

Bases: object

Prior distribution for component concentrations.

distribution

Distribution type (‘uniform’, ‘normal’, ‘lognormal’, ‘beta’).

Type:

str

params

Parameters for the distribution (distribution-specific).

Type:

Dict[str, float]

min_value

Minimum allowed concentration.

Type:

float

max_value

Maximum allowed concentration.

Type:

float

distribution: str = 'uniform'
max_value: float = 1.0
min_value: float = 0.0
params: Dict[str, float]
sample(rng: Generator, n_samples: int = 1) ndarray[source]

Sample from the concentration prior.

class nirs4all.data.synthetic.domains.DomainCategory(value)[source]

Bases: str, Enum

Top-level domain categories.

AGRICULTURE = 'agriculture'
BEVERAGE = 'beverage'
BIOMEDICAL = 'biomedical'
ENVIRONMENTAL = 'environmental'
FOOD = 'food'
PETROCHEMICAL = 'petrochemical'
PHARMACEUTICAL = 'pharmaceutical'
POLYMER = 'polymer'
TEXTILE = 'textile'
class nirs4all.data.synthetic.domains.DomainConfig(name: str, category: DomainCategory, description: str = '', typical_components: List[str] = <factory>, component_weights: Dict[str, float] | None=None, concentration_priors: Dict[str, ~nirs4all.data.synthetic.domains.ConcentrationPrior]=<factory>, wavelength_range: Tuple[float, float]=(1000, 2500), n_components_range: Tuple[int, int]=(3, 8), noise_level: str = 'medium', measurement_mode: str = 'reflectance', typical_sample_types: List[str] = <factory>, complexity: str = 'realistic', additional_params: Dict[str, ~typing.Any]=<factory>)[source]

Bases: object

Configuration for a specific application domain.

Encapsulates all domain-specific parameters needed for generating realistic synthetic NIRS data.

name

Human-readable domain name.

Type:

str

category

Domain category (agriculture, pharmaceutical, etc.).

Type:

nirs4all.data.synthetic.domains.DomainCategory

description

Brief description of the domain.

Type:

str

typical_components

List of predefined component names commonly found.

Type:

List[str]

component_weights

Relative importance of each component (for selection).

Type:

Dict[str, float] | None

concentration_priors

Per-component concentration distributions.

Type:

Dict[str, nirs4all.data.synthetic.domains.ConcentrationPrior]

wavelength_range

Typical measurement range (nm).

Type:

Tuple[float, float]

n_components_range

Range of number of components per sample.

Type:

Tuple[int, int]

noise_level

Typical noise level (‘low’, ‘medium’, ‘high’).

Type:

str

measurement_mode

Typical measurement geometry.

Type:

str

typical_sample_types

Examples of sample types in this domain.

Type:

List[str]

complexity

Overall complexity level for generation.

Type:

str

additional_params

Domain-specific additional parameters.

Type:

Dict[str, Any]

additional_params: Dict[str, Any]
category: DomainCategory
complexity: str = 'realistic'
component_weights: Dict[str, float] | None = None
concentration_priors: Dict[str, ConcentrationPrior]
description: str = ''
get_component_weights() Dict[str, float][source]

Get normalized component weights for selection.

measurement_mode: str = 'reflectance'
n_components_range: Tuple[int, int] = (3, 8)
name: str
noise_level: str = 'medium'
sample_components(rng: Generator, n_components: int | None = None) List[str][source]

Sample components for a sample based on domain priors.

Parameters:
  • rng – Random number generator.

  • n_components – Number of components. If None, samples from range.

Returns:

List of component names.

sample_concentrations(rng: Generator, components: List[str], n_samples: int = 1) ndarray[source]

Sample concentrations for selected components.

Parameters:
  • rng – Random number generator.

  • components – List of component names.

  • n_samples – Number of samples.

Returns:

Concentration matrix (n_samples, n_components).

typical_components: List[str]
typical_sample_types: List[str]
wavelength_range: Tuple[float, float] = (1000, 2500)
nirs4all.data.synthetic.domains.create_domain_aware_library(domain_name: str, n_samples: int = 100, random_state: int | None = None) Tuple[List[str], ndarray][source]

Create component selection and concentrations based on domain priors.

This function samples components and their concentrations according to domain-specific distributions.

Parameters:
  • domain_name – Name of the domain.

  • n_samples – Number of samples to generate concentrations for.

  • random_state – Random seed for reproducibility.

Returns:

Tuple of (component_names, concentration_matrix).

Example

>>> components, concentrations = create_domain_aware_library(
...     "food_dairy",
...     n_samples=50,
...     random_state=42
... )
>>> print(components)
['water', 'lactose', 'casein', 'lipid']
>>> print(concentrations.shape)
(50, 4)
nirs4all.data.synthetic.domains.get_domain_components(domain_name: str) List[str][source]

Get typical components for a domain.

Parameters:

domain_name – Name of the domain.

Returns:

List of component names.

Example

>>> get_domain_components("food_dairy")
['water', 'lactose', 'casein', 'lipid', 'moisture', 'protein']
nirs4all.data.synthetic.domains.get_domain_config(domain_name: str) DomainConfig[source]

Get configuration for a specific domain.

Parameters:

domain_name – Name of the domain (key in APPLICATION_DOMAINS).

Returns:

DomainConfig for the specified domain.

Raises:

ValueError – If domain is not found.

Example

>>> config = get_domain_config("agriculture_grain")
>>> print(config.name)
'Grain and Cereals'
nirs4all.data.synthetic.domains.get_domains_for_component(component_name: str) List[str][source]

Find domains that typically contain a specific component.

Parameters:

component_name – Name of the component.

Returns:

List of domain names containing this component.

Example

>>> get_domains_for_component("protein")
['agriculture_grain', 'food_meat', 'biomedical_tissue', ...]
nirs4all.data.synthetic.domains.list_domains(category: DomainCategory | None = None) List[str][source]

List available domain names.

Parameters:

category – Optional category filter.

Returns:

List of domain names.

Example

>>> list_domains(DomainCategory.AGRICULTURE)
['agriculture_grain', 'agriculture_forage', ...]