nirs4all.synthesis.domains module
Application domain configurations for synthetic NIRS data generation.
This module provides domain-specific priors and configurations for generating realistic synthetic NIRS data tailored to specific application areas such as agriculture, pharmaceutical, food processing, petrochemical, and others.
Each domain configuration includes: - Typical spectral components (chemical compounds) - Concentration distributions specific to the domain - Wavelength ranges commonly used - Typical number of components in samples - Domain-specific noise and artifact characteristics
- Key Features:
15+ predefined application domains
Domain-aware component selection
Realistic concentration priors
Easy integration with generators
Example
>>> from nirs4all.synthesis.domains import (
... get_domain_config,
... APPLICATION_DOMAINS,
... DomainConfig
... )
>>>
>>> # Get configuration for agricultural samples
>>> config = get_domain_config("agriculture_grain")
>>> print(config.typical_components)
['starch', 'protein', 'moisture', 'lipid', 'cellulose']
References
Burns, D. A., & Ciurczak, E. W. (2007). Handbook of Near-Infrared Analysis (3rd ed.). CRC Press.
Williams, P. C., & Norris, K. H. (2001). Near-Infrared Technology in the Agricultural and Food Industries (2nd ed.). AACC International.
Reich, G. (2005). Near-Infrared Spectroscopy and Imaging: Basic Principles and Pharmaceutical Applications. Advanced Drug Delivery Reviews.
- class nirs4all.synthesis.domains.ConcentrationPrior(distribution: str = 'uniform', params: Dict[str, float]=<factory>, min_value: float = 0.0, max_value: float = 1.0)[source]
Bases:
objectPrior distribution for component concentrations.
- class nirs4all.synthesis.domains.DomainCategory(value)[source]
-
Top-level domain categories.
- AGRICULTURE = 'agriculture'
- BEVERAGE = 'beverage'
- BIOMEDICAL = 'biomedical'
- ENVIRONMENTAL = 'environmental'
- FOOD = 'food'
- PETROCHEMICAL = 'petrochemical'
- PHARMACEUTICAL = 'pharmaceutical'
- POLYMER = 'polymer'
- TEXTILE = 'textile'
- class nirs4all.synthesis.domains.DomainConfig(name: str, category: DomainCategory, description: str = '', typical_components: List[str] = <factory>, component_weights: Dict[str, float] | None=None, concentration_priors: Dict[str, ~nirs4all.synthesis.domains.ConcentrationPrior]=<factory>, wavelength_range: Tuple[float, float]=(1000, 2500), n_components_range: Tuple[int, int]=(3, 8), noise_level: str = 'medium', measurement_mode: str = 'reflectance', typical_sample_types: List[str] = <factory>, complexity: str = 'realistic', additional_params: Dict[str, ~typing.Any]=<factory>)[source]
Bases:
objectConfiguration for a specific application domain.
Encapsulates all domain-specific parameters needed for generating realistic synthetic NIRS data.
- category
Domain category (agriculture, pharmaceutical, etc.).
- component_weights
Relative importance of each component (for selection).
- concentration_priors
Per-component concentration distributions.
- Type:
- category: DomainCategory
- concentration_priors: Dict[str, ConcentrationPrior]
- sample_components(rng: Generator, n_components: int | None = None) List[str][source]
Sample components for a sample based on domain priors.
- Parameters:
rng – Random number generator.
n_components – Number of components. If None, samples from range.
- Returns:
List of component names.
- sample_concentrations(rng: Generator, components: List[str], n_samples: int = 1) ndarray[source]
Sample concentrations for selected components.
- Parameters:
rng – Random number generator.
components – List of component names.
n_samples – Number of samples.
- Returns:
Concentration matrix (n_samples, n_components).
- nirs4all.synthesis.domains.create_domain_aware_library(domain_name: str, n_samples: int = 100, random_state: int | None = None) Tuple[List[str], ndarray][source]
Create component selection and concentrations based on domain priors.
This function samples components and their concentrations according to domain-specific distributions.
- Parameters:
domain_name – Name of the domain.
n_samples – Number of samples to generate concentrations for.
random_state – Random seed for reproducibility.
- Returns:
Tuple of (component_names, concentration_matrix).
Example
>>> components, concentrations = create_domain_aware_library( ... "food_dairy", ... n_samples=50, ... random_state=42 ... ) >>> print(components) ['water', 'lactose', 'casein', 'lipid'] >>> print(concentrations.shape) (50, 4)
- nirs4all.synthesis.domains.get_domain_components(domain_name: str) List[str][source]
Get typical components for a domain.
- Parameters:
domain_name – Name of the domain.
- Returns:
List of component names.
Example
>>> get_domain_components("food_dairy") ['water', 'lactose', 'casein', 'lipid', 'moisture', 'protein']
- nirs4all.synthesis.domains.get_domain_config(domain_name: str) DomainConfig[source]
Get configuration for a specific domain.
- Parameters:
domain_name – Name of the domain (key in APPLICATION_DOMAINS).
- Returns:
DomainConfig for the specified domain.
- Raises:
ValueError – If domain is not found.
Example
>>> config = get_domain_config("agriculture_grain") >>> print(config.name) 'Grain and Cereals'
- nirs4all.synthesis.domains.get_domains_for_component(component_name: str) List[str][source]
Find domains that typically contain a specific component.
- Parameters:
component_name – Name of the component.
- Returns:
List of domain names containing this component.
Example
>>> get_domains_for_component("protein") ['agriculture_grain', 'food_meat', 'biomedical_tissue', ...]
- nirs4all.synthesis.domains.list_domains(category: DomainCategory | None = None) List[str][source]
List available domain names.
- Parameters:
category – Optional category filter.
- Returns:
List of domain names.
Example
>>> list_domains(DomainCategory.AGRICULTURE) ['agriculture_grain', 'agriculture_forage', ...]