nirs4all.data.synthetic.components module
Spectral components for synthetic NIRS spectra generation.
This module provides the core building blocks for defining NIR absorption bands and spectral components based on physical spectroscopy principles.
- Classes:
NIRBand: Represents a single NIR absorption band with Voigt profile. SpectralComponent: A chemical compound or functional group with multiple bands. ComponentLibrary: Collection of spectral components for generation.
- class nirs4all.data.synthetic.components.ComponentLibrary(random_state: int | None = None)[source]
Bases:
objectLibrary of spectral components for synthetic NIRS generation.
Supports both predefined components (based on known NIR band assignments) and programmatically generated random components for research purposes.
- rng
NumPy random generator for reproducibility.
Example
>>> # Create from predefined components >>> library = ComponentLibrary.from_predefined( ... ["water", "protein", "lipid"], ... random_state=42 ... ) >>> >>> # Or generate random components >>> library = ComponentLibrary(random_state=42) >>> library.generate_random_library(n_components=5) >>> >>> # Compute all component spectra >>> wavelengths = np.arange(1000, 2500, 2) >>> E = library.compute_all(wavelengths) # shape: (n_components, n_wavelengths)
- __getitem__(name: str) SpectralComponent[source]
Get component by name.
- add_boundary_component(name: str, measurement_range: Tuple[float, float] = (1000, 2500), edge: str = 'both', n_bands: int = 1, amplitude_range: Tuple[float, float] = (0.3, 1.0), width_range: Tuple[float, float] = (50, 200), offset_range: Tuple[float, float] = (0.3, 1.5)) SpectralComponent[source]
Generate a component with bands outside the measurement range.
This creates “boundary” or “truncated” peaks - absorption bands whose centers lie outside the measured wavelength range, resulting in partial peaks visible at the spectral edges. This is a common phenomenon in real NIR spectra where absorption bands extend beyond the instrument’s wavelength range.
Common causes include: - Strong water absorption bands at ~2500 nm affecting NIR edge - UV/visible absorption tails at the low wavelength end - Mid-IR fundamental bands tailing into NIR at the high end
- Parameters:
name – Component name.
measurement_range – (min, max) wavelength range of the “measurement” (nm). Bands will be placed outside this range.
edge – Which edge(s) to add boundary bands: - “left”: Only below min wavelength - “right”: Only above max wavelength - “both”: Either edge (randomly selected)
n_bands – Number of boundary bands to generate.
amplitude_range – Range for peak amplitudes (0-1 scale).
width_range – Range for band widths (nm). Controls how much of the peak is visible in the measurement range.
offset_range – Range for how far outside the measurement range to place the band center, as a fraction of width. e.g., 0.5 means center is 0.5*width outside the range.
- Returns:
The generated SpectralComponent with boundary bands.
Example
>>> library = ComponentLibrary(random_state=42) >>> # Add water band tail at long wavelength edge >>> boundary = library.add_boundary_component( ... "water_tail", ... measurement_range=(1000, 2400), ... edge="right", ... amplitude_range=(0.5, 1.0), ... width_range=(100, 300) ... )
References
Burns & Ciurczak (2007). Handbook of Near-Infrared Analysis. Discussion of wavelength range selection and edge effects.
- add_boundary_components_from_known(measurement_range: Tuple[float, float] = (1000, 2500)) ComponentLibrary[source]
Add known boundary components that affect common NIR measurement ranges.
Based on literature, certain absorption bands commonly appear as truncated peaks at measurement boundaries:
Left edge (short wavelengths): Electronic transitions, UV tails
Right edge (long wavelengths): Strong water O-H bands, C-H fundamentals
- Parameters:
measurement_range – (min, max) wavelength range of measurement (nm).
- Returns:
Self for method chaining.
Example
>>> library = ComponentLibrary(random_state=42) >>> library.add_boundary_components_from_known((1000, 2400))
- add_component(component: SpectralComponent) ComponentLibrary[source]
Add a spectral component to the library.
- Parameters:
component – SpectralComponent to add.
- Returns:
Self for method chaining.
- add_random_component(name: str, n_bands: int = 3, wavelength_range: Tuple[float, float] = (1000, 2500), zones: List[Tuple[float, float]] | None = None) SpectralComponent[source]
Generate and add a random spectral component.
Creates a component with randomly placed absorption bands within the specified wavelength range or zones.
- Parameters:
name – Component name.
n_bands – Number of absorption bands to generate.
wavelength_range – Overall wavelength range for band placement.
zones – Optional list of (min, max) wavelength zones for band centers. If None, uses default NIR-relevant zones.
- Returns:
The generated SpectralComponent.
Example
>>> library = ComponentLibrary(random_state=42) >>> component = library.add_random_component( ... "random_compound", ... n_bands=4, ... wavelength_range=(1000, 2500) ... )
- property components: Dict[str, SpectralComponent]
Get all components in the library.
- compute_all(wavelengths: ndarray) ndarray[source]
Compute spectra for all components at given wavelengths.
- Parameters:
wavelengths – Array of wavelengths in nm.
- Returns:
Array of shape (n_components, n_wavelengths) containing the spectrum of each component.
Example
>>> library = ComponentLibrary.from_predefined(["water", "protein"]) >>> wavelengths = np.arange(1000, 2500, 2) >>> E = library.compute_all(wavelengths) >>> print(E.shape) (2, 751)
- classmethod from_predefined(component_names: List[str] | None = None, random_state: int | None = None) ComponentLibrary[source]
Create a library from predefined spectral components.
- Parameters:
component_names – List of component names to include. If None, includes all predefined components.
random_state – Random seed for reproducibility.
- Returns:
ComponentLibrary instance populated with predefined components.
- Raises:
ValueError – If an unknown component name is specified.
Example
>>> library = ComponentLibrary.from_predefined( ... ["water", "protein", "lipid"] ... )
- generate_random_library(n_components: int = 5, n_bands_range: Tuple[int, int] = (2, 6)) ComponentLibrary[source]
Generate a library of random spectral components.
- Parameters:
n_components – Number of components to generate.
n_bands_range – Range (min, max) for number of bands per component.
- Returns:
Self for method chaining.
Example
>>> library = ComponentLibrary(random_state=42) >>> library.generate_random_library(n_components=5, n_bands_range=(2, 5))
- class nirs4all.data.synthetic.components.NIRBand(center: float, sigma: float, gamma: float = 0.0, amplitude: float = 1.0, name: str = '')[source]
Bases:
objectRepresents a single NIR absorption band.
This class models an absorption band using a Voigt profile, which is the convolution of Gaussian (thermal broadening) and Lorentzian (pressure broadening) line shapes.
Example
>>> band = NIRBand(center=1450, sigma=25, gamma=3, amplitude=0.8) >>> wavelengths = np.arange(1400, 1500, 1) >>> spectrum = band.compute(wavelengths)
- compute(wavelengths: ndarray) ndarray[source]
Compute the band profile at given wavelengths using Voigt profile.
- Parameters:
wavelengths – Array of wavelengths in nm at which to evaluate the band.
- Returns:
Array of absorbance values at each wavelength.
Note
When gamma=0, a pure Gaussian profile is used for efficiency. Otherwise, the full Voigt profile (Gaussian ⊗ Lorentzian) is computed.
- class nirs4all.data.synthetic.components.SpectralComponent(name: str, bands: List[NIRBand] = <factory>, correlation_group: int | None = None, category: str = '', subcategory: str = '', synonyms: List[str] = <factory>, formula: str = '', cas_number: str = '', references: List[str] = <factory>, tags: List[str] = <factory>)[source]
Bases:
objectA spectral component representing a chemical compound or functional group.
Each component consists of multiple absorption bands that together define the characteristic NIR signature of the compound.
- bands
List of NIRBand objects defining the spectral signature.
- Type:
- correlation_group
Optional group ID for components that should have correlated concentrations (e.g., protein and nitrogen compounds).
- Type:
int | None
Example
>>> water = SpectralComponent( ... name="water", ... bands=[ ... NIRBand(center=1450, sigma=25, gamma=3, amplitude=0.8), ... NIRBand(center=1940, sigma=30, gamma=4, amplitude=1.0), ... ], ... correlation_group=1, ... category="water_related", ... formula="H2O", ... ) >>> wavelengths = np.arange(1000, 2500, 2) >>> spectrum = water.compute(wavelengths)
- compute(wavelengths: ndarray) ndarray[source]
Compute the full component spectrum by summing all bands.
- Parameters:
wavelengths – Array of wavelengths in nm at which to evaluate.
- Returns:
Array of absorbance values representing the combined spectrum.
- has_bands_in_range(wavelength_range: Tuple[float, float]) bool[source]
Check if component has any bands with centers in the given wavelength range.
- Parameters:
wavelength_range – (min, max) wavelength in nm.
- Returns:
True if at least one band center is within the range.
- info() str[source]
Return formatted information about the component.
- Returns:
Human-readable string with component details.
- is_normalized(tolerance: float = 0.01) bool[source]
Check if the component’s band amplitudes are max-normalized (max amplitude = 1.0).
- Parameters:
tolerance – Acceptable deviation from 1.0 for max amplitude.
- Returns:
True if max amplitude is within tolerance of 1.0.
- normalized(method: str = 'max') SpectralComponent[source]
Return a new SpectralComponent with normalized band amplitudes.
- Parameters:
method – Normalization method. - “max”: Scale so max amplitude = 1.0 (default) - “sum”: Scale so sum of amplitudes = 1.0
- Returns:
New SpectralComponent with normalized amplitudes.
Example
>>> component = SpectralComponent(name="test", bands=[ ... NIRBand(center=1450, sigma=25, amplitude=0.8), ... NIRBand(center=1940, sigma=30, amplitude=2.0), ... ]) >>> normalized = component.normalized() >>> print(max(b.amplitude for b in normalized.bands)) # 1.0
- nirs4all.data.synthetic.components.available_components() List[str][source]
Return list of all available predefined component names.
- Returns:
Sorted list of component names.
Example
>>> names = available_components() >>> print(f"Available: {len(names)} components") >>> print(names[:5])
- nirs4all.data.synthetic.components.component_info(name: str) str[source]
Return formatted information about a component.
- Parameters:
name – Component name.
- Returns:
Human-readable string with component details.
Example
>>> print(component_info("water"))
- nirs4all.data.synthetic.components.get_component(name: str) SpectralComponent[source]
Get a single predefined component by name or synonym.
- Parameters:
name – Component name (e.g., “water”, “protein”, “lipid”) or synonym (e.g., “amylose” for “starch”).
- Returns:
SpectralComponent object.
- Raises:
ValueError – If component name is not found.
Example
>>> water = get_component("water") >>> print(water.category) >>> print(len(water.bands)) >>> >>> # Using synonyms >>> starch = get_component("amylose") # Returns starch component
- nirs4all.data.synthetic.components.list_categories() Dict[str, List[str]][source]
Return dictionary of categories to component names.
- Returns:
Dictionary mapping category names to lists of component names.
Example
>>> categories = list_categories() >>> for cat, components in categories.items(): ... print(f"{cat}: {len(components)} components")
- nirs4all.data.synthetic.components.normalize_component_amplitudes(component: SpectralComponent, method: str = 'max') SpectralComponent[source]
Normalize band amplitudes for a component.
This is a convenience wrapper around SpectralComponent.normalized().
- Parameters:
component – SpectralComponent to normalize.
method – Normalization method (“max” or “sum”).
- Returns:
New SpectralComponent with normalized amplitudes.
Example
>>> comp = get_component("water") >>> normalized = normalize_component_amplitudes(comp)
- nirs4all.data.synthetic.components.search_components(query: str | None = None, category: str | None = None, subcategory: str | None = None, tags: List[str] | None = None, wavelength_range: Tuple[float, float] | None = None) List[str][source]
Search components by various criteria.
- Parameters:
query – Fuzzy match on name or synonyms.
category – Filter by category (e.g., “proteins”, “carbohydrates”).
subcategory – Filter by subcategory (e.g., “monosaccharides”).
tags – Filter by tags (any match).
wavelength_range – Filter by components with bands in range (min, max).
- Returns:
List of matching component names.
Example
>>> # Find all protein-related components >>> proteins = search_components(category="proteins") >>> >>> # Find components with bands in visible-NIR region >>> vis_nir = search_components(wavelength_range=(400, 1000)) >>> >>> # Find components tagged for pharmaceutical use >>> pharma = search_components(tags=["pharma"])
- nirs4all.data.synthetic.components.validate_component_coverage(wavelength_range: Tuple[float, float] = (350, 2500)) Dict[str, List[str]][source]
Check which components have bands in the given wavelength range.
- Parameters:
wavelength_range – (min, max) wavelength in nm.
- Returns:
Dictionary with ‘covered’ and ‘not_covered’ component lists.
Example
>>> coverage = validate_component_coverage((1000, 2500)) >>> print(f"Covered: {len(coverage['covered'])}") >>> print(f"Not covered: {coverage['not_covered']}")
- nirs4all.data.synthetic.components.validate_predefined_components() List[str][source]
Validate all predefined components.
- Returns:
List of validation warnings/errors (empty if all valid).
Example
>>> issues = validate_predefined_components() >>> if issues: ... for issue in issues: ... print(issue) ... else: ... print("All components valid!")