nirs4all.data.synthetic.components module

Spectral components for synthetic NIRS spectra generation.

This module provides the core building blocks for defining NIR absorption bands and spectral components based on physical spectroscopy principles.

Classes:

NIRBand: Represents a single NIR absorption band with Voigt profile. SpectralComponent: A chemical compound or functional group with multiple bands. ComponentLibrary: Collection of spectral components for generation.

class nirs4all.data.synthetic.components.ComponentLibrary(random_state: int | None = None)[source]

Bases: object

Library of spectral components for synthetic NIRS generation.

Supports both predefined components (based on known NIR band assignments) and programmatically generated random components for research purposes.

rng

NumPy random generator for reproducibility.

Example

>>> # Create from predefined components
>>> library = ComponentLibrary.from_predefined(
...     ["water", "protein", "lipid"],
...     random_state=42
... )
>>>
>>> # Or generate random components
>>> library = ComponentLibrary(random_state=42)
>>> library.generate_random_library(n_components=5)
>>>
>>> # Compute all component spectra
>>> wavelengths = np.arange(1000, 2500, 2)
>>> E = library.compute_all(wavelengths)  # shape: (n_components, n_wavelengths)
__contains__(name: str) bool[source]

Check if component exists by name.

__getitem__(name: str) SpectralComponent[source]

Get component by name.

__iter__()[source]

Iterate over components.

__len__() int[source]

Return number of components.

add_boundary_component(name: str, measurement_range: Tuple[float, float] = (1000, 2500), edge: str = 'both', n_bands: int = 1, amplitude_range: Tuple[float, float] = (0.3, 1.0), width_range: Tuple[float, float] = (50, 200), offset_range: Tuple[float, float] = (0.3, 1.5)) SpectralComponent[source]

Generate a component with bands outside the measurement range.

This creates “boundary” or “truncated” peaks - absorption bands whose centers lie outside the measured wavelength range, resulting in partial peaks visible at the spectral edges. This is a common phenomenon in real NIR spectra where absorption bands extend beyond the instrument’s wavelength range.

Common causes include: - Strong water absorption bands at ~2500 nm affecting NIR edge - UV/visible absorption tails at the low wavelength end - Mid-IR fundamental bands tailing into NIR at the high end

Parameters:
  • name – Component name.

  • measurement_range – (min, max) wavelength range of the “measurement” (nm). Bands will be placed outside this range.

  • edge – Which edge(s) to add boundary bands: - “left”: Only below min wavelength - “right”: Only above max wavelength - “both”: Either edge (randomly selected)

  • n_bands – Number of boundary bands to generate.

  • amplitude_range – Range for peak amplitudes (0-1 scale).

  • width_range – Range for band widths (nm). Controls how much of the peak is visible in the measurement range.

  • offset_range – Range for how far outside the measurement range to place the band center, as a fraction of width. e.g., 0.5 means center is 0.5*width outside the range.

Returns:

The generated SpectralComponent with boundary bands.

Example

>>> library = ComponentLibrary(random_state=42)
>>> # Add water band tail at long wavelength edge
>>> boundary = library.add_boundary_component(
...     "water_tail",
...     measurement_range=(1000, 2400),
...     edge="right",
...     amplitude_range=(0.5, 1.0),
...     width_range=(100, 300)
... )

References

  • Burns & Ciurczak (2007). Handbook of Near-Infrared Analysis. Discussion of wavelength range selection and edge effects.

add_boundary_components_from_known(measurement_range: Tuple[float, float] = (1000, 2500)) ComponentLibrary[source]

Add known boundary components that affect common NIR measurement ranges.

Based on literature, certain absorption bands commonly appear as truncated peaks at measurement boundaries:

  • Left edge (short wavelengths): Electronic transitions, UV tails

  • Right edge (long wavelengths): Strong water O-H bands, C-H fundamentals

Parameters:

measurement_range – (min, max) wavelength range of measurement (nm).

Returns:

Self for method chaining.

Example

>>> library = ComponentLibrary(random_state=42)
>>> library.add_boundary_components_from_known((1000, 2400))
add_component(component: SpectralComponent) ComponentLibrary[source]

Add a spectral component to the library.

Parameters:

component – SpectralComponent to add.

Returns:

Self for method chaining.

add_random_component(name: str, n_bands: int = 3, wavelength_range: Tuple[float, float] = (1000, 2500), zones: List[Tuple[float, float]] | None = None) SpectralComponent[source]

Generate and add a random spectral component.

Creates a component with randomly placed absorption bands within the specified wavelength range or zones.

Parameters:
  • name – Component name.

  • n_bands – Number of absorption bands to generate.

  • wavelength_range – Overall wavelength range for band placement.

  • zones – Optional list of (min, max) wavelength zones for band centers. If None, uses default NIR-relevant zones.

Returns:

The generated SpectralComponent.

Example

>>> library = ComponentLibrary(random_state=42)
>>> component = library.add_random_component(
...     "random_compound",
...     n_bands=4,
...     wavelength_range=(1000, 2500)
... )
property component_names: List[str]

Get list of component names in order.

property components: Dict[str, SpectralComponent]

Get all components in the library.

compute_all(wavelengths: ndarray) ndarray[source]

Compute spectra for all components at given wavelengths.

Parameters:

wavelengths – Array of wavelengths in nm.

Returns:

Array of shape (n_components, n_wavelengths) containing the spectrum of each component.

Example

>>> library = ComponentLibrary.from_predefined(["water", "protein"])
>>> wavelengths = np.arange(1000, 2500, 2)
>>> E = library.compute_all(wavelengths)
>>> print(E.shape)
(2, 751)
classmethod from_predefined(component_names: List[str] | None = None, random_state: int | None = None) ComponentLibrary[source]

Create a library from predefined spectral components.

Parameters:
  • component_names – List of component names to include. If None, includes all predefined components.

  • random_state – Random seed for reproducibility.

Returns:

ComponentLibrary instance populated with predefined components.

Raises:

ValueError – If an unknown component name is specified.

Example

>>> library = ComponentLibrary.from_predefined(
...     ["water", "protein", "lipid"]
... )
generate_random_library(n_components: int = 5, n_bands_range: Tuple[int, int] = (2, 6)) ComponentLibrary[source]

Generate a library of random spectral components.

Parameters:
  • n_components – Number of components to generate.

  • n_bands_range – Range (min, max) for number of bands per component.

Returns:

Self for method chaining.

Example

>>> library = ComponentLibrary(random_state=42)
>>> library.generate_random_library(n_components=5, n_bands_range=(2, 5))
property n_components: int

Number of components in the library.

class nirs4all.data.synthetic.components.NIRBand(center: float, sigma: float, gamma: float = 0.0, amplitude: float = 1.0, name: str = '')[source]

Bases: object

Represents a single NIR absorption band.

This class models an absorption band using a Voigt profile, which is the convolution of Gaussian (thermal broadening) and Lorentzian (pressure broadening) line shapes.

center

Central wavelength in nm.

Type:

float

sigma

Gaussian width (standard deviation) in nm.

Type:

float

gamma

Lorentzian width (HWHM) in nm. Use 0 for pure Gaussian.

Type:

float

amplitude

Peak amplitude in absorbance units.

Type:

float

name

Descriptive name of the band (e.g., “O-H 1st overtone”).

Type:

str

Example

>>> band = NIRBand(center=1450, sigma=25, gamma=3, amplitude=0.8)
>>> wavelengths = np.arange(1400, 1500, 1)
>>> spectrum = band.compute(wavelengths)
amplitude: float = 1.0
center: float
compute(wavelengths: ndarray) ndarray[source]

Compute the band profile at given wavelengths using Voigt profile.

Parameters:

wavelengths – Array of wavelengths in nm at which to evaluate the band.

Returns:

Array of absorbance values at each wavelength.

Note

When gamma=0, a pure Gaussian profile is used for efficiency. Otherwise, the full Voigt profile (Gaussian ⊗ Lorentzian) is computed.

gamma: float = 0.0
name: str = ''
sigma: float
class nirs4all.data.synthetic.components.SpectralComponent(name: str, bands: List[NIRBand] = <factory>, correlation_group: int | None = None, category: str = '', subcategory: str = '', synonyms: List[str] = <factory>, formula: str = '', cas_number: str = '', references: List[str] = <factory>, tags: List[str] = <factory>)[source]

Bases: object

A spectral component representing a chemical compound or functional group.

Each component consists of multiple absorption bands that together define the characteristic NIR signature of the compound.

name

Component name (e.g., “water”, “protein”, “lipid”).

Type:

str

bands

List of NIRBand objects defining the spectral signature.

Type:

List[nirs4all.data.synthetic.components.NIRBand]

correlation_group

Optional group ID for components that should have correlated concentrations (e.g., protein and nitrogen compounds).

Type:

int | None

category

Primary category (e.g., “carbohydrates”, “proteins”, “lipids”).

Type:

str

subcategory

More specific classification (e.g., “monosaccharides”, “amino_acids”).

Type:

str

synonyms

Alternative names (e.g., [“vitamin C”] for ascorbic_acid).

Type:

List[str]

formula

Chemical formula (e.g., “C6H12O6” for glucose).

Type:

str

cas_number

CAS registry number for chemical identification.

Type:

str

references

Literature citations for band assignments.

Type:

List[str]

tags

Classification tags (e.g., [“food”, “pharma”, “agriculture”]).

Type:

List[str]

Example

>>> water = SpectralComponent(
...     name="water",
...     bands=[
...         NIRBand(center=1450, sigma=25, gamma=3, amplitude=0.8),
...         NIRBand(center=1940, sigma=30, gamma=4, amplitude=1.0),
...     ],
...     correlation_group=1,
...     category="water_related",
...     formula="H2O",
... )
>>> wavelengths = np.arange(1000, 2500, 2)
>>> spectrum = water.compute(wavelengths)
bands: List[NIRBand]
cas_number: str = ''
category: str = ''
compute(wavelengths: ndarray) ndarray[source]

Compute the full component spectrum by summing all bands.

Parameters:

wavelengths – Array of wavelengths in nm at which to evaluate.

Returns:

Array of absorbance values representing the combined spectrum.

correlation_group: int | None = None
formula: str = ''
has_bands_in_range(wavelength_range: Tuple[float, float]) bool[source]

Check if component has any bands with centers in the given wavelength range.

Parameters:

wavelength_range – (min, max) wavelength in nm.

Returns:

True if at least one band center is within the range.

info() str[source]

Return formatted information about the component.

Returns:

Human-readable string with component details.

is_normalized(tolerance: float = 0.01) bool[source]

Check if the component’s band amplitudes are max-normalized (max amplitude = 1.0).

Parameters:

tolerance – Acceptable deviation from 1.0 for max amplitude.

Returns:

True if max amplitude is within tolerance of 1.0.

name: str
normalized(method: str = 'max') SpectralComponent[source]

Return a new SpectralComponent with normalized band amplitudes.

Parameters:

method – Normalization method. - “max”: Scale so max amplitude = 1.0 (default) - “sum”: Scale so sum of amplitudes = 1.0

Returns:

New SpectralComponent with normalized amplitudes.

Example

>>> component = SpectralComponent(name="test", bands=[
...     NIRBand(center=1450, sigma=25, amplitude=0.8),
...     NIRBand(center=1940, sigma=30, amplitude=2.0),
... ])
>>> normalized = component.normalized()
>>> print(max(b.amplitude for b in normalized.bands))  # 1.0
references: List[str]
subcategory: str = ''
synonyms: List[str]
tags: List[str]
validate() List[str][source]

Validate component parameters.

Returns:

List of validation issues (empty if all valid).

Example

>>> component = SpectralComponent(name="test", bands=[])
>>> issues = component.validate()
>>> if issues:
...     print("Issues found:", issues)
nirs4all.data.synthetic.components.available_components() List[str][source]

Return list of all available predefined component names.

Returns:

Sorted list of component names.

Example

>>> names = available_components()
>>> print(f"Available: {len(names)} components")
>>> print(names[:5])
nirs4all.data.synthetic.components.component_info(name: str) str[source]

Return formatted information about a component.

Parameters:

name – Component name.

Returns:

Human-readable string with component details.

Example

>>> print(component_info("water"))
nirs4all.data.synthetic.components.get_component(name: str) SpectralComponent[source]

Get a single predefined component by name or synonym.

Parameters:

name – Component name (e.g., “water”, “protein”, “lipid”) or synonym (e.g., “amylose” for “starch”).

Returns:

SpectralComponent object.

Raises:

ValueError – If component name is not found.

Example

>>> water = get_component("water")
>>> print(water.category)
>>> print(len(water.bands))
>>>
>>> # Using synonyms
>>> starch = get_component("amylose")  # Returns starch component
nirs4all.data.synthetic.components.list_categories() Dict[str, List[str]][source]

Return dictionary of categories to component names.

Returns:

Dictionary mapping category names to lists of component names.

Example

>>> categories = list_categories()
>>> for cat, components in categories.items():
...     print(f"{cat}: {len(components)} components")
nirs4all.data.synthetic.components.normalize_component_amplitudes(component: SpectralComponent, method: str = 'max') SpectralComponent[source]

Normalize band amplitudes for a component.

This is a convenience wrapper around SpectralComponent.normalized().

Parameters:
  • component – SpectralComponent to normalize.

  • method – Normalization method (“max” or “sum”).

Returns:

New SpectralComponent with normalized amplitudes.

Example

>>> comp = get_component("water")
>>> normalized = normalize_component_amplitudes(comp)
nirs4all.data.synthetic.components.search_components(query: str | None = None, category: str | None = None, subcategory: str | None = None, tags: List[str] | None = None, wavelength_range: Tuple[float, float] | None = None) List[str][source]

Search components by various criteria.

Parameters:
  • query – Fuzzy match on name or synonyms.

  • category – Filter by category (e.g., “proteins”, “carbohydrates”).

  • subcategory – Filter by subcategory (e.g., “monosaccharides”).

  • tags – Filter by tags (any match).

  • wavelength_range – Filter by components with bands in range (min, max).

Returns:

List of matching component names.

Example

>>> # Find all protein-related components
>>> proteins = search_components(category="proteins")
>>>
>>> # Find components with bands in visible-NIR region
>>> vis_nir = search_components(wavelength_range=(400, 1000))
>>>
>>> # Find components tagged for pharmaceutical use
>>> pharma = search_components(tags=["pharma"])
nirs4all.data.synthetic.components.validate_component_coverage(wavelength_range: Tuple[float, float] = (350, 2500)) Dict[str, List[str]][source]

Check which components have bands in the given wavelength range.

Parameters:

wavelength_range – (min, max) wavelength in nm.

Returns:

Dictionary with ‘covered’ and ‘not_covered’ component lists.

Example

>>> coverage = validate_component_coverage((1000, 2500))
>>> print(f"Covered: {len(coverage['covered'])}")
>>> print(f"Not covered: {coverage['not_covered']}")
nirs4all.data.synthetic.components.validate_predefined_components() List[str][source]

Validate all predefined components.

Returns:

List of validation warnings/errors (empty if all valid).

Example

>>> issues = validate_predefined_components()
>>> if issues:
...     for issue in issues:
...         print(issue)
... else:
...     print("All components valid!")