nirs4all.data.synthetic.scattering module
Scattering effects simulation for synthetic NIRS data generation.
This module provides simulation of light scattering effects in NIR spectra, including particle size effects and scattering coefficient generation.
- Key Features:
EMSC-style (Extended Multiplicative Scatter Correction) transformations
Particle size-dependent scattering simulation
Scattering coefficient generation for Kubelka-Munk
Sample-to-sample scatter variation
Wavelength-dependent scattering (Rayleigh-like)
- Physics Background:
Light scattering in particulate samples is complex and depends on: - Particle size relative to wavelength (Mie vs Rayleigh regimes) - Particle shape and surface roughness - Refractive index differences - Packing density
Rather than implementing full Mie theory (computationally expensive and may not match real data), this module uses empirical EMSC-style models that approximate the distortions that chemometric preprocessing corrects.
References
Martens, H., Nielsen, J. P., & Engelsen, S. B. (2003). Light scattering and light absorbance separated by extended multiplicative signal correction. Application to near-infrared transmission analysis of powder mixtures. Analytical Chemistry, 75(3), 394-404.
Kubelka, P. (1948). New contributions to the optics of intensely light-scattering materials. Part I. JOSA, 38(5), 448-457.
Dahm, D. J., & Dahm, K. D. (2007). Interpreting Diffuse Reflectance and Transmittance. NIR Publications.
Burger, J., & Geladi, P. (2005). Hyperspectral NIR image regression part I: calibration and correction. Journal of Chemometrics, 19(5‐7), 355-363.
- class nirs4all.data.synthetic.scattering.EMSCConfig(polynomial_order: int = 2, multiplicative_scatter_std: float = 0.15, additive_scatter_std: float = 0.05, include_wavelength_terms: bool = True, wavelength_coef_std: float = 0.02, reference_spectrum: ndarray | None = None)[source]
Bases:
objectConfiguration for EMSC-style scattering transformation.
EMSC models scattering distortion as: x = a + b*x_ref + d*λ + e*λ² + …
where a, b are multiplicative/additive scatter, and higher terms model baseline curvature due to scattering.
- reference_spectrum
Optional reference spectrum for EMSC.
- Type:
numpy.ndarray | None
- class nirs4all.data.synthetic.scattering.EMSCTransformSimulator(config: EMSCConfig | None = None, random_state: int | None = None)[source]
Bases:
objectSimulate EMSC-style scattering distortions.
Applies the inverse of Extended Multiplicative Scatter Correction, generating realistic scatter distortions that EMSC would correct.
EMSC models spectra as: x = a + b*m + d*λ + e*λ² + … where m is a reference spectrum.
This simulator generates a, b, d, e, … to create scatter distortions.
- config
EMSC configuration.
- rng
Random number generator.
Example
>>> config = EMSCConfig(polynomial_order=2) >>> simulator = EMSCTransformSimulator(config, random_state=42) >>> spectra_out = simulator.apply(spectra, wavelengths)
- apply(spectra: ndarray, wavelengths: ndarray, reference_spectrum: ndarray | None = None) ndarray[source]
Apply EMSC-style scattering distortions.
- Parameters:
spectra – Input spectra array (n_samples, n_wavelengths).
wavelengths – Wavelength array in nm.
reference_spectrum – Optional reference spectrum. If None, uses mean of input spectra or config reference.
- Returns:
Modified spectra with scatter distortions applied.
- class nirs4all.data.synthetic.scattering.ParticleSizeConfig(distribution: ~nirs4all.data.synthetic.scattering.ParticleSizeDistribution = <factory>, reference_size_um: float = 50.0, size_effect_strength: float = 1.0, wavelength_exponent: float = 1.5, include_path_length_effect: bool = True, path_length_sensitivity: float = 0.5)[source]
Bases:
objectConfiguration for particle size effects.
- distribution
Particle size distribution parameters.
- wavelength_exponent
Exponent for wavelength dependence of scattering. - 4.0 = Rayleigh (particles << wavelength) - 0.0 = No wavelength dependence - 1.0-2.0 = Typical for NIR powder samples
- Type:
- distribution: ParticleSizeDistribution
- class nirs4all.data.synthetic.scattering.ParticleSizeDistribution(mean_size_um: float = 50.0, std_size_um: float = 15.0, min_size_um: float = 5.0, max_size_um: float = 200.0, distribution: str = 'lognormal')[source]
Bases:
objectParticle size distribution parameters.
Models particle size as a log-normal distribution, which is common for ground/milled samples in NIR analysis.
- class nirs4all.data.synthetic.scattering.ParticleSizeSimulator(config: ParticleSizeConfig | None = None, random_state: int | None = None)[source]
Bases:
objectSimulate particle size effects on NIR spectra.
Particle size affects NIR spectra through: - Scattering baseline (smaller particles = more scattering) - Path length through sample (affects Beer-Lambert) - Wavelength dependence of scattering
Uses EMSC-style approach: applies distortions that chemometric preprocessing (SNV, MSC) would correct.
- config
Particle size configuration.
- rng
Random number generator.
Example
>>> config = ParticleSizeConfig( ... distribution=ParticleSizeDistribution(mean_size_um=30.0) ... ) >>> simulator = ParticleSizeSimulator(config, random_state=42) >>> spectra_out = simulator.apply(spectra, wavelengths)
- apply(spectra: ndarray, wavelengths: ndarray, particle_sizes: ndarray | None = None) ndarray[source]
Apply particle size effects to spectra.
- Parameters:
spectra – Input spectra array (n_samples, n_wavelengths).
wavelengths – Wavelength array in nm.
particle_sizes – Optional per-sample particle sizes (μm). If None, samples from configured distribution.
- Returns:
Modified spectra with particle size effects applied.
- class nirs4all.data.synthetic.scattering.ScatteringCoefficientConfig(baseline_scattering: float = 1.0, wavelength_exponent: float = 1.0, particle_size_factor: float = 0.5, sample_variation: float = 0.15, wavelength_reference_nm: float = 1500.0)[source]
Bases:
objectConfiguration for scattering coefficient (S) generation.
For Kubelka-Munk reflectance, we need both absorption (K) and scattering (S) coefficients. This config controls S(λ) generation.
- class nirs4all.data.synthetic.scattering.ScatteringCoefficientGenerator(config: ScatteringCoefficientConfig | None = None, random_state: int | None = None)[source]
Bases:
objectGenerate scattering coefficients S(λ) for Kubelka-Munk simulation.
The Kubelka-Munk equation relates reflectance R to absorption K and scattering S: f(R) = (1-R)²/(2R) = K/S
This generator produces realistic S(λ) values for different sample types.
- config
Scattering coefficient configuration.
- rng
Random number generator.
Example
>>> config = ScatteringCoefficientConfig( ... baseline_scattering=1.5, ... wavelength_exponent=1.2 ... ) >>> generator = ScatteringCoefficientGenerator(config, random_state=42) >>> S = generator.generate(n_samples=100, wavelengths=wavelengths)
- generate(n_samples: int, wavelengths: ndarray, particle_sizes: ndarray | None = None) ndarray[source]
Generate scattering coefficients for samples.
- Parameters:
n_samples – Number of samples.
wavelengths – Wavelength array in nm.
particle_sizes – Optional per-sample particle sizes (μm).
- Returns:
Scattering coefficient array (n_samples, n_wavelengths).
- class nirs4all.data.synthetic.scattering.ScatteringEffectsConfig(model: ~nirs4all.data.synthetic.scattering.ScatteringModel = ScatteringModel.EMSC, particle_size: ~nirs4all.data.synthetic.scattering.ParticleSizeConfig = <factory>, emsc: ~nirs4all.data.synthetic.scattering.EMSCConfig = <factory>, scattering_coefficient: ~nirs4all.data.synthetic.scattering.ScatteringCoefficientConfig = <factory>, enable_particle_size: bool = True, enable_emsc: bool = True)[source]
Bases:
objectCombined configuration for all scattering effects.
- model
Which scattering model to use.
- particle_size
Particle size effect configuration.
- emsc
EMSC-style transformation configuration.
- scattering_coefficient
Scattering coefficient generation config.
- emsc: EMSCConfig
- model: ScatteringModel = 'emsc'
- particle_size: ParticleSizeConfig
- scattering_coefficient: ScatteringCoefficientConfig
- class nirs4all.data.synthetic.scattering.ScatteringEffectsSimulator(config: ScatteringEffectsConfig | None = None, random_state: int | None = None)[source]
Bases:
objectCombined simulator for all scattering effects.
Applies particle size effects and EMSC-style transformations in the correct order.
- config
Scattering effects configuration.
- particle_sim
Particle size simulator.
- emsc_sim
EMSC transformation simulator.
- scatter_gen
Scattering coefficient generator.
- rng
Random number generator.
Example
>>> config = ScatteringEffectsConfig( ... model=ScatteringModel.EMSC, ... particle_size=ParticleSizeConfig( ... distribution=ParticleSizeDistribution(mean_size_um=30.0) ... ) ... ) >>> simulator = ScatteringEffectsSimulator(config, random_state=42) >>> spectra_out = simulator.apply(spectra, wavelengths)
- apply(spectra: ndarray, wavelengths: ndarray, particle_sizes: ndarray | None = None) ndarray[source]
Apply all scattering effects to spectra.
- Parameters:
spectra – Input spectra array (n_samples, n_wavelengths).
wavelengths – Wavelength array in nm.
particle_sizes – Optional per-sample particle sizes.
- Returns:
Modified spectra with scattering effects applied.
- generate_scattering_coefficients(n_samples: int, wavelengths: ndarray, particle_sizes: ndarray | None = None) ndarray[source]
Generate scattering coefficients for Kubelka-Munk.
- Parameters:
n_samples – Number of samples.
wavelengths – Wavelength array.
particle_sizes – Optional particle sizes.
- Returns:
Scattering coefficient array (n_samples, n_wavelengths).
- class nirs4all.data.synthetic.scattering.ScatteringModel(value)[source]
-
Available scattering models.
- EMSC = 'emsc'
- KUBELKA_MUNK = 'kubelka_munk'
- MIE_APPROX = 'mie_approx'
- POLYNOMIAL = 'polynomial'
- RAYLEIGH = 'rayleigh'
- nirs4all.data.synthetic.scattering.apply_emsc_distortion(spectra: ndarray, wavelengths: ndarray, multiplicative_std: float = 0.15, additive_std: float = 0.05, random_state: int | None = None) ndarray[source]
Apply EMSC-style scatter distortions with simple API.
- Parameters:
spectra – Input spectra (n_samples, n_wavelengths).
wavelengths – Wavelength array (nm).
multiplicative_std – Std dev of multiplicative scatter.
additive_std – Std dev of additive scatter.
random_state – Random seed.
- Returns:
Spectra with EMSC-style distortions applied.
Example
>>> # Add realistic scatter distortions >>> spectra_scattered = apply_emsc_distortion(spectra, wavelengths)
- nirs4all.data.synthetic.scattering.apply_particle_size_effects(spectra: ndarray, wavelengths: ndarray, mean_particle_size_um: float = 50.0, size_variation: float = 15.0, random_state: int | None = None) ndarray[source]
Apply particle size effects to spectra with simple API.
- Parameters:
spectra – Input spectra (n_samples, n_wavelengths).
wavelengths – Wavelength array (nm).
mean_particle_size_um – Mean particle size in micrometers.
size_variation – Standard deviation of particle size.
random_state – Random seed.
- Returns:
Spectra with particle size effects applied.
Example
>>> # Simulate fine powder sample >>> spectra_fine = apply_particle_size_effects( ... spectra, wavelengths, ... mean_particle_size_um=20.0 ... )
- nirs4all.data.synthetic.scattering.generate_scattering_coefficients(n_samples: int, wavelengths: ndarray, baseline_scattering: float = 1.0, wavelength_exponent: float = 1.0, particle_sizes: ndarray | None = None, random_state: int | None = None) ndarray[source]
Generate scattering coefficients with simple API.
- Parameters:
n_samples – Number of samples.
wavelengths – Wavelength array (nm).
baseline_scattering – Base scattering coefficient.
wavelength_exponent – Wavelength dependence exponent.
particle_sizes – Optional particle sizes (μm).
random_state – Random seed.
- Returns:
Scattering coefficient array (n_samples, n_wavelengths).
Example
>>> S = generate_scattering_coefficients(100, wavelengths)
- nirs4all.data.synthetic.scattering.simulate_msc_correctable_scatter(spectra: ndarray, reference: ndarray | None = None, intensity: float = 1.0, random_state: int | None = None) ndarray[source]
Apply scatter effects that MSC (Multiplicative Scatter Correction) would correct.
MSC regresses each spectrum against a reference to remove multiplicative and baseline scatter. This function applies such effects.
- Parameters:
spectra – Input spectra.
reference – Reference spectrum (mean if None).
intensity – Intensity of scatter effects.
random_state – Random seed.
- Returns:
Spectra with MSC-correctable scatter.
Example
>>> # Add scatter that MSC will correct >>> scattered = simulate_msc_correctable_scatter(spectra)
- nirs4all.data.synthetic.scattering.simulate_snv_correctable_scatter(spectra: ndarray, intensity: float = 1.0, random_state: int | None = None) ndarray[source]
Apply scatter effects that SNV (Standard Normal Variate) would correct.
SNV corrects multiplicative and additive scatter. This function applies such effects so that SNV preprocessing would restore the original spectra.
- Parameters:
spectra – Input spectra.
intensity – Intensity of scatter effects (0-2, default 1).
random_state – Random seed.
- Returns:
Spectra with SNV-correctable scatter.
Example
>>> # Add scatter that SNV will correct >>> scattered = simulate_snv_correctable_scatter(spectra, intensity=1.5)