nirs4all.data.synthetic.scattering module

Scattering effects simulation for synthetic NIRS data generation.

This module provides simulation of light scattering effects in NIR spectra, including particle size effects and scattering coefficient generation.

Key Features:
  • EMSC-style (Extended Multiplicative Scatter Correction) transformations

  • Particle size-dependent scattering simulation

  • Scattering coefficient generation for Kubelka-Munk

  • Sample-to-sample scatter variation

  • Wavelength-dependent scattering (Rayleigh-like)

Physics Background:

Light scattering in particulate samples is complex and depends on: - Particle size relative to wavelength (Mie vs Rayleigh regimes) - Particle shape and surface roughness - Refractive index differences - Packing density

Rather than implementing full Mie theory (computationally expensive and may not match real data), this module uses empirical EMSC-style models that approximate the distortions that chemometric preprocessing corrects.

References

  • Martens, H., Nielsen, J. P., & Engelsen, S. B. (2003). Light scattering and light absorbance separated by extended multiplicative signal correction. Application to near-infrared transmission analysis of powder mixtures. Analytical Chemistry, 75(3), 394-404.

  • Kubelka, P. (1948). New contributions to the optics of intensely light-scattering materials. Part I. JOSA, 38(5), 448-457.

  • Dahm, D. J., & Dahm, K. D. (2007). Interpreting Diffuse Reflectance and Transmittance. NIR Publications.

  • Burger, J., & Geladi, P. (2005). Hyperspectral NIR image regression part I: calibration and correction. Journal of Chemometrics, 19(5‐7), 355-363.

class nirs4all.data.synthetic.scattering.EMSCConfig(polynomial_order: int = 2, multiplicative_scatter_std: float = 0.15, additive_scatter_std: float = 0.05, include_wavelength_terms: bool = True, wavelength_coef_std: float = 0.02, reference_spectrum: ndarray | None = None)[source]

Bases: object

Configuration for EMSC-style scattering transformation.

EMSC models scattering distortion as: x = a + b*x_ref + d*λ + e*λ² + …

where a, b are multiplicative/additive scatter, and higher terms model baseline curvature due to scattering.

polynomial_order

Order of polynomial for wavelength-dependent scatter.

Type:

int

multiplicative_scatter_std

Std dev of multiplicative scatter factor b.

Type:

float

additive_scatter_std

Std dev of additive scatter offset a.

Type:

float

include_wavelength_terms

Whether to include λ, λ² terms.

Type:

bool

wavelength_coef_std

Std dev of wavelength coefficient.

Type:

float

reference_spectrum

Optional reference spectrum for EMSC.

Type:

numpy.ndarray | None

additive_scatter_std: float = 0.05
include_wavelength_terms: bool = True
multiplicative_scatter_std: float = 0.15
polynomial_order: int = 2
reference_spectrum: ndarray | None = None
wavelength_coef_std: float = 0.02
class nirs4all.data.synthetic.scattering.EMSCTransformSimulator(config: EMSCConfig | None = None, random_state: int | None = None)[source]

Bases: object

Simulate EMSC-style scattering distortions.

Applies the inverse of Extended Multiplicative Scatter Correction, generating realistic scatter distortions that EMSC would correct.

EMSC models spectra as: x = a + b*m + d*λ + e*λ² + … where m is a reference spectrum.

This simulator generates a, b, d, e, … to create scatter distortions.

config

EMSC configuration.

rng

Random number generator.

Example

>>> config = EMSCConfig(polynomial_order=2)
>>> simulator = EMSCTransformSimulator(config, random_state=42)
>>> spectra_out = simulator.apply(spectra, wavelengths)
apply(spectra: ndarray, wavelengths: ndarray, reference_spectrum: ndarray | None = None) ndarray[source]

Apply EMSC-style scattering distortions.

Parameters:
  • spectra – Input spectra array (n_samples, n_wavelengths).

  • wavelengths – Wavelength array in nm.

  • reference_spectrum – Optional reference spectrum. If None, uses mean of input spectra or config reference.

Returns:

Modified spectra with scatter distortions applied.

get_emsc_basis(wavelengths: ndarray) ndarray[source]

Get EMSC polynomial basis functions.

Parameters:

wavelengths – Wavelength array.

Returns:

Basis matrix (n_wavelengths, n_terms).

class nirs4all.data.synthetic.scattering.ParticleSizeConfig(distribution: ~nirs4all.data.synthetic.scattering.ParticleSizeDistribution = <factory>, reference_size_um: float = 50.0, size_effect_strength: float = 1.0, wavelength_exponent: float = 1.5, include_path_length_effect: bool = True, path_length_sensitivity: float = 0.5)[source]

Bases: object

Configuration for particle size effects.

distribution

Particle size distribution parameters.

Type:

nirs4all.data.synthetic.scattering.ParticleSizeDistribution

reference_size_um

Reference particle size for baseline scattering.

Type:

float

size_effect_strength

How strongly size affects scattering (0-1).

Type:

float

wavelength_exponent

Exponent for wavelength dependence of scattering. - 4.0 = Rayleigh (particles << wavelength) - 0.0 = No wavelength dependence - 1.0-2.0 = Typical for NIR powder samples

Type:

float

include_path_length_effect

Whether particle size affects optical path.

Type:

bool

path_length_sensitivity

How strongly size affects path length.

Type:

float

distribution: ParticleSizeDistribution
include_path_length_effect: bool = True
path_length_sensitivity: float = 0.5
reference_size_um: float = 50.0
size_effect_strength: float = 1.0
wavelength_exponent: float = 1.5
class nirs4all.data.synthetic.scattering.ParticleSizeDistribution(mean_size_um: float = 50.0, std_size_um: float = 15.0, min_size_um: float = 5.0, max_size_um: float = 200.0, distribution: str = 'lognormal')[source]

Bases: object

Particle size distribution parameters.

Models particle size as a log-normal distribution, which is common for ground/milled samples in NIR analysis.

mean_size_um

Mean particle size in micrometers.

Type:

float

std_size_um

Standard deviation of particle size in micrometers.

Type:

float

min_size_um

Minimum particle size (lower truncation).

Type:

float

max_size_um

Maximum particle size (upper truncation).

Type:

float

distribution

Type of distribution (‘lognormal’, ‘normal’, ‘uniform’).

Type:

str

distribution: str = 'lognormal'
max_size_um: float = 200.0
mean_size_um: float = 50.0
min_size_um: float = 5.0
sample(n_samples: int, rng: Generator) ndarray[source]

Sample particle sizes from the distribution.

std_size_um: float = 15.0
class nirs4all.data.synthetic.scattering.ParticleSizeSimulator(config: ParticleSizeConfig | None = None, random_state: int | None = None)[source]

Bases: object

Simulate particle size effects on NIR spectra.

Particle size affects NIR spectra through: - Scattering baseline (smaller particles = more scattering) - Path length through sample (affects Beer-Lambert) - Wavelength dependence of scattering

Uses EMSC-style approach: applies distortions that chemometric preprocessing (SNV, MSC) would correct.

config

Particle size configuration.

rng

Random number generator.

Example

>>> config = ParticleSizeConfig(
...     distribution=ParticleSizeDistribution(mean_size_um=30.0)
... )
>>> simulator = ParticleSizeSimulator(config, random_state=42)
>>> spectra_out = simulator.apply(spectra, wavelengths)
apply(spectra: ndarray, wavelengths: ndarray, particle_sizes: ndarray | None = None) ndarray[source]

Apply particle size effects to spectra.

Parameters:
  • spectra – Input spectra array (n_samples, n_wavelengths).

  • wavelengths – Wavelength array in nm.

  • particle_sizes – Optional per-sample particle sizes (μm). If None, samples from configured distribution.

Returns:

Modified spectra with particle size effects applied.

generate_particle_sizes(n_samples: int) ndarray[source]

Generate particle sizes for a set of samples.

Parameters:

n_samples – Number of samples.

Returns:

Array of particle sizes in μm.

class nirs4all.data.synthetic.scattering.ScatteringCoefficientConfig(baseline_scattering: float = 1.0, wavelength_exponent: float = 1.0, particle_size_factor: float = 0.5, sample_variation: float = 0.15, wavelength_reference_nm: float = 1500.0)[source]

Bases: object

Configuration for scattering coefficient (S) generation.

For Kubelka-Munk reflectance, we need both absorption (K) and scattering (S) coefficients. This config controls S(λ) generation.

baseline_scattering

Base scattering coefficient value.

Type:

float

wavelength_exponent

Exponent for wavelength dependence. S(λ) ∝ λ^(-exponent)

Type:

float

particle_size_factor

How strongly particle size affects S.

Type:

float

sample_variation

Sample-to-sample variation in S.

Type:

float

wavelength_reference_nm

Reference wavelength for normalization.

Type:

float

baseline_scattering: float = 1.0
particle_size_factor: float = 0.5
sample_variation: float = 0.15
wavelength_exponent: float = 1.0
wavelength_reference_nm: float = 1500.0
class nirs4all.data.synthetic.scattering.ScatteringCoefficientGenerator(config: ScatteringCoefficientConfig | None = None, random_state: int | None = None)[source]

Bases: object

Generate scattering coefficients S(λ) for Kubelka-Munk simulation.

The Kubelka-Munk equation relates reflectance R to absorption K and scattering S: f(R) = (1-R)²/(2R) = K/S

This generator produces realistic S(λ) values for different sample types.

config

Scattering coefficient configuration.

rng

Random number generator.

Example

>>> config = ScatteringCoefficientConfig(
...     baseline_scattering=1.5,
...     wavelength_exponent=1.2
... )
>>> generator = ScatteringCoefficientGenerator(config, random_state=42)
>>> S = generator.generate(n_samples=100, wavelengths=wavelengths)
generate(n_samples: int, wavelengths: ndarray, particle_sizes: ndarray | None = None) ndarray[source]

Generate scattering coefficients for samples.

Parameters:
  • n_samples – Number of samples.

  • wavelengths – Wavelength array in nm.

  • particle_sizes – Optional per-sample particle sizes (μm).

Returns:

Scattering coefficient array (n_samples, n_wavelengths).

generate_for_particle_sizes(particle_sizes: ndarray, wavelengths: ndarray) ndarray[source]

Generate scattering coefficients based on particle sizes.

Parameters:
  • particle_sizes – Array of particle sizes in μm.

  • wavelengths – Wavelength array in nm.

Returns:

Scattering coefficient array.

class nirs4all.data.synthetic.scattering.ScatteringEffectsConfig(model: ~nirs4all.data.synthetic.scattering.ScatteringModel = ScatteringModel.EMSC, particle_size: ~nirs4all.data.synthetic.scattering.ParticleSizeConfig = <factory>, emsc: ~nirs4all.data.synthetic.scattering.EMSCConfig = <factory>, scattering_coefficient: ~nirs4all.data.synthetic.scattering.ScatteringCoefficientConfig = <factory>, enable_particle_size: bool = True, enable_emsc: bool = True)[source]

Bases: object

Combined configuration for all scattering effects.

model

Which scattering model to use.

Type:

nirs4all.data.synthetic.scattering.ScatteringModel

particle_size

Particle size effect configuration.

Type:

nirs4all.data.synthetic.scattering.ParticleSizeConfig

emsc

EMSC-style transformation configuration.

Type:

nirs4all.data.synthetic.scattering.EMSCConfig

scattering_coefficient

Scattering coefficient generation config.

Type:

nirs4all.data.synthetic.scattering.ScatteringCoefficientConfig

enable_particle_size

Whether to apply particle size effects.

Type:

bool

enable_emsc

Whether to apply EMSC-style transformation.

Type:

bool

emsc: EMSCConfig
enable_emsc: bool = True
enable_particle_size: bool = True
model: ScatteringModel = 'emsc'
particle_size: ParticleSizeConfig
scattering_coefficient: ScatteringCoefficientConfig
class nirs4all.data.synthetic.scattering.ScatteringEffectsSimulator(config: ScatteringEffectsConfig | None = None, random_state: int | None = None)[source]

Bases: object

Combined simulator for all scattering effects.

Applies particle size effects and EMSC-style transformations in the correct order.

config

Scattering effects configuration.

particle_sim

Particle size simulator.

emsc_sim

EMSC transformation simulator.

scatter_gen

Scattering coefficient generator.

rng

Random number generator.

Example

>>> config = ScatteringEffectsConfig(
...     model=ScatteringModel.EMSC,
...     particle_size=ParticleSizeConfig(
...         distribution=ParticleSizeDistribution(mean_size_um=30.0)
...     )
... )
>>> simulator = ScatteringEffectsSimulator(config, random_state=42)
>>> spectra_out = simulator.apply(spectra, wavelengths)
apply(spectra: ndarray, wavelengths: ndarray, particle_sizes: ndarray | None = None) ndarray[source]

Apply all scattering effects to spectra.

Parameters:
  • spectra – Input spectra array (n_samples, n_wavelengths).

  • wavelengths – Wavelength array in nm.

  • particle_sizes – Optional per-sample particle sizes.

Returns:

Modified spectra with scattering effects applied.

generate_scattering_coefficients(n_samples: int, wavelengths: ndarray, particle_sizes: ndarray | None = None) ndarray[source]

Generate scattering coefficients for Kubelka-Munk.

Parameters:
  • n_samples – Number of samples.

  • wavelengths – Wavelength array.

  • particle_sizes – Optional particle sizes.

Returns:

Scattering coefficient array (n_samples, n_wavelengths).

class nirs4all.data.synthetic.scattering.ScatteringModel(value)[source]

Bases: str, Enum

Available scattering models.

EMSC = 'emsc'
KUBELKA_MUNK = 'kubelka_munk'
MIE_APPROX = 'mie_approx'
POLYNOMIAL = 'polynomial'
RAYLEIGH = 'rayleigh'
nirs4all.data.synthetic.scattering.apply_emsc_distortion(spectra: ndarray, wavelengths: ndarray, multiplicative_std: float = 0.15, additive_std: float = 0.05, random_state: int | None = None) ndarray[source]

Apply EMSC-style scatter distortions with simple API.

Parameters:
  • spectra – Input spectra (n_samples, n_wavelengths).

  • wavelengths – Wavelength array (nm).

  • multiplicative_std – Std dev of multiplicative scatter.

  • additive_std – Std dev of additive scatter.

  • random_state – Random seed.

Returns:

Spectra with EMSC-style distortions applied.

Example

>>> # Add realistic scatter distortions
>>> spectra_scattered = apply_emsc_distortion(spectra, wavelengths)
nirs4all.data.synthetic.scattering.apply_particle_size_effects(spectra: ndarray, wavelengths: ndarray, mean_particle_size_um: float = 50.0, size_variation: float = 15.0, random_state: int | None = None) ndarray[source]

Apply particle size effects to spectra with simple API.

Parameters:
  • spectra – Input spectra (n_samples, n_wavelengths).

  • wavelengths – Wavelength array (nm).

  • mean_particle_size_um – Mean particle size in micrometers.

  • size_variation – Standard deviation of particle size.

  • random_state – Random seed.

Returns:

Spectra with particle size effects applied.

Example

>>> # Simulate fine powder sample
>>> spectra_fine = apply_particle_size_effects(
...     spectra, wavelengths,
...     mean_particle_size_um=20.0
... )
nirs4all.data.synthetic.scattering.generate_scattering_coefficients(n_samples: int, wavelengths: ndarray, baseline_scattering: float = 1.0, wavelength_exponent: float = 1.0, particle_sizes: ndarray | None = None, random_state: int | None = None) ndarray[source]

Generate scattering coefficients with simple API.

Parameters:
  • n_samples – Number of samples.

  • wavelengths – Wavelength array (nm).

  • baseline_scattering – Base scattering coefficient.

  • wavelength_exponent – Wavelength dependence exponent.

  • particle_sizes – Optional particle sizes (μm).

  • random_state – Random seed.

Returns:

Scattering coefficient array (n_samples, n_wavelengths).

Example

>>> S = generate_scattering_coefficients(100, wavelengths)
nirs4all.data.synthetic.scattering.simulate_msc_correctable_scatter(spectra: ndarray, reference: ndarray | None = None, intensity: float = 1.0, random_state: int | None = None) ndarray[source]

Apply scatter effects that MSC (Multiplicative Scatter Correction) would correct.

MSC regresses each spectrum against a reference to remove multiplicative and baseline scatter. This function applies such effects.

Parameters:
  • spectra – Input spectra.

  • reference – Reference spectrum (mean if None).

  • intensity – Intensity of scatter effects.

  • random_state – Random seed.

Returns:

Spectra with MSC-correctable scatter.

Example

>>> # Add scatter that MSC will correct
>>> scattered = simulate_msc_correctable_scatter(spectra)
nirs4all.data.synthetic.scattering.simulate_snv_correctable_scatter(spectra: ndarray, intensity: float = 1.0, random_state: int | None = None) ndarray[source]

Apply scatter effects that SNV (Standard Normal Variate) would correct.

SNV corrects multiplicative and additive scatter. This function applies such effects so that SNV preprocessing would restore the original spectra.

Parameters:
  • spectra – Input spectra.

  • intensity – Intensity of scatter effects (0-2, default 1).

  • random_state – Random seed.

Returns:

Spectra with SNV-correctable scatter.

Example

>>> # Add scatter that SNV will correct
>>> scattered = simulate_snv_correctable_scatter(spectra, intensity=1.5)