nirs4all.synthesis.fitter module

Real data fitting utilities for synthetic NIRS spectra generation.

This module provides tools to analyze real NIRS datasets and fit generator parameters to match their statistical and spectral properties.

Key Features:
  • Statistical property analysis (mean, std, skewness, kurtosis)

  • Spectral shape analysis (slope, curvature, noise)

  • PCA structure analysis

  • Parameter estimation for SyntheticNIRSGenerator

  • Comparison between synthetic and real data

  • Phase 1-4 Enhanced Features:
    • Instrument archetype inference (InGaAs, PbS, MEMS, etc.)

    • Measurement mode detection (transmittance, reflectance, ATR)

    • Application domain suggestion (agriculture, pharmaceutical, etc.)

    • Environmental effects estimation (temperature, moisture)

    • Scattering parameter estimation (particle size, EMSC)

    • Wavenumber-based peak analysis for component identification

Example

>>> from nirs4all.synthesis import RealDataFitter, SyntheticNIRSGenerator
>>>
>>> # Analyze real data
>>> fitter = RealDataFitter()
>>> params = fitter.fit(X_real, wavelengths=wavelengths)
>>>
>>> # Create generator with fitted parameters (includes all Phase 1-4 features)
>>> generator = fitter.create_matched_generator()
>>> X_synthetic, _, _ = generator.generate(n_samples=1000)
>>>
>>> # Or get all inferred characteristics
>>> print(f"Inferred instrument: {params.inferred_instrument}")
>>> print(f"Inferred domain: {params.inferred_domain}")
>>> print(f"Measurement mode: {params.measurement_mode}")

References

  • Based on comparator.py from bench/synthetic/

  • Enhanced with Phase 1-4 synthetic generator features

class nirs4all.synthesis.fitter.ComponentFitResult(component_names: List[str], concentrations: ndarray, baseline_coefficients: ndarray | None, fitted_spectrum: ndarray, residuals: ndarray, r_squared: float, rmse: float, wavelengths: ndarray | None = None)[source]

Bases: object

Result of fitting spectral components to an observed spectrum.

component_names

Names of components used in fitting.

Type:

List[str]

concentrations

Estimated concentration for each component.

Type:

numpy.ndarray

baseline_coefficients

Polynomial baseline coefficients (if fit_baseline=True).

Type:

numpy.ndarray | None

fitted_spectrum

Reconstructed spectrum from fit.

Type:

numpy.ndarray

residuals

Difference between observed and fitted spectra.

Type:

numpy.ndarray

r_squared

R² goodness-of-fit metric.

Type:

float

rmse

Root mean squared error of fit.

Type:

float

wavelengths

Wavelength grid used for fitting.

Type:

numpy.ndarray | None

baseline_coefficients: ndarray | None
component_names: List[str]
concentrations: ndarray
fitted_spectrum: ndarray
r_squared: float
residuals: ndarray
rmse: float
summary() str[source]

Return human-readable summary of fit results.

to_dict() Dict[str, float][source]

Return concentrations as a dictionary.

top_components(n: int = 5, threshold: float = 0.0) List[Tuple[str, float]][source]

Get top N components by concentration.

Parameters:
  • n – Maximum number of components to return.

  • threshold – Minimum concentration threshold.

Returns:

List of (component_name, concentration) tuples, sorted descending.

wavelengths: ndarray | None = None
class nirs4all.synthesis.fitter.ComponentFitter(component_names: List[str] | None = None, wavelengths: ndarray | None = None, fit_baseline: bool = True, baseline_order: int = 2, preprocessing: str | PreprocessingType | None = None, auto_detect_preprocessing: bool = False, sg_window_length: int = 15, sg_polyorder: int = 2)[source]

Bases: object

Fit linear combinations of spectral components to observed spectra.

Solves: spectrum ≈ Σ(c_i * component_i(λ)) + baseline

Uses non-negative least squares (NNLS) to ensure positive concentrations, which is physically meaningful for spectroscopic analysis.

Preprocessing Support: If your observed spectra are preprocessed (e.g., second derivative, SNV), use the preprocessing parameter to apply the same transformation to component spectra before fitting.

Auto-detection: Set auto_detect_preprocessing=True to automatically detect the preprocessing type from the data (recommended for derivative data).

Example

>>> from nirs4all.synthesis import ComponentFitter
>>>
>>> # Fit with all available components
>>> fitter = ComponentFitter(wavelengths=np.arange(1000, 2500, 2))
>>> result = fitter.fit(observed_spectrum)
>>> print(result.summary())
>>>
>>> # Fit preprocessed data (e.g., second derivative)
>>> fitter = ComponentFitter(
...     component_names=["water", "protein", "lipid"],
...     wavelengths=wavelengths,
...     preprocessing="second_derivative",  # Components will be transformed
... )
>>> result = fitter.fit(derivative_spectrum)
>>>
>>> # Auto-detect preprocessing (recommended for unknown data)
>>> fitter = ComponentFitter(
...     wavelengths=wavelengths,
...     auto_detect_preprocessing=True,  # Will detect derivative, SNV, etc.
... )
>>> result = fitter.fit(unknown_spectrum)
component_names

List of component names to fit.

wavelengths

Wavelength grid for fitting.

fit_baseline

Whether to include polynomial baseline.

baseline_order

Polynomial order for baseline (default 2).

preprocessing

Preprocessing to apply to components before fitting.

auto_detect_preprocessing

If True, detect preprocessing from data.

detected_preprocessing

The detected preprocessing type (after first fit).

Type:

nirs4all.synthesis.fitter.PreprocessingType | None

detected_preprocessing: PreprocessingType | None
fit(spectrum: ndarray, method: str = 'nnls') ComponentFitResult[source]

Fit components to a single spectrum.

Parameters:
  • spectrum – Observed spectrum, shape (n_wavelengths,).

  • method – Fitting method. - “nnls”: Non-negative least squares (default, physically meaningful). - “lsq”: Unconstrained least squares (allows negative concentrations).

Returns:

ComponentFitResult with concentrations, residuals, and fit quality metrics.

Example

>>> result = fitter.fit(observed_spectrum)
>>> print(f"R² = {result.r_squared:.4f}")
>>> print(f"Top components: {result.top_components(3)}")
fit_batch(spectra: ndarray, method: str = 'nnls', n_jobs: int = -1) List[ComponentFitResult][source]

Fit components to multiple spectra in parallel.

Parameters:
  • spectra – Observed spectra, shape (n_samples, n_wavelengths).

  • method – Fitting method (“nnls” or “lsq”).

  • n_jobs – Number of parallel jobs (-1 = all cores, 1 = sequential).

Returns:

List of ComponentFitResult objects.

Example

>>> results = fitter.fit_batch(X_observed, n_jobs=4)
>>> mean_r2 = np.mean([r.r_squared for r in results])
>>> print(f"Mean R² = {mean_r2:.4f}")
get_concentration_matrix(spectra: ndarray, method: str = 'nnls', n_jobs: int = -1) Tuple[ndarray, List[str]][source]

Get concentration matrix for batch of spectra.

Convenience method that extracts just the concentrations.

Parameters:
  • spectra – Observed spectra, shape (n_samples, n_wavelengths).

  • method – Fitting method (“nnls” or “lsq”).

  • n_jobs – Number of parallel jobs.

Returns:

  • concentrations: Array of shape (n_samples, n_components)

  • component_names: List of component names

Return type:

Tuple of

Example

>>> C, names = fitter.get_concentration_matrix(X_observed)
>>> water_idx = names.index("water")
>>> water_concentrations = C[:, water_idx]
suggest_components(spectrum: ndarray, top_n: int = 5, threshold: float = 0.01, method: str = 'nnls') List[Tuple[str, float]][source]

Suggest which components are likely present in a spectrum.

Performs a fit and returns the top components by concentration.

Parameters:
  • spectrum – Observed spectrum, shape (n_wavelengths,).

  • top_n – Maximum number of components to return.

  • threshold – Minimum concentration threshold.

  • method – Fitting method (“nnls” or “lsq”).

Returns:

List of (component_name, estimated_concentration) tuples, sorted by concentration descending.

Example

>>> suggestions = fitter.suggest_components(unknown_spectrum)
>>> print("Likely components:")
>>> for name, conc in suggestions:
...     print(f"  {name}: {conc:.3f}")
class nirs4all.synthesis.fitter.DerivativeAwareForwardModelFitter(components: List['SpectralComponent'], canonical_grid: np.ndarray, target_grid: np.ndarray, derivative_order: int = 1, sg_window: int = 15, sg_polyorder: int = 2, baseline_order: int = 6, wl_shift_bounds: Tuple[float, float] = (-5.0, 5.0), ils_sigma_bounds: Tuple[float, float] = (2.0, 15.0), path_length_bounds: Tuple[float, float] = (0.5, 2.0))[source]

Bases: object

Forward model fitter for derivative-preprocessed datasets.

Key principle: Never fit derivative spectra by adding narrow bands. Instead:

  1. Fit latent physical model (raw absorbance)

  2. Apply derivative preprocessing to model output

  3. Compare in derivative space

This ensures concentrations remain physically interpretable without oscillatory artifacts from narrow compensating peaks.

components

List of SpectralComponent objects.

Type:

List[‘SpectralComponent’]

canonical_grid

High-resolution canonical wavelength grid.

Type:

np.ndarray

target_grid

Target wavelength grid (dataset grid).

Type:

np.ndarray

derivative_order

1 for first derivative, 2 for second.

Type:

int

sg_window

Savitzky-Golay window length.

Type:

int

sg_polyorder

Savitzky-Golay polynomial order.

Type:

int

baseline_order

Number of Chebyshev baseline terms.

Type:

int

Example

>>> fitter = DerivativeAwareForwardModelFitter(
...     components=components,
...     canonical_grid=canonical_wl,
...     target_grid=dataset_wl,
...     derivative_order=1,  # First derivative
... )
>>> result = fitter.fit(derivative_spectrum)
>>> print(f"R² = {result['r_squared']:.4f}")
__post_init__()[source]

Pre-compute component spectra on canonical grid.

baseline_order: int = 6
canonical_grid: np.ndarray
components: List['SpectralComponent']
derivative_order: int = 1
fit(y_deriv: ndarray, initial_guess: ndarray | None = None) Dict[str, Any][source]

Fit forward model to derivative spectrum.

Parameters:
  • y_deriv – Target spectrum (already derivative-preprocessed).

  • initial_guess – Initial [wl_shift, ils_sigma, path_length].

Returns:

  • r_squared: Coefficient of determination

  • fitted_deriv: Fitted derivative spectrum

  • fitted_raw: Reconstructed raw spectrum

  • residuals_deriv: Fitting residuals

  • concentrations: Fitted component concentrations

  • baseline_coeffs: Fitted baseline coefficients

  • wl_shift, ils_sigma, path_length: Instrument params

Return type:

Dict with fitted parameters

ils_sigma_bounds: Tuple[float, float] = (2.0, 15.0)
path_length_bounds: Tuple[float, float] = (0.5, 2.0)
sg_polyorder: int = 2
sg_window: int = 15
target_grid: np.ndarray
wl_shift_bounds: Tuple[float, float] = (-5.0, 5.0)
class nirs4all.synthesis.fitter.DomainInference(domain_name: str = 'unknown', category: str = 'unknown', confidence: float = 0.0, detected_components: List[str] = <factory>, alternative_domains: Dict[str, float]=<factory>)[source]

Bases: object

Results of application domain inference.

domain_name

Best matching domain name.

Type:

str

category

Domain category.

Type:

str

confidence

Confidence score (0-1).

Type:

float

detected_components

Components detected from peak analysis.

Type:

List[str]

alternative_domains

Other possible domains with scores.

Type:

Dict[str, float]

alternative_domains: Dict[str, float]
category: str = 'unknown'
confidence: float = 0.0
detected_components: List[str]
domain_name: str = 'unknown'
class nirs4all.synthesis.fitter.EdgeArtifactInference(has_edge_artifacts: bool = False, has_detector_rolloff: bool = False, has_stray_light: bool = False, has_truncated_peaks: bool = False, has_edge_curvature: bool = False, left_edge_intensity: float = 0.0, right_edge_intensity: float = 0.0, edge_noise_ratio: float = 1.0, detector_model: str = 'generic_nir', stray_light_fraction: float = 0.0, curvature_type: str = 'none', boundary_peak_amplitudes: Tuple[float, float] = (0.0, 0.0))[source]

Bases: object

Results of edge artifact inference.

Detects edge deformation effects in NIR spectra caused by: - Detector sensitivity roll-off at wavelength boundaries - Stray light effects (more pronounced at edges) - Truncated absorption bands outside measurement range - Baseline curvature concentrated at edges

has_edge_artifacts

Whether significant edge artifacts are detected.

Type:

bool

has_detector_rolloff

Whether detector roll-off effects are present.

Type:

bool

has_stray_light

Whether stray light effects are detected.

Type:

bool

has_truncated_peaks

Whether truncated peaks at boundaries are present.

Type:

bool

has_edge_curvature

Whether edge curvature/bending is detected.

Type:

bool

left_edge_intensity

Relative intensity change at left edge.

Type:

float

right_edge_intensity

Relative intensity change at right edge.

Type:

float

edge_noise_ratio

Ratio of edge noise to center noise.

Type:

float

detector_model

Suggested detector model based on characteristics.

Type:

str

stray_light_fraction

Estimated stray light fraction.

Type:

float

curvature_type

Detected curvature type (“smile”, “frown”, “asymmetric”).

Type:

str

boundary_peak_amplitudes

Estimated truncated peak amplitudes at edges.

Type:

Tuple[float, float]

References

  • JASCO (2020). Advantages of high-sensitivity InGaAs detector.

  • Applied Optics (1975). Resolution and stray light in NIR spectroscopy.

  • Burns & Ciurczak (2007). Handbook of Near-Infrared Analysis.

boundary_peak_amplitudes: Tuple[float, float] = (0.0, 0.0)
curvature_type: str = 'none'
detector_model: str = 'generic_nir'
edge_noise_ratio: float = 1.0
has_detector_rolloff: bool = False
has_edge_artifacts: bool = False
has_edge_curvature: bool = False
has_stray_light: bool = False
has_truncated_peaks: bool = False
left_edge_intensity: float = 0.0
right_edge_intensity: float = 0.0
stray_light_fraction: float = 0.0
class nirs4all.synthesis.fitter.EnvironmentalInference(estimated_temperature_variation: float = 0.0, has_temperature_effects: bool = False, estimated_moisture_variation: float = 0.0, has_moisture_effects: bool = False, water_band_shift: float = 0.0)[source]

Bases: object

Results of environmental effects inference.

estimated_temperature_variation

Estimated temperature variation (°C).

Type:

float

has_temperature_effects

Whether temperature effects are detectable.

Type:

bool

estimated_moisture_variation

Estimated moisture variation.

Type:

float

has_moisture_effects

Whether moisture effects are detectable.

Type:

bool

water_band_shift

Detected shift in water bands (nm).

Type:

float

estimated_moisture_variation: float = 0.0
estimated_temperature_variation: float = 0.0
has_moisture_effects: bool = False
has_temperature_effects: bool = False
water_band_shift: float = 0.0
class nirs4all.synthesis.fitter.FittedParameters(wavelength_start: float = 1000.0, wavelength_end: float = 2500.0, wavelength_step: float = 2.0, global_slope_mean: float = 0.0, global_slope_std: float = 0.02, noise_base: float = 0.001, noise_signal_dep: float = 0.005, path_length_std: float = 0.05, baseline_amplitude: float = 0.02, scatter_alpha_std: float = 0.05, scatter_beta_std: float = 0.01, tilt_std: float = 0.01, complexity: str = 'realistic', source_name: str = '', source_properties: SpectralProperties | None = None, inferred_instrument: str = 'unknown', instrument_inference: InstrumentInference | None = None, measurement_mode: str = 'transmittance', measurement_mode_confidence: float = 0.0, inferred_domain: str = 'unknown', domain_inference: DomainInference | None = None, environmental_inference: EnvironmentalInference | None = None, temperature_config: Dict[str, ~typing.Any]=<factory>, moisture_config: Dict[str, ~typing.Any]=<factory>, scattering_inference: ScatteringInference | None = None, particle_size_config: Dict[str, ~typing.Any]=<factory>, emsc_config: Dict[str, ~typing.Any]=<factory>, edge_artifact_inference: EdgeArtifactInference | None = None, edge_artifacts_config: Dict[str, ~typing.Any]=<factory>, boundary_components_config: Dict[str, ~typing.Any]=<factory>, preprocessing_inference: PreprocessingInference | None = None, preprocessing_type: str = 'raw_absorbance', is_preprocessed: bool = False, detected_components: List[str] = <factory>, suggested_n_components: int = 5)[source]

Bases: object

Parameters fitted from real data for synthetic generation.

This dataclass contains all parameters needed to configure a SyntheticNIRSGenerator to produce spectra similar to a real dataset, including Phase 1-4 enhanced features.

# Basic wavelength grid
wavelength_start

Start wavelength (nm).

Type:

float

wavelength_end

End wavelength (nm).

Type:

float

wavelength_step

Wavelength step (nm).

Type:

float

# Slope and baseline parameters
global_slope_mean

Mean global slope.

Type:

float

global_slope_std

Slope standard deviation.

Type:

float

baseline_amplitude

Baseline drift amplitude.

Type:

float

# Noise parameters
noise_base

Base noise level.

Type:

float

noise_signal_dep

Signal-dependent noise factor.

Type:

float

# Scatter parameters
path_length_std

Path length variation.

Type:

float

scatter_alpha_std

Multiplicative scatter std.

Type:

float

scatter_beta_std

Additive scatter std.

Type:

float

tilt_std

Spectral tilt standard deviation.

Type:

float

# Complexity
complexity

Suggested complexity level.

Type:

str

# Source metadata
source_name

Name of source dataset.

Type:

str

source_properties

Full SpectralProperties of source.

Type:

nirs4all.synthesis.fitter.SpectralProperties | None

# Phase 1-4 Enhanced Parameters
# Instrument inference
inferred_instrument

Inferred instrument archetype.

Type:

str

instrument_inference

Full instrument inference result.

Type:

nirs4all.synthesis.fitter.InstrumentInference | None

# Measurement mode
measurement_mode

Inferred measurement mode.

Type:

str

measurement_mode_confidence

Confidence of inference.

Type:

float

# Domain inference
inferred_domain

Inferred application domain.

Type:

str

domain_inference

Full domain inference result.

Type:

nirs4all.synthesis.fitter.DomainInference | None

# Environmental effects
environmental_inference

Environmental effects inference.

Type:

nirs4all.synthesis.fitter.EnvironmentalInference | None

temperature_config

Suggested temperature config parameters.

Type:

Dict[str, Any]

moisture_config

Suggested moisture config parameters.

Type:

Dict[str, Any]

# Scattering effects
scattering_inference

Scattering effects inference.

Type:

nirs4all.synthesis.fitter.ScatteringInference | None

particle_size_config

Suggested particle size config parameters.

Type:

Dict[str, Any]

emsc_config

Suggested EMSC config parameters.

Type:

Dict[str, Any]

# Detected components for procedural generation
detected_components

List of detected/inferred component names.

Type:

List[str]

suggested_n_components

Suggested number of components.

Type:

int

baseline_amplitude: float = 0.02
boundary_components_config: Dict[str, Any]
complexity: str = 'realistic'
detected_components: List[str]
domain_inference: DomainInference | None = None
edge_artifact_inference: EdgeArtifactInference | None = None
edge_artifacts_config: Dict[str, Any]
emsc_config: Dict[str, Any]
environmental_inference: EnvironmentalInference | None = None
classmethod from_dict(data: Dict[str, Any]) FittedParameters[source]

Create FittedParameters from a dictionary.

Parameters:

data – Dictionary with parameter values.

Returns:

FittedParameters instance.

global_slope_mean: float = 0.0
global_slope_std: float = 0.02
inferred_domain: str = 'unknown'
inferred_instrument: str = 'unknown'
instrument_inference: InstrumentInference | None = None
is_preprocessed: bool = False
classmethod load(path: str) FittedParameters[source]

Load parameters from JSON file.

Parameters:

path – Input file path.

Returns:

FittedParameters instance.

measurement_mode: str = 'transmittance'
measurement_mode_confidence: float = 0.0
moisture_config: Dict[str, Any]
noise_base: float = 0.001
noise_signal_dep: float = 0.005
particle_size_config: Dict[str, Any]
path_length_std: float = 0.05
preprocessing_inference: PreprocessingInference | None = None
preprocessing_type: str = 'raw_absorbance'
save(path: str) None[source]

Save parameters to JSON file.

Parameters:

path – Output file path.

scatter_alpha_std: float = 0.05
scatter_beta_std: float = 0.01
scattering_inference: ScatteringInference | None = None
source_name: str = ''
source_properties: SpectralProperties | None = None
suggested_n_components: int = 5
summary() str[source]

Generate a human-readable summary of fitted parameters.

Returns:

Multi-line summary string.

temperature_config: Dict[str, Any]
tilt_std: float = 0.01
to_dict() Dict[str, Any][source]

Convert all parameters to a dictionary.

Returns:

Dictionary with all parameter values.

to_full_config() Dict[str, Any][source]

Convert all fitted parameters to a comprehensive configuration.

This includes all Phase 1-4 parameters for complete synthetic data generation matching the source dataset.

Returns:

Dictionary with all configuration parameters.

Example

>>> params = fitter.fit(X_real)
>>> config = params.to_full_config()
>>> # Use with builder pattern or advanced configuration
to_generator_kwargs() Dict[str, Any][source]

Convert fitted parameters to kwargs for SyntheticNIRSGenerator.

Returns:

Dictionary of keyword arguments.

Example

>>> params = fitter.fit(X_real)
>>> generator = SyntheticNIRSGenerator(**params.to_generator_kwargs())
wavelength_end: float = 2500.0
wavelength_start: float = 1000.0
wavelength_step: float = 2.0
class nirs4all.synthesis.fitter.ForwardModelFitter(components: List['SpectralComponent'], canonical_grid: np.ndarray, target_grid: np.ndarray, baseline_order: int = 4, wl_shift_bounds: Tuple[float, float] = (-5.0, 5.0), ils_sigma_bounds: Tuple[float, float] = (2.0, 15.0), path_length_bounds: Tuple[float, float] = (0.5, 2.0))[source]

Bases: object

Variable projection fitter for physical forward model.

Fits a physical mixture model to observed spectra by separating: - Linear params: concentrations, baseline coefficients (solved via NNLS/lsq) - Nonlinear params: wl_shift, ils_sigma, path_length (solved via optimization)

This approach is numerically stable and physically interpretable.

components

List of SpectralComponent objects.

Type:

List[‘SpectralComponent’]

canonical_grid

High-resolution canonical wavelength grid.

Type:

np.ndarray

target_grid

Target wavelength grid (dataset grid).

Type:

np.ndarray

baseline_order

Number of Chebyshev baseline terms.

Type:

int

wl_shift_bounds

Bounds for wavelength shift parameter.

Type:

Tuple[float, float]

ils_sigma_bounds

Bounds for ILS sigma parameter.

Type:

Tuple[float, float]

path_length_bounds

Bounds for path length parameter.

Type:

Tuple[float, float]

Example

>>> from nirs4all.synthesis._constants import get_predefined_components
>>> components = [get_predefined_components()[n] for n in ['water', 'protein']]
>>> fitter = ForwardModelFitter(
...     components=components,
...     canonical_grid=np.linspace(400, 2500, 4200),
...     target_grid=dataset_wavelengths,
... )
>>> result = fitter.fit(spectrum)
>>> print(f"R² = {result['r_squared']:.4f}")
__post_init__()[source]

Pre-compute component spectra on canonical grid.

baseline_order: int = 4
canonical_grid: np.ndarray
components: List['SpectralComponent']
fit(y: ndarray, initial_guess: ndarray | None = None) Dict[str, Any][source]

Fit forward model to target spectrum.

Parameters:
  • y – Target spectrum.

  • initial_guess – Initial [wl_shift, ils_sigma, path_length].

Returns:

  • r_squared: Coefficient of determination

  • fitted: Fitted spectrum

  • residuals: Fitting residuals

  • concentrations: Fitted component concentrations

  • baseline_coeffs: Fitted baseline coefficients

  • wl_shift, ils_sigma, path_length: Instrument params

Return type:

Dict with fitted parameters

ils_sigma_bounds: Tuple[float, float] = (2.0, 15.0)
path_length_bounds: Tuple[float, float] = (0.5, 2.0)
target_grid: np.ndarray
wl_shift_bounds: Tuple[float, float] = (-5.0, 5.0)
class nirs4all.synthesis.fitter.InstrumentChain(wl_shift: float = 0.0, wl_stretch: float = 1.0, ils_sigma: float = 4.0, stray_light: float = 0.001, gain: float = 1.0, offset: float = 0.0)[source]

Bases: object

Forward instrument chain: canonical grid → dataset grid.

Applies the complete measurement chain to transform a high-resolution physical spectrum to the observed instrument grid.

Chain:
  1. Wavelength warp (shift + stretch)

  2. ILS convolution (Gaussian smoothing)

  3. Stray light / gain / offset

  4. Resample to target grid

wl_shift

Wavelength shift in nm.

Type:

float

wl_stretch

Wavelength scale factor.

Type:

float

ils_sigma

Instrument line shape Gaussian sigma in nm.

Type:

float

stray_light

Stray light fraction.

Type:

float

gain

Photometric gain.

Type:

float

offset

Photometric offset.

Type:

float

Example

>>> chain = InstrumentChain(wl_shift=2.0, ils_sigma=5.0)
>>> spectrum_obs = chain.apply(spectrum_phys, canonical_wl, target_wl)
apply(spectrum: ndarray, canonical_wl: ndarray, target_wl: ndarray) ndarray[source]

Apply full instrument chain.

Parameters:
  • spectrum – Input spectrum on canonical grid.

  • canonical_wl – Canonical wavelength grid (nm).

  • target_wl – Target wavelength grid (nm).

Returns:

Transformed spectrum on target grid.

gain: float = 1.0
ils_sigma: float = 4.0
offset: float = 0.0
stray_light: float = 0.001
wl_shift: float = 0.0
wl_stretch: float = 1.0
class nirs4all.synthesis.fitter.InstrumentInference(archetype_name: str = 'unknown', detector_type: str = 'unknown', wavelength_range: Tuple[float, float]=(1000.0, 2500.0), estimated_resolution: float = 8.0, confidence: float = 0.0, alternative_archetypes: Dict[str, float]=<factory>)[source]

Bases: object

Results of instrument archetype inference.

archetype_name

Best matching instrument archetype name.

Type:

str

detector_type

Inferred detector type.

Type:

str

wavelength_range

Detected wavelength range.

Type:

Tuple[float, float]

estimated_resolution

Estimated spectral resolution (nm).

Type:

float

confidence

Confidence score (0-1).

Type:

float

alternative_archetypes

Other possible archetypes with scores.

Type:

Dict[str, float]

alternative_archetypes: Dict[str, float]
archetype_name: str = 'unknown'
confidence: float = 0.0
detector_type: str = 'unknown'
estimated_resolution: float = 8.0
wavelength_range: Tuple[float, float] = (1000.0, 2500.0)
class nirs4all.synthesis.fitter.MeasurementModeInference(value)[source]

Bases: str, Enum

Inferred measurement mode from spectral analysis.

ATR = 'atr'
REFLECTANCE = 'reflectance'
TRANSFLECTANCE = 'transflectance'
TRANSMITTANCE = 'transmittance'
UNKNOWN = 'unknown'
class nirs4all.synthesis.fitter.OperatorVarianceParams(noise_std: float = 0.001, offset_std: float = 0.01, slope_std: float = 0.001, curvature_std: float = 0.0001, mult_scatter_std: float = 0.05)[source]

Bases: object

Parameters for operator-based variance modeling.

Models spectral variation as independent physical sources: - High-frequency noise (detector noise) - Baseline offset/slope/curvature (instrumental drift, scattering) - Multiplicative scatter (sample thickness, optical path variation)

noise_std

Standard deviation of high-frequency noise.

Type:

float

offset_std

Standard deviation of baseline offset.

Type:

float

slope_std

Standard deviation of baseline slope (per 1000nm).

Type:

float

curvature_std

Standard deviation of baseline curvature.

Type:

float

mult_scatter_std

Standard deviation of multiplicative scatter.

Type:

float

curvature_std: float = 0.0001
mult_scatter_std: float = 0.05
noise_std: float = 0.001
offset_std: float = 0.01
slope_std: float = 0.001
to_dict() Dict[str, float][source]

Convert to dictionary.

class nirs4all.synthesis.fitter.OptimizedComponentFitter(wavelengths: ndarray | None = None, priority_categories: List[str] | None = None, max_components: int = 10, baseline_order: int = 4, preprocessing: str | PreprocessingType | None = None, auto_detect_preprocessing: bool = False, sg_window_length: int = 15, sg_polyorder: int = 3, regularization: float = 1e-06, smooth_sigma_nm: float = 30.0, use_nnls: bool = False)[source]

Bases: object

Optimize component selection using greedy search with category prioritization.

Unlike ComponentFitter which fits all components simultaneously with NNLS, this class uses a greedy forward selection approach that:

  1. Starts with baseline-only fit

  2. Greedily adds components from priority categories (low threshold)

  3. Fills remaining slots from other categories (higher threshold)

  4. Applies swap refinement to escape local optima

This approach produces much better fits for real-world data by: - Avoiding overfitting to spurious components - Respecting domain knowledge (e.g., protein for dairy, starch for grains) - Allowing both positive and negative coefficients (OLS, not NNLS)

Example

>>> from nirs4all.synthesis import OptimizedComponentFitter
>>>
>>> # Create fitter for grain analysis
>>> fitter = OptimizedComponentFitter(
...     wavelengths=wavelengths,
...     priority_categories=['carbohydrates', 'proteins', 'water_related'],
...     max_components=10,
... )
>>> result = fitter.fit(spectrum)
>>> print(result.summary())
wavelengths

Wavelength grid for fitting.

priority_categories

Categories to prioritize in component selection.

max_components

Maximum number of components to select.

baseline_order

Polynomial order for baseline (default 4).

preprocessing

Preprocessing to apply to components.

auto_detect_preprocessing

Auto-detect preprocessing from data.

detected_preprocessing: PreprocessingType | None
fit(spectrum: ndarray) OptimizedFitResult[source]

Fit components to a spectrum using greedy category-prioritized selection.

The algorithm: 1. Starts with baseline-only fit 2. Greedily adds components from priority categories (very low threshold: 0.0001) 3. Fills remaining slots from other categories (higher threshold: 0.005) 4. Applies swap refinement (prefers swapping in priority components)

Parameters:

spectrum – Observed spectrum, shape (n_wavelengths,).

Returns:

OptimizedFitResult with fit results.

class nirs4all.synthesis.fitter.OptimizedFitResult(component_names: List[str], concentrations: ndarray, baseline_coefficients: ndarray | None, fitted_spectrum: ndarray, residuals: ndarray, r_squared: float, rmse: float, n_components: int, n_priority_components: int, baseline_r_squared: float, wavelengths: ndarray)[source]

Bases: object

Result from optimized greedy component fitting.

component_names

Names of selected components (in order of selection).

Type:

List[str]

concentrations

Fitted concentrations for each component.

Type:

numpy.ndarray

baseline_coefficients

Polynomial baseline coefficients.

Type:

numpy.ndarray | None

fitted_spectrum

Reconstructed spectrum from fit.

Type:

numpy.ndarray

residuals

Fit residuals.

Type:

numpy.ndarray

r_squared

Coefficient of determination.

Type:

float

rmse

Root mean squared error.

Type:

float

n_components

Number of components selected.

Type:

int

n_priority_components

Number of components from priority categories.

Type:

int

baseline_r_squared

R² from baseline-only fit (for comparison).

Type:

float

wavelengths

Wavelength grid used for fitting.

Type:

numpy.ndarray

baseline_coefficients: ndarray | None
baseline_r_squared: float
component_names: List[str]
concentrations: ndarray
fitted_spectrum: ndarray
n_components: int
n_priority_components: int
r_squared: float
residuals: ndarray
rmse: float
summary() str[source]

Return human-readable summary.

top_components(n: int = 5, threshold: float = 0.001) List[Tuple[str, float]][source]

Get top components by concentration.

wavelengths: ndarray
class nirs4all.synthesis.fitter.PCAVarianceParams(n_components: int = 5, explained_variance_ratio: ndarray | None = None, score_means: ndarray | None = None, score_stds: ndarray | None = None, components: ndarray | None = None, mean_spectrum: ndarray | None = None)[source]

Bases: object

Parameters for PCA-based variance modeling.

Models spectral variation using principal component score distributions.

n_components

Number of PCA components.

Type:

int

explained_variance_ratio

Explained variance per component.

Type:

numpy.ndarray | None

score_means

Mean of PC scores.

Type:

numpy.ndarray | None

score_stds

Std of PC scores.

Type:

numpy.ndarray | None

components

PCA loading vectors (n_components, n_wavelengths).

Type:

numpy.ndarray | None

mean_spectrum

Mean spectrum from PCA.

Type:

numpy.ndarray | None

components: ndarray | None = None
explained_variance_ratio: ndarray | None = None
mean_spectrum: ndarray | None = None
n_components: int = 5
score_means: ndarray | None = None
score_stds: ndarray | None = None
class nirs4all.synthesis.fitter.PreprocessingInference(preprocessing_type: PreprocessingType = PreprocessingType.RAW_ABSORBANCE, confidence: float = 0.0, is_preprocessed: bool = False, global_mean: float = 0.0, global_range: Tuple[float, float] = (0.0, 1.0), zero_crossing_ratio: float = 0.0, per_sample_std_variation: float = 0.0, oscillation_frequency: float = 0.0, suggested_inverse: str | None = None)[source]

Bases: object

Results of preprocessing type inference.

Detects whether spectral data has been preprocessed (derivatives, normalization, centering, etc.) before being provided to the fitter.

This is crucial for generating synthetic data that matches the real data distribution - synthetic spectra should be generated as raw absorbance and then the same preprocessing applied.

preprocessing_type

Detected preprocessing type.

Type:

nirs4all.synthesis.fitter.PreprocessingType

confidence

Confidence score (0-1).

Type:

float

is_preprocessed

Whether data appears to be preprocessed.

Type:

bool

global_mean

Mean value (0 suggests centering/derivatives).

Type:

float

global_range

(min, max) value range.

Type:

Tuple[float, float]

zero_crossing_ratio

Ratio of zero crossings (high for derivatives).

Type:

float

per_sample_std_variation

Variation in per-sample std (low for SNV).

Type:

float

oscillation_frequency

Spectral oscillation frequency (high for 2nd deriv).

Type:

float

suggested_inverse

Suggested inverse operation to recover raw data.

Type:

str | None

confidence: float = 0.0
global_mean: float = 0.0
global_range: Tuple[float, float] = (0.0, 1.0)
is_preprocessed: bool = False
oscillation_frequency: float = 0.0
per_sample_std_variation: float = 0.0
preprocessing_type: PreprocessingType = 'raw_absorbance'
suggested_inverse: str | None = None
zero_crossing_ratio: float = 0.0
class nirs4all.synthesis.fitter.PreprocessingType(value)[source]

Bases: str, Enum

Detected preprocessing type of spectral data.

FIRST_DERIVATIVE = 'first_derivative'
MEAN_CENTERED = 'mean_centered'
MSC_CORRECTED = 'msc_corrected'
NORMALIZED = 'normalized'
RAW_ABSORBANCE = 'raw_absorbance'
RAW_REFLECTANCE = 'raw_reflectance'
SECOND_DERIVATIVE = 'second_derivative'
SNV_CORRECTED = 'snv_corrected'
UNKNOWN = 'unknown'
class nirs4all.synthesis.fitter.RealBandFitResult(band_names: ~typing.List[str], band_centers: ~numpy.ndarray, amplitudes: ~numpy.ndarray, sigmas: ~numpy.ndarray, baseline_coefficients: ~numpy.ndarray, fitted_spectrum: ~numpy.ndarray, residuals: ~numpy.ndarray, r_squared: float, rmse: float, n_bands: int, wavelengths: ~numpy.ndarray, band_assignments: ~typing.List[~typing.Any] = <factory>)[source]

Bases: object

Result from real band fitting using known NIR band assignments.

band_names

Names of fitted bands (e.g., “O-H/1st”, “C-H/combination”).

Type:

List[str]

band_centers

Fixed center wavelengths from NIR_BANDS.

Type:

numpy.ndarray

amplitudes

Fitted amplitudes for each band.

Type:

numpy.ndarray

sigmas

Sigma values (within constrained ranges).

Type:

numpy.ndarray

baseline_coefficients

Polynomial baseline coefficients.

Type:

numpy.ndarray

fitted_spectrum

Reconstructed spectrum from fit.

Type:

numpy.ndarray

residuals

Fit residuals.

Type:

numpy.ndarray

r_squared

Coefficient of determination.

Type:

float

rmse

Root mean squared error.

Type:

float

n_bands

Number of bands used.

Type:

int

wavelengths

Wavelength grid used for fitting.

Type:

numpy.ndarray

band_assignments

Original BandAssignment objects.

Type:

List[Any]

amplitudes: ndarray
band_assignments: List[Any]
band_centers: ndarray
band_names: List[str]
baseline_coefficients: ndarray
fitted_spectrum: ndarray
n_bands: int
r_squared: float
residuals: ndarray
rmse: float
sigmas: ndarray
summary() str[source]

Return human-readable summary.

top_bands(n: int = 10, threshold: float = 0.001) List[Tuple[str, float, float]][source]

Get top bands by amplitude. Returns (name, center, amplitude).

wavelengths: ndarray
class nirs4all.synthesis.fitter.RealBandFitter(baseline_order: int = 4, max_bands: int = 50, target_r2: float = 0.98, allow_sigma_variation: bool = True, sigma_margin: float = 0.3, n_iterations: int = 3)[source]

Bases: object

Fit spectra using REAL NIR band assignments from the _bands.py dictionary.

Unlike pure Gaussian band fitting which optimizes band centers freely, this class uses: - Fixed band centers from known spectroscopic literature assignments - Constrained sigma values based on typical ranges for each band type - Only amplitude optimization (more physically interpretable)

This provides spectroscopically meaningful decomposition that can be linked back to functional groups (O-H, C-H, N-H, etc.) and overtone levels.

Example

>>> from nirs4all.synthesis import RealBandFitter
>>>
>>> fitter = RealBandFitter(baseline_order=4, max_bands=40)
>>> result = fitter.fit(spectrum, wavelengths)
>>> print(result.summary())
>>>
>>> # See which functional groups contribute
>>> for name, center, amp in result.top_bands(10):
...     print(f"{center:.0f} nm: {name} (amplitude={amp:.4f})")
baseline_order

Polynomial baseline order.

max_bands

Maximum number of bands to use.

target_r2

Target R² for iterative refinement.

allow_sigma_variation

Allow sigma to vary within literature ranges.

sigma_margin

How much sigma can vary from midpoint (0.3 = ±30%).

fit(spectrum: ndarray, wavelengths: ndarray) RealBandFitResult[source]

Fit spectrum using real NIR band positions.

Parameters:
  • spectrum – Target spectrum to fit, shape (n_wavelengths,).

  • wavelengths – Wavelengths in nm, shape (n_wavelengths,).

Returns:

RealBandFitResult with fit results and band assignments.

class nirs4all.synthesis.fitter.RealDataFitter[source]

Bases: object

Fit generator parameters to match real dataset properties.

This class analyzes real NIRS data and estimates parameters for the SyntheticNIRSGenerator to produce similar spectra. Includes Phase 1-4 enhanced inference for instruments, domains, and effects.

source_properties

SpectralProperties of the analyzed data.

Type:

nirs4all.synthesis.fitter.SpectralProperties | None

fitted_params

FittedParameters after fitting.

Type:

nirs4all.synthesis.fitter.FittedParameters | None

Example

>>> fitter = RealDataFitter()
>>> params = fitter.fit(X_real, wavelengths=wavelengths)
>>>
>>> # Access inferred characteristics
>>> print(f"Instrument: {params.inferred_instrument}")
>>> print(f"Domain: {params.inferred_domain}")
>>>
>>> # Create matched generator
>>> generator = fitter.create_matched_generator()
>>> X_synth, _, _ = generator.generate(1000)
apply_matching_preprocessing(X: ndarray, *, window_length: int = 15, polyorder: int = 2) ndarray[source]

Apply preprocessing to match the detected preprocessing of real data.

If the real data was detected as preprocessed (e.g., second derivative), this method applies the same preprocessing to synthetic raw absorbance spectra so they match the real data distribution.

Parameters:
  • X – Raw absorbance spectra from generator (n_samples, n_wavelengths).

  • window_length – Savitzky-Golay window length for derivatives.

  • polyorder – Polynomial order for Savitzky-Golay filter.

Returns:

Preprocessed spectra matching the real data type.

Raises:

RuntimeError – If fit() hasn’t been called.

Example

>>> fitter = RealDataFitter()
>>> params = fitter.fit(X_real, wavelengths=wl)
>>> generator = fitter.create_matched_generator()
>>> X_raw, _, _ = generator.generate(1000)
>>> X_matched = fitter.apply_matching_preprocessing(X_raw)
create_matched_generator(random_state: int | None = None) SyntheticNIRSGenerator[source]

Create a SyntheticNIRSGenerator configured to match the fitted data.

This method creates a generator with all fitted parameters including Phase 1-4 enhanced features (instrument, domain, effects).

Parameters:

random_state – Random seed for reproducibility.

Returns:

Configured SyntheticNIRSGenerator instance.

Raises:

RuntimeError – If fit() hasn’t been called.

Example

>>> fitter = RealDataFitter()
>>> params = fitter.fit(X_real, wavelengths=wavelengths)
>>> generator = fitter.create_matched_generator(random_state=42)
>>> X_synth, _, _ = generator.generate(1000)
evaluate_similarity(X_synthetic: ndarray, wavelengths: ndarray | None = None) Dict[str, Any][source]

Evaluate similarity between synthetic and source data.

Computes various metrics comparing synthetic spectra to the original real data.

Parameters:
  • X_synthetic – Synthetic spectra matrix.

  • wavelengths – Optional wavelength grid.

Returns:

Dictionary with similarity metrics.

Raises:

RuntimeError – If fit() hasn’t been called.

Example

>>> params = fitter.fit(X_real)
>>> X_synth, _, _ = generator.generate(1000)
>>> metrics = fitter.evaluate_similarity(X_synth)
>>> print(f"Similarity: {metrics['overall_score']:.1f}/100")
fit(X: np.ndarray | 'SpectroDataset', *, wavelengths: np.ndarray | None = None, name: str = 'source', infer_instrument: bool = True, infer_domain: bool = True, infer_measurement_mode: bool = True, infer_environmental: bool = True, infer_scattering: bool = True, infer_edge_artifacts: bool = True, infer_preprocessing: bool = True) FittedParameters[source]

Fit generator parameters to real data.

Analyzes the input data and estimates optimal parameters for generating synthetic spectra with similar properties. Includes Phase 1-6 enhanced inference.

Parameters:
  • X – Real spectra matrix (n_samples, n_wavelengths) or SpectroDataset.

  • wavelengths – Wavelength grid (required if X is ndarray).

  • name – Dataset name for reference.

  • infer_instrument – Whether to infer instrument archetype.

  • infer_domain – Whether to infer application domain.

  • infer_measurement_mode – Whether to infer measurement mode.

  • infer_environmental – Whether to infer environmental effects.

  • infer_scattering – Whether to infer scattering parameters.

  • infer_edge_artifacts – Whether to infer edge artifact effects.

  • infer_preprocessing – Whether to detect preprocessing type.

Returns:

FittedParameters object with estimated parameters.

Raises:

ValueError – If X is empty or has wrong shape.

Example

>>> fitter = RealDataFitter()
>>> params = fitter.fit(X_real, wavelengths=wl, name="wheat")
>>> print(params.summary())
fit_from_path(path: str, *, name: str | None = None) FittedParameters[source]

Fit parameters from a dataset path.

Loads data using DatasetConfigs and fits parameters.

Parameters:
  • path – Path to dataset folder.

  • name – Optional name override.

Returns:

FittedParameters object.

Example

>>> params = fitter.fit_from_path("sample_data/regression")
fitted_params: FittedParameters | None
get_tuning_recommendations() List[str][source]

Get recommendations for tuning generation parameters.

Based on the fitted parameters and source data, provides suggestions for manual tuning.

Returns:

List of recommendation strings.

Example

>>> params = fitter.fit(X_real)
>>> for rec in fitter.get_tuning_recommendations():
...     print(f"- {rec}")
source_properties: SpectralProperties | None
class nirs4all.synthesis.fitter.ScatteringInference(has_scatter_effects: bool = False, estimated_particle_size_um: float = 50.0, multiplicative_scatter_std: float = 0.0, additive_scatter_std: float = 0.0, baseline_curvature: float = 0.0, snv_correctable: bool = False, msc_correctable: bool = False)[source]

Bases: object

Results of scattering effects inference.

has_scatter_effects

Whether significant scatter is detected.

Type:

bool

estimated_particle_size_um

Estimated mean particle size (μm).

Type:

float

multiplicative_scatter_std

Estimated MSC-style multiplicative scatter.

Type:

float

additive_scatter_std

Estimated SNV-style additive scatter.

Type:

float

baseline_curvature

Detected baseline curvature intensity.

Type:

float

snv_correctable

Whether SNV would improve spectra.

Type:

bool

msc_correctable

Whether MSC would improve spectra.

Type:

bool

additive_scatter_std: float = 0.0
baseline_curvature: float = 0.0
estimated_particle_size_um: float = 50.0
has_scatter_effects: bool = False
msc_correctable: bool = False
multiplicative_scatter_std: float = 0.0
snv_correctable: bool = False
class nirs4all.synthesis.fitter.SpectralProperties(name: str = 'dataset', n_samples: int = 0, n_wavelengths: int = 0, wavelengths: ndarray | None = None, mean_spectrum: ndarray | None = None, std_spectrum: ndarray | None = None, global_mean: float = 0.0, global_std: float = 0.0, global_range: Tuple[float, float] = (0.0, 0.0), mean_slope: float = 0.0, slope_std: float = 0.0, slopes: ndarray | None = None, mean_curvature: float = 0.0, curvature_std: float = 0.0, skewness: float = 0.0, kurtosis: float = 0.0, noise_estimate: float = 0.0, snr_estimate: float = 0.0, pca_explained_variance: ndarray | None = None, pca_n_components_95: int = 0, n_peaks_mean: float = 0.0, peak_positions: ndarray | None = None, peak_wavenumbers: ndarray | None = None, effective_resolution: float = 8.0, noise_correlation_length: float = 1.0, wavelength_range: Tuple[float, float] = (1000.0, 2500.0), baseline_offset: float = 0.0, kubelka_munk_linearity: float = 0.0, baseline_convexity: float = 0.0, water_band_variation: float = 0.0, oh_band_positions: ndarray | None = None, temperature_sensitivity_score: float = 0.0, scatter_baseline_slope: float = 0.0, scatter_baseline_curvature: float = 0.0, sample_to_sample_offset_std: float = 0.0, sample_to_sample_slope_std: float = 0.0, protein_band_intensity: float = 0.0, carbohydrate_band_intensity: float = 0.0, lipid_band_intensity: float = 0.0, water_band_intensity: float = 0.0, left_edge_noise_std: float = 0.0, right_edge_noise_std: float = 0.0, center_noise_std: float = 0.0, left_edge_slope: float = 0.0, right_edge_slope: float = 0.0, edge_curvature_intensity: float = 0.0, edge_curvature_asymmetry: float = 0.0, has_boundary_rise_left: bool = False, has_boundary_rise_right: bool = False)[source]

Bases: object

Container for computed spectral properties of a dataset.

This dataclass holds various statistical and spectral properties computed from a NIRS dataset for comparison and fitting purposes.

name

Dataset identifier.

Type:

str

n_samples

Number of samples.

Type:

int

n_wavelengths

Number of wavelengths.

Type:

int

wavelengths

Wavelength grid.

Type:

numpy.ndarray | None

# Basic statistics
mean_spectrum

Mean spectrum across samples.

Type:

numpy.ndarray | None

std_spectrum

Standard deviation spectrum.

Type:

numpy.ndarray | None

global_mean

Overall mean absorbance.

Type:

float

global_std

Overall standard deviation.

Type:

float

global_range

(min, max) absorbance range.

Type:

Tuple[float, float]

# Shape properties
mean_slope

Average spectral slope (per 1000nm).

Type:

float

slope_std

Standard deviation of slopes.

Type:

float

mean_curvature

Average curvature (second derivative).

Type:

float

# Distribution statistics
skewness

Skewness of absorbance distribution.

Type:

float

kurtosis

Kurtosis of absorbance distribution.

Type:

float

# Noise characteristics
noise_estimate

Estimated noise level.

Type:

float

snr_estimate

Signal-to-noise ratio estimate.

Type:

float

# PCA properties
pca_explained_variance

Explained variance ratios.

Type:

numpy.ndarray | None

pca_n_components_95

Components for 95% variance.

Type:

int

# Peak analysis
n_peaks_mean

Mean number of peaks.

Type:

float

peak_positions

Wavelengths of detected peaks.

Type:

numpy.ndarray | None

peak_wavenumbers

Wavenumber positions of peaks.

Type:

numpy.ndarray | None

# Phase 1-4 Enhanced properties
# Instrument indicators
effective_resolution

Estimated spectral resolution from peak widths.

Type:

float

noise_correlation_length

Correlation length of noise (detector indicator).

Type:

float

wavelength_range

Actual wavelength range of data.

Type:

Tuple[float, float]

# Measurement mode indicators
baseline_offset

Mean baseline offset (transmittance indicator).

Type:

float

kubelka_munk_linearity

K-M linearity score (reflectance indicator).

Type:

float

baseline_convexity

Convexity of baseline (ATR indicator).

Type:

float

# Environmental indicators
water_band_variation

Variation in water band region.

Type:

float

oh_band_positions

Detected O-H band positions.

Type:

numpy.ndarray | None

temperature_sensitivity_score

Score for temperature effect detection.

Type:

float

# Scattering indicators
scatter_baseline_slope

Wavelength-dependent scatter slope.

Type:

float

scatter_baseline_curvature

Curvature from scattering.

Type:

float

sample_to_sample_offset_std

Sample-to-sample offset variation.

Type:

float

sample_to_sample_slope_std

Sample-to-sample slope variation.

Type:

float

# Domain indicators
protein_band_intensity

Intensity in protein band regions.

Type:

float

carbohydrate_band_intensity

Intensity in carbohydrate regions.

Type:

float

lipid_band_intensity

Intensity in lipid band regions.

Type:

float

water_band_intensity

Intensity in water band regions.

Type:

float

baseline_convexity: float = 0.0
baseline_offset: float = 0.0
carbohydrate_band_intensity: float = 0.0
center_noise_std: float = 0.0
curvature_std: float = 0.0
edge_curvature_asymmetry: float = 0.0
edge_curvature_intensity: float = 0.0
effective_resolution: float = 8.0
global_mean: float = 0.0
global_range: Tuple[float, float] = (0.0, 0.0)
global_std: float = 0.0
has_boundary_rise_left: bool = False
has_boundary_rise_right: bool = False
kubelka_munk_linearity: float = 0.0
kurtosis: float = 0.0
left_edge_noise_std: float = 0.0
left_edge_slope: float = 0.0
lipid_band_intensity: float = 0.0
mean_curvature: float = 0.0
mean_slope: float = 0.0
mean_spectrum: ndarray | None = None
n_peaks_mean: float = 0.0
n_samples: int = 0
n_wavelengths: int = 0
name: str = 'dataset'
noise_correlation_length: float = 1.0
noise_estimate: float = 0.0
oh_band_positions: ndarray | None = None
pca_explained_variance: ndarray | None = None
pca_n_components_95: int = 0
peak_positions: ndarray | None = None
peak_wavenumbers: ndarray | None = None
protein_band_intensity: float = 0.0
right_edge_noise_std: float = 0.0
right_edge_slope: float = 0.0
sample_to_sample_offset_std: float = 0.0
sample_to_sample_slope_std: float = 0.0
scatter_baseline_curvature: float = 0.0
scatter_baseline_slope: float = 0.0
skewness: float = 0.0
slope_std: float = 0.0
slopes: ndarray | None = None
snr_estimate: float = 0.0
std_spectrum: ndarray | None = None
temperature_sensitivity_score: float = 0.0
water_band_intensity: float = 0.0
water_band_variation: float = 0.0
wavelength_range: Tuple[float, float] = (1000.0, 2500.0)
wavelengths: ndarray | None = None
class nirs4all.synthesis.fitter.VarianceFitResult(operator_params: OperatorVarianceParams, pca_params: PCAVarianceParams, n_samples: int = 0, wavelengths: ndarray | None = None)[source]

Bases: object

Combined result from variance fitting.

operator_params

Operator-based variance parameters.

Type:

nirs4all.synthesis.fitter.OperatorVarianceParams

pca_params

PCA-based variance parameters.

Type:

nirs4all.synthesis.fitter.PCAVarianceParams

n_samples

Number of samples used for fitting.

Type:

int

wavelengths

Wavelength grid.

Type:

numpy.ndarray | None

n_samples: int = 0
operator_params: OperatorVarianceParams
pca_params: PCAVarianceParams
summary() str[source]

Return human-readable summary.

wavelengths: ndarray | None = None
class nirs4all.synthesis.fitter.VarianceFitter(n_pca_components: int = 10)[source]

Bases: object

Fit variance parameters from real spectra.

Provides two complementary methods for modeling spectral variation: - Operator-based: Independent physical sources (noise, scatter, baseline) - PCA-based: Correlated variations capturing the covariance structure

Example

>>> from nirs4all.synthesis import VarianceFitter
>>>
>>> fitter = VarianceFitter()
>>> result = fitter.fit(X_real, wavelengths)
>>>
>>> # Use operator-based params for generation
>>> print(f"Noise level: {result.operator_params.noise_std:.6f}")
>>>
>>> # Generate synthetic variance using PCA
>>> X_variance = fitter.generate_pca_variance(n_samples=100, random_state=42)
fit(X: ndarray, wavelengths: ndarray | None = None) VarianceFitResult[source]

Fit variance parameters from real spectra.

Parameters:
  • X – Real spectra matrix (n_samples, n_wavelengths).

  • wavelengths – Wavelength array (nm).

Returns:

VarianceFitResult with both operator and PCA parameters.

generate_operator_variance(base_spectrum: ndarray, wavelengths: ndarray, n_samples: int = 100, random_state: int | None = None) ndarray[source]

Generate synthetic spectra using operator-based variance.

Parameters:
  • base_spectrum – Mean/fitted spectrum to add variance to.

  • wavelengths – Wavelength array.

  • n_samples – Number of samples to generate.

  • random_state – Random seed.

Returns:

Array of synthetic spectra (n_samples, n_wavelengths).

generate_pca_variance(n_samples: int = 100, n_components: int | None = None, random_state: int | None = None) ndarray[source]

Generate synthetic spectra using PCA-based variance.

Parameters:
  • n_samples – Number of samples to generate.

  • n_components – Number of PCA components to use (None = all).

  • random_state – Random seed.

Returns:

Array of synthetic spectra (n_samples, n_wavelengths).

nirs4all.synthesis.fitter.compare_datasets(X_synthetic: ndarray, X_real: ndarray, wavelengths: ndarray | None = None) Dict[str, Any][source]

Quick comparison between synthetic and real datasets.

Parameters:
  • X_synthetic – Synthetic spectra.

  • X_real – Real spectra.

  • wavelengths – Wavelength grid.

Returns:

Dictionary with comparison metrics.

Example

>>> metrics = compare_datasets(X_synth, X_real)
>>> print(f"Similarity: {metrics['overall_score']:.1f}/100")
nirs4all.synthesis.fitter.compute_spectral_properties(X: ndarray, wavelengths: ndarray | None = None, name: str = 'dataset', n_pca_components: int = 20) SpectralProperties[source]

Compute comprehensive spectral properties of a dataset.

Analyzes a matrix of spectra to extract statistical and spectral properties useful for fitting and comparison. Includes Phase 1-4 enhanced properties for instrument, mode, domain, and effect inference.

Parameters:
  • X – Spectra matrix (n_samples, n_wavelengths).

  • wavelengths – Optional wavelength grid.

  • name – Dataset identifier.

  • n_pca_components – Maximum PCA components to compute.

Returns:

SpectralProperties with computed metrics.

Example

>>> props = compute_spectral_properties(X_real, wavelengths)
>>> print(f"Mean slope: {props.mean_slope:.4f}")
>>> print(f"Inferred resolution: {props.effective_resolution:.1f} nm")
nirs4all.synthesis.fitter.fit_components(spectrum: ndarray, wavelengths: ndarray, component_names: List[str] | None = None, fit_baseline: bool = True, baseline_order: int = 2, method: str = 'nnls', preprocessing: str | PreprocessingType | None = None, auto_detect_preprocessing: bool = False) ComponentFitResult[source]

Convenience function to fit components to a spectrum.

Parameters:
  • spectrum – Observed spectrum.

  • wavelengths – Wavelength grid.

  • component_names – Components to fit (None = all available).

  • fit_baseline – Include polynomial baseline.

  • baseline_order – Polynomial order for baseline.

  • method – Fitting method (“nnls” or “lsq”).

  • preprocessing – Preprocessing to apply to components (e.g., “second_derivative”). Use this when fitting preprocessed data.

  • auto_detect_preprocessing – If True, automatically detect preprocessing type from the data. This is useful for derivative data where the preprocessing type is unknown. Takes precedence over preprocessing if set.

Returns:

ComponentFitResult with fit results.

Example

>>> # Fit raw absorbance data
>>> result = fit_components(spectrum, wavelengths, ["water", "protein", "lipid"])
>>>
>>> # Fit second derivative data
>>> result = fit_components(
...     deriv_spectrum, wavelengths, ["water", "protein"],
...     preprocessing="second_derivative"
... )
>>>
>>> # Auto-detect preprocessing (recommended for unknown data)
>>> result = fit_components(
...     unknown_spectrum, wavelengths,
...     auto_detect_preprocessing=True
... )
nirs4all.synthesis.fitter.fit_components_optimized(spectrum: ndarray, wavelengths: ndarray, priority_categories: List[str] | None = None, max_components: int = 10, baseline_order: int = 4, preprocessing: str | PreprocessingType | None = None, auto_detect_preprocessing: bool = False, smooth_sigma_nm: float = 30.0, use_nnls: bool = False) OptimizedFitResult[source]

Convenience function for optimized component fitting.

Uses greedy category-prioritized selection for better fits than NNLS.

Parameters:
  • spectrum – Observed spectrum.

  • wavelengths – Wavelength grid.

  • priority_categories – Categories to prioritize (e.g., [‘carbohydrates’, ‘proteins’]).

  • max_components – Maximum components to select.

  • baseline_order – Polynomial baseline order.

  • preprocessing – Preprocessing type (‘first_derivative’, ‘second_derivative’, etc.).

  • auto_detect_preprocessing – Auto-detect preprocessing from data.

  • smooth_sigma_nm – Gaussian smoothing sigma in nm to broaden component spectra.

  • use_nnls – Use non-negative least squares instead of OLS.

Returns:

OptimizedFitResult with fit results.

Example

>>> result = fit_components_optimized(
...     spectrum, wavelengths,
...     priority_categories=['carbohydrates', 'proteins'],
...     auto_detect_preprocessing=True,
... )
>>> print(f"R² = {result.r_squared:.4f}")
nirs4all.synthesis.fitter.fit_real_bands(spectrum: ndarray, wavelengths: ndarray, baseline_order: int = 4, max_bands: int = 50, target_r2: float = 0.98, allow_sigma_variation: bool = True) RealBandFitResult[source]

Convenience function for fitting spectrum using real NIR band assignments.

Uses known band positions from the NIR_BANDS dictionary for physically meaningful spectral decomposition.

Parameters:
  • spectrum – Observed spectrum.

  • wavelengths – Wavelength grid in nm.

  • baseline_order – Polynomial baseline order.

  • max_bands – Maximum number of bands to use.

  • target_r2 – Target R² for early stopping.

  • allow_sigma_variation – Allow sigma to vary within constrained ranges.

Returns:

RealBandFitResult with fit results.

Example

>>> result = fit_real_bands(spectrum, wavelengths)
>>> print(f"R² = {result.r_squared:.4f}")
>>> for name, center, amp in result.top_bands(5):
...     print(f"{center:.0f} nm: {name}")
nirs4all.synthesis.fitter.fit_to_real_data(X: np.ndarray | 'SpectroDataset', wavelengths: np.ndarray | None = None, name: str = 'source') FittedParameters[source]

Quick function to fit parameters to real data.

Convenience function for simple fitting use cases.

Parameters:
  • X – Real spectra or SpectroDataset.

  • wavelengths – Wavelength grid.

  • name – Dataset name.

Returns:

FittedParameters object.

Example

>>> params = fit_to_real_data(X_real, wavelengths)
>>> generator = SyntheticNIRSGenerator(**params.to_generator_kwargs())
nirs4all.synthesis.fitter.fit_variance(X: ndarray, wavelengths: ndarray | None = None, n_pca_components: int = 10) VarianceFitResult[source]

Convenience function to fit variance parameters from real spectra.

Parameters:
  • X – Real spectra matrix (n_samples, n_wavelengths).

  • wavelengths – Wavelength array (nm).

  • n_pca_components – Number of PCA components to fit.

Returns:

VarianceFitResult with fitted parameters.

Example

>>> result = fit_variance(X_real, wavelengths)
>>> print(f"Noise level: {result.operator_params.noise_std:.6f}")
nirs4all.synthesis.fitter.multiscale_derivative_fit(fitter: DerivativeAwareForwardModelFitter, y_deriv: ndarray, scales: List[float] | None = None) Dict[str, Any][source]

Multiscale fitting curriculum for derivative spectra.

Fits coarse features first by smoothing the derivative target, then progressively reduces smoothing. Particularly important for derivative data which can have high-frequency noise.

Parameters:
  • fitter – DerivativeAwareForwardModelFitter instance.

  • y_deriv – Target derivative spectrum.

  • scales – List of Gaussian sigma values. Default: [15, 8, 4, 0].

Returns:

Final fit result dict.

Example

>>> result = multiscale_derivative_fit(fitter, deriv_spectrum)
nirs4all.synthesis.fitter.multiscale_fit(fitter: ForwardModelFitter, y: ndarray, scales: List[float] | None = None) Dict[str, Any][source]

Multiscale fitting curriculum for raw spectra.

Fits coarse features first by smoothing the target, then progressively reduces smoothing to capture finer details. This improves optimization stability and avoids local minima.

Parameters:
  • fitter – ForwardModelFitter instance.

  • y – Target spectrum.

  • scales – List of Gaussian sigma values for progressive smoothing. Default: [20, 10, 5, 0].

Returns:

Final fit result dict.

Example

>>> result = multiscale_fit(fitter, spectrum, scales=[20, 10, 5, 0])