nirs4all.synthesis.fitter module

Real data fitting utilities for synthetic NIRS spectra generation.

This module provides tools to analyze real NIRS datasets and fit generator parameters to match their statistical and spectral properties.

Key Features:

Statistical property analysis (mean, std, skewness, kurtosis)
Spectral shape analysis (slope, curvature, noise)
PCA structure analysis
Parameter estimation for SyntheticNIRSGenerator
Comparison between synthetic and real data
Phase 1-4 Enhanced Features:
- Instrument archetype inference (InGaAs, PbS, MEMS, etc.)
- Measurement mode detection (transmittance, reflectance, ATR)
- Application domain suggestion (agriculture, pharmaceutical, etc.)
- Environmental effects estimation (temperature, moisture)
- Scattering parameter estimation (particle size, EMSC)
- Wavenumber-based peak analysis for component identification

Example

>>> from nirs4all.synthesis import RealDataFitter, SyntheticNIRSGenerator
>>>
>>> # Analyze real data
>>> fitter = RealDataFitter()
>>> params = fitter.fit(X_real, wavelengths=wavelengths)
>>>
>>> # Create generator with fitted parameters (includes all Phase 1-4 features)
>>> generator = fitter.create_matched_generator()
>>> X_synthetic, _, _ = generator.generate(n_samples=1000)
>>>
>>> # Or get all inferred characteristics
>>> print(f"Inferred instrument: {params.inferred_instrument}")
>>> print(f"Inferred domain: {params.inferred_domain}")
>>> print(f"Measurement mode: {params.measurement_mode}")

References

Based on comparator.py from bench/synthetic/
Enhanced with Phase 1-4 synthetic generator features

class nirs4all.synthesis.fitter.ComponentFitResult(component_names: List[str], concentrations: ndarray, baseline_coefficients: ndarray | None, fitted_spectrum: ndarray, residuals: ndarray, r_squared: float, rmse: float, wavelengths: ndarray | None = None)[source]

Bases: object

Result of fitting spectral components to an observed spectrum.

component_names

Names of components used in fitting.

Type:: List[str]

concentrations

Estimated concentration for each component.

Type:: numpy.ndarray

baseline_coefficients

Polynomial baseline coefficients (if fit_baseline=True).

Type:: numpy.ndarray | None

fitted_spectrum

Reconstructed spectrum from fit.

Type:: numpy.ndarray

residuals

Difference between observed and fitted spectra.

Type:: numpy.ndarray

r_squared

R² goodness-of-fit metric.

Type:: float

rmse

Root mean squared error of fit.

Type:: float

wavelengths

Wavelength grid used for fitting.

Type:: numpy.ndarray | None

baseline_coefficients: ndarray | None

component_names: List[str]

concentrations: ndarray

fitted_spectrum: ndarray

r_squared: float

residuals: ndarray

rmse: float

summary() → str[source]: Return human-readable summary of fit results.

to_dict() → Dict[str, float][source]: Return concentrations as a dictionary.

top_components(n: int = 5, threshold: float = 0.0) → List[Tuple[str, float]][source]

Get top N components by concentration.

Parameters:

n – Maximum number of components to return.
threshold – Minimum concentration threshold.

Returns:

List of (component_name, concentration) tuples, sorted descending.

wavelengths: ndarray | None = None

class nirs4all.synthesis.fitter.ComponentFitter(component_names: List[str] | None = None, wavelengths: ndarray | None = None, fit_baseline: bool = True, baseline_order: int = 2, preprocessing: str | PreprocessingType | None = None, auto_detect_preprocessing: bool = False, sg_window_length: int = 15, sg_polyorder: int = 2)[source]

Bases: object

Fit linear combinations of spectral components to observed spectra.

Solves: spectrum ≈ Σ(c_i * component_i(λ)) + baseline

Uses non-negative least squares (NNLS) to ensure positive concentrations, which is physically meaningful for spectroscopic analysis.

Preprocessing Support: If your observed spectra are preprocessed (e.g., second derivative, SNV), use the preprocessing parameter to apply the same transformation to component spectra before fitting.

Auto-detection: Set auto_detect_preprocessing=True to automatically detect the preprocessing type from the data (recommended for derivative data).

Example

>>> from nirs4all.synthesis import ComponentFitter
>>>
>>> # Fit with all available components
>>> fitter = ComponentFitter(wavelengths=np.arange(1000, 2500, 2))
>>> result = fitter.fit(observed_spectrum)
>>> print(result.summary())
>>>
>>> # Fit preprocessed data (e.g., second derivative)
>>> fitter = ComponentFitter(
...     component_names=["water", "protein", "lipid"],
...     wavelengths=wavelengths,
...     preprocessing="second_derivative",  # Components will be transformed
... )
>>> result = fitter.fit(derivative_spectrum)
>>>
>>> # Auto-detect preprocessing (recommended for unknown data)
>>> fitter = ComponentFitter(
...     wavelengths=wavelengths,
...     auto_detect_preprocessing=True,  # Will detect derivative, SNV, etc.
... )
>>> result = fitter.fit(unknown_spectrum)

component_names: List of component names to fit.

wavelengths: Wavelength grid for fitting.

fit_baseline: Whether to include polynomial baseline.

baseline_order: Polynomial order for baseline (default 2).

preprocessing: Preprocessing to apply to components before fitting.

auto_detect_preprocessing: If True, detect preprocessing from data.

detected_preprocessing

The detected preprocessing type (after first fit).

Type:: nirs4all.synthesis.fitter.PreprocessingType | None

detected_preprocessing: PreprocessingType | None

fit(spectrum: ndarray, method: str = 'nnls') → ComponentFitResult[source]

Fit components to a single spectrum.

Parameters:

spectrum – Observed spectrum, shape (n_wavelengths,).
method – Fitting method. - “nnls”: Non-negative least squares (default, physically meaningful). - “lsq”: Unconstrained least squares (allows negative concentrations).

Returns:

ComponentFitResult with concentrations, residuals, and fit quality metrics.

Example

>>> result = fitter.fit(observed_spectrum)
>>> print(f"R² = {result.r_squared:.4f}")
>>> print(f"Top components: {result.top_components(3)}")

fit_batch(spectra: ndarray, method: str = 'nnls', n_jobs: int = -1) → List[ComponentFitResult][source]

Fit components to multiple spectra in parallel.

Parameters:

spectra – Observed spectra, shape (n_samples, n_wavelengths).
method – Fitting method (“nnls” or “lsq”).
n_jobs – Number of parallel jobs (-1 = all cores, 1 = sequential).

Returns:

List of ComponentFitResult objects.

Example

>>> results = fitter.fit_batch(X_observed, n_jobs=4)
>>> mean_r2 = np.mean([r.r_squared for r in results])
>>> print(f"Mean R² = {mean_r2:.4f}")

get_concentration_matrix(spectra: ndarray, method: str = 'nnls', n_jobs: int = -1) → Tuple[ndarray, List[str]][source]

Get concentration matrix for batch of spectra.

Convenience method that extracts just the concentrations.

Parameters:

spectra – Observed spectra, shape (n_samples, n_wavelengths).
method – Fitting method (“nnls” or “lsq”).
n_jobs – Number of parallel jobs.

Returns:

concentrations: Array of shape (n_samples, n_components)
component_names: List of component names

Return type:

Tuple of

Example

>>> C, names = fitter.get_concentration_matrix(X_observed)
>>> water_idx = names.index("water")
>>> water_concentrations = C[:, water_idx]

suggest_components(spectrum: ndarray, top_n: int = 5, threshold: float = 0.01, method: str = 'nnls') → List[Tuple[str, float]][source]

Suggest which components are likely present in a spectrum.

Performs a fit and returns the top components by concentration.

Parameters:

spectrum – Observed spectrum, shape (n_wavelengths,).
top_n – Maximum number of components to return.
threshold – Minimum concentration threshold.
method – Fitting method (“nnls” or “lsq”).

Returns:

List of (component_name, estimated_concentration) tuples, sorted by concentration descending.

Example

>>> suggestions = fitter.suggest_components(unknown_spectrum)
>>> print("Likely components:")
>>> for name, conc in suggestions:
...     print(f"  {name}: {conc:.3f}")

class nirs4all.synthesis.fitter.DerivativeAwareForwardModelFitter(components: List['SpectralComponent'], canonical_grid: np.ndarray, target_grid: np.ndarray, derivative_order: int = 1, sg_window: int = 15, sg_polyorder: int = 2, baseline_order: int = 6, wl_shift_bounds: Tuple[float, float] = (-5.0, 5.0), ils_sigma_bounds: Tuple[float, float] = (2.0, 15.0), path_length_bounds: Tuple[float, float] = (0.5, 2.0))[source]

Bases: object

Forward model fitter for derivative-preprocessed datasets.

Key principle: Never fit derivative spectra by adding narrow bands. Instead:

Fit latent physical model (raw absorbance)

Apply derivative preprocessing to model output

Compare in derivative space

This ensures concentrations remain physically interpretable without oscillatory artifacts from narrow compensating peaks.

components

List of SpectralComponent objects.

Type:: List[‘SpectralComponent’]

canonical_grid

High-resolution canonical wavelength grid.

Type:: np.ndarray

target_grid

Target wavelength grid (dataset grid).

Type:: np.ndarray

derivative_order

1 for first derivative, 2 for second.

Type:: int

sg_window

Savitzky-Golay window length.

Type:: int

sg_polyorder

Savitzky-Golay polynomial order.

Type:: int

baseline_order

Number of Chebyshev baseline terms.

Type:: int

Example

>>> fitter = DerivativeAwareForwardModelFitter(
...     components=components,
...     canonical_grid=canonical_wl,
...     target_grid=dataset_wl,
...     derivative_order=1,  # First derivative
... )
>>> result = fitter.fit(derivative_spectrum)
>>> print(f"R² = {result['r_squared']:.4f}")

__post_init__()[source]: Pre-compute component spectra on canonical grid.

baseline_order: int = 6

canonical_grid: np.ndarray

components: List['SpectralComponent']

derivative_order: int = 1

fit(y_deriv: ndarray, initial_guess: ndarray | None = None) → Dict[str, Any][source]

Fit forward model to derivative spectrum.

Parameters:

y_deriv – Target spectrum (already derivative-preprocessed).
initial_guess – Initial [wl_shift, ils_sigma, path_length].

Returns:

r_squared: Coefficient of determination
fitted_deriv: Fitted derivative spectrum
fitted_raw: Reconstructed raw spectrum
residuals_deriv: Fitting residuals
concentrations: Fitted component concentrations
baseline_coeffs: Fitted baseline coefficients
wl_shift, ils_sigma, path_length: Instrument params

Return type:

Dict with fitted parameters

ils_sigma_bounds: Tuple[float, float] = (2.0, 15.0)

path_length_bounds: Tuple[float, float] = (0.5, 2.0)

sg_polyorder: int = 2

sg_window: int = 15

target_grid: np.ndarray

wl_shift_bounds: Tuple[float, float] = (-5.0, 5.0)

class nirs4all.synthesis.fitter.DomainInference(domain_name: str = 'unknown', category: str = 'unknown', confidence: float = 0.0, detected_components: List[str] = <factory>, alternative_domains: Dict[str, float]=<factory>)[source]

Bases: object

Results of application domain inference.

domain_name

Best matching domain name.

Type:: str

category

Domain category.

Type:: str

confidence

Confidence score (0-1).

Type:: float

detected_components

Components detected from peak analysis.

Type:: List[str]

alternative_domains

Other possible domains with scores.

Type:: Dict[str, float]

alternative_domains: Dict[str, float]

category: str = 'unknown'

confidence: float = 0.0

detected_components: List[str]

domain_name: str = 'unknown'

class nirs4all.synthesis.fitter.EdgeArtifactInference(has_edge_artifacts: bool = False, has_detector_rolloff: bool = False, has_stray_light: bool = False, has_truncated_peaks: bool = False, has_edge_curvature: bool = False, left_edge_intensity: float = 0.0, right_edge_intensity: float = 0.0, edge_noise_ratio: float = 1.0, detector_model: str = 'generic_nir', stray_light_fraction: float = 0.0, curvature_type: str = 'none', boundary_peak_amplitudes: Tuple[float, float] = (0.0, 0.0))[source]

Bases: object

Results of edge artifact inference.

Detects edge deformation effects in NIR spectra caused by: - Detector sensitivity roll-off at wavelength boundaries - Stray light effects (more pronounced at edges) - Truncated absorption bands outside measurement range - Baseline curvature concentrated at edges

has_edge_artifacts

Whether significant edge artifacts are detected.

Type:: bool

has_detector_rolloff

Whether detector roll-off effects are present.

Type:: bool

has_stray_light

Whether stray light effects are detected.

Type:: bool

has_truncated_peaks

Whether truncated peaks at boundaries are present.

Type:: bool

has_edge_curvature

Whether edge curvature/bending is detected.

Type:: bool

left_edge_intensity

Relative intensity change at left edge.

Type:: float

right_edge_intensity

Relative intensity change at right edge.

Type:: float

edge_noise_ratio

Ratio of edge noise to center noise.

Type:: float

detector_model

Suggested detector model based on characteristics.

Type:: str

stray_light_fraction

Estimated stray light fraction.

Type:: float

curvature_type

Detected curvature type (“smile”, “frown”, “asymmetric”).

Type:: str

boundary_peak_amplitudes

Estimated truncated peak amplitudes at edges.

Type:: Tuple[float, float]

References

JASCO (2020). Advantages of high-sensitivity InGaAs detector.
Applied Optics (1975). Resolution and stray light in NIR spectroscopy.
Burns & Ciurczak (2007). Handbook of Near-Infrared Analysis.

boundary_peak_amplitudes: Tuple[float, float] = (0.0, 0.0)

curvature_type: str = 'none'

detector_model: str = 'generic_nir'

edge_noise_ratio: float = 1.0

has_detector_rolloff: bool = False

has_edge_artifacts: bool = False

has_edge_curvature: bool = False

has_stray_light: bool = False

has_truncated_peaks: bool = False

left_edge_intensity: float = 0.0

right_edge_intensity: float = 0.0

stray_light_fraction: float = 0.0

class nirs4all.synthesis.fitter.EnvironmentalInference(estimated_temperature_variation: float = 0.0, has_temperature_effects: bool = False, estimated_moisture_variation: float = 0.0, has_moisture_effects: bool = False, water_band_shift: float = 0.0)[source]

Bases: object

Results of environmental effects inference.

estimated_temperature_variation

Estimated temperature variation (°C).

Type:: float

has_temperature_effects

Whether temperature effects are detectable.

Type:: bool

estimated_moisture_variation

Estimated moisture variation.

Type:: float

has_moisture_effects

Whether moisture effects are detectable.

Type:: bool

water_band_shift

Detected shift in water bands (nm).

Type:: float

estimated_moisture_variation: float = 0.0

estimated_temperature_variation: float = 0.0

has_moisture_effects: bool = False

has_temperature_effects: bool = False

water_band_shift: float = 0.0

class nirs4all.synthesis.fitter.FittedParameters(wavelength_start: float = 1000.0, wavelength_end: float = 2500.0, wavelength_step: float = 2.0, global_slope_mean: float = 0.0, global_slope_std: float = 0.02, noise_base: float = 0.001, noise_signal_dep: float = 0.005, path_length_std: float = 0.05, baseline_amplitude: float = 0.02, scatter_alpha_std: float = 0.05, scatter_beta_std: float = 0.01, tilt_std: float = 0.01, complexity: str = 'realistic', source_name: str = '', source_properties: SpectralProperties | None = None, inferred_instrument: str = 'unknown', instrument_inference: InstrumentInference | None = None, measurement_mode: str = 'transmittance', measurement_mode_confidence: float = 0.0, inferred_domain: str = 'unknown', domain_inference: DomainInference | None = None, environmental_inference: EnvironmentalInference | None = None, temperature_config: Dict[str, ~typing.Any]=<factory>, moisture_config: Dict[str, ~typing.Any]=<factory>, scattering_inference: ScatteringInference | None = None, particle_size_config: Dict[str, ~typing.Any]=<factory>, emsc_config: Dict[str, ~typing.Any]=<factory>, edge_artifact_inference: EdgeArtifactInference | None = None, edge_artifacts_config: Dict[str, ~typing.Any]=<factory>, boundary_components_config: Dict[str, ~typing.Any]=<factory>, preprocessing_inference: PreprocessingInference | None = None, preprocessing_type: str = 'raw_absorbance', is_preprocessed: bool = False, detected_components: List[str] = <factory>, suggested_n_components: int = 5)[source]

Bases: object

Parameters fitted from real data for synthetic generation.

This dataclass contains all parameters needed to configure a SyntheticNIRSGenerator to produce spectra similar to a real dataset, including Phase 1-4 enhanced features.

# Basic wavelength grid

wavelength_start

Start wavelength (nm).

Type:: float

wavelength_end

End wavelength (nm).

Type:: float

wavelength_step

Wavelength step (nm).

Type:: float

# Slope and baseline parameters

global_slope_mean

Mean global slope.

Type:: float

global_slope_std

Slope standard deviation.

Type:: float

baseline_amplitude

Baseline drift amplitude.

Type:: float

# Noise parameters

noise_base

Base noise level.

Type:: float

noise_signal_dep

Signal-dependent noise factor.

Type:: float

# Scatter parameters

path_length_std

Path length variation.

Type:: float

scatter_alpha_std

Multiplicative scatter std.

Type:: float

scatter_beta_std

Additive scatter std.

Type:: float

tilt_std

Spectral tilt standard deviation.

Type:: float

# Complexity

complexity

Suggested complexity level.

Type:: str

# Source metadata

source_name

Name of source dataset.

Type:: str

source_properties

Full SpectralProperties of source.

Type:: nirs4all.synthesis.fitter.SpectralProperties | None

# Phase 1-4 Enhanced Parameters

# Instrument inference

inferred_instrument

Inferred instrument archetype.

Type:: str

instrument_inference

Full instrument inference result.

Type:: nirs4all.synthesis.fitter.InstrumentInference | None

# Measurement mode

measurement_mode

Inferred measurement mode.

Type:: str

measurement_mode_confidence

Confidence of inference.

Type:: float

# Domain inference

inferred_domain

Inferred application domain.

Type:: str

domain_inference

Full domain inference result.

Type:: nirs4all.synthesis.fitter.DomainInference | None

# Environmental effects

environmental_inference

Environmental effects inference.

Type:: nirs4all.synthesis.fitter.EnvironmentalInference | None

temperature_config

Suggested temperature config parameters.

Type:: Dict[str, Any]

moisture_config

Suggested moisture config parameters.

Type:: Dict[str, Any]

# Scattering effects

scattering_inference

Scattering effects inference.

Type:: nirs4all.synthesis.fitter.ScatteringInference | None

particle_size_config

Suggested particle size config parameters.

Type:: Dict[str, Any]

emsc_config

Suggested EMSC config parameters.

Type:: Dict[str, Any]

# Detected components for procedural generation

detected_components

List of detected/inferred component names.

Type:: List[str]

suggested_n_components

Suggested number of components.

Type:: int

baseline_amplitude: float = 0.02

boundary_components_config: Dict[str, Any]

complexity: str = 'realistic'

detected_components: List[str]

domain_inference: DomainInference | None = None

edge_artifact_inference: EdgeArtifactInference | None = None

edge_artifacts_config: Dict[str, Any]

emsc_config: Dict[str, Any]

environmental_inference: EnvironmentalInference | None = None

classmethod from_dict(data: Dict[str, Any]) → FittedParameters[source]

Create FittedParameters from a dictionary.

Parameters:: data – Dictionary with parameter values.
Returns:: FittedParameters instance.

global_slope_mean: float = 0.0

global_slope_std: float = 0.02

inferred_domain: str = 'unknown'

inferred_instrument: str = 'unknown'

instrument_inference: InstrumentInference | None = None

is_preprocessed: bool = False

classmethod load(path: str) → FittedParameters[source]

Load parameters from JSON file.

Parameters:: path – Input file path.
Returns:: FittedParameters instance.

measurement_mode: str = 'transmittance'

measurement_mode_confidence: float = 0.0

moisture_config: Dict[str, Any]

noise_base: float = 0.001

noise_signal_dep: float = 0.005

particle_size_config: Dict[str, Any]

path_length_std: float = 0.05

preprocessing_inference: PreprocessingInference | None = None

preprocessing_type: str = 'raw_absorbance'

save(path: str) → None[source]

Save parameters to JSON file.

Parameters:: path – Output file path.

scatter_alpha_std: float = 0.05

scatter_beta_std: float = 0.01

scattering_inference: ScatteringInference | None = None

source_name: str = ''

source_properties: SpectralProperties | None = None

suggested_n_components: int = 5

summary() → str[source]

Generate a human-readable summary of fitted parameters.

Returns:: Multi-line summary string.

temperature_config: Dict[str, Any]

tilt_std: float = 0.01

to_dict() → Dict[str, Any][source]

Convert all parameters to a dictionary.

Returns:: Dictionary with all parameter values.

to_full_config() → Dict[str, Any][source]

Convert all fitted parameters to a comprehensive configuration.

This includes all Phase 1-4 parameters for complete synthetic data generation matching the source dataset.

Returns:: Dictionary with all configuration parameters.

Example

>>> params = fitter.fit(X_real)
>>> config = params.to_full_config()
>>> # Use with builder pattern or advanced configuration

to_generator_kwargs() → Dict[str, Any][source]

Convert fitted parameters to kwargs for SyntheticNIRSGenerator.

Returns:: Dictionary of keyword arguments.

Example

>>> params = fitter.fit(X_real)
>>> generator = SyntheticNIRSGenerator(**params.to_generator_kwargs())

wavelength_end: float = 2500.0

wavelength_start: float = 1000.0

wavelength_step: float = 2.0

class nirs4all.synthesis.fitter.ForwardModelFitter(components: List['SpectralComponent'], canonical_grid: np.ndarray, target_grid: np.ndarray, baseline_order: int = 4, wl_shift_bounds: Tuple[float, float] = (-5.0, 5.0), ils_sigma_bounds: Tuple[float, float] = (2.0, 15.0), path_length_bounds: Tuple[float, float] = (0.5, 2.0))[source]

Bases: object

Variable projection fitter for physical forward model.

Fits a physical mixture model to observed spectra by separating: - Linear params: concentrations, baseline coefficients (solved via NNLS/lsq) - Nonlinear params: wl_shift, ils_sigma, path_length (solved via optimization)

This approach is numerically stable and physically interpretable.

components

List of SpectralComponent objects.

Type:: List[‘SpectralComponent’]

canonical_grid

High-resolution canonical wavelength grid.

Type:: np.ndarray

target_grid

Target wavelength grid (dataset grid).

Type:: np.ndarray

baseline_order

Number of Chebyshev baseline terms.

Type:: int

wl_shift_bounds

Bounds for wavelength shift parameter.

Type:: Tuple[float, float]

ils_sigma_bounds

Bounds for ILS sigma parameter.

Type:: Tuple[float, float]

path_length_bounds

Bounds for path length parameter.

Type:: Tuple[float, float]

Example

>>> from nirs4all.synthesis._constants import get_predefined_components
>>> components = [get_predefined_components()[n] for n in ['water', 'protein']]
>>> fitter = ForwardModelFitter(
...     components=components,
...     canonical_grid=np.linspace(400, 2500, 4200),
...     target_grid=dataset_wavelengths,
... )
>>> result = fitter.fit(spectrum)
>>> print(f"R² = {result['r_squared']:.4f}")

__post_init__()[source]: Pre-compute component spectra on canonical grid.

baseline_order: int = 4

canonical_grid: np.ndarray

components: List['SpectralComponent']

fit(y: ndarray, initial_guess: ndarray | None = None) → Dict[str, Any][source]

Fit forward model to target spectrum.

Parameters:

y – Target spectrum.
initial_guess – Initial [wl_shift, ils_sigma, path_length].

Returns:

r_squared: Coefficient of determination
fitted: Fitted spectrum
residuals: Fitting residuals
concentrations: Fitted component concentrations
baseline_coeffs: Fitted baseline coefficients
wl_shift, ils_sigma, path_length: Instrument params

Return type:

Dict with fitted parameters

ils_sigma_bounds: Tuple[float, float] = (2.0, 15.0)

path_length_bounds: Tuple[float, float] = (0.5, 2.0)

target_grid: np.ndarray

wl_shift_bounds: Tuple[float, float] = (-5.0, 5.0)

class nirs4all.synthesis.fitter.InstrumentChain(wl_shift: float = 0.0, wl_stretch: float = 1.0, ils_sigma: float = 4.0, stray_light: float = 0.001, gain: float = 1.0, offset: float = 0.0)[source]

Bases: object

Forward instrument chain: canonical grid → dataset grid.

Applies the complete measurement chain to transform a high-resolution physical spectrum to the observed instrument grid.

Chain:

Wavelength warp (shift + stretch)
ILS convolution (Gaussian smoothing)
Stray light / gain / offset
Resample to target grid

wl_shift

Wavelength shift in nm.

Type:: float

wl_stretch

Wavelength scale factor.

Type:: float

ils_sigma

Instrument line shape Gaussian sigma in nm.

Type:: float

stray_light

Stray light fraction.

Type:: float

gain

Photometric gain.

Type:: float

offset

Photometric offset.

Type:: float

Example

>>> chain = InstrumentChain(wl_shift=2.0, ils_sigma=5.0)
>>> spectrum_obs = chain.apply(spectrum_phys, canonical_wl, target_wl)

apply(spectrum: ndarray, canonical_wl: ndarray, target_wl: ndarray) → ndarray[source]

Apply full instrument chain.

Parameters:

spectrum – Input spectrum on canonical grid.
canonical_wl – Canonical wavelength grid (nm).
target_wl – Target wavelength grid (nm).

Returns:

Transformed spectrum on target grid.

gain: float = 1.0

ils_sigma: float = 4.0

offset: float = 0.0

stray_light: float = 0.001

wl_shift: float = 0.0

wl_stretch: float = 1.0

class nirs4all.synthesis.fitter.InstrumentInference(archetype_name: str = 'unknown', detector_type: str = 'unknown', wavelength_range: Tuple[float, float]=(1000.0, 2500.0), estimated_resolution: float = 8.0, confidence: float = 0.0, alternative_archetypes: Dict[str, float]=<factory>)[source]

Bases: object

Results of instrument archetype inference.

archetype_name

Best matching instrument archetype name.

Type:: str

detector_type

Inferred detector type.

Type:: str

wavelength_range

Detected wavelength range.

Type:: Tuple[float, float]

estimated_resolution

Estimated spectral resolution (nm).

Type:: float

confidence

Confidence score (0-1).

Type:: float

alternative_archetypes

Other possible archetypes with scores.

Type:: Dict[str, float]

alternative_archetypes: Dict[str, float]

archetype_name: str = 'unknown'

confidence: float = 0.0

detector_type: str = 'unknown'

estimated_resolution: float = 8.0

wavelength_range: Tuple[float, float] = (1000.0, 2500.0)

class nirs4all.synthesis.fitter.MeasurementModeInference(value)[source]

Bases: str, Enum

Inferred measurement mode from spectral analysis.

ATR = 'atr'

REFLECTANCE = 'reflectance'

TRANSFLECTANCE = 'transflectance'

TRANSMITTANCE = 'transmittance'

UNKNOWN = 'unknown'

class nirs4all.synthesis.fitter.OperatorVarianceParams(noise_std: float = 0.001, offset_std: float = 0.01, slope_std: float = 0.001, curvature_std: float = 0.0001, mult_scatter_std: float = 0.05)[source]

Bases: object

Parameters for operator-based variance modeling.

Models spectral variation as independent physical sources: - High-frequency noise (detector noise) - Baseline offset/slope/curvature (instrumental drift, scattering) - Multiplicative scatter (sample thickness, optical path variation)

noise_std

Standard deviation of high-frequency noise.

Type:: float

offset_std

Standard deviation of baseline offset.

Type:: float

slope_std

Standard deviation of baseline slope (per 1000nm).

Type:: float

curvature_std

Standard deviation of baseline curvature.

Type:: float

mult_scatter_std

Standard deviation of multiplicative scatter.

Type:: float

curvature_std: float = 0.0001

mult_scatter_std: float = 0.05

noise_std: float = 0.001

offset_std: float = 0.01

slope_std: float = 0.001

to_dict() → Dict[str, float][source]: Convert to dictionary.

class nirs4all.synthesis.fitter.OptimizedComponentFitter(wavelengths: ndarray | None = None, priority_categories: List[str] | None = None, max_components: int = 10, baseline_order: int = 4, preprocessing: str | PreprocessingType | None = None, auto_detect_preprocessing: bool = False, sg_window_length: int = 15, sg_polyorder: int = 3, regularization: float = 1e-06, smooth_sigma_nm: float = 30.0, use_nnls: bool = False)[source]

Bases: object

Optimize component selection using greedy search with category prioritization.

Unlike ComponentFitter which fits all components simultaneously with NNLS, this class uses a greedy forward selection approach that:

Starts with baseline-only fit
Greedily adds components from priority categories (low threshold)
Fills remaining slots from other categories (higher threshold)
Applies swap refinement to escape local optima

This approach produces much better fits for real-world data by: - Avoiding overfitting to spurious components - Respecting domain knowledge (e.g., protein for dairy, starch for grains) - Allowing both positive and negative coefficients (OLS, not NNLS)

Example

>>> from nirs4all.synthesis import OptimizedComponentFitter
>>>
>>> # Create fitter for grain analysis
>>> fitter = OptimizedComponentFitter(
...     wavelengths=wavelengths,
...     priority_categories=['carbohydrates', 'proteins', 'water_related'],
...     max_components=10,
... )
>>> result = fitter.fit(spectrum)
>>> print(result.summary())

wavelengths: Wavelength grid for fitting.

priority_categories: Categories to prioritize in component selection.

max_components: Maximum number of components to select.

baseline_order: Polynomial order for baseline (default 4).

preprocessing: Preprocessing to apply to components.

auto_detect_preprocessing: Auto-detect preprocessing from data.

detected_preprocessing: PreprocessingType | None

fit(spectrum: ndarray) → OptimizedFitResult[source]

Fit components to a spectrum using greedy category-prioritized selection.

The algorithm: 1. Starts with baseline-only fit 2. Greedily adds components from priority categories (very low threshold: 0.0001) 3. Fills remaining slots from other categories (higher threshold: 0.005) 4. Applies swap refinement (prefers swapping in priority components)

Parameters:: spectrum – Observed spectrum, shape (n_wavelengths,).
Returns:: OptimizedFitResult with fit results.

class nirs4all.synthesis.fitter.OptimizedFitResult(component_names: List[str], concentrations: ndarray, baseline_coefficients: ndarray | None, fitted_spectrum: ndarray, residuals: ndarray, r_squared: float, rmse: float, n_components: int, n_priority_components: int, baseline_r_squared: float, wavelengths: ndarray)[source]

Bases: object

Result from optimized greedy component fitting.

component_names

Names of selected components (in order of selection).

Type:: List[str]

concentrations

Fitted concentrations for each component.

Type:: numpy.ndarray

baseline_coefficients

Polynomial baseline coefficients.

Type:: numpy.ndarray | None

fitted_spectrum

Reconstructed spectrum from fit.

Type:: numpy.ndarray

residuals

Fit residuals.

Type:: numpy.ndarray

r_squared

Coefficient of determination.

Type:: float

rmse

Root mean squared error.

Type:: float

n_components

Number of components selected.

Type:: int

n_priority_components

Number of components from priority categories.

Type:: int

baseline_r_squared

R² from baseline-only fit (for comparison).

Type:: float

wavelengths

Wavelength grid used for fitting.

Type:: numpy.ndarray

baseline_coefficients: ndarray | None

baseline_r_squared: float

component_names: List[str]

concentrations: ndarray

fitted_spectrum: ndarray

n_components: int

n_priority_components: int

r_squared: float

residuals: ndarray

rmse: float

summary() → str[source]: Return human-readable summary.

top_components(n: int = 5, threshold: float = 0.001) → List[Tuple[str, float]][source]: Get top components by concentration.

wavelengths: ndarray

class nirs4all.synthesis.fitter.PCAVarianceParams(n_components: int = 5, explained_variance_ratio: ndarray | None = None, score_means: ndarray | None = None, score_stds: ndarray | None = None, components: ndarray | None = None, mean_spectrum: ndarray | None = None)[source]

Bases: object

Parameters for PCA-based variance modeling.

Models spectral variation using principal component score distributions.

n_components

Number of PCA components.

Type:: int

explained_variance_ratio

Explained variance per component.

Type:: numpy.ndarray | None

score_means

Mean of PC scores.

Type:: numpy.ndarray | None

score_stds

Std of PC scores.

Type:: numpy.ndarray | None

components

PCA loading vectors (n_components, n_wavelengths).

Type:: numpy.ndarray | None

mean_spectrum

Mean spectrum from PCA.

Type:: numpy.ndarray | None

components: ndarray | None = None

explained_variance_ratio: ndarray | None = None

mean_spectrum: ndarray | None = None

n_components: int = 5

score_means: ndarray | None = None

score_stds: ndarray | None = None

class nirs4all.synthesis.fitter.PreprocessingInference(preprocessing_type: PreprocessingType = PreprocessingType.RAW_ABSORBANCE, confidence: float = 0.0, is_preprocessed: bool = False, global_mean: float = 0.0, global_range: Tuple[float, float] = (0.0, 1.0), zero_crossing_ratio: float = 0.0, per_sample_std_variation: float = 0.0, oscillation_frequency: float = 0.0, suggested_inverse: str | None = None)[source]

Bases: object

Results of preprocessing type inference.

Detects whether spectral data has been preprocessed (derivatives, normalization, centering, etc.) before being provided to the fitter.

This is crucial for generating synthetic data that matches the real data distribution - synthetic spectra should be generated as raw absorbance and then the same preprocessing applied.

preprocessing_type

Detected preprocessing type.

Type:: nirs4all.synthesis.fitter.PreprocessingType

confidence

Confidence score (0-1).

Type:: float

is_preprocessed

Whether data appears to be preprocessed.

Type:: bool

global_mean

Mean value (0 suggests centering/derivatives).

Type:: float

global_range

(min, max) value range.

Type:: Tuple[float, float]

zero_crossing_ratio

Ratio of zero crossings (high for derivatives).

Type:: float

per_sample_std_variation

Variation in per-sample std (low for SNV).

Type:: float

oscillation_frequency

Spectral oscillation frequency (high for 2nd deriv).

Type:: float

suggested_inverse

Suggested inverse operation to recover raw data.

Type:: str | None

confidence: float = 0.0

global_mean: float = 0.0

global_range: Tuple[float, float] = (0.0, 1.0)

is_preprocessed: bool = False

oscillation_frequency: float = 0.0

per_sample_std_variation: float = 0.0

preprocessing_type: PreprocessingType = 'raw_absorbance'

suggested_inverse: str | None = None

zero_crossing_ratio: float = 0.0

class nirs4all.synthesis.fitter.PreprocessingType(value)[source]

Bases: str, Enum

Detected preprocessing type of spectral data.

FIRST_DERIVATIVE = 'first_derivative'

MEAN_CENTERED = 'mean_centered'

MSC_CORRECTED = 'msc_corrected'

NORMALIZED = 'normalized'

RAW_ABSORBANCE = 'raw_absorbance'

RAW_REFLECTANCE = 'raw_reflectance'

SECOND_DERIVATIVE = 'second_derivative'

SNV_CORRECTED = 'snv_corrected'

UNKNOWN = 'unknown'

class nirs4all.synthesis.fitter.RealBandFitResult(band_names: ~typing.List[str], band_centers: ~numpy.ndarray, amplitudes: ~numpy.ndarray, sigmas: ~numpy.ndarray, baseline_coefficients: ~numpy.ndarray, fitted_spectrum: ~numpy.ndarray, residuals: ~numpy.ndarray, r_squared: float, rmse: float, n_bands: int, wavelengths: ~numpy.ndarray, band_assignments: ~typing.List[~typing.Any] = <factory>)[source]

Bases: object

Result from real band fitting using known NIR band assignments.

band_names

Names of fitted bands (e.g., “O-H/1st”, “C-H/combination”).

Type:: List[str]

band_centers

Fixed center wavelengths from NIR_BANDS.

Type:: numpy.ndarray

amplitudes

Fitted amplitudes for each band.

Type:: numpy.ndarray

sigmas

Sigma values (within constrained ranges).

Type:: numpy.ndarray

baseline_coefficients

Polynomial baseline coefficients.

Type:: numpy.ndarray

fitted_spectrum

Reconstructed spectrum from fit.

Type:: numpy.ndarray

residuals

Fit residuals.

Type:: numpy.ndarray

r_squared

Coefficient of determination.

Type:: float

rmse

Root mean squared error.

Type:: float

n_bands

Number of bands used.

Type:: int

wavelengths

Wavelength grid used for fitting.

Type:: numpy.ndarray

band_assignments

Original BandAssignment objects.

Type:: List[Any]

amplitudes: ndarray

band_assignments: List[Any]

band_centers: ndarray

band_names: List[str]

baseline_coefficients: ndarray

fitted_spectrum: ndarray

n_bands: int

r_squared: float

residuals: ndarray

rmse: float

sigmas: ndarray

summary() → str[source]: Return human-readable summary.

top_bands(n: int = 10, threshold: float = 0.001) → List[Tuple[str, float, float]][source]: Get top bands by amplitude. Returns (name, center, amplitude).

wavelengths: ndarray

class nirs4all.synthesis.fitter.RealBandFitter(baseline_order: int = 4, max_bands: int = 50, target_r2: float = 0.98, allow_sigma_variation: bool = True, sigma_margin: float = 0.3, n_iterations: int = 3)[source]

Bases: object

Fit spectra using REAL NIR band assignments from the _bands.py dictionary.

Unlike pure Gaussian band fitting which optimizes band centers freely, this class uses: - Fixed band centers from known spectroscopic literature assignments - Constrained sigma values based on typical ranges for each band type - Only amplitude optimization (more physically interpretable)

This provides spectroscopically meaningful decomposition that can be linked back to functional groups (O-H, C-H, N-H, etc.) and overtone levels.

Example

>>> from nirs4all.synthesis import RealBandFitter
>>>
>>> fitter = RealBandFitter(baseline_order=4, max_bands=40)
>>> result = fitter.fit(spectrum, wavelengths)
>>> print(result.summary())
>>>
>>> # See which functional groups contribute
>>> for name, center, amp in result.top_bands(10):
...     print(f"{center:.0f} nm: {name} (amplitude={amp:.4f})")

baseline_order: Polynomial baseline order.

max_bands: Maximum number of bands to use.

target_r2: Target R² for iterative refinement.

allow_sigma_variation: Allow sigma to vary within literature ranges.

sigma_margin: How much sigma can vary from midpoint (0.3 = ±30%).

fit(spectrum: ndarray, wavelengths: ndarray) → RealBandFitResult[source]

Fit spectrum using real NIR band positions.

Parameters:

spectrum – Target spectrum to fit, shape (n_wavelengths,).
wavelengths – Wavelengths in nm, shape (n_wavelengths,).

Returns:

RealBandFitResult with fit results and band assignments.

class nirs4all.synthesis.fitter.RealDataFitter[source]

Bases: object

Fit generator parameters to match real dataset properties.

This class analyzes real NIRS data and estimates parameters for the SyntheticNIRSGenerator to produce similar spectra. Includes Phase 1-4 enhanced inference for instruments, domains, and effects.

source_properties

SpectralProperties of the analyzed data.

Type:: nirs4all.synthesis.fitter.SpectralProperties | None

fitted_params

FittedParameters after fitting.

Type:: nirs4all.synthesis.fitter.FittedParameters | None

Example

>>> fitter = RealDataFitter()
>>> params = fitter.fit(X_real, wavelengths=wavelengths)
>>>
>>> # Access inferred characteristics
>>> print(f"Instrument: {params.inferred_instrument}")
>>> print(f"Domain: {params.inferred_domain}")
>>>
>>> # Create matched generator
>>> generator = fitter.create_matched_generator()
>>> X_synth, _, _ = generator.generate(1000)

apply_matching_preprocessing(X: ndarray, *, window_length: int = 15, polyorder: int = 2) → ndarray[source]

Apply preprocessing to match the detected preprocessing of real data.

If the real data was detected as preprocessed (e.g., second derivative), this method applies the same preprocessing to synthetic raw absorbance spectra so they match the real data distribution.

Parameters:

X – Raw absorbance spectra from generator (n_samples, n_wavelengths).
window_length – Savitzky-Golay window length for derivatives.
polyorder – Polynomial order for Savitzky-Golay filter.

Returns:

Preprocessed spectra matching the real data type.

Raises:

RuntimeError – If fit() hasn’t been called.

Example

>>> fitter = RealDataFitter()
>>> params = fitter.fit(X_real, wavelengths=wl)
>>> generator = fitter.create_matched_generator()
>>> X_raw, _, _ = generator.generate(1000)
>>> X_matched = fitter.apply_matching_preprocessing(X_raw)

create_matched_generator(random_state: int | None = None) → SyntheticNIRSGenerator[source]

Create a SyntheticNIRSGenerator configured to match the fitted data.

This method creates a generator with all fitted parameters including Phase 1-4 enhanced features (instrument, domain, effects).

Parameters:: random_state – Random seed for reproducibility.
Returns:: Configured SyntheticNIRSGenerator instance.
Raises:: RuntimeError – If fit() hasn’t been called.

Example

>>> fitter = RealDataFitter()
>>> params = fitter.fit(X_real, wavelengths=wavelengths)
>>> generator = fitter.create_matched_generator(random_state=42)
>>> X_synth, _, _ = generator.generate(1000)

evaluate_similarity(X_synthetic: ndarray, wavelengths: ndarray | None = None) → Dict[str, Any][source]

Evaluate similarity between synthetic and source data.

Computes various metrics comparing synthetic spectra to the original real data.

Parameters:

X_synthetic – Synthetic spectra matrix.
wavelengths – Optional wavelength grid.

Returns:

Dictionary with similarity metrics.

Raises:

RuntimeError – If fit() hasn’t been called.

Example

>>> params = fitter.fit(X_real)
>>> X_synth, _, _ = generator.generate(1000)
>>> metrics = fitter.evaluate_similarity(X_synth)
>>> print(f"Similarity: {metrics['overall_score']:.1f}/100")

fit(X: np.ndarray | 'SpectroDataset', *, wavelengths: np.ndarray | None = None, name: str = 'source', infer_instrument: bool = True, infer_domain: bool = True, infer_measurement_mode: bool = True, infer_environmental: bool = True, infer_scattering: bool = True, infer_edge_artifacts: bool = True, infer_preprocessing: bool = True) → FittedParameters[source]

Fit generator parameters to real data.

Analyzes the input data and estimates optimal parameters for generating synthetic spectra with similar properties. Includes Phase 1-6 enhanced inference.

Parameters:

X – Real spectra matrix (n_samples, n_wavelengths) or SpectroDataset.
wavelengths – Wavelength grid (required if X is ndarray).
name – Dataset name for reference.
infer_instrument – Whether to infer instrument archetype.
infer_domain – Whether to infer application domain.
infer_measurement_mode – Whether to infer measurement mode.
infer_environmental – Whether to infer environmental effects.
infer_scattering – Whether to infer scattering parameters.
infer_edge_artifacts – Whether to infer edge artifact effects.
infer_preprocessing – Whether to detect preprocessing type.

Returns:

FittedParameters object with estimated parameters.

Raises:

ValueError – If X is empty or has wrong shape.

Example

>>> fitter = RealDataFitter()
>>> params = fitter.fit(X_real, wavelengths=wl, name="wheat")
>>> print(params.summary())

fit_from_path(path: str, *, name: str | None = None) → FittedParameters[source]

Fit parameters from a dataset path.

Loads data using DatasetConfigs and fits parameters.

Parameters:

path – Path to dataset folder.
name – Optional name override.

Returns:

FittedParameters object.

Example

>>> params = fitter.fit_from_path("sample_data/regression")

fitted_params: FittedParameters | None

get_tuning_recommendations() → List[str][source]

Get recommendations for tuning generation parameters.

Based on the fitted parameters and source data, provides suggestions for manual tuning.

Returns:: List of recommendation strings.

Example

>>> params = fitter.fit(X_real)
>>> for rec in fitter.get_tuning_recommendations():
...     print(f"- {rec}")

source_properties: SpectralProperties | None

class nirs4all.synthesis.fitter.ScatteringInference(has_scatter_effects: bool = False, estimated_particle_size_um: float = 50.0, multiplicative_scatter_std: float = 0.0, additive_scatter_std: float = 0.0, baseline_curvature: float = 0.0, snv_correctable: bool = False, msc_correctable: bool = False)[source]

Bases: object

Results of scattering effects inference.

has_scatter_effects

Whether significant scatter is detected.

Type:: bool

estimated_particle_size_um

Estimated mean particle size (μm).

Type:: float

multiplicative_scatter_std

Estimated MSC-style multiplicative scatter.

Type:: float

additive_scatter_std

Estimated SNV-style additive scatter.

Type:: float

baseline_curvature

Detected baseline curvature intensity.

Type:: float

snv_correctable

Whether SNV would improve spectra.

Type:: bool

msc_correctable

Whether MSC would improve spectra.

Type:: bool

additive_scatter_std: float = 0.0

baseline_curvature: float = 0.0

estimated_particle_size_um: float = 50.0

has_scatter_effects: bool = False

msc_correctable: bool = False

multiplicative_scatter_std: float = 0.0

snv_correctable: bool = False

class nirs4all.synthesis.fitter.SpectralProperties(name: str = 'dataset', n_samples: int = 0, n_wavelengths: int = 0, wavelengths: ndarray | None = None, mean_spectrum: ndarray | None = None, std_spectrum: ndarray | None = None, global_mean: float = 0.0, global_std: float = 0.0, global_range: Tuple[float, float] = (0.0, 0.0), mean_slope: float = 0.0, slope_std: float = 0.0, slopes: ndarray | None = None, mean_curvature: float = 0.0, curvature_std: float = 0.0, skewness: float = 0.0, kurtosis: float = 0.0, noise_estimate: float = 0.0, snr_estimate: float = 0.0, pca_explained_variance: ndarray | None = None, pca_n_components_95: int = 0, n_peaks_mean: float = 0.0, peak_positions: ndarray | None = None, peak_wavenumbers: ndarray | None = None, effective_resolution: float = 8.0, noise_correlation_length: float = 1.0, wavelength_range: Tuple[float, float] = (1000.0, 2500.0), baseline_offset: float = 0.0, kubelka_munk_linearity: float = 0.0, baseline_convexity: float = 0.0, water_band_variation: float = 0.0, oh_band_positions: ndarray | None = None, temperature_sensitivity_score: float = 0.0, scatter_baseline_slope: float = 0.0, scatter_baseline_curvature: float = 0.0, sample_to_sample_offset_std: float = 0.0, sample_to_sample_slope_std: float = 0.0, protein_band_intensity: float = 0.0, carbohydrate_band_intensity: float = 0.0, lipid_band_intensity: float = 0.0, water_band_intensity: float = 0.0, left_edge_noise_std: float = 0.0, right_edge_noise_std: float = 0.0, center_noise_std: float = 0.0, left_edge_slope: float = 0.0, right_edge_slope: float = 0.0, edge_curvature_intensity: float = 0.0, edge_curvature_asymmetry: float = 0.0, has_boundary_rise_left: bool = False, has_boundary_rise_right: bool = False)[source]

Bases: object

Container for computed spectral properties of a dataset.

This dataclass holds various statistical and spectral properties computed from a NIRS dataset for comparison and fitting purposes.

name

Dataset identifier.

Type:: str

n_samples

Number of samples.

Type:: int

n_wavelengths

Number of wavelengths.

Type:: int

wavelengths

Wavelength grid.

Type:: numpy.ndarray | None

# Basic statistics

mean_spectrum

Mean spectrum across samples.

Type:: numpy.ndarray | None

std_spectrum

Standard deviation spectrum.

Type:: numpy.ndarray | None

global_mean

Overall mean absorbance.

Type:: float

global_std

Overall standard deviation.

Type:: float

global_range

(min, max) absorbance range.

Type:: Tuple[float, float]

# Shape properties

mean_slope

Average spectral slope (per 1000nm).

Type:: float

slope_std

Standard deviation of slopes.

Type:: float

mean_curvature

Average curvature (second derivative).

Type:: float

# Distribution statistics

skewness

Skewness of absorbance distribution.

Type:: float

kurtosis

Kurtosis of absorbance distribution.

Type:: float

# Noise characteristics

noise_estimate

Estimated noise level.

Type:: float

snr_estimate

Signal-to-noise ratio estimate.

Type:: float

# PCA properties

pca_explained_variance

Explained variance ratios.

Type:: numpy.ndarray | None

pca_n_components_95

Components for 95% variance.

Type:: int

# Peak analysis

n_peaks_mean

Mean number of peaks.

Type:: float

peak_positions

Wavelengths of detected peaks.

Type:: numpy.ndarray | None

peak_wavenumbers

Wavenumber positions of peaks.

Type:: numpy.ndarray | None

# Phase 1-4 Enhanced properties

# Instrument indicators

effective_resolution

Estimated spectral resolution from peak widths.

Type:: float

noise_correlation_length

Correlation length of noise (detector indicator).

Type:: float

wavelength_range

Actual wavelength range of data.

Type:: Tuple[float, float]

# Measurement mode indicators

baseline_offset

Mean baseline offset (transmittance indicator).

Type:: float

kubelka_munk_linearity

K-M linearity score (reflectance indicator).

Type:: float

baseline_convexity

Convexity of baseline (ATR indicator).

Type:: float

# Environmental indicators

water_band_variation

Variation in water band region.

Type:: float

oh_band_positions

Detected O-H band positions.

Type:: numpy.ndarray | None

temperature_sensitivity_score

Score for temperature effect detection.

Type:: float

# Scattering indicators

scatter_baseline_slope

Wavelength-dependent scatter slope.

Type:: float

scatter_baseline_curvature

Curvature from scattering.

Type:: float

sample_to_sample_offset_std

Sample-to-sample offset variation.

Type:: float

sample_to_sample_slope_std

Sample-to-sample slope variation.

Type:: float

# Domain indicators

protein_band_intensity

Intensity in protein band regions.

Type:: float

carbohydrate_band_intensity

Intensity in carbohydrate regions.

Type:: float

lipid_band_intensity

Intensity in lipid band regions.

Type:: float

water_band_intensity

Intensity in water band regions.

Type:: float

baseline_convexity: float = 0.0

baseline_offset: float = 0.0

carbohydrate_band_intensity: float = 0.0

center_noise_std: float = 0.0

curvature_std: float = 0.0

edge_curvature_asymmetry: float = 0.0

edge_curvature_intensity: float = 0.0

effective_resolution: float = 8.0

global_mean: float = 0.0

global_range: Tuple[float, float] = (0.0, 0.0)

global_std: float = 0.0

has_boundary_rise_left: bool = False

has_boundary_rise_right: bool = False

kubelka_munk_linearity: float = 0.0

kurtosis: float = 0.0

left_edge_noise_std: float = 0.0

left_edge_slope: float = 0.0

lipid_band_intensity: float = 0.0

mean_curvature: float = 0.0

mean_slope: float = 0.0

mean_spectrum: ndarray | None = None

n_peaks_mean: float = 0.0

n_samples: int = 0

n_wavelengths: int = 0

name: str = 'dataset'

noise_correlation_length: float = 1.0

noise_estimate: float = 0.0

oh_band_positions: ndarray | None = None

pca_explained_variance: ndarray | None = None

pca_n_components_95: int = 0

peak_positions: ndarray | None = None

peak_wavenumbers: ndarray | None = None

protein_band_intensity: float = 0.0

right_edge_noise_std: float = 0.0

right_edge_slope: float = 0.0

sample_to_sample_offset_std: float = 0.0

sample_to_sample_slope_std: float = 0.0

scatter_baseline_curvature: float = 0.0

scatter_baseline_slope: float = 0.0

skewness: float = 0.0

slope_std: float = 0.0

slopes: ndarray | None = None

snr_estimate: float = 0.0

std_spectrum: ndarray | None = None

temperature_sensitivity_score: float = 0.0

water_band_intensity: float = 0.0

water_band_variation: float = 0.0

wavelength_range: Tuple[float, float] = (1000.0, 2500.0)

wavelengths: ndarray | None = None

class nirs4all.synthesis.fitter.VarianceFitResult(operator_params: OperatorVarianceParams, pca_params: PCAVarianceParams, n_samples: int = 0, wavelengths: ndarray | None = None)[source]

Bases: object

Combined result from variance fitting.

operator_params

Operator-based variance parameters.

Type:: nirs4all.synthesis.fitter.OperatorVarianceParams

pca_params

PCA-based variance parameters.

Type:: nirs4all.synthesis.fitter.PCAVarianceParams

n_samples

Number of samples used for fitting.

Type:: int

wavelengths

Wavelength grid.

Type:: numpy.ndarray | None

n_samples: int = 0

operator_params: OperatorVarianceParams

pca_params: PCAVarianceParams

summary() → str[source]: Return human-readable summary.

wavelengths: ndarray | None = None

class nirs4all.synthesis.fitter.VarianceFitter(n_pca_components: int = 10)[source]

Bases: object

Fit variance parameters from real spectra.

Provides two complementary methods for modeling spectral variation: - Operator-based: Independent physical sources (noise, scatter, baseline) - PCA-based: Correlated variations capturing the covariance structure

Example

>>> from nirs4all.synthesis import VarianceFitter
>>>
>>> fitter = VarianceFitter()
>>> result = fitter.fit(X_real, wavelengths)
>>>
>>> # Use operator-based params for generation
>>> print(f"Noise level: {result.operator_params.noise_std:.6f}")
>>>
>>> # Generate synthetic variance using PCA
>>> X_variance = fitter.generate_pca_variance(n_samples=100, random_state=42)

fit(X: ndarray, wavelengths: ndarray | None = None) → VarianceFitResult[source]

Fit variance parameters from real spectra.

Parameters:

X – Real spectra matrix (n_samples, n_wavelengths).
wavelengths – Wavelength array (nm).

Returns:

VarianceFitResult with both operator and PCA parameters.

generate_operator_variance(base_spectrum: ndarray, wavelengths: ndarray, n_samples: int = 100, random_state: int | None = None) → ndarray[source]

Generate synthetic spectra using operator-based variance.

Parameters:

base_spectrum – Mean/fitted spectrum to add variance to.
wavelengths – Wavelength array.
n_samples – Number of samples to generate.
random_state – Random seed.

Returns:

Array of synthetic spectra (n_samples, n_wavelengths).

generate_pca_variance(n_samples: int = 100, n_components: int | None = None, random_state: int | None = None) → ndarray[source]

Generate synthetic spectra using PCA-based variance.

Parameters:

n_samples – Number of samples to generate.
n_components – Number of PCA components to use (None = all).
random_state – Random seed.

Returns:

Array of synthetic spectra (n_samples, n_wavelengths).

nirs4all.synthesis.fitter.compare_datasets(X_synthetic: ndarray, X_real: ndarray, wavelengths: ndarray | None = None) → Dict[str, Any][source]

Quick comparison between synthetic and real datasets.

Parameters:

X_synthetic – Synthetic spectra.
X_real – Real spectra.
wavelengths – Wavelength grid.

Returns:

Dictionary with comparison metrics.

Example

>>> metrics = compare_datasets(X_synth, X_real)
>>> print(f"Similarity: {metrics['overall_score']:.1f}/100")

nirs4all.synthesis.fitter.compute_spectral_properties(X: ndarray, wavelengths: ndarray | None = None, name: str = 'dataset', n_pca_components: int = 20) → SpectralProperties[source]

Compute comprehensive spectral properties of a dataset.

Analyzes a matrix of spectra to extract statistical and spectral properties useful for fitting and comparison. Includes Phase 1-4 enhanced properties for instrument, mode, domain, and effect inference.

Parameters:

X – Spectra matrix (n_samples, n_wavelengths).
wavelengths – Optional wavelength grid.
name – Dataset identifier.
n_pca_components – Maximum PCA components to compute.

Returns:

SpectralProperties with computed metrics.

Example

>>> props = compute_spectral_properties(X_real, wavelengths)
>>> print(f"Mean slope: {props.mean_slope:.4f}")
>>> print(f"Inferred resolution: {props.effective_resolution:.1f} nm")

nirs4all.synthesis.fitter.fit_components(spectrum: ndarray, wavelengths: ndarray, component_names: List[str] | None = None, fit_baseline: bool = True, baseline_order: int = 2, method: str = 'nnls', preprocessing: str | PreprocessingType | None = None, auto_detect_preprocessing: bool = False) → ComponentFitResult[source]

Convenience function to fit components to a spectrum.

Parameters:

spectrum – Observed spectrum.
wavelengths – Wavelength grid.
component_names – Components to fit (None = all available).
fit_baseline – Include polynomial baseline.
baseline_order – Polynomial order for baseline.
method – Fitting method (“nnls” or “lsq”).
preprocessing – Preprocessing to apply to components (e.g., “second_derivative”). Use this when fitting preprocessed data.
auto_detect_preprocessing – If True, automatically detect preprocessing type from the data. This is useful for derivative data where the preprocessing type is unknown. Takes precedence over preprocessing if set.

Returns:

ComponentFitResult with fit results.

Example

>>> # Fit raw absorbance data
>>> result = fit_components(spectrum, wavelengths, ["water", "protein", "lipid"])
>>>
>>> # Fit second derivative data
>>> result = fit_components(
...     deriv_spectrum, wavelengths, ["water", "protein"],
...     preprocessing="second_derivative"
... )
>>>
>>> # Auto-detect preprocessing (recommended for unknown data)
>>> result = fit_components(
...     unknown_spectrum, wavelengths,
...     auto_detect_preprocessing=True
... )

nirs4all.synthesis.fitter.fit_components_optimized(spectrum: ndarray, wavelengths: ndarray, priority_categories: List[str] | None = None, max_components: int = 10, baseline_order: int = 4, preprocessing: str | PreprocessingType | None = None, auto_detect_preprocessing: bool = False, smooth_sigma_nm: float = 30.0, use_nnls: bool = False) → OptimizedFitResult[source]

Convenience function for optimized component fitting.

Uses greedy category-prioritized selection for better fits than NNLS.

Parameters:

spectrum – Observed spectrum.
wavelengths – Wavelength grid.
priority_categories – Categories to prioritize (e.g., [‘carbohydrates’, ‘proteins’]).
max_components – Maximum components to select.
baseline_order – Polynomial baseline order.
preprocessing – Preprocessing type (‘first_derivative’, ‘second_derivative’, etc.).
auto_detect_preprocessing – Auto-detect preprocessing from data.
smooth_sigma_nm – Gaussian smoothing sigma in nm to broaden component spectra.
use_nnls – Use non-negative least squares instead of OLS.

Returns:

OptimizedFitResult with fit results.

Example

>>> result = fit_components_optimized(
...     spectrum, wavelengths,
...     priority_categories=['carbohydrates', 'proteins'],
...     auto_detect_preprocessing=True,
... )
>>> print(f"R² = {result.r_squared:.4f}")

nirs4all.synthesis.fitter.fit_real_bands(spectrum: ndarray, wavelengths: ndarray, baseline_order: int = 4, max_bands: int = 50, target_r2: float = 0.98, allow_sigma_variation: bool = True) → RealBandFitResult[source]

Convenience function for fitting spectrum using real NIR band assignments.

Uses known band positions from the NIR_BANDS dictionary for physically meaningful spectral decomposition.

Parameters:

spectrum – Observed spectrum.
wavelengths – Wavelength grid in nm.
baseline_order – Polynomial baseline order.
max_bands – Maximum number of bands to use.
target_r2 – Target R² for early stopping.
allow_sigma_variation – Allow sigma to vary within constrained ranges.

Returns:

RealBandFitResult with fit results.

Example

>>> result = fit_real_bands(spectrum, wavelengths)
>>> print(f"R² = {result.r_squared:.4f}")
>>> for name, center, amp in result.top_bands(5):
...     print(f"{center:.0f} nm: {name}")

nirs4all.synthesis.fitter.fit_to_real_data(X: np.ndarray | 'SpectroDataset', wavelengths: np.ndarray | None = None, name: str = 'source') → FittedParameters[source]

Quick function to fit parameters to real data.

Convenience function for simple fitting use cases.

Parameters:

X – Real spectra or SpectroDataset.
wavelengths – Wavelength grid.
name – Dataset name.

Returns:

FittedParameters object.

Example

>>> params = fit_to_real_data(X_real, wavelengths)
>>> generator = SyntheticNIRSGenerator(**params.to_generator_kwargs())

nirs4all.synthesis.fitter.fit_variance(X: ndarray, wavelengths: ndarray | None = None, n_pca_components: int = 10) → VarianceFitResult[source]

Convenience function to fit variance parameters from real spectra.

Parameters:

X – Real spectra matrix (n_samples, n_wavelengths).
wavelengths – Wavelength array (nm).
n_pca_components – Number of PCA components to fit.

Returns:

VarianceFitResult with fitted parameters.

Example

>>> result = fit_variance(X_real, wavelengths)
>>> print(f"Noise level: {result.operator_params.noise_std:.6f}")

nirs4all.synthesis.fitter.multiscale_derivative_fit(fitter: DerivativeAwareForwardModelFitter, y_deriv: ndarray, scales: List[float] | None = None) → Dict[str, Any][source]

Multiscale fitting curriculum for derivative spectra.

Fits coarse features first by smoothing the derivative target, then progressively reduces smoothing. Particularly important for derivative data which can have high-frequency noise.

Parameters:

fitter – DerivativeAwareForwardModelFitter instance.
y_deriv – Target derivative spectrum.
scales – List of Gaussian sigma values. Default: [15, 8, 4, 0].

Returns:

Final fit result dict.

Example

>>> result = multiscale_derivative_fit(fitter, deriv_spectrum)

nirs4all.synthesis.fitter.multiscale_fit(fitter: ForwardModelFitter, y: ndarray, scales: List[float] | None = None) → Dict[str, Any][source]

Multiscale fitting curriculum for raw spectra.

Fits coarse features first by smoothing the target, then progressively reduces smoothing to capture finer details. This improves optimization stability and avoids local minima.

Parameters:

fitter – ForwardModelFitter instance.
y – Target spectrum.
scales – List of Gaussian sigma values for progressive smoothing. Default: [20, 10, 5, 0].

Returns:

Final fit result dict.

Example

>>> result = multiscale_fit(fitter, spectrum, scales=[20, 10, 5, 0])