nirs4all.data.synthetic.fitter module
Real data fitting utilities for synthetic NIRS spectra generation.
This module provides tools to analyze real NIRS datasets and fit generator parameters to match their statistical and spectral properties.
- Key Features:
Statistical property analysis (mean, std, skewness, kurtosis)
Spectral shape analysis (slope, curvature, noise)
PCA structure analysis
Parameter estimation for SyntheticNIRSGenerator
Comparison between synthetic and real data
- Phase 1-4 Enhanced Features:
Instrument archetype inference (InGaAs, PbS, MEMS, etc.)
Measurement mode detection (transmittance, reflectance, ATR)
Application domain suggestion (agriculture, pharmaceutical, etc.)
Environmental effects estimation (temperature, moisture)
Scattering parameter estimation (particle size, EMSC)
Wavenumber-based peak analysis for component identification
Example
>>> from nirs4all.data.synthetic import RealDataFitter, SyntheticNIRSGenerator
>>>
>>> # Analyze real data
>>> fitter = RealDataFitter()
>>> params = fitter.fit(X_real, wavelengths=wavelengths)
>>>
>>> # Create generator with fitted parameters (includes all Phase 1-4 features)
>>> generator = fitter.create_matched_generator()
>>> X_synthetic, _, _ = generator.generate(n_samples=1000)
>>>
>>> # Or get all inferred characteristics
>>> print(f"Inferred instrument: {params.inferred_instrument}")
>>> print(f"Inferred domain: {params.inferred_domain}")
>>> print(f"Measurement mode: {params.measurement_mode}")
References
Based on comparator.py from bench/synthetic/
Enhanced with Phase 1-4 synthetic generator features
- class nirs4all.data.synthetic.fitter.DomainInference(domain_name: str = 'unknown', category: str = 'unknown', confidence: float = 0.0, detected_components: ~typing.List[str] = <factory>, alternative_domains: ~typing.Dict[str, float] = <factory>)[source]
Bases:
objectResults of application domain inference.
- class nirs4all.data.synthetic.fitter.EnvironmentalInference(estimated_temperature_variation: float = 0.0, has_temperature_effects: bool = False, estimated_moisture_variation: float = 0.0, has_moisture_effects: bool = False, water_band_shift: float = 0.0)[source]
Bases:
objectResults of environmental effects inference.
- class nirs4all.data.synthetic.fitter.FittedParameters(wavelength_start: float = 1000.0, wavelength_end: float = 2500.0, wavelength_step: float = 2.0, global_slope_mean: float = 0.0, global_slope_std: float = 0.02, noise_base: float = 0.001, noise_signal_dep: float = 0.005, path_length_std: float = 0.05, baseline_amplitude: float = 0.02, scatter_alpha_std: float = 0.05, scatter_beta_std: float = 0.01, tilt_std: float = 0.01, complexity: str = 'realistic', source_name: str = '', source_properties: ~nirs4all.data.synthetic.fitter.SpectralProperties | None = None, inferred_instrument: str = 'unknown', instrument_inference: ~nirs4all.data.synthetic.fitter.InstrumentInference | None = None, measurement_mode: str = 'transmittance', measurement_mode_confidence: float = 0.0, inferred_domain: str = 'unknown', domain_inference: ~nirs4all.data.synthetic.fitter.DomainInference | None = None, environmental_inference: ~nirs4all.data.synthetic.fitter.EnvironmentalInference | None = None, temperature_config: ~typing.Dict[str, ~typing.Any] = <factory>, moisture_config: ~typing.Dict[str, ~typing.Any] = <factory>, scattering_inference: ~nirs4all.data.synthetic.fitter.ScatteringInference | None = None, particle_size_config: ~typing.Dict[str, ~typing.Any] = <factory>, emsc_config: ~typing.Dict[str, ~typing.Any] = <factory>, detected_components: ~typing.List[str] = <factory>, suggested_n_components: int = 5)[source]
Bases:
objectParameters fitted from real data for synthetic generation.
This dataclass contains all parameters needed to configure a SyntheticNIRSGenerator to produce spectra similar to a real dataset, including Phase 1-4 enhanced features.
- # Basic wavelength grid
- # Slope and baseline parameters
- # Noise parameters
- # Scatter parameters
- # Complexity
- # Source metadata
- source_properties
Full SpectralProperties of source.
- Type:
- # Phase 1-4 Enhanced Parameters
- # Instrument inference
- instrument_inference
Full instrument inference result.
- Type:
- # Measurement mode
- # Domain inference
- domain_inference
Full domain inference result.
- Type:
- # Environmental effects
- environmental_inference
Environmental effects inference.
- # Scattering effects
- scattering_inference
Scattering effects inference.
- Type:
- # Detected components for procedural generation
- domain_inference: DomainInference | None = None
- environmental_inference: EnvironmentalInference | None = None
- classmethod from_dict(data: Dict[str, Any]) FittedParameters[source]
Create FittedParameters from a dictionary.
- Parameters:
data – Dictionary with parameter values.
- Returns:
FittedParameters instance.
- instrument_inference: InstrumentInference | None = None
- classmethod load(path: str) FittedParameters[source]
Load parameters from JSON file.
- Parameters:
path – Input file path.
- Returns:
FittedParameters instance.
- scattering_inference: ScatteringInference | None = None
- source_properties: SpectralProperties | None = None
- summary() str[source]
Generate a human-readable summary of fitted parameters.
- Returns:
Multi-line summary string.
- to_dict() Dict[str, Any][source]
Convert all parameters to a dictionary.
- Returns:
Dictionary with all parameter values.
- to_full_config() Dict[str, Any][source]
Convert all fitted parameters to a comprehensive configuration.
This includes all Phase 1-4 parameters for complete synthetic data generation matching the source dataset.
- Returns:
Dictionary with all configuration parameters.
Example
>>> params = fitter.fit(X_real) >>> config = params.to_full_config() >>> # Use with builder pattern or advanced configuration
- class nirs4all.data.synthetic.fitter.InstrumentInference(archetype_name: str = 'unknown', detector_type: str = 'unknown', wavelength_range: ~typing.Tuple[float, float] = (1000.0, 2500.0), estimated_resolution: float = 8.0, confidence: float = 0.0, alternative_archetypes: ~typing.Dict[str, float] = <factory>)[source]
Bases:
objectResults of instrument archetype inference.
- class nirs4all.data.synthetic.fitter.MeasurementModeInference(value)[source]
-
Inferred measurement mode from spectral analysis.
- ATR = 'atr'
- REFLECTANCE = 'reflectance'
- TRANSFLECTANCE = 'transflectance'
- TRANSMITTANCE = 'transmittance'
- UNKNOWN = 'unknown'
- class nirs4all.data.synthetic.fitter.RealDataFitter[source]
Bases:
objectFit generator parameters to match real dataset properties.
This class analyzes real NIRS data and estimates parameters for the SyntheticNIRSGenerator to produce similar spectra. Includes Phase 1-4 enhanced inference for instruments, domains, and effects.
- source_properties
SpectralProperties of the analyzed data.
- fitted_params
FittedParameters after fitting.
Example
>>> fitter = RealDataFitter() >>> params = fitter.fit(X_real, wavelengths=wavelengths) >>> >>> # Access inferred characteristics >>> print(f"Instrument: {params.inferred_instrument}") >>> print(f"Domain: {params.inferred_domain}") >>> >>> # Create matched generator >>> generator = fitter.create_matched_generator() >>> X_synth, _, _ = generator.generate(1000)
- create_matched_generator(random_state: int | None = None) SyntheticNIRSGenerator[source]
Create a SyntheticNIRSGenerator configured to match the fitted data.
This method creates a generator with all fitted parameters including Phase 1-4 enhanced features (instrument, domain, effects).
- Parameters:
random_state – Random seed for reproducibility.
- Returns:
Configured SyntheticNIRSGenerator instance.
- Raises:
RuntimeError – If fit() hasn’t been called.
Example
>>> fitter = RealDataFitter() >>> params = fitter.fit(X_real, wavelengths=wavelengths) >>> generator = fitter.create_matched_generator(random_state=42) >>> X_synth, _, _ = generator.generate(1000)
- evaluate_similarity(X_synthetic: ndarray, wavelengths: ndarray | None = None) Dict[str, Any][source]
Evaluate similarity between synthetic and source data.
Computes various metrics comparing synthetic spectra to the original real data.
- Parameters:
X_synthetic – Synthetic spectra matrix.
wavelengths – Optional wavelength grid.
- Returns:
Dictionary with similarity metrics.
- Raises:
RuntimeError – If fit() hasn’t been called.
Example
>>> params = fitter.fit(X_real) >>> X_synth, _, _ = generator.generate(1000) >>> metrics = fitter.evaluate_similarity(X_synth) >>> print(f"Similarity: {metrics['overall_score']:.1f}/100")
- fit(X: np.ndarray | 'SpectroDataset', *, wavelengths: np.ndarray | None = None, name: str = 'source', infer_instrument: bool = True, infer_domain: bool = True, infer_measurement_mode: bool = True, infer_environmental: bool = True, infer_scattering: bool = True) FittedParameters[source]
Fit generator parameters to real data.
Analyzes the input data and estimates optimal parameters for generating synthetic spectra with similar properties. Includes Phase 1-4 enhanced inference.
- Parameters:
X – Real spectra matrix (n_samples, n_wavelengths) or SpectroDataset.
wavelengths – Wavelength grid (required if X is ndarray).
name – Dataset name for reference.
infer_instrument – Whether to infer instrument archetype.
infer_domain – Whether to infer application domain.
infer_measurement_mode – Whether to infer measurement mode.
infer_environmental – Whether to infer environmental effects.
infer_scattering – Whether to infer scattering parameters.
- Returns:
FittedParameters object with estimated parameters.
- Raises:
ValueError – If X is empty or has wrong shape.
Example
>>> fitter = RealDataFitter() >>> params = fitter.fit(X_real, wavelengths=wl, name="wheat") >>> print(params.summary())
- fit_from_path(path: str, *, name: str | None = None) FittedParameters[source]
Fit parameters from a dataset path.
Loads data using DatasetConfigs and fits parameters.
- Parameters:
path – Path to dataset folder.
name – Optional name override.
- Returns:
FittedParameters object.
Example
>>> params = fitter.fit_from_path("sample_data/regression")
- get_tuning_recommendations() List[str][source]
Get recommendations for tuning generation parameters.
Based on the fitted parameters and source data, provides suggestions for manual tuning.
- Returns:
List of recommendation strings.
Example
>>> params = fitter.fit(X_real) >>> for rec in fitter.get_tuning_recommendations(): ... print(f"- {rec}")
- class nirs4all.data.synthetic.fitter.ScatteringInference(has_scatter_effects: bool = False, estimated_particle_size_um: float = 50.0, multiplicative_scatter_std: float = 0.0, additive_scatter_std: float = 0.0, baseline_curvature: float = 0.0, snv_correctable: bool = False, msc_correctable: bool = False)[source]
Bases:
objectResults of scattering effects inference.
- class nirs4all.data.synthetic.fitter.SpectralProperties(name: str = 'dataset', n_samples: int = 0, n_wavelengths: int = 0, wavelengths: ndarray | None = None, mean_spectrum: ndarray | None = None, std_spectrum: ndarray | None = None, global_mean: float = 0.0, global_std: float = 0.0, global_range: Tuple[float, float] = (0.0, 0.0), mean_slope: float = 0.0, slope_std: float = 0.0, slopes: ndarray | None = None, mean_curvature: float = 0.0, curvature_std: float = 0.0, skewness: float = 0.0, kurtosis: float = 0.0, noise_estimate: float = 0.0, snr_estimate: float = 0.0, pca_explained_variance: ndarray | None = None, pca_n_components_95: int = 0, n_peaks_mean: float = 0.0, peak_positions: ndarray | None = None, peak_wavenumbers: ndarray | None = None, effective_resolution: float = 8.0, noise_correlation_length: float = 1.0, wavelength_range: Tuple[float, float] = (1000.0, 2500.0), baseline_offset: float = 0.0, kubelka_munk_linearity: float = 0.0, baseline_convexity: float = 0.0, water_band_variation: float = 0.0, oh_band_positions: ndarray | None = None, temperature_sensitivity_score: float = 0.0, scatter_baseline_slope: float = 0.0, scatter_baseline_curvature: float = 0.0, sample_to_sample_offset_std: float = 0.0, sample_to_sample_slope_std: float = 0.0, protein_band_intensity: float = 0.0, carbohydrate_band_intensity: float = 0.0, lipid_band_intensity: float = 0.0, water_band_intensity: float = 0.0)[source]
Bases:
objectContainer for computed spectral properties of a dataset.
This dataclass holds various statistical and spectral properties computed from a NIRS dataset for comparison and fitting purposes.
- wavelengths
Wavelength grid.
- Type:
numpy.ndarray | None
- # Basic statistics
- mean_spectrum
Mean spectrum across samples.
- Type:
numpy.ndarray | None
- std_spectrum
Standard deviation spectrum.
- Type:
numpy.ndarray | None
- # Shape properties
- # Distribution statistics
- # Noise characteristics
- # PCA properties
- pca_explained_variance
Explained variance ratios.
- Type:
numpy.ndarray | None
- # Peak analysis
- peak_positions
Wavelengths of detected peaks.
- Type:
numpy.ndarray | None
- peak_wavenumbers
Wavenumber positions of peaks.
- Type:
numpy.ndarray | None
- # Phase 1-4 Enhanced properties
- # Instrument indicators
- # Measurement mode indicators
- # Environmental indicators
- oh_band_positions
Detected O-H band positions.
- Type:
numpy.ndarray | None
- # Scattering indicators
- # Domain indicators
- nirs4all.data.synthetic.fitter.compare_datasets(X_synthetic: ndarray, X_real: ndarray, wavelengths: ndarray | None = None) Dict[str, Any][source]
Quick comparison between synthetic and real datasets.
- Parameters:
X_synthetic – Synthetic spectra.
X_real – Real spectra.
wavelengths – Wavelength grid.
- Returns:
Dictionary with comparison metrics.
Example
>>> metrics = compare_datasets(X_synth, X_real) >>> print(f"Similarity: {metrics['overall_score']:.1f}/100")
- nirs4all.data.synthetic.fitter.compute_spectral_properties(X: ndarray, wavelengths: ndarray | None = None, name: str = 'dataset', n_pca_components: int = 20) SpectralProperties[source]
Compute comprehensive spectral properties of a dataset.
Analyzes a matrix of spectra to extract statistical and spectral properties useful for fitting and comparison. Includes Phase 1-4 enhanced properties for instrument, mode, domain, and effect inference.
- Parameters:
X – Spectra matrix (n_samples, n_wavelengths).
wavelengths – Optional wavelength grid.
name – Dataset identifier.
n_pca_components – Maximum PCA components to compute.
- Returns:
SpectralProperties with computed metrics.
Example
>>> props = compute_spectral_properties(X_real, wavelengths) >>> print(f"Mean slope: {props.mean_slope:.4f}") >>> print(f"Inferred resolution: {props.effective_resolution:.1f} nm")
- nirs4all.data.synthetic.fitter.fit_to_real_data(X: np.ndarray | 'SpectroDataset', wavelengths: np.ndarray | None = None, name: str = 'source') FittedParameters[source]
Quick function to fit parameters to real data.
Convenience function for simple fitting use cases.
- Parameters:
X – Real spectra or SpectroDataset.
wavelengths – Wavelength grid.
name – Dataset name.
- Returns:
FittedParameters object.
Example
>>> params = fit_to_real_data(X_real, wavelengths) >>> generator = SyntheticNIRSGenerator(**params.to_generator_kwargs())