nirs4all.synthesis.config module
Configuration dataclasses for synthetic NIRS data generation.
This module provides structured configuration objects for controlling various aspects of synthetic spectra generation.
- class nirs4all.synthesis.config.BatchEffectConfig(enabled: bool = False, n_batches: int = 3, offset_std: float = 0.02, gain_std: float = 0.03)[source]
Bases:
objectConfiguration for batch/session effects simulation.
- class nirs4all.synthesis.config.ConfounderConfig(signal_to_confound_ratio: float = 1.0, n_confounders: int = 0, spectral_masking: float = 0.0, temporal_drift: bool = False)[source]
Bases:
objectConfiguration for spectral-target decoupling and confounding effects.
Introduces factors that make the target only partially predictable from spectral features, simulating real-world irreducible error.
- signal_to_confound_ratio
Proportion of target variance explainable from spectra. 1.0 = fully predictable, 0.5 = 50% unexplainable.
- Type:
- n_confounders
Number of confounding variables that affect both spectra and target in different ways.
- Type:
- spectral_masking
Fraction of predictive signal hidden in high-noise wavelength regions (0.0-0.5).
- Type:
- temporal_drift
If True, the target-spectra relationship gradually changes across samples.
- Type:
- class nirs4all.synthesis.config.FeatureConfig(wavelength_start: float = 1000.0, wavelength_end: float = 2500.0, wavelength_step: float = 2.0, complexity: Literal['simple', 'realistic', 'complex'] = 'simple', n_components: int | None = None, component_names: List[str] | None = None)[source]
Bases:
objectConfiguration for spectral feature generation.
- complexity
Complexity level affecting noise, scatter, etc. Options: ‘simple’, ‘realistic’, ‘complex’.
- Type:
Literal[‘simple’, ‘realistic’, ‘complex’]
- class nirs4all.synthesis.config.MetadataConfig(generate_sample_ids: bool = True, sample_id_prefix: str = 'sample', n_groups: int | None = None, n_repetitions: int | Tuple[int, int] = 1, group_names: List[str] | None = None, additional_columns: Dict[str, Any] | None = None)[source]
Bases:
objectConfiguration for sample metadata generation.
- n_repetitions
Repetitions per sample, either fixed int or (min, max) range.
- class nirs4all.synthesis.config.MultiRegimeConfig(n_regimes: int = 1, regime_method: Literal['concentration', 'spectral', 'random'] = 'concentration', regime_overlap: float = 0.2, noise_heteroscedasticity: float = 0.0)[source]
Bases:
objectConfiguration for multi-regime target landscapes.
Creates regions in feature space where the target-spectra relationship differs, simulating subpopulations.
- regime_method
How to partition samples into regimes: ‘concentration’, ‘spectral’, or ‘random’.
- Type:
Literal[‘concentration’, ‘spectral’, ‘random’]
- regime_overlap
Overlap between regimes creating transition zones. 0 = hard boundaries, 0.5 = smooth transitions.
- Type:
- noise_heteroscedasticity
How much prediction noise varies by regime. 0 = same noise everywhere, 1 = very different noise levels.
- Type:
- class nirs4all.synthesis.config.NonLinearConfig(interactions: Literal['none', 'polynomial', 'synergistic', 'antagonistic'] = 'none', interaction_strength: float = 0.5, hidden_factors: int = 0, polynomial_degree: int = 2)[source]
Bases:
objectConfiguration for non-linear target relationships.
Enables polynomial, synergistic, or antagonistic interactions between component concentrations and targets, making prediction harder.
- interactions
Type of non-linear interaction. Options: ‘none’, ‘polynomial’, ‘synergistic’, ‘antagonistic’.
- Type:
Literal[‘none’, ‘polynomial’, ‘synergistic’, ‘antagonistic’]
Number of latent variables affecting target but not spectra.
- Type:
- class nirs4all.synthesis.config.OutputConfig(as_dataset: bool = True, include_metadata: bool = False, include_wavelengths: bool = True)[source]
Bases:
objectConfiguration for output format.
- class nirs4all.synthesis.config.PartitionConfig(train_ratio: float = 0.8, stratify: bool = False, shuffle: bool = True, group_aware: bool = True)[source]
Bases:
objectConfiguration for data partitioning (train/test split).
- class nirs4all.synthesis.config.SyntheticDatasetConfig(n_samples: int = 1000, random_state: int | None = None, features: FeatureConfig = <factory>, targets: TargetConfig = <factory>, metadata: MetadataConfig = <factory>, partitions: PartitionConfig = <factory>, batch_effects: BatchEffectConfig = <factory>, nonlinear: NonLinearConfig = <factory>, confounders: ConfounderConfig = <factory>, multi_regime: MultiRegimeConfig = <factory>, output: OutputConfig = <factory>, name: str = 'synthetic_nirs')[source]
Bases:
objectComplete configuration for synthetic dataset generation.
This is the main configuration object that combines all sub-configurations for generating synthetic NIRS datasets.
- features
Feature generation configuration.
- targets
Target variable configuration.
- metadata
Sample metadata configuration.
- partitions
Train/test split configuration.
- batch_effects
Batch effect configuration.
- output
Output format configuration.
Example
>>> config = SyntheticDatasetConfig( ... n_samples=1000, ... random_state=42, ... features=FeatureConfig(complexity="realistic"), ... targets=TargetConfig(distribution="lognormal", range=(0, 100)), ... )
- batch_effects: BatchEffectConfig
- confounders: ConfounderConfig
- features: FeatureConfig
- metadata: MetadataConfig
- multi_regime: MultiRegimeConfig
- nonlinear: NonLinearConfig
- output: OutputConfig
- partitions: PartitionConfig
- targets: TargetConfig
- class nirs4all.synthesis.config.TargetConfig(distribution: Literal['dirichlet', 'uniform', 'lognormal', 'correlated'] = 'dirichlet', range: Tuple[float, float] | None = None, n_targets: int | None = None, component_indices: List[int] | None = None, transform: Literal['log', 'sqrt'] | None = None)[source]
Bases:
objectConfiguration for target variable generation.
- distribution
Target value distribution method. Options: ‘dirichlet’, ‘uniform’, ‘lognormal’, ‘correlated’.
- Type:
Literal[‘dirichlet’, ‘uniform’, ‘lognormal’, ‘correlated’]
- transform
Optional transformation to apply (‘log’, ‘sqrt’, None).
- Type:
Literal[‘log’, ‘sqrt’] | None