nirs4all.synthesis.targets module
Target generation for synthetic NIRS datasets.
This module provides tools for generating target variables for regression and classification tasks, with configurable distributions and class separation.
Example
>>> from nirs4all.synthesis.targets import TargetGenerator
>>>
>>> generator = TargetGenerator(random_state=42)
>>>
>>> # Regression targets
>>> y = generator.regression(
... n_samples=100,
... concentrations=C, # From spectra generation
... distribution="lognormal",
... range=(0, 100)
... )
>>>
>>> # Classification with separable classes
>>> y = generator.classification(
... n_samples=100,
... concentrations=C,
... n_classes=3,
... separation=2.0
... )
- class nirs4all.synthesis.targets.ClassSeparationConfig(separation: float = 1.5, method: Literal['component', 'shift', 'intensity'] = 'component', noise: float = 0.1)[source]
Bases:
objectConfiguration for class separation in classification tasks.
- separation
Separation factor (higher = more separable). Values around 0.5-1.0 create overlapping classes. Values around 2.0-3.0 create well-separated classes.
- Type:
- method
How to create class differences: - “component”: Different component concentration profiles per class. - “shift”: Systematic spectral shifts between classes. - “intensity”: Different overall intensity levels.
- Type:
Literal[‘component’, ‘shift’, ‘intensity’]
- class nirs4all.synthesis.targets.NonLinearTargetConfig(nonlinear_interactions: Literal['none', 'polynomial', 'synergistic', 'antagonistic'] = 'none', interaction_strength: float = 0.5, hidden_factors: int = 0, polynomial_degree: int = 2, signal_to_confound_ratio: float = 1.0, n_confounders: int = 0, spectral_masking: float = 0.0, temporal_drift: bool = False, n_regimes: int = 1, regime_method: Literal['concentration', 'spectral', 'random'] = 'concentration', regime_overlap: float = 0.2, noise_heteroscedasticity: float = 0.0)[source]
Bases:
objectConfiguration for non-linear target complexity.
- nonlinear_interactions
Type of non-linear interaction.
- Type:
Literal[‘none’, ‘polynomial’, ‘synergistic’, ‘antagonistic’]
Latent variables not in spectra.
- Type:
- regime_method
How to partition into regimes.
- Type:
Literal[‘concentration’, ‘spectral’, ‘random’]
- class nirs4all.synthesis.targets.NonLinearTargetProcessor(config: NonLinearTargetConfig, random_state: int | None = None)[source]
Bases:
objectProcess targets with non-linear relationships, confounders, and multi-regime landscapes.
This class implements three propositions for making synthetic targets harder to predict:
Non-linear interactions: Polynomial, synergistic, or antagonistic effects.
Spectral-target decoupling: Confounders and partial predictability.
Multi-regime landscapes: Different relationships in different regions.
- Parameters:
config – NonLinearTargetConfig with all settings.
random_state – Random seed for reproducibility.
Example
>>> config = NonLinearTargetConfig( ... nonlinear_interactions="polynomial", ... interaction_strength=0.7, ... n_regimes=3 ... ) >>> processor = NonLinearTargetProcessor(config, random_state=42) >>> y_complex = processor.process(C, y_base)
- process(concentrations: ndarray, y_base: ndarray, spectra: ndarray | None = None) ndarray[source]
Apply all configured complexity to base targets.
- Parameters:
concentrations – Component concentration matrix (n_samples, n_components).
y_base – Base target values (n_samples,) or (n_samples, n_targets).
spectra – Optional spectra matrix for spectral-based regimes.
- Returns:
Transformed target values with added complexity.
- class nirs4all.synthesis.targets.TargetGenerator(random_state: int | None = None)[source]
Bases:
objectGenerate target variables for synthetic NIRS datasets.
This class creates both regression targets (continuous values correlated with component concentrations) and classification targets (discrete labels with controllable class separation).
- rng
NumPy random generator for reproducibility.
- Parameters:
random_state – Random seed for reproducibility.
Example
>>> generator = TargetGenerator(random_state=42) >>> >>> # Generate concentrations first (from SyntheticNIRSGenerator) >>> C = np.random.rand(100, 5) # 5 components >>> >>> # Regression targets scaled to percentage >>> y = generator.regression( ... n_samples=100, ... concentrations=C, ... component=0, # Use first component ... range=(0, 100) ... ) >>> >>> # Multi-class classification >>> y = generator.classification( ... n_samples=100, ... concentrations=C, ... n_classes=4, ... separation=2.0 ... )
- classification(n_samples: int, concentrations: ndarray | None = None, *, n_classes: int = 2, class_weights: List[float] | None = None, separation: float = 1.5, separation_method: Literal['component', 'threshold', 'cluster'] = 'component', class_names: List[str] | None = None, return_proba: bool = False) ndarray | Tuple[ndarray, ndarray][source]
Generate classification target labels with controllable class separation.
The separation parameter controls how distinguishable classes are in feature space. Higher values create more separable classes.
- Parameters:
n_samples – Number of samples.
concentrations – Component concentration matrix.
n_classes – Number of classes to generate.
class_weights – Class proportions (should sum to 1.0). If None, uses balanced classes.
separation – Class separation factor: - 0.5-1.0: Overlapping classes (challenging) - 1.5-2.0: Moderate separation (realistic) - 2.5+: Well-separated classes (easy)
separation_method – How to create class differences: - “component”: Each class has distinct component profiles - “threshold”: Classes based on concentration thresholds - “cluster”: K-means-like cluster assignment
class_names – Optional string labels for classes.
return_proba – If True, also return class probabilities.
- Returns:
Integer class labels (n_samples,). If return_proba=True: Tuple of (labels, probabilities).
- Return type:
If return_proba=False
Example
>>> # Binary classification with balanced classes >>> y = generator.classification(100, C, n_classes=2) >>> >>> # 3-class with imbalanced weights >>> y = generator.classification( ... 100, C, ... n_classes=3, ... class_weights=[0.5, 0.3, 0.2], ... separation=2.0 ... )
- regression(n_samples: int, concentrations: ndarray | None = None, *, distribution: Literal['uniform', 'normal', 'lognormal', 'bimodal'] = 'uniform', range: Tuple[float, float] | None = None, component: int | str | List[int] | None = None, component_names: List[str] | None = None, correlation: float = 0.9, noise: float = 0.1, transform: Literal['log', 'sqrt'] | None = None) ndarray[source]
Generate regression target values.
- Parameters:
n_samples – Number of samples.
concentrations – Component concentration matrix (n_samples, n_components). If None, generates random base values.
distribution – Target value distribution.
range – (min, max) for scaling targets.
component – Which component(s) to use as target: - None: Weighted combination of all components - int: Use component at that index - str: Use component with that name (requires component_names) - List[int]: Multi-output using specified component indices
component_names – Names of components (for string component selection).
correlation – Correlation between concentrations and targets (0-1).
noise – Noise level to add.
transform – Optional transformation (‘log’, ‘sqrt’).
- Returns:
Target values array. Shape (n_samples,) for single target, or (n_samples, n_targets) for multi-output.
Example
>>> y = generator.regression( ... 100, C, ... distribution="lognormal", ... range=(5, 50), ... component="protein", ... component_names=["water", "protein", "lipid"] ... )
- nirs4all.synthesis.targets.generate_classification_targets(n_samples: int, concentrations: ndarray | None = None, *, random_state: int | None = None, n_classes: int = 2, class_weights: List[float] | None = None, separation: float = 1.5) ndarray[source]
Convenience function for generating classification targets.
- Parameters:
n_samples – Number of samples.
concentrations – Component concentrations (optional).
random_state – Random seed.
n_classes – Number of classes.
class_weights – Class proportions.
separation – Class separation factor.
- Returns:
Integer class labels array.
- nirs4all.synthesis.targets.generate_regression_targets(n_samples: int, concentrations: ndarray | None = None, *, random_state: int | None = None, distribution: str = 'uniform', range: Tuple[float, float] | None = None) ndarray[source]
Convenience function for generating regression targets.
- Parameters:
n_samples – Number of samples.
concentrations – Component concentrations (optional).
random_state – Random seed.
distribution – Target distribution type.
range – Value range (min, max).
- Returns:
Target values array.