nirs4all.synthesis.targets module

Target generation for synthetic NIRS datasets.

This module provides tools for generating target variables for regression and classification tasks, with configurable distributions and class separation.

Example

>>> from nirs4all.synthesis.targets import TargetGenerator
>>>
>>> generator = TargetGenerator(random_state=42)
>>>
>>> # Regression targets
>>> y = generator.regression(
...     n_samples=100,
...     concentrations=C,  # From spectra generation
...     distribution="lognormal",
...     range=(0, 100)
... )
>>>
>>> # Classification with separable classes
>>> y = generator.classification(
...     n_samples=100,
...     concentrations=C,
...     n_classes=3,
...     separation=2.0
... )
class nirs4all.synthesis.targets.ClassSeparationConfig(separation: float = 1.5, method: Literal['component', 'shift', 'intensity'] = 'component', noise: float = 0.1)[source]

Bases: object

Configuration for class separation in classification tasks.

separation

Separation factor (higher = more separable). Values around 0.5-1.0 create overlapping classes. Values around 2.0-3.0 create well-separated classes.

Type:

float

method

How to create class differences: - “component”: Different component concentration profiles per class. - “shift”: Systematic spectral shifts between classes. - “intensity”: Different overall intensity levels.

Type:

Literal[‘component’, ‘shift’, ‘intensity’]

noise

Noise level to add to class boundaries.

Type:

float

method: Literal['component', 'shift', 'intensity'] = 'component'
noise: float = 0.1
separation: float = 1.5
class nirs4all.synthesis.targets.NonLinearTargetConfig(nonlinear_interactions: Literal['none', 'polynomial', 'synergistic', 'antagonistic'] = 'none', interaction_strength: float = 0.5, hidden_factors: int = 0, polynomial_degree: int = 2, signal_to_confound_ratio: float = 1.0, n_confounders: int = 0, spectral_masking: float = 0.0, temporal_drift: bool = False, n_regimes: int = 1, regime_method: Literal['concentration', 'spectral', 'random'] = 'concentration', regime_overlap: float = 0.2, noise_heteroscedasticity: float = 0.0)[source]

Bases: object

Configuration for non-linear target complexity.

nonlinear_interactions

Type of non-linear interaction.

Type:

Literal[‘none’, ‘polynomial’, ‘synergistic’, ‘antagonistic’]

interaction_strength

Blend factor (0=linear, 1=fully non-linear).

Type:

float

hidden_factors

Latent variables not in spectra.

Type:

int

polynomial_degree

Degree for polynomial interactions.

Type:

int

signal_to_confound_ratio

Predictability from spectra.

Type:

float

n_confounders

Confounding variables.

Type:

int

spectral_masking

Signal in noisy regions.

Type:

float

temporal_drift

Relationship changes over samples.

Type:

bool

n_regimes

Number of relationship regimes.

Type:

int

regime_method

How to partition into regimes.

Type:

Literal[‘concentration’, ‘spectral’, ‘random’]

regime_overlap

Transition zone smoothness.

Type:

float

noise_heteroscedasticity

Per-regime noise variation.

Type:

float

hidden_factors: int = 0
interaction_strength: float = 0.5
n_confounders: int = 0
n_regimes: int = 1
noise_heteroscedasticity: float = 0.0
nonlinear_interactions: Literal['none', 'polynomial', 'synergistic', 'antagonistic'] = 'none'
polynomial_degree: int = 2
regime_method: Literal['concentration', 'spectral', 'random'] = 'concentration'
regime_overlap: float = 0.2
signal_to_confound_ratio: float = 1.0
spectral_masking: float = 0.0
temporal_drift: bool = False
class nirs4all.synthesis.targets.NonLinearTargetProcessor(config: NonLinearTargetConfig, random_state: int | None = None)[source]

Bases: object

Process targets with non-linear relationships, confounders, and multi-regime landscapes.

This class implements three propositions for making synthetic targets harder to predict:

  1. Non-linear interactions: Polynomial, synergistic, or antagonistic effects.

  2. Spectral-target decoupling: Confounders and partial predictability.

  3. Multi-regime landscapes: Different relationships in different regions.

Parameters:
  • config – NonLinearTargetConfig with all settings.

  • random_state – Random seed for reproducibility.

Example

>>> config = NonLinearTargetConfig(
...     nonlinear_interactions="polynomial",
...     interaction_strength=0.7,
...     n_regimes=3
... )
>>> processor = NonLinearTargetProcessor(config, random_state=42)
>>> y_complex = processor.process(C, y_base)
process(concentrations: ndarray, y_base: ndarray, spectra: ndarray | None = None) ndarray[source]

Apply all configured complexity to base targets.

Parameters:
  • concentrations – Component concentration matrix (n_samples, n_components).

  • y_base – Base target values (n_samples,) or (n_samples, n_targets).

  • spectra – Optional spectra matrix for spectral-based regimes.

Returns:

Transformed target values with added complexity.

class nirs4all.synthesis.targets.TargetGenerator(random_state: int | None = None)[source]

Bases: object

Generate target variables for synthetic NIRS datasets.

This class creates both regression targets (continuous values correlated with component concentrations) and classification targets (discrete labels with controllable class separation).

rng

NumPy random generator for reproducibility.

Parameters:

random_state – Random seed for reproducibility.

Example

>>> generator = TargetGenerator(random_state=42)
>>>
>>> # Generate concentrations first (from SyntheticNIRSGenerator)
>>> C = np.random.rand(100, 5)  # 5 components
>>>
>>> # Regression targets scaled to percentage
>>> y = generator.regression(
...     n_samples=100,
...     concentrations=C,
...     component=0,  # Use first component
...     range=(0, 100)
... )
>>>
>>> # Multi-class classification
>>> y = generator.classification(
...     n_samples=100,
...     concentrations=C,
...     n_classes=4,
...     separation=2.0
... )
classification(n_samples: int, concentrations: ndarray | None = None, *, n_classes: int = 2, class_weights: List[float] | None = None, separation: float = 1.5, separation_method: Literal['component', 'threshold', 'cluster'] = 'component', class_names: List[str] | None = None, return_proba: bool = False) ndarray | Tuple[ndarray, ndarray][source]

Generate classification target labels with controllable class separation.

The separation parameter controls how distinguishable classes are in feature space. Higher values create more separable classes.

Parameters:
  • n_samples – Number of samples.

  • concentrations – Component concentration matrix.

  • n_classes – Number of classes to generate.

  • class_weights – Class proportions (should sum to 1.0). If None, uses balanced classes.

  • separation – Class separation factor: - 0.5-1.0: Overlapping classes (challenging) - 1.5-2.0: Moderate separation (realistic) - 2.5+: Well-separated classes (easy)

  • separation_method – How to create class differences: - “component”: Each class has distinct component profiles - “threshold”: Classes based on concentration thresholds - “cluster”: K-means-like cluster assignment

  • class_names – Optional string labels for classes.

  • return_proba – If True, also return class probabilities.

Returns:

Integer class labels (n_samples,). If return_proba=True: Tuple of (labels, probabilities).

Return type:

If return_proba=False

Example

>>> # Binary classification with balanced classes
>>> y = generator.classification(100, C, n_classes=2)
>>>
>>> # 3-class with imbalanced weights
>>> y = generator.classification(
...     100, C,
...     n_classes=3,
...     class_weights=[0.5, 0.3, 0.2],
...     separation=2.0
... )
regression(n_samples: int, concentrations: ndarray | None = None, *, distribution: Literal['uniform', 'normal', 'lognormal', 'bimodal'] = 'uniform', range: Tuple[float, float] | None = None, component: int | str | List[int] | None = None, component_names: List[str] | None = None, correlation: float = 0.9, noise: float = 0.1, transform: Literal['log', 'sqrt'] | None = None) ndarray[source]

Generate regression target values.

Parameters:
  • n_samples – Number of samples.

  • concentrations – Component concentration matrix (n_samples, n_components). If None, generates random base values.

  • distribution – Target value distribution.

  • range – (min, max) for scaling targets.

  • component – Which component(s) to use as target: - None: Weighted combination of all components - int: Use component at that index - str: Use component with that name (requires component_names) - List[int]: Multi-output using specified component indices

  • component_names – Names of components (for string component selection).

  • correlation – Correlation between concentrations and targets (0-1).

  • noise – Noise level to add.

  • transform – Optional transformation (‘log’, ‘sqrt’).

Returns:

Target values array. Shape (n_samples,) for single target, or (n_samples, n_targets) for multi-output.

Example

>>> y = generator.regression(
...     100, C,
...     distribution="lognormal",
...     range=(5, 50),
...     component="protein",
...     component_names=["water", "protein", "lipid"]
... )
nirs4all.synthesis.targets.generate_classification_targets(n_samples: int, concentrations: ndarray | None = None, *, random_state: int | None = None, n_classes: int = 2, class_weights: List[float] | None = None, separation: float = 1.5) ndarray[source]

Convenience function for generating classification targets.

Parameters:
  • n_samples – Number of samples.

  • concentrations – Component concentrations (optional).

  • random_state – Random seed.

  • n_classes – Number of classes.

  • class_weights – Class proportions.

  • separation – Class separation factor.

Returns:

Integer class labels array.

nirs4all.synthesis.targets.generate_regression_targets(n_samples: int, concentrations: ndarray | None = None, *, random_state: int | None = None, distribution: str = 'uniform', range: Tuple[float, float] | None = None) ndarray[source]

Convenience function for generating regression targets.

Parameters:
  • n_samples – Number of samples.

  • concentrations – Component concentrations (optional).

  • random_state – Random seed.

  • distribution – Target distribution type.

  • range – Value range (min, max).

Returns:

Target values array.