nirs4all.operators.filters.spectral_quality module

Spectral quality filter for sample filtering.

This module provides the SpectralQualityFilter class for detecting and excluding samples with poor spectral quality based on various quality metrics.

class nirs4all.operators.filters.spectral_quality.SpectralQualityFilter(max_nan_ratio: float = 0.1, max_zero_ratio: float = 0.5, min_variance: float = 1e-08, max_value: float | None = None, min_value: float | None = None, check_inf: bool = True, reason: str | None = None)[source]

Bases: SampleFilter

Filter samples with poor spectral quality.

This filter identifies samples whose spectra exhibit quality issues such as: - High proportion of NaN or missing values - High proportion of zero values (potentially corrupted) - Very low variance (flat or constant spectra) - Values outside expected range (saturation)

max_nan_ratio

Maximum allowed NaN ratio per spectrum

Type:

float

max_zero_ratio

Maximum allowed zero ratio

Type:

float

min_variance

Minimum variance threshold

Type:

float

max_value

Maximum allowed value (saturation detection)

Type:

float

min_value

Minimum allowed value

Type:

float

Example

>>> from nirs4all.operators.filters import SpectralQualityFilter
>>>
>>> # Default quality checks
>>> filter_obj = SpectralQualityFilter()
>>>
>>> # Strict quality requirements
>>> filter_strict = SpectralQualityFilter(
...     max_nan_ratio=0.01,
...     max_zero_ratio=0.1,
...     min_variance=1e-4
... )
>>>
>>> # Check for saturated spectra
>>> filter_sat = SpectralQualityFilter(max_value=4.0, min_value=-0.5)
>>>
>>> # Get mask
>>> mask = filter_obj.get_mask(X_train)  # True = keep
In Pipeline:
>>> pipeline = [
...     {
...         "sample_filter": {
...             "filters": [SpectralQualityFilter(max_nan_ratio=0.05)],
...         }
...     },
...     "snv",
...     "model:PLSRegression",
... ]
__repr__() str[source]

Return string representation.

property exclusion_reason: str

Get descriptive exclusion reason.

fit(X: ndarray, y: ndarray | None = None) SpectralQualityFilter[source]

Fit the filter (no-op for quality filter as thresholds are fixed).

The SpectralQualityFilter uses fixed thresholds set at initialization, so no fitting is required. This method is provided for API consistency.

Parameters:
  • X – Feature array of shape (n_samples, n_features).

  • y – Target array (not used).

Returns:

The filter instance (unchanged).

Return type:

self

get_filter_stats(X: ndarray, y: ndarray | None = None) Dict[str, Any][source]

Get statistics about filter application including quality breakdown.

Parameters:
  • X – Feature array.

  • y – Target array (unused).

Returns:

  • Base stats (n_samples, n_excluded, n_kept, exclusion_rate)

  • Quality thresholds

  • Per-check failure counts

  • Quality metric distributions

Return type:

Dict containing

get_mask(X: ndarray, y: ndarray | None = None) ndarray[source]

Compute boolean mask indicating which samples to KEEP based on quality.

Parameters:
  • X – Feature array of shape (n_samples, n_features).

  • y – Target array (not used for X-based quality checks).

Returns:

Boolean array of shape (n_samples,) where:
  • True means KEEP the sample (passes quality checks)

  • False means EXCLUDE the sample (fails quality checks)

Return type:

np.ndarray

get_quality_breakdown(X: ndarray, y: ndarray | None = None) Dict[str, ndarray][source]

Get detailed breakdown of which quality checks each sample fails.

This method provides per-check masks to understand why specific samples were excluded.

Parameters:
  • X – Feature array of shape (n_samples, n_features).

  • y – Target array (not used).

Returns:

  • “passes_nan”: True if NaN ratio is acceptable

  • ”passes_inf”: True if no Inf values

  • ”passes_zero”: True if zero ratio is acceptable

  • ”passes_variance”: True if variance is sufficient

  • ”passes_max_value”: True if max value is within limit

  • ”passes_min_value”: True if min value is within limit

  • ”passes_all”: True if passes all checks

Return type:

Dict with boolean arrays for each quality check