nirs4all.operators.filters.spectral_quality module
Spectral quality filter for sample filtering.
This module provides the SpectralQualityFilter class for detecting and excluding samples with poor spectral quality based on various quality metrics.
- class nirs4all.operators.filters.spectral_quality.SpectralQualityFilter(max_nan_ratio: float = 0.1, max_zero_ratio: float = 0.5, min_variance: float = 1e-08, max_value: float | None = None, min_value: float | None = None, check_inf: bool = True, reason: str | None = None)[source]
Bases:
SampleFilterFilter samples with poor spectral quality.
This filter identifies samples whose spectra exhibit quality issues such as: - High proportion of NaN or missing values - High proportion of zero values (potentially corrupted) - Very low variance (flat or constant spectra) - Values outside expected range (saturation)
Example
>>> from nirs4all.operators.filters import SpectralQualityFilter >>> >>> # Default quality checks >>> filter_obj = SpectralQualityFilter() >>> >>> # Strict quality requirements >>> filter_strict = SpectralQualityFilter( ... max_nan_ratio=0.01, ... max_zero_ratio=0.1, ... min_variance=1e-4 ... ) >>> >>> # Check for saturated spectra >>> filter_sat = SpectralQualityFilter(max_value=4.0, min_value=-0.5) >>> >>> # Get mask >>> mask = filter_obj.get_mask(X_train) # True = keep
- In Pipeline:
>>> pipeline = [ ... { ... "sample_filter": { ... "filters": [SpectralQualityFilter(max_nan_ratio=0.05)], ... } ... }, ... "snv", ... "model:PLSRegression", ... ]
- fit(X: ndarray, y: ndarray | None = None) SpectralQualityFilter[source]
Fit the filter (no-op for quality filter as thresholds are fixed).
The SpectralQualityFilter uses fixed thresholds set at initialization, so no fitting is required. This method is provided for API consistency.
- Parameters:
X – Feature array of shape (n_samples, n_features).
y – Target array (not used).
- Returns:
The filter instance (unchanged).
- Return type:
self
- get_filter_stats(X: ndarray, y: ndarray | None = None) Dict[str, Any][source]
Get statistics about filter application including quality breakdown.
- Parameters:
X – Feature array.
y – Target array (unused).
- Returns:
Base stats (n_samples, n_excluded, n_kept, exclusion_rate)
Quality thresholds
Per-check failure counts
Quality metric distributions
- Return type:
Dict containing
- get_mask(X: ndarray, y: ndarray | None = None) ndarray[source]
Compute boolean mask indicating which samples to KEEP based on quality.
- Parameters:
X – Feature array of shape (n_samples, n_features).
y – Target array (not used for X-based quality checks).
- Returns:
- Boolean array of shape (n_samples,) where:
True means KEEP the sample (passes quality checks)
False means EXCLUDE the sample (fails quality checks)
- Return type:
np.ndarray
- get_quality_breakdown(X: ndarray, y: ndarray | None = None) Dict[str, ndarray][source]
Get detailed breakdown of which quality checks each sample fails.
This method provides per-check masks to understand why specific samples were excluded.
- Parameters:
X – Feature array of shape (n_samples, n_features).
y – Target array (not used).
- Returns:
“passes_nan”: True if NaN ratio is acceptable
”passes_inf”: True if no Inf values
”passes_zero”: True if zero ratio is acceptable
”passes_variance”: True if variance is sufficient
”passes_max_value”: True if max value is within limit
”passes_min_value”: True if min value is within limit
”passes_all”: True if passes all checks
- Return type:
Dict with boolean arrays for each quality check