Resampler - Wavelength Grid Resampling
The Resampler operator allows you to resample spectral data to a different wavelength grid using various interpolation methods from scipy. This is particularly useful for:
Standardizing wavelength grids across different instruments or datasets
Reducing dimensionality by downsampling to fewer wavelengths
Increasing resolution by upsampling with interpolation
Focusing on specific regions by cropping and resampling
Preparing data for transfer learning across different spectral resolutions
Basic Usage
from nirs4all.operators.transforms import Resampler
import numpy as np
# Define target wavelength grid
target_wavelengths = np.linspace(1000, 2500, 100)
# Create resampler
resampler = Resampler(
target_wavelengths=target_wavelengths,
method='linear'
)
# Use in pipeline
pipeline = [
MinMaxScaler(),
resampler,
StandardNormalVariate(),
# ... rest of pipeline
]
Parameters
target_wavelengths (required)
Array-like or list of target wavelength values. Can be:
Single array: same grid for all sources
List of arrays: different grid per source (for multi-source datasets)
# Same grid for all sources
target_wl = np.linspace(1000, 2500, 100)
resampler = Resampler(target_wavelengths=target_wl)
# Different grid per source
target_wl_list = [
np.linspace(1000, 2500, 100), # Source 0
np.linspace(1100, 2400, 120), # Source 1
]
resampler = Resampler(target_wavelengths=target_wl_list)
method (default: ‘linear’)
Interpolation method from scipy.interpolate.interp1d. Options include:
'linear': Linear interpolation (fast, good for most cases)'cubic': Cubic spline interpolation (smoother)'quadratic': Quadratic interpolation'nearest': Nearest neighbor (no interpolation)'slinear','zero','previous','next': Other scipy methods
resampler = Resampler(
target_wavelengths=target_wl,
method='cubic' # Smooth interpolation
)
crop_range (optional)
Tuple (min_wavelength, max_wavelength) to crop the original data before resampling.
Useful for focusing on specific spectral regions.
# Focus on mid-infrared region
resampler = Resampler(
target_wavelengths=np.linspace(1200, 2200, 100),
crop_range=(1100, 2300) # Crop before resampling
)
fill_value (default: 0)
Value to use for wavelengths outside the original range. Can be:
A number (e.g.,
0,np.nan)'extrapolate': Use scipy’s extrapolation
resampler = Resampler(
target_wavelengths=target_wl,
fill_value='extrapolate' # Extrapolate beyond original range
)
bounds_error (default: False)
If True, raise error when target wavelengths are outside original range.
If False, use fill_value instead.
resampler = Resampler(
target_wavelengths=target_wl,
bounds_error=True # Strict: raise error if out of bounds
)
copy (default: True)
Whether to copy the input data before transformation.
Examples
Example 1: Downsampling for Faster Processing
from nirs4all.operators.transforms import Resampler
import numpy as np
# Reduce from 200 to 50 wavelengths
target_wl = np.linspace(1000, 2500, 50)
pipeline = [
MinMaxScaler(),
Resampler(target_wavelengths=target_wl, method='linear'),
StandardNormalVariate(),
ShuffleSplit(n_splits=5),
{"y_processing": MinMaxScaler()},
{"model": PLSRegression(n_components=10)},
]
Example 2: Upsampling for Higher Resolution
# Increase from 200 to 500 wavelengths using cubic interpolation
target_wl = np.linspace(1000, 2500, 500)
pipeline = [
Resampler(target_wavelengths=target_wl, method='cubic'),
# ... rest of pipeline
]
Example 3: Focusing on Specific Wavelength Range
# Focus on fingerprint region (1300-1800 nm)
target_wl = np.linspace(1300, 1800, 100)
pipeline = [
Resampler(
target_wavelengths=target_wl,
crop_range=(1250, 1850), # Crop with buffer
method='linear'
),
# ... rest of pipeline
]
Example 4: Multi-Source with Different Target Grids
# Different sampling for each source
target_wl_list = [
np.linspace(1000, 2500, 100), # Source 0: standard resolution
np.linspace(1100, 2300, 150), # Source 1: higher resolution in narrower range
]
pipeline = [
Resampler(target_wavelengths=target_wl_list),
# ... rest of pipeline
]
Example 5: Comparing Interpolation Methods
# Test different interpolation methods
target_wl = np.linspace(1000, 2500, 100)
pipeline = [
MinMaxScaler(),
ShuffleSplit(n_splits=3),
{"y_processing": MinMaxScaler()},
]
# Add models with different resampling methods
for method in ['linear', 'cubic', 'quadratic']:
pipeline.extend([
{"model": Resampler(target_wavelengths=target_wl, method=method),
"name": f"Resample_{method}"},
{"model": PLSRegression(n_components=15),
"name": f"PLS_with_{method}_resampling"}
])
How It Works
Wavelength Extraction: The controller automatically extracts wavelength information from
dataset.headers(source_idx)and converts to float.Validation: Ensures headers are numeric wavelengths (raises error if not convertible to float).
Cropping (optional): If
crop_rangeis specified, crops to that range first.Interpolation: Uses scipy’s
interp1dto interpolate each spectrum to the target wavelength grid.Header Update: Updates dataset headers with the new wavelength values.
Preprocessing Name: Adds a new preprocessing name like
"raw_Resampler_1"following the standard naming convention.
Controller Integration
The ResamplerController integrates the Resampler into the pipeline:
Multi-source support: ✓ Yes
Prediction mode support: ✓ Yes (stores interpolation parameters)
Priority: 15 (runs before most transformers)
Matching: Detects
Resamplerinstances in pipeline steps
Warnings and Errors
⚠️ Warnings
Extrapolation Warning: If target wavelengths extend beyond the original range, a warning is issued (unless
fill_value='extrapolate').
UserWarning: Target wavelengths extend below 1000.0 and above 2500.0 original range.
Using fill_value=0 for extrapolation.
❌ Errors
No wavelength overlap: If crop_range or target_wavelengths have no overlap with original wavelengths.
Invalid wavelengths: If dataset headers cannot be converted to float.
Shape mismatch: If transform data doesn’t match fitted dimensions.
Best Practices
Choose appropriate method:
Use
'linear'for most cases (fast, reliable)Use
'cubic'for smoother spectra (slower)Avoid
'nearest'unless you specifically want no interpolation
Consider computational cost:
Downsampling reduces computation time in subsequent steps
Upsampling increases computation time but may preserve more information
Validate wavelength ranges:
Ensure target wavelengths are within or close to original range
Use
crop_rangeto focus on regions of interest
Multi-source datasets:
Provide different target grids if sources have different spectral characteristics
Or use same grid to standardize across sources
Save resampled data:
The resampler parameters are automatically saved when using
save_artifacts=TrueThis ensures consistent resampling in prediction mode
Technical Details
Serialization
The resampler stores these parameters for prediction mode:
target_wavelengths: Target gridmethod: Interpolation methodfill_value: Fill value for out-of-boundsbounds_error: Bounds error flagcrop_mask_: Boolean mask for cropping (if used)
Performance
Time complexity: O(n_samples × n_features × n_target) for interpolation
Memory: Stores only the interpolation parameters, not full interpolator objects
Parallelization: Processes each sample independently (future: parallel processing)
See Also
Standard Normal Variate (SNV) Transformation - Spectral normalization
Preprocessing Cheatsheet - Preprocessing cheatsheet
Preprocessing - All preprocessing options
References
scipy.interpolate.interp1d documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html