Resampler - Wavelength Grid Resampling

The Resampler operator allows you to resample spectral data to a different wavelength grid using various interpolation methods from scipy. This is particularly useful for:

  • Standardizing wavelength grids across different instruments or datasets

  • Reducing dimensionality by downsampling to fewer wavelengths

  • Increasing resolution by upsampling with interpolation

  • Focusing on specific regions by cropping and resampling

  • Preparing data for transfer learning across different spectral resolutions

Basic Usage

from nirs4all.operators.transforms import Resampler
import numpy as np

# Define target wavelength grid
target_wavelengths = np.linspace(1000, 2500, 100)

# Create resampler
resampler = Resampler(
    target_wavelengths=target_wavelengths,
    method='linear'
)

# Use in pipeline
pipeline = [
    MinMaxScaler(),
    resampler,
    StandardNormalVariate(),
    # ... rest of pipeline
]

Parameters

target_wavelengths (required)

Array-like or list of target wavelength values. Can be:

  • Single array: same grid for all sources

  • List of arrays: different grid per source (for multi-source datasets)

# Same grid for all sources
target_wl = np.linspace(1000, 2500, 100)
resampler = Resampler(target_wavelengths=target_wl)

# Different grid per source
target_wl_list = [
    np.linspace(1000, 2500, 100),  # Source 0
    np.linspace(1100, 2400, 120),  # Source 1
]
resampler = Resampler(target_wavelengths=target_wl_list)

method (default: ‘linear’)

Interpolation method from scipy.interpolate.interp1d. Options include:

  • 'linear': Linear interpolation (fast, good for most cases)

  • 'cubic': Cubic spline interpolation (smoother)

  • 'quadratic': Quadratic interpolation

  • 'nearest': Nearest neighbor (no interpolation)

  • 'slinear', 'zero', 'previous', 'next': Other scipy methods

resampler = Resampler(
    target_wavelengths=target_wl,
    method='cubic'  # Smooth interpolation
)

crop_range (optional)

Tuple (min_wavelength, max_wavelength) to crop the original data before resampling. Useful for focusing on specific spectral regions.

# Focus on mid-infrared region
resampler = Resampler(
    target_wavelengths=np.linspace(1200, 2200, 100),
    crop_range=(1100, 2300)  # Crop before resampling
)

fill_value (default: 0)

Value to use for wavelengths outside the original range. Can be:

  • A number (e.g., 0, np.nan)

  • 'extrapolate': Use scipy’s extrapolation

resampler = Resampler(
    target_wavelengths=target_wl,
    fill_value='extrapolate'  # Extrapolate beyond original range
)

bounds_error (default: False)

If True, raise error when target wavelengths are outside original range. If False, use fill_value instead.

resampler = Resampler(
    target_wavelengths=target_wl,
    bounds_error=True  # Strict: raise error if out of bounds
)

copy (default: True)

Whether to copy the input data before transformation.

Examples

Example 1: Downsampling for Faster Processing

from nirs4all.operators.transforms import Resampler
import numpy as np

# Reduce from 200 to 50 wavelengths
target_wl = np.linspace(1000, 2500, 50)

pipeline = [
    MinMaxScaler(),
    Resampler(target_wavelengths=target_wl, method='linear'),
    StandardNormalVariate(),
    ShuffleSplit(n_splits=5),
    {"y_processing": MinMaxScaler()},
    {"model": PLSRegression(n_components=10)},
]

Example 2: Upsampling for Higher Resolution

# Increase from 200 to 500 wavelengths using cubic interpolation
target_wl = np.linspace(1000, 2500, 500)

pipeline = [
    Resampler(target_wavelengths=target_wl, method='cubic'),
    # ... rest of pipeline
]

Example 3: Focusing on Specific Wavelength Range

# Focus on fingerprint region (1300-1800 nm)
target_wl = np.linspace(1300, 1800, 100)

pipeline = [
    Resampler(
        target_wavelengths=target_wl,
        crop_range=(1250, 1850),  # Crop with buffer
        method='linear'
    ),
    # ... rest of pipeline
]

Example 4: Multi-Source with Different Target Grids

# Different sampling for each source
target_wl_list = [
    np.linspace(1000, 2500, 100),  # Source 0: standard resolution
    np.linspace(1100, 2300, 150),  # Source 1: higher resolution in narrower range
]

pipeline = [
    Resampler(target_wavelengths=target_wl_list),
    # ... rest of pipeline
]

Example 5: Comparing Interpolation Methods

# Test different interpolation methods
target_wl = np.linspace(1000, 2500, 100)

pipeline = [
    MinMaxScaler(),
    ShuffleSplit(n_splits=3),
    {"y_processing": MinMaxScaler()},
]

# Add models with different resampling methods
for method in ['linear', 'cubic', 'quadratic']:
    pipeline.extend([
        {"model": Resampler(target_wavelengths=target_wl, method=method),
         "name": f"Resample_{method}"},
        {"model": PLSRegression(n_components=15),
         "name": f"PLS_with_{method}_resampling"}
    ])

How It Works

  1. Wavelength Extraction: The controller automatically extracts wavelength information from dataset.headers(source_idx) and converts to float.

  2. Validation: Ensures headers are numeric wavelengths (raises error if not convertible to float).

  3. Cropping (optional): If crop_range is specified, crops to that range first.

  4. Interpolation: Uses scipy’s interp1d to interpolate each spectrum to the target wavelength grid.

  5. Header Update: Updates dataset headers with the new wavelength values.

  6. Preprocessing Name: Adds a new preprocessing name like "raw_Resampler_1" following the standard naming convention.

Controller Integration

The ResamplerController integrates the Resampler into the pipeline:

  • Multi-source support: ✓ Yes

  • Prediction mode support: ✓ Yes (stores interpolation parameters)

  • Priority: 15 (runs before most transformers)

  • Matching: Detects Resampler instances in pipeline steps

Warnings and Errors

⚠️ Warnings

  • Extrapolation Warning: If target wavelengths extend beyond the original range, a warning is issued (unless fill_value='extrapolate').

UserWarning: Target wavelengths extend below 1000.0 and above 2500.0 original range.
Using fill_value=0 for extrapolation.

❌ Errors

  • No wavelength overlap: If crop_range or target_wavelengths have no overlap with original wavelengths.

  • Invalid wavelengths: If dataset headers cannot be converted to float.

  • Shape mismatch: If transform data doesn’t match fitted dimensions.

Best Practices

  1. Choose appropriate method:

    • Use 'linear' for most cases (fast, reliable)

    • Use 'cubic' for smoother spectra (slower)

    • Avoid 'nearest' unless you specifically want no interpolation

  2. Consider computational cost:

    • Downsampling reduces computation time in subsequent steps

    • Upsampling increases computation time but may preserve more information

  3. Validate wavelength ranges:

    • Ensure target wavelengths are within or close to original range

    • Use crop_range to focus on regions of interest

  4. Multi-source datasets:

    • Provide different target grids if sources have different spectral characteristics

    • Or use same grid to standardize across sources

  5. Save resampled data:

    • The resampler parameters are automatically saved when using save_artifacts=True

    • This ensures consistent resampling in prediction mode

Technical Details

Serialization

The resampler stores these parameters for prediction mode:

  • target_wavelengths: Target grid

  • method: Interpolation method

  • fill_value: Fill value for out-of-bounds

  • bounds_error: Bounds error flag

  • crop_mask_: Boolean mask for cropping (if used)

Performance

  • Time complexity: O(n_samples × n_features × n_target) for interpolation

  • Memory: Stores only the interpolation parameters, not full interpolator objects

  • Parallelization: Processes each sample independently (future: parallel processing)

See Also

References

  • scipy.interpolate.interp1d documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html