Frequently Asked Questions

Common questions, errors, and solutions for nirs4all.

Installation

How do I install nirs4all?

pip install nirs4all

For GPU support with TensorFlow:

pip install nirs4all tensorflow[and-cuda]

How do I verify my installation?

nirs4all --test-install

This checks all dependencies and reports available frameworks.

Which Python versions are supported?

Python 3.11+ is required.

Do I need TensorFlow, PyTorch, or JAX?

No. nirs4all works with scikit-learn only. Deep learning frameworks are optional and only needed if you want to use neural network models.

Data Loading

What file formats are supported?

CSV (.csv)
Excel (.xlsx, .xls)
MATLAB (.mat)
NumPy (.npy, .npz)
Parquet (.parquet)

See Loading Data for details.

How do I specify which column is the target variable?

from nirs4all.data import DatasetConfigs

dataset = DatasetConfigs(
    "data.csv",
    y_column="concentration",  # Target column name
)

How do I handle multiple data sources?

dataset = DatasetConfigs([
    {"path": "nir.csv", "source_name": "NIR"},
    {"path": "raman.csv", "source_name": "Raman"},
])

Error: “Could not infer target column”

Your dataset doesn’t have a clear target column. Specify it explicitly:

dataset = DatasetConfigs("data.csv", y_column="my_target")

Error: “Sample count mismatch between X and y”

Your feature matrix and target array have different numbers of samples. Check your data file for:

Missing values
Misaligned rows
Header issues

Pipeline Execution

How do I run a basic pipeline?

import nirs4all
from sklearn.cross_decomposition import PLSRegression
from sklearn.model_selection import ShuffleSplit

pipeline = [
    ShuffleSplit(n_splits=5, test_size=0.2, random_state=42),
    PLSRegression(n_components=10),
]

result = nirs4all.run(
    pipeline=pipeline,
    dataset="path/to/data.csv"
)

Why do I need cross-validation in my pipeline?

Cross-validation (e.g., ShuffleSplit, KFold) is required to:

Split data into train/test sets
Evaluate model generalization
Generate out-of-fold predictions

How do I save my results?

Results are automatically saved when using PipelineRunner:

from nirs4all.pipeline import PipelineRunner

runner = PipelineRunner(
    save_artifacts=True,
    workspace_path="workspace/"
)
predictions, _ = runner.run(pipeline, dataset)

Error: “No splitter found in pipeline”

Add a cross-validation splitter before your model:

pipeline = [
    ShuffleSplit(n_splits=5, random_state=42),  # Add this
    PLSRegression(n_components=10),
]

Error: “Pipeline must contain at least one model step”

Add a model to your pipeline:

pipeline = [
    SNV(),
    ShuffleSplit(n_splits=5, random_state=42),
    PLSRegression(n_components=10),  # Model step
]

Preprocessing

Which preprocessing should I use?

Data Issue	Recommended Preprocessing
Baseline drift	`Detrend`, `BaselineCorrection`
Scatter effects	`SNV`, `MSC`
Noise	`SavitzkyGolay`, `Gaussian`
Scale differences	`StandardScaler`, `MinMaxScaler`
Derivatives	`FirstDerivative`, `SecondDerivative`

See Preprocessing Cheatsheet for model-specific recommendations.

Can I combine multiple preprocessings?

Yes, chain them in your pipeline:

pipeline = [
    SNV(),
    SavitzkyGolay(window_length=11, polyorder=2),
    FirstDerivative(),
    ShuffleSplit(n_splits=5, random_state=42),
    PLSRegression(n_components=10),
]

How do I compare different preprocessings?

Use feature_augmentation:

pipeline = [
    {"feature_augmentation": [SNV, Detrend, MSC], "action": "extend"},
    ShuffleSplit(n_splits=5, random_state=42),
    PLSRegression(n_components=10),
]

# Result will contain predictions for each preprocessing

Models

What models can I use?

Any scikit-learn compatible model:

Regression: PLSRegression, RandomForestRegressor, SVR, etc.
Classification: LogisticRegression, RandomForestClassifier, SVC, etc.
Deep Learning: nicon, decon (with TensorFlow/PyTorch/JAX)

How do I know if my task is regression or classification?

nirs4all auto-detects based on your target variable:

Continuous values → Regression
Discrete categories → Classification

Override with:

dataset = DatasetConfigs("data.csv", task="classification")

How do I tune hyperparameters?

Use finetune_params:

{
    "model": PLSRegression(),
    "finetune_params": {
        "n_trials": 20,
        "sample": "tpe",
        "model_params": {
            "n_components": ('int', 1, 20),
        }
    }
}

See Hyperparameter Tuning for details.

Error: “Model does not support classification”

Some models are regression-only. For classification, use:

nicon_classification instead of nicon
RandomForestClassifier instead of RandomForestRegressor

Deep Learning

How do I use neural networks?

from nirs4all.operators.models.tensorflow.nicon import nicon

pipeline = [
    MinMaxScaler(),
    ShuffleSplit(n_splits=3, random_state=42),
    {
        'model': nicon,
        'train_params': {'epochs': 50, 'verbose': 1}
    }
]

Error: “TensorFlow not installed”

Install TensorFlow:

pip install tensorflow

Error: “CUDA out of memory”

Reduce batch size or use a smaller model:

{
    'model': thin_nicon,  # Smaller architecture
    'train_params': {'batch_size': 8}  # Smaller batches
}

Neural network training is slow

Enable GPU: Install tensorflow[and-cuda] or torch with CUDA
Reduce epochs for quick tests
Use hyperband for efficient hyperparameter search

Results and Visualization

How do I access prediction results?

result = nirs4all.run(pipeline, dataset)

# Best score
print(result.best_score)

# All predictions
for pred in result.predictions:
    print(pred.get('rmse'))

# Top 5 configurations
for pred in result.top(5):
    print(pred)

How do I visualize results?

from nirs4all.visualization.predictions import PredictionAnalyzer

analyzer = PredictionAnalyzer(result.predictions)
analyzer.plot_scatter()
analyzer.plot_top_k(k=10)
analyzer.plot_heatmap(x_var="model_name", y_var="preprocessings")

How do I export my model for production?

from nirs4all.pipeline.bundle import BundleManager

manager = BundleManager()
manager.export(
    predictions=result.predictions,
    export_path="my_model.n4a"
)

Performance

Pipeline is slow. How do I speed it up?

Reduce cross-validation folds: n_splits=3 instead of n_splits=10
Use fewer trials: Lower n_trials in finetune_params
Enable parallelization: n_jobs=-1 for sklearn models
Use GPU: For neural networks
Reduce preprocessing combinations: Fewer items in feature_augmentation

How much memory does nirs4all use?

Memory scales with:

Dataset size (samples × features)
Number of preprocessing variants
Model complexity
Cross-validation folds

For large datasets, process in batches or reduce n_splits.

Can I run pipelines in parallel?

Sklearn models support n_jobs=-1 for internal parallelization. Pipeline-level parallelism is planned for future releases.

Troubleshooting

Error: “No module named ‘nirs4all’”

Install nirs4all:

pip install nirs4all

Error: “AttributeError: module ‘nirs4all’ has no attribute…”

You may have an outdated version. Update:

pip install --upgrade nirs4all

Plots don’t display

In scripts: Add plt.show() at the end
In Jupyter: Use %matplotlib inline
Set plots_visible=True in nirs4all.run()

Results are NaN or infinite

Check your data for:

Missing values
Infinite values
Division by zero in preprocessing
Incompatible target scale

import numpy as np

# Check data
print(np.isnan(X).any())  # NaN check
print(np.isinf(X).any())  # Inf check

Memory error

Reduce memory usage:

# Smaller cross-validation
ShuffleSplit(n_splits=3, test_size=0.2)  # Instead of 10 folds

# Process fewer variants
{"feature_augmentation": [SNV, Detrend]}  # Instead of 10 preprocessings

Best Practices

Preprocessing

Always scale for neural networks: Use MinMaxScaler or StandardScaler
SNV before derivatives: Apply scatter correction first
Don’t over-preprocess: More isn’t always better
Match preprocessing to model: See Preprocessing Cheatsheet

Cross-Validation

Use enough folds: Minimum 3, recommended 5-10
Set random_state: For reproducibility
Use stratification for classification: StratifiedKFold
Consider group structure: Use GroupKFold for grouped samples

Model Selection

Start with PLS: Reliable baseline for NIRS
Compare multiple models: Use branching
Don’t overtune: More trials ≠ better results
Validate on held-out data: Don’t trust only CV scores

Reproducibility

Set random seeds: random_state=42 everywhere
Save artifacts: save_artifacts=True
Version your data: Track dataset versions
Export configurations: Save pipeline YAML

Getting Help

Where can I find more examples?

examples/user/ - User tutorials
examples/developer/ - Advanced examples
examples/reference/ - Reference implementations

How do I report a bug?

Open an issue on GitHub with:

nirs4all version (nirs4all --version)
Python version
Error message and traceback
Minimal reproducible example

Where can I ask questions?

GitHub Discussions
GitHub Issues (for bugs)

Frequently Asked Questions

Installation

How do I install nirs4all?

How do I verify my installation?

Which Python versions are supported?

Do I need TensorFlow, PyTorch, or JAX?

Data Loading

What file formats are supported?

How do I specify which column is the target variable?

How do I handle multiple data sources?

Error: “Could not infer target column”

Error: “Sample count mismatch between X and y”

Pipeline Execution

How do I run a basic pipeline?

Why do I need cross-validation in my pipeline?

How do I save my results?

Error: “No splitter found in pipeline”

Error: “Pipeline must contain at least one model step”

Preprocessing

Which preprocessing should I use?

Can I combine multiple preprocessings?

How do I compare different preprocessings?

Models

What models can I use?

How do I know if my task is regression or classification?

How do I tune hyperparameters?

Error: “Model does not support classification”

Deep Learning

How do I use neural networks?

Error: “TensorFlow not installed”

Error: “CUDA out of memory”

Neural network training is slow

Results and Visualization

How do I access prediction results?

How do I visualize results?

How do I export my model for production?

Performance

Pipeline is slow. How do I speed it up?

How much memory does nirs4all use?

Can I run pipelines in parallel?

Troubleshooting

Error: “No module named ‘nirs4all’”

Error: “AttributeError: module ‘nirs4all’ has no attribute…”

Plots don’t display

Results are NaN or infinite

Memory error

Best Practices

Preprocessing

Cross-Validation

Model Selection

Reproducibility

Getting Help

Where can I find more examples?

How do I report a bug?

Where can I ask questions?

See Also