Frequently Asked Questions
Common questions, errors, and solutions for nirs4all.
Installation
How do I install nirs4all?
pip install nirs4all
For GPU support with TensorFlow:
pip install nirs4all tensorflow[and-cuda]
How do I verify my installation?
nirs4all --test-install
This checks all dependencies and reports available frameworks.
Which Python versions are supported?
Python 3.11+ is required.
Do I need TensorFlow, PyTorch, or JAX?
No. nirs4all works with scikit-learn only. Deep learning frameworks are optional and only needed if you want to use neural network models.
Data Loading
What file formats are supported?
CSV (
.csv)Excel (
.xlsx,.xls)MATLAB (
.mat)NumPy (
.npy,.npz)Parquet (
.parquet)
See Loading Data for details.
How do I specify which column is the target variable?
from nirs4all.data import DatasetConfigs
dataset = DatasetConfigs(
"data.csv",
y_column="concentration", # Target column name
)
How do I handle multiple data sources?
dataset = DatasetConfigs([
{"path": "nir.csv", "source_name": "NIR"},
{"path": "raman.csv", "source_name": "Raman"},
])
Error: “Could not infer target column”
Your dataset doesn’t have a clear target column. Specify it explicitly:
dataset = DatasetConfigs("data.csv", y_column="my_target")
Error: “Sample count mismatch between X and y”
Your feature matrix and target array have different numbers of samples. Check your data file for:
Missing values
Misaligned rows
Header issues
Pipeline Execution
How do I run a basic pipeline?
import nirs4all
from sklearn.cross_decomposition import PLSRegression
from sklearn.model_selection import ShuffleSplit
pipeline = [
ShuffleSplit(n_splits=5, test_size=0.2, random_state=42),
PLSRegression(n_components=10),
]
result = nirs4all.run(
pipeline=pipeline,
dataset="path/to/data.csv"
)
Why do I need cross-validation in my pipeline?
Cross-validation (e.g., ShuffleSplit, KFold) is required to:
Split data into train/test sets
Evaluate model generalization
Generate out-of-fold predictions
How do I save my results?
Results are automatically saved when using PipelineRunner:
from nirs4all.pipeline import PipelineRunner
runner = PipelineRunner(
save_artifacts=True,
workspace_path="workspace/"
)
predictions, _ = runner.run(pipeline, dataset)
Error: “No splitter found in pipeline”
Add a cross-validation splitter before your model:
pipeline = [
ShuffleSplit(n_splits=5, random_state=42), # Add this
PLSRegression(n_components=10),
]
Error: “Pipeline must contain at least one model step”
Add a model to your pipeline:
pipeline = [
SNV(),
ShuffleSplit(n_splits=5, random_state=42),
PLSRegression(n_components=10), # Model step
]
Preprocessing
Which preprocessing should I use?
Data Issue |
Recommended Preprocessing |
|---|---|
Baseline drift |
|
Scatter effects |
|
Noise |
|
Scale differences |
|
Derivatives |
|
See Preprocessing Cheatsheet for model-specific recommendations.
Can I combine multiple preprocessings?
Yes, chain them in your pipeline:
pipeline = [
SNV(),
SavitzkyGolay(window_length=11, polyorder=2),
FirstDerivative(),
ShuffleSplit(n_splits=5, random_state=42),
PLSRegression(n_components=10),
]
How do I compare different preprocessings?
Use feature_augmentation:
pipeline = [
{"feature_augmentation": [SNV, Detrend, MSC], "action": "extend"},
ShuffleSplit(n_splits=5, random_state=42),
PLSRegression(n_components=10),
]
# Result will contain predictions for each preprocessing
Models
What models can I use?
Any scikit-learn compatible model:
Regression: PLSRegression, RandomForestRegressor, SVR, etc.
Classification: LogisticRegression, RandomForestClassifier, SVC, etc.
Deep Learning: nicon, decon (with TensorFlow/PyTorch/JAX)
How do I know if my task is regression or classification?
nirs4all auto-detects based on your target variable:
Continuous values → Regression
Discrete categories → Classification
Override with:
dataset = DatasetConfigs("data.csv", task="classification")
How do I tune hyperparameters?
Use finetune_params:
{
"model": PLSRegression(),
"finetune_params": {
"n_trials": 20,
"sample": "tpe",
"model_params": {
"n_components": ('int', 1, 20),
}
}
}
See Hyperparameter Tuning for details.
Error: “Model does not support classification”
Some models are regression-only. For classification, use:
nicon_classificationinstead ofniconRandomForestClassifierinstead ofRandomForestRegressor
Deep Learning
How do I use neural networks?
from nirs4all.operators.models.tensorflow.nicon import nicon
pipeline = [
MinMaxScaler(),
ShuffleSplit(n_splits=3, random_state=42),
{
'model': nicon,
'train_params': {'epochs': 50, 'verbose': 1}
}
]
Error: “TensorFlow not installed”
Install TensorFlow:
pip install tensorflow
Error: “CUDA out of memory”
Reduce batch size or use a smaller model:
{
'model': thin_nicon, # Smaller architecture
'train_params': {'batch_size': 8} # Smaller batches
}
Neural network training is slow
Enable GPU: Install
tensorflow[and-cuda]ortorchwith CUDAReduce epochs for quick tests
Use
hyperbandfor efficient hyperparameter search
Results and Visualization
How do I access prediction results?
result = nirs4all.run(pipeline, dataset)
# Best score
print(result.best_score)
# All predictions
for pred in result.predictions:
print(pred.get('rmse'))
# Top 5 configurations
for pred in result.top(5):
print(pred)
How do I visualize results?
from nirs4all.visualization.predictions import PredictionAnalyzer
analyzer = PredictionAnalyzer(result.predictions)
analyzer.plot_scatter()
analyzer.plot_top_k(k=10)
analyzer.plot_heatmap(x_var="model_name", y_var="preprocessings")
How do I export my model for production?
from nirs4all.pipeline.bundle import BundleManager
manager = BundleManager()
manager.export(
predictions=result.predictions,
export_path="my_model.n4a"
)
Performance
Pipeline is slow. How do I speed it up?
Reduce cross-validation folds:
n_splits=3instead ofn_splits=10Use fewer trials: Lower
n_trialsinfinetune_paramsEnable parallelization:
n_jobs=-1for sklearn modelsUse GPU: For neural networks
Reduce preprocessing combinations: Fewer items in
feature_augmentation
How much memory does nirs4all use?
Memory scales with:
Dataset size (samples × features)
Number of preprocessing variants
Model complexity
Cross-validation folds
For large datasets, process in batches or reduce n_splits.
Can I run pipelines in parallel?
Sklearn models support n_jobs=-1 for internal parallelization. Pipeline-level parallelism is planned for future releases.
Troubleshooting
Error: “No module named ‘nirs4all’”
Install nirs4all:
pip install nirs4all
Error: “AttributeError: module ‘nirs4all’ has no attribute…”
You may have an outdated version. Update:
pip install --upgrade nirs4all
Plots don’t display
In scripts: Add
plt.show()at the endIn Jupyter: Use
%matplotlib inlineSet
plots_visible=Trueinnirs4all.run()
Results are NaN or infinite
Check your data for:
Missing values
Infinite values
Division by zero in preprocessing
Incompatible target scale
import numpy as np
# Check data
print(np.isnan(X).any()) # NaN check
print(np.isinf(X).any()) # Inf check
Memory error
Reduce memory usage:
# Smaller cross-validation
ShuffleSplit(n_splits=3, test_size=0.2) # Instead of 10 folds
# Process fewer variants
{"feature_augmentation": [SNV, Detrend]} # Instead of 10 preprocessings
Best Practices
Preprocessing
Always scale for neural networks: Use
MinMaxScalerorStandardScalerSNV before derivatives: Apply scatter correction first
Don’t over-preprocess: More isn’t always better
Match preprocessing to model: See Preprocessing Cheatsheet
Cross-Validation
Use enough folds: Minimum 3, recommended 5-10
Set random_state: For reproducibility
Use stratification for classification:
StratifiedKFoldConsider group structure: Use
GroupKFoldfor grouped samples
Model Selection
Start with PLS: Reliable baseline for NIRS
Compare multiple models: Use branching
Don’t overtune: More trials ≠ better results
Validate on held-out data: Don’t trust only CV scores
Reproducibility
Set random seeds:
random_state=42everywhereSave artifacts:
save_artifacts=TrueVersion your data: Track dataset versions
Export configurations: Save pipeline YAML
Getting Help
Where can I find more examples?
examples/user/- User tutorialsexamples/developer/- Advanced examplesexamples/reference/- Reference implementations
How do I report a bug?
Open an issue on GitHub with:
nirs4all version (
nirs4all --version)Python version
Error message and traceback
Minimal reproducible example
Where can I ask questions?
GitHub Discussions
GitHub Issues (for bugs)
See Also
Installation - Installation guide
Quickstart - Quick start tutorial
Migration Guide - Migration guides
Dataset Configuration Troubleshooting Guide - Dataset troubleshooting