# Quickstart

Get up and running with NIRS4ALL in 5 minutes. This guide walks you through your first complete pipeline.

## Prerequisites

- NIRS4ALL installed (see {doc}`installation`)
- Python 3.9+

## Your First Pipeline

### Step 1: Import Libraries

```python
import nirs4all
from sklearn.preprocessing import MinMaxScaler
from sklearn.cross_decomposition import PLSRegression
from sklearn.model_selection import ShuffleSplit
```

### Step 2: Define a Pipeline

A pipeline is a list of processing steps:

```python
pipeline = [
    MinMaxScaler(),                              # Scale features to [0, 1]
    {"y_processing": MinMaxScaler()},            # Scale targets
    ShuffleSplit(n_splits=3, test_size=0.25),    # 3-fold cross-validation
    {"model": PLSRegression(n_components=10)}    # PLS model
]
```

### Step 3: Run the Pipeline

Use `nirs4all.run()` to train with one function call:

```python
result = nirs4all.run(
    pipeline=pipeline,
    dataset="sample_data/regression",   # Path to your data
    name="MyFirstPipeline",
    verbose=1
)
```

### Step 4: View Results

```python
# Check overall performance
print(f"Best RMSE: {result.best_rmse:.4f}")
print(f"Best R²: {result.best_r2:.4f}")
print(f"Number of predictions: {result.num_predictions}")

# Get top 3 models
for pred in result.top(n=3, display_metrics=['rmse', 'r2']):
    print(f"  {pred['model_name']}: RMSE={pred['rmse']:.4f}, R²={pred['r2']:.4f}")
```

### Step 4b: Understand Prediction Entries

Each prediction returned by `top()` is a dictionary with detailed information:

```python
# Get the best prediction
best = result.best

# Core identification
print(f"Model: {best['model_name']}")
print(f"Dataset: {best['dataset_name']}")
print(f"Fold: {best['fold_id']}")
print(f"Preprocessing: {best.get('preprocessings', 'none')}")

# Scores by partition (primary metric, always available)
print(f"Primary metric: {best['metric']}")
print(f"Train: {best['train_score']:.6f}")
print(f"Val: {best['val_score']:.6f}")
print(f"Test: {best['test_score']:.6f}")

# Additional metrics (when using display_metrics)
print(f"RMSE: {best.get('rmse', 0):.4f}")
print(f"R²: {best.get('r2', 0):.4f}")
```

**Key fields in each prediction entry:**

| Field | Description |
|-------|-------------|
| `model_name` | Name of the model (e.g., "PLSRegression") |
| `model_classname` | Class name of the model |
| `dataset_name` | Dataset name |
| `fold_id` | Cross-validation fold index |
| `preprocessings` | Preprocessing steps applied |
| `metric` | Primary metric name (e.g., 'mse') |
| `train_score`, `val_score`, `test_score` | Scores by partition (primary metric) |
| `rmse`, `r2`, `mse`, `mae` | Metrics (when using `display_metrics`) |
| `n_samples`, `n_features` | Data shape info |
| `task_type` | 'regression' or 'classification' |

### Step 5: Export for Production

```python
# Export the best model for later use
result.export("exports/my_model.n4a")
```

## Complete Example

Here's the complete code you can copy and run:

```python
"""My first NIRS4ALL pipeline."""

import nirs4all
from sklearn.preprocessing import MinMaxScaler
from sklearn.cross_decomposition import PLSRegression
from sklearn.model_selection import ShuffleSplit

# Generate synthetic NIRS data (or use your own dataset path)
dataset = nirs4all.generate.regression(
    n_samples=200,
    target_component=0,
    random_state=42
)

# Define pipeline
pipeline = [
    MinMaxScaler(),                              # Scale features
    {"y_processing": MinMaxScaler()},            # Scale targets
    ShuffleSplit(n_splits=3, test_size=0.25),    # Cross-validation
    {"model": PLSRegression(n_components=10)}    # Model
]

# Run pipeline
result = nirs4all.run(
    pipeline=pipeline,
    dataset=dataset,
    name="MyFirstPipeline",
    verbose=1
)

# View results
print(f"\n📊 Results:")
print(f"   Best RMSE: {result.best_rmse:.4f}")
print(f"   Best R²: {result.best_r2:.4f}")
print(f"   Total predictions: {result.num_predictions}")

# Top models with detailed metrics
print("\n🏆 Top 3 Models:")
for i, pred in enumerate(result.top(n=3, display_metrics=['rmse', 'r2']), 1):
    print(f"   {i}. {pred['model_name']}: RMSE={pred['rmse']:.4f}, R²={pred['r2']:.4f}")

# Explore the best prediction entry
print("\n📦 Best prediction details:")
best = result.best
print(f"   Model: {best['model_name']}")
print(f"   Dataset: {best['dataset_name']}")
print(f"   Fold: {best['fold_id']}")
print(f"   Metric: {best['metric']}")

# Access partition-specific scores (primary metric)
print(f"   Train: {best['train_score']:.6f}")
print(f"   Val: {best['val_score']:.6f}")
print(f"   Test: {best['test_score']:.6f}")

# Export best model
result.export("exports/my_model.n4a")
print("\n✅ Model exported to exports/my_model.n4a")
```

## Add NIRS-Specific Preprocessing

NIRS data benefits from specialized preprocessing. Try this enhanced pipeline:

```python
from nirs4all.operators.transforms import (
    StandardNormalVariate,
    FirstDerivative
)

pipeline = [
    MinMaxScaler(),                              # Feature scaling
    StandardNormalVariate(),                     # SNV: scatter correction
    FirstDerivative(),                           # Enhance spectral features
    {"y_processing": MinMaxScaler()},            # Target scaling
    ShuffleSplit(n_splits=3),                    # Cross-validation
    {"model": PLSRegression(n_components=10)}    # Model
]

result = nirs4all.run(
    pipeline=pipeline,
    dataset="sample_data/regression",
    name="NIRSPipeline",
    verbose=1
)
```

## Using Your Own Data

Replace the sample data with your own:

```python
# From a CSV file
result = nirs4all.run(pipeline, dataset="path/to/your/data.csv")

# From a folder
result = nirs4all.run(pipeline, dataset="path/to/data_folder/")

# With explicit configuration
from nirs4all.data import DatasetConfigs

dataset = DatasetConfigs({
    "train_x": "spectra.csv",
    "train_y": "targets.csv",
})
result = nirs4all.run(pipeline, dataset=dataset)
```

## No Data? Generate Synthetic NIRS Spectra

Get started immediately with realistic synthetic data:

```python
import nirs4all

# Generate synthetic NIRS data with known ground truth
dataset = nirs4all.generate.regression(
    n_samples=500,
    components=["water", "protein", "lipid"],
    complexity="realistic",
    random_state=42
)

# Use directly in pipelines
result = nirs4all.run(
    pipeline=[
        MinMaxScaler(),
        ShuffleSplit(n_splits=3),
        {"model": PLSRegression(n_components=10)}
    ],
    dataset=dataset
)

print(f"RMSE: {result.best_rmse:.4f}")
```

Synthetic data is perfect for:
- Learning and experimentation
- Testing preprocessing pipelines
- Prototyping before real data arrives
- Reproducible unit tests

See {doc}`/user_guide/data/synthetic_data` for full documentation.

## Compare Multiple Models

Run and compare different models in one pipeline:

```python
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import Ridge

pipeline = [
    MinMaxScaler(),
    ShuffleSplit(n_splits=3),

    # Multiple models - each is evaluated
    {"model": PLSRegression(n_components=5)},
    {"model": PLSRegression(n_components=10)},
    {"model": PLSRegression(n_components=15)},
    {"model": Ridge(alpha=1.0)},
    {"model": RandomForestRegressor(n_estimators=100)},
]

result = nirs4all.run(
    pipeline=pipeline,
    dataset="sample_data/regression",
    name="MultiModel",
    verbose=1
)

# See which model performed best
for pred in result.top(n=5, display_metrics=['rmse', 'r2']):
    print(f"{pred['model_name']}: RMSE={pred['rmse']:.4f}")
```

## Run Multiple Pipelines at Once

Pass a list of pipelines to execute them all independently:

```python
# Define different pipeline strategies
pipeline_pls = [
    MinMaxScaler(),
    ShuffleSplit(n_splits=3),
    {"model": PLSRegression(n_components=10)}
]

pipeline_rf = [
    StandardScaler(),
    ShuffleSplit(n_splits=3),
    {"model": RandomForestRegressor(n_estimators=100)}
]

pipeline_ridge = [
    MinMaxScaler(),
    FirstDerivative(),
    ShuffleSplit(n_splits=3),
    {"model": Ridge(alpha=1.0)}
]

# Run all three pipelines with one call
result = nirs4all.run(
    pipeline=[pipeline_pls, pipeline_rf, pipeline_ridge],  # List of pipelines
    dataset="sample_data/regression",
    verbose=1
)

print(f"Total configurations tested: {result.num_predictions}")
print(f"Best RMSE: {result.best_rmse:.4f}")
```

## Run on Multiple Datasets

Test the same pipeline(s) across different datasets:

```python
# Cartesian product: each pipeline × each dataset
result = nirs4all.run(
    pipeline=[pipeline_pls, pipeline_rf],   # 2 pipelines
    dataset=["data/wheat", "data/corn"],    # 2 datasets
    verbose=1
)
# Runs 4 combinations: PLS×wheat, PLS×corn, RF×wheat, RF×corn

print(f"Tested {result.num_predictions} configurations")
```

## Visualize Results

Create publication-quality visualizations:

```python
from nirs4all.visualization.predictions import PredictionAnalyzer

analyzer = PredictionAnalyzer(result.predictions)

# Predicted vs actual plot for top models
fig1 = analyzer.plot_top_k(k=3, rank_metric='rmse')

# Compare models with candlestick chart
fig2 = analyzer.plot_candlestick(variable="model_name")

# Show all plots
import matplotlib.pyplot as plt
plt.show()
```

## What's Next?

::::{grid} 2
:gutter: 3

:::{grid-item-card} 📚 Core Concepts
:link: concepts
:link-type: doc

Understand pipelines, datasets, and execution flow.
:::

:::{grid-item-card} 📖 User Guide
:link: /user_guide/index
:link-type: doc

Learn preprocessing, stacking, and deployment.
:::

:::{grid-item-card} 📝 Examples
:link: /examples/index
:link-type: doc

50+ working examples organized by topic.
:::

:::{grid-item-card} 📋 Pipeline Syntax
:link: /reference/pipeline_syntax
:link-type: doc

Complete pipeline syntax reference.
:::

::::

## Key Takeaways

1. **Pipelines are lists** of processing steps
2. **One function** (`nirs4all.run()`) handles everything
3. **Results are accessible** via `result.best_rmse`, `result.best_r2`, `result.top()`, etc.
4. **Prediction entries are dicts** with model_name, dataset_name, fold_id, scores, and more
5. **Detailed scores** are available via `pred['scores']['train'/'val'/'test']['rmse'/'r2'/...]`
6. **Export models** with `result.export()` for deployment
7. **NIRS preprocessing** (SNV, derivatives) improves spectral analysis

## See Also

- {doc}`concepts` - Understanding SpectroDataset and pipelines
- {doc}`/user_guide/preprocessing/overview` - NIRS preprocessing techniques
- {doc}`/reference/pipeline_syntax` - Complete syntax reference