# Logging System User Guide

This guide explains how to use the nirs4all logging system for structured, configurable output.

## Overview

The nirs4all logging system provides:

- **Human-readable console output** optimized for researchers
- **Machine-parseable file logging** for automation and analysis
- **Progress bars** with TTY-aware display
- **Context tracking** for runs, branches, and sources
- **ASCII-safe output** for HPC/cluster environments

## Quick Start

```python
from nirs4all.pipeline import PipelineRunner

# Basic usage - logging is configured automatically
runner = PipelineRunner(verbose=1)
predictions, _ = runner.run(pipeline, dataset)

# With logging options
runner = PipelineRunner(
    verbose=2,              # Detailed output
    log_file=True,          # Write to workspace/logs/
    log_format="pretty",    # Human-readable format
    use_unicode=True,       # Use Unicode symbols
    use_colors=True,        # ANSI colors
)
```

## Verbosity Levels

| Level | Name | Use Case |
|-------|------|----------|
| 0 | Quiet | Silent operation, errors only. Best for production/notebooks |
| 1 | Standard | Key milestones and results. **Recommended for research** |
| 2 | Debug | Detailed operation, troubleshooting |
| 3 | Trace | Full trace with per-fold/per-step details |

### What you see at each level

**verbose=0 (Quiet)**
```
Only warnings and errors - no progress information.
```

**verbose=1 (Standard)**
```
> Loading data...
  [OK] Loaded dataset: 3,482 samples x 2,150 features

> Evaluating pipelines...
  * Progress: 21/42 (50%) -- best RMSE: 0.389
  [OK] Evaluation complete

> Training best model...
  [OK] Model trained: CV_RMSE=0.381
```

**verbose=2 (Debug)**
```
Everything from verbose=1, plus:
- Configuration details (seeds, versions)
- Pipeline generation/pruning statistics
- Cache hits/misses
- Per-pipeline evaluation summaries
- Memory/GPU usage warnings
```

## Configuration Options

### PipelineRunner Parameters

```python
runner = PipelineRunner(
    # Verbosity
    verbose=1,              # 0-3, controls log level

    # File logging
    log_file=True,          # Write logs to files
    log_format="pretty",    # "pretty", "minimal", or "json"
    json_output=False,      # Also write JSON Lines file

    # Display settings
    use_unicode=True,       # Unicode symbols (False for ASCII)
    use_colors=True,        # ANSI colors (auto-detect TTY)
    show_progress_bar=True, # Show progress bars
)
```

### Environment Variables

Override settings via environment variables:

```bash
# Override log level
export NIRS4ALL_LOG_LEVEL=DEBUG

# Force ASCII-only output (for clusters)
export NIRS4ALL_ASCII_ONLY=1

# Disable colors
export NIRS4ALL_NO_COLOR=1
```

## Progress Bars

The logging system includes TTY-aware progress bars that automatically adapt to terminal capabilities.

### Basic Usage

```python
from nirs4all.core.logging import ProgressBar, EvaluationProgress

# Simple progress bar
with ProgressBar(total=100, description="Processing") as pbar:
    for i in range(100):
        # do work
        pbar.update(1)

# With iterator
for item in ProgressBar.wrap(items, description="Processing"):
    process(item)
```

### ML-Specific Evaluation Progress

```python
from nirs4all.core.logging import EvaluationProgress

# Track pipeline evaluation with best score
with EvaluationProgress(
    total_pipelines=42,
    metric_name="RMSE",
    higher_is_better=False
) as progress:
    for pipeline in pipelines:
        score = evaluate(pipeline)
        is_new_best = progress.update(score=score, pipeline_name=pipeline.name)
        if is_new_best:
            print(f"New best: {score}")
```

### Multi-Level Progress

For nested operations (datasets → pipelines → folds):

```python
from nirs4all.core.logging import MultiLevelProgress

progress = MultiLevelProgress(run_total=5, run_description="Datasets")

with progress.run_level() as run_pbar:
    for dataset in datasets:
        with progress.pipeline_level(total=10) as pipe_pbar:
            for pipeline in pipelines:
                with progress.fold_level(total=5) as fold_pbar:
                    for fold in folds:
                        # evaluate
                        fold_pbar.update(1)
                pipe_pbar.update(1)
        run_pbar.update(1)
```

### Spinner for Unknown Duration

```python
from nirs4all.core.logging import spinner

with spinner("Loading large dataset") as s:
    data = load_dataset()
    s.update("Parsing...")
    parsed = parse(data)
```

## File Logging

### Log File Location

When `log_file=True`, logs are written to:
```
{workspace}/logs/{run_id}.log      # Human-readable
{workspace}/logs/{run_id}.jsonl    # JSON Lines (if json_output=True)
```

### Log Rotation

Logs are automatically rotated based on:

- **Count**: Keep last N runs (default: 100)
- **Age**: Remove logs older than N days (default: 30)
- **Size**: Rotate when file exceeds N bytes (optional)

Old logs are compressed with gzip to save space.

```python
from nirs4all.core.logging import configure_logging

configure_logging(
    log_file=True,
    log_dir="./workspace/logs",
    max_log_runs=50,        # Keep last 50 runs
    max_log_age_days=14,    # Remove after 14 days
    max_log_bytes=10_000_000,  # Rotate at 10MB
    compress_logs=True,     # Gzip old logs
)
```

### JSON Lines Format

For integration with log aggregation systems (ELK, Loki, etc.):

```python
runner = PipelineRunner(
    log_file=True,
    json_output=True  # Write .jsonl file
)
```

JSON log entries look like:
```json
{"ts": "2025-12-16T19:12:03.041+01:00", "level": "INFO", "run_id": "R-20251216-191203", "message": "Loading data...", "phase": "data"}
{"ts": "2025-12-16T19:12:05.882+01:00", "level": "INFO", "run_id": "R-20251216-191203", "message": "Data loaded", "samples": 3482, "features": 2150}
```

## Context Tracking

### Run Context

Track entire runs for reproducibility:

```python
from nirs4all.core.logging import LogContext, get_logger

logger = get_logger(__name__)

with LogContext(run_id="experiment-001", project="protein-analysis"):
    logger.info("Starting analysis")
    # All logs include run_id
```

### Branch Context

Track pipeline branches:

```python
with LogContext.branch("snv", index=0, total=4):
    logger.info("Processing SNV preprocessing")
    # Output: [branch:snv] Processing SNV preprocessing
```

### Source Context

Track multi-source pipelines:

```python
with LogContext.source("NIR", index=0, total=3):
    logger.info("Processing NIR spectra")
    # Output: [source:0/NIR] Processing NIR spectra
```

## Module-Level Logging

For library code, use module-level loggers:

```python
from nirs4all.core.logging import get_logger

logger = get_logger(__name__)

def my_function():
    logger.info("Starting processing")
    logger.debug("Detailed info for debugging")
    logger.warning("Something unexpected happened")
    logger.success("Operation completed")  # [OK] prefix
```

### Available Methods

| Method | Level | Symbol | Use |
|--------|-------|--------|-----|
| `logger.info()` | INFO | (none) | General information |
| `logger.debug()` | DEBUG | (none) | Detailed debugging |
| `logger.warning()` | WARNING | `[!]` | Non-fatal issues |
| `logger.error()` | ERROR | `[X]` | Fatal errors |
| `logger.success()` | INFO | `[OK]` | Successful completion |
| `logger.starting()` | INFO | `>` | Starting an operation |
| `logger.progress()` | INFO | `*` | Progress updates (throttled) |

## HPC/Cluster Environments

For HPC systems without Unicode support:

```python
runner = PipelineRunner(
    use_unicode=False,  # ASCII-only symbols
    use_colors=False,   # No ANSI escape codes
)
```

Or set environment variables:
```bash
export NIRS4ALL_ASCII_ONLY=1
export NIRS4ALL_NO_COLOR=1
```

## Example Output

### Standard Run (verbose=1)

```
================================================================================
  nirs4all run: wheat_protein_analysis
  Started: 2025-12-16 19:12:03
================================================================================

> Loading data...
  [OK] Loaded wheat_nir: 3,482 samples x 2,150 features

> Building cross-validation splits...
  [OK] 5-fold GroupKFold ready

> Evaluating pipelines...
  * Progress: 21/42 (50%) -- best RMSE: 0.389
  [OK] Evaluation complete

> Training best model...
  [OK] Model trained: CV_RMSE=0.381

================================================================================
  [OK] Run completed in 2m 5.9s

  Best pipeline: SavGol(w=11) -> PCA(n=150) -> TabPFN
  Metrics: RMSE=0.381  R2=0.82
================================================================================
```

### With Branching (verbose=2)

```
> Entering branch block (4 branches)...
  |
  |-- [branch:snv] SNV preprocessing
  |   * fold 1/5: RMSE=0.412
  |   * fold 2/5: RMSE=0.398
  |   [OK] CV_RMSE=0.405
  |
  |-- [branch:msc] MSC preprocessing
  |   [OK] CV_RMSE=0.392
  |
  |-- [branch:savgol] Savitzky-Golay
  |   [OK] CV_RMSE=0.381  <- best
  |

> Branch comparison:
  +------------+----------+-------+
  | Branch     | CV_RMSE  | Rank  |
  +------------+----------+-------+
  | savgol     | 0.381    | 1     |
  | msc        | 0.392    | 2     |
  | snv        | 0.405    | 3     |
  +------------+----------+-------+
```

## Troubleshooting

### Logs not appearing

Check verbosity level:
```python
runner = PipelineRunner(verbose=1)  # INFO level
```

### Progress bars not working

Progress bars require a TTY. In non-interactive environments (notebooks, CI), they fall back to line-based updates.

### Unicode errors on cluster

```python
runner = PipelineRunner(use_unicode=False)
```

### Finding log files

```python
from nirs4all.core.logging import get_config

config = get_config()
if config._file_handler:
    print(f"Log file: {config._file_handler.get_log_file_path()}")
```