Logging System User Guide

This guide explains how to use the nirs4all logging system for structured, configurable output.

Overview

The nirs4all logging system provides:

Human-readable console output optimized for researchers
Machine-parseable file logging for automation and analysis
Progress bars with TTY-aware display
Context tracking for runs, branches, and sources
ASCII-safe output for HPC/cluster environments

Quick Start

from nirs4all.pipeline import PipelineRunner

# Basic usage - logging is configured automatically
runner = PipelineRunner(verbose=1)
predictions, _ = runner.run(pipeline, dataset)

# With logging options
runner = PipelineRunner(
    verbose=2,              # Detailed output
    log_file=True,          # Write to workspace/logs/
    log_format="pretty",    # Human-readable format
    use_unicode=True,       # Use Unicode symbols
    use_colors=True,        # ANSI colors
)

Verbosity Levels

Level	Name	Use Case
0	Quiet	Silent operation, errors only. Best for production/notebooks
1	Standard	Key milestones and results. Recommended for research
2	Debug	Detailed operation, troubleshooting
3	Trace	Full trace with per-fold/per-step details

What you see at each level

verbose=0 (Quiet)

Only warnings and errors - no progress information.

verbose=1 (Standard)

> Loading data...
  [OK] Loaded dataset: 3,482 samples x 2,150 features

> Evaluating pipelines...
  * Progress: 21/42 (50%) -- best RMSE: 0.389
  [OK] Evaluation complete

> Training best model...
  [OK] Model trained: CV_RMSE=0.381

verbose=2 (Debug)

Everything from verbose=1, plus:
- Configuration details (seeds, versions)
- Pipeline generation/pruning statistics
- Cache hits/misses
- Per-pipeline evaluation summaries
- Memory/GPU usage warnings

Configuration Options

PipelineRunner Parameters

runner = PipelineRunner(
    # Verbosity
    verbose=1,              # 0-3, controls log level

    # File logging
    log_file=True,          # Write logs to files
    log_format="pretty",    # "pretty", "minimal", or "json"
    json_output=False,      # Also write JSON Lines file

    # Display settings
    use_unicode=True,       # Unicode symbols (False for ASCII)
    use_colors=True,        # ANSI colors (auto-detect TTY)
    show_progress_bar=True, # Show progress bars
)

Environment Variables

Override settings via environment variables:

# Override log level
export NIRS4ALL_LOG_LEVEL=DEBUG

# Force ASCII-only output (for clusters)
export NIRS4ALL_ASCII_ONLY=1

# Disable colors
export NIRS4ALL_NO_COLOR=1

Progress Bars

The logging system includes TTY-aware progress bars that automatically adapt to terminal capabilities.

Basic Usage

from nirs4all.core.logging import ProgressBar, EvaluationProgress

# Simple progress bar
with ProgressBar(total=100, description="Processing") as pbar:
    for i in range(100):
        # do work
        pbar.update(1)

# With iterator
for item in ProgressBar.wrap(items, description="Processing"):
    process(item)

ML-Specific Evaluation Progress

from nirs4all.core.logging import EvaluationProgress

# Track pipeline evaluation with best score
with EvaluationProgress(
    total_pipelines=42,
    metric_name="RMSE",
    higher_is_better=False
) as progress:
    for pipeline in pipelines:
        score = evaluate(pipeline)
        is_new_best = progress.update(score=score, pipeline_name=pipeline.name)
        if is_new_best:
            print(f"New best: {score}")

Multi-Level Progress

For nested operations (datasets → pipelines → folds):

from nirs4all.core.logging import MultiLevelProgress

progress = MultiLevelProgress(run_total=5, run_description="Datasets")

with progress.run_level() as run_pbar:
    for dataset in datasets:
        with progress.pipeline_level(total=10) as pipe_pbar:
            for pipeline in pipelines:
                with progress.fold_level(total=5) as fold_pbar:
                    for fold in folds:
                        # evaluate
                        fold_pbar.update(1)
                pipe_pbar.update(1)
        run_pbar.update(1)

Spinner for Unknown Duration

from nirs4all.core.logging import spinner

with spinner("Loading large dataset") as s:
    data = load_dataset()
    s.update("Parsing...")
    parsed = parse(data)

File Logging

Log File Location

When log_file=True, logs are written to:

{workspace}/logs/{run_id}.log      # Human-readable
{workspace}/logs/{run_id}.jsonl    # JSON Lines (if json_output=True)

Log Rotation

Logs are automatically rotated based on:

Count: Keep last N runs (default: 100)
Age: Remove logs older than N days (default: 30)
Size: Rotate when file exceeds N bytes (optional)

Old logs are compressed with gzip to save space.

from nirs4all.core.logging import configure_logging

configure_logging(
    log_file=True,
    log_dir="./workspace/logs",
    max_log_runs=50,        # Keep last 50 runs
    max_log_age_days=14,    # Remove after 14 days
    max_log_bytes=10_000_000,  # Rotate at 10MB
    compress_logs=True,     # Gzip old logs
)

JSON Lines Format

For integration with log aggregation systems (ELK, Loki, etc.):

runner = PipelineRunner(
    log_file=True,
    json_output=True  # Write .jsonl file
)

JSON log entries look like:

{"ts": "2025-12-16T19:12:03.041+01:00", "level": "INFO", "run_id": "R-20251216-191203", "message": "Loading data...", "phase": "data"}
{"ts": "2025-12-16T19:12:05.882+01:00", "level": "INFO", "run_id": "R-20251216-191203", "message": "Data loaded", "samples": 3482, "features": 2150}

Context Tracking

Run Context

Track entire runs for reproducibility:

from nirs4all.core.logging import LogContext, get_logger

logger = get_logger(__name__)

with LogContext(run_id="experiment-001", project="protein-analysis"):
    logger.info("Starting analysis")
    # All logs include run_id

Branch Context

Track pipeline branches:

with LogContext.branch("snv", index=0, total=4):
    logger.info("Processing SNV preprocessing")
    # Output: [branch:snv] Processing SNV preprocessing

Source Context

Track multi-source pipelines:

with LogContext.source("NIR", index=0, total=3):
    logger.info("Processing NIR spectra")
    # Output: [source:0/NIR] Processing NIR spectra

Module-Level Logging

For library code, use module-level loggers:

from nirs4all.core.logging import get_logger

logger = get_logger(__name__)

def my_function():
    logger.info("Starting processing")
    logger.debug("Detailed info for debugging")
    logger.warning("Something unexpected happened")
    logger.success("Operation completed")  # [OK] prefix

Available Methods

Method	Level	Symbol	Use
`logger.info()`	INFO	(none)	General information
`logger.debug()`	DEBUG	(none)	Detailed debugging
`logger.warning()`	WARNING	`[!]`	Non-fatal issues
`logger.error()`	ERROR	`[X]`	Fatal errors
`logger.success()`	INFO	`[OK]`	Successful completion
`logger.starting()`	INFO	`>`	Starting an operation
`logger.progress()`	INFO	`*`	Progress updates (throttled)

HPC/Cluster Environments

For HPC systems without Unicode support:

runner = PipelineRunner(
    use_unicode=False,  # ASCII-only symbols
    use_colors=False,   # No ANSI escape codes
)

Or set environment variables:

export NIRS4ALL_ASCII_ONLY=1
export NIRS4ALL_NO_COLOR=1

Example Output

Standard Run (verbose=1)

================================================================================
  nirs4all run: wheat_protein_analysis
  Started: 2025-12-16 19:12:03
================================================================================

> Loading data...
  [OK] Loaded wheat_nir: 3,482 samples x 2,150 features

> Building cross-validation splits...
  [OK] 5-fold GroupKFold ready

> Evaluating pipelines...
  * Progress: 21/42 (50%) -- best RMSE: 0.389
  [OK] Evaluation complete

> Training best model...
  [OK] Model trained: CV_RMSE=0.381

================================================================================
  [OK] Run completed in 2m 5.9s

  Best pipeline: SavGol(w=11) -> PCA(n=150) -> TabPFN
  Metrics: RMSE=0.381  R2=0.82
================================================================================

With Branching (verbose=2)

> Entering branch block (4 branches)...
  |
  |-- [branch:snv] SNV preprocessing
  |   * fold 1/5: RMSE=0.412
  |   * fold 2/5: RMSE=0.398
  |   [OK] CV_RMSE=0.405
  |
  |-- [branch:msc] MSC preprocessing
  |   [OK] CV_RMSE=0.392
  |
  |-- [branch:savgol] Savitzky-Golay
  |   [OK] CV_RMSE=0.381  <- best
  |

> Branch comparison:
  +------------+----------+-------+
  | Branch     | CV_RMSE  | Rank  |
  +------------+----------+-------+
  | savgol     | 0.381    | 1     |
  | msc        | 0.392    | 2     |
  | snv        | 0.405    | 3     |
  +------------+----------+-------+

Troubleshooting

Logs not appearing

Check verbosity level:

runner = PipelineRunner(verbose=1)  # INFO level

Progress bars not working

Progress bars require a TTY. In non-interactive environments (notebooks, CI), they fall back to line-based updates.

Unicode errors on cluster

runner = PipelineRunner(use_unicode=False)

Finding log files

from nirs4all.core.logging import get_config

config = get_config()
if config._file_handler:
    print(f"Log file: {config._file_handler.get_log_file_path()}")