nirs4all.data.detection package

Submodules

Module contents

Auto-detection module for dataset configuration.

This module provides enhanced auto-detection capabilities for file formats, delimiters, headers, signal types, and other file parameters.

class nirs4all.data.detection.AutoDetector(sample_lines: int = 50, min_confidence: float = 0.6)[source]

Bases: object

Auto-detect file parameters.

Provides methods to detect CSV delimiters, decimal separators, header presence, header units, and signal types from file content.

Example

`python detector = AutoDetector() result = detector.detect("path/to/file.csv") print(f"Delimiter: {result.delimiter}") print(f"Has header: {result.has_header}") print(f"Signal type: {result.signal_type}") `

DELIMITERS = [',', ';', '\t', '|', ' ']
HEADER_PATTERNS = {'cm-1': ['^\\d{4,5}(?:\\.\\d+)?$', '^\\d{4,5}(?:\\.\\d+)?cm-1$', '^\\d{4,5}(?:\\.\\d+)?wavenumber$'], 'index': ['^\\d{1,3}$'], 'nm': ['^\\d{3,4}(?:\\.\\d+)?$', '^\\d{3,4}(?:\\.\\d+)?nm$'], 'text': ['^[a-zA-Z]', '^feature_\\d+$', '^[xX]_?\\d+$']}
SIGNAL_TYPE_PATTERNS = {'absorbance': ['abs(orbance)?', 'log\\s*\\(?1/[RT]\\)?', 'A\\s*='], 'reflectance': ['reflect(ance)?', '^R$', 'R\\s*%'], 'transmittance': ['transmit(tance)?', '^T$', 'T\\s*%']}
detect(source: str | Path | bytes | StringIO, known_params: Dict[str, Any] | None = None) DetectionResult[source]

Detect file parameters.

Parameters:
  • source – Path to file, file content as bytes, or StringIO.

  • known_params – Optional known parameters to skip detection for.

Returns:

DetectionResult with detected parameters.

class nirs4all.data.detection.DetectionResult(delimiter: str = ';', decimal_separator: str = '.', has_header: bool = True, header_unit: str = 'cm-1', signal_type: str | None = None, encoding: str = 'utf-8', n_columns: int = 0, n_rows: int = 0, confidence: Dict[str, float]=<factory>, warnings: List[str] = <factory>)[source]

Bases: object

Result of auto-detection.

delimiter

Detected field delimiter.

Type:

str

decimal_separator

Detected decimal separator.

Type:

str

has_header

Whether the file has a header row.

Type:

bool

header_unit

Detected unit type for headers.

Type:

str

signal_type

Detected signal type.

Type:

str | None

encoding

Detected file encoding.

Type:

str

n_columns

Detected number of columns.

Type:

int

n_rows

Estimated number of rows.

Type:

int

confidence

Confidence scores for each detected parameter.

Type:

Dict[str, float]

warnings

List of detection warnings.

Type:

List[str]

confidence: Dict[str, float]
decimal_separator: str = '.'
delimiter: str = ';'
encoding: str = 'utf-8'
has_header: bool = True
header_unit: str = 'cm-1'
n_columns: int = 0
n_rows: int = 0
signal_type: str | None = None
to_params() Dict[str, Any][source]

Convert to loading parameters dictionary.

warnings: List[str]
nirs4all.data.detection.detect_file_parameters(source: str | Path | bytes, known_params: Dict[str, Any] | None = None, sample_lines: int = 50) DetectionResult[source]

Convenience function to detect file parameters.

Parameters:
  • source – Path to file or file content.

  • known_params – Optional known parameters.

  • sample_lines – Number of lines to sample.

Returns:

DetectionResult with detected parameters.

nirs4all.data.detection.detect_signal_type(header: List[str] | None = None, data: ndarray | None = None) Tuple[str | None, float][source]

Detect signal type from header and/or data.

Parameters:
  • header – Optional list of header values.

  • data – Optional data array.

Returns:

Tuple of (signal_type or None, confidence).