nirs4all.data.detection package
Submodules
- nirs4all.data.detection.detector module
AutoDetectorDetectionResultDetectionResult.delimiterDetectionResult.decimal_separatorDetectionResult.has_headerDetectionResult.header_unitDetectionResult.signal_typeDetectionResult.encodingDetectionResult.n_columnsDetectionResult.n_rowsDetectionResult.confidenceDetectionResult.warningsDetectionResult.confidenceDetectionResult.decimal_separatorDetectionResult.delimiterDetectionResult.encodingDetectionResult.has_headerDetectionResult.header_unitDetectionResult.n_columnsDetectionResult.n_rowsDetectionResult.signal_typeDetectionResult.to_params()DetectionResult.warnings
detect_file_parameters()detect_signal_type()
Module contents
Auto-detection module for dataset configuration.
This module provides enhanced auto-detection capabilities for file formats, delimiters, headers, signal types, and other file parameters.
- class nirs4all.data.detection.AutoDetector(sample_lines: int = 50, min_confidence: float = 0.6)[source]
Bases:
objectAuto-detect file parameters.
Provides methods to detect CSV delimiters, decimal separators, header presence, header units, and signal types from file content.
Example
`python detector = AutoDetector() result = detector.detect("path/to/file.csv") print(f"Delimiter: {result.delimiter}") print(f"Has header: {result.has_header}") print(f"Signal type: {result.signal_type}") `- DELIMITERS = [',', ';', '\t', '|', ' ']
- HEADER_PATTERNS = {'cm-1': ['^\\d{4,5}(?:\\.\\d+)?$', '^\\d{4,5}(?:\\.\\d+)?cm-1$', '^\\d{4,5}(?:\\.\\d+)?wavenumber$'], 'index': ['^\\d{1,3}$'], 'nm': ['^\\d{3,4}(?:\\.\\d+)?$', '^\\d{3,4}(?:\\.\\d+)?nm$'], 'text': ['^[a-zA-Z]', '^feature_\\d+$', '^[xX]_?\\d+$']}
- SIGNAL_TYPE_PATTERNS = {'absorbance': ['abs(orbance)?', 'log\\s*\\(?1/[RT]\\)?', 'A\\s*='], 'reflectance': ['reflect(ance)?', '^R$', 'R\\s*%'], 'transmittance': ['transmit(tance)?', '^T$', 'T\\s*%']}
- detect(source: str | Path | bytes | StringIO, known_params: Dict[str, Any] | None = None) DetectionResult[source]
Detect file parameters.
- Parameters:
source – Path to file, file content as bytes, or StringIO.
known_params – Optional known parameters to skip detection for.
- Returns:
DetectionResult with detected parameters.
- class nirs4all.data.detection.DetectionResult(delimiter: str = ';', decimal_separator: str = '.', has_header: bool = True, header_unit: str = 'cm-1', signal_type: str | None = None, encoding: str = 'utf-8', n_columns: int = 0, n_rows: int = 0, confidence: Dict[str, float]=<factory>, warnings: List[str] = <factory>)[source]
Bases:
objectResult of auto-detection.
- nirs4all.data.detection.detect_file_parameters(source: str | Path | bytes, known_params: Dict[str, Any] | None = None, sample_lines: int = 50) DetectionResult[source]
Convenience function to detect file parameters.
- Parameters:
source – Path to file or file content.
known_params – Optional known parameters.
sample_lines – Number of lines to sample.
- Returns:
DetectionResult with detected parameters.
- nirs4all.data.detection.detect_signal_type(header: List[str] | None = None, data: ndarray | None = None) Tuple[str | None, float][source]
Detect signal type from header and/or data.
- Parameters:
header – Optional list of header values.
data – Optional data array.
- Returns:
Tuple of (signal_type or None, confidence).