nirs4all.data.schema.validation.error_codes module
Error codes and diagnostics for dataset configuration.
This module provides comprehensive error codes, diagnostic messages, and suggestion systems for configuration issues.
Phase 8 Implementation - Dataset Configuration Roadmap Section 8.4: Error Handling & Diagnostics
- class nirs4all.data.schema.validation.error_codes.DiagnosticBuilder[source]
Bases:
objectBuilder for diagnostic messages.
Example
```python builder = DiagnosticBuilder()
# Create error message error = builder.create(
ErrorRegistry.E200, path=”/path/to/file.csv”
)
# Create with location error = builder.create(
ErrorRegistry.E401, line=10, error=”Unexpected token”, location=”config.json:10”
)
- create(error_code: ErrorCode, location: str | None = None, **kwargs) DiagnosticMessage[source]
Create a diagnostic message.
- Parameters:
error_code – The error code definition.
location – Optional file/line location.
**kwargs – Parameters for message template.
- Returns:
DiagnosticMessage instance.
- file_not_found(path: str) DiagnosticMessage[source]
Create file not found error.
- invalid_value(field: str, value: Any, valid_values: List[Any]) DiagnosticMessage[source]
Create invalid value error.
- missing_field(field: str) DiagnosticMessage[source]
Create missing field error.
- class nirs4all.data.schema.validation.error_codes.DiagnosticMessage(error_code: ErrorCode, message: str, suggestion: str | None = None, context: Dict[str, ~typing.Any]=<factory>, location: str | None = None)[source]
Bases:
objectA diagnostic message with formatted content.
- error_code
The ErrorCode definition.
- property category: ErrorCategory
Get the error category.
- property severity: ErrorSeverity
Get the error severity.
- class nirs4all.data.schema.validation.error_codes.DiagnosticReport(messages: List[DiagnosticMessage] = <factory>, config_path: str | None = None)[source]
Bases:
objectCollection of diagnostic messages.
- messages
List of diagnostic messages.
- add(message: DiagnosticMessage) None[source]
Add a diagnostic message.
- add_error(error_code: ErrorCode, location: str | None = None, **kwargs) DiagnosticMessage[source]
Create and add an error message.
- property errors: List[DiagnosticMessage]
Get all error messages.
- messages: List[DiagnosticMessage]
- property warnings: List[DiagnosticMessage]
Get all warning messages.
- class nirs4all.data.schema.validation.error_codes.ErrorCategory(value)[source]
-
Categories of configuration errors.
- AGGREGATION = 'aggregation'
- DATA = 'data'
- FILE = 'file'
- FOLD = 'fold'
- LOADING = 'loading'
- PARTITION = 'partition'
- RUNTIME = 'runtime'
- SCHEMA = 'schema'
- VARIATION = 'variation'
- class nirs4all.data.schema.validation.error_codes.ErrorCode(code: str, category: ErrorCategory, severity: ErrorSeverity, message_template: str, suggestion_template: str | None = None, documentation_url: str | None = None)[source]
Bases:
objectError code definition.
- category
Error category.
- severity
Error severity.
- category: ErrorCategory
- severity: ErrorSeverity
- class nirs4all.data.schema.validation.error_codes.ErrorRegistry[source]
Bases:
objectRegistry of all error codes.
- E100 = ErrorCode(code='E100', category=<ErrorCategory.SCHEMA: 'schema'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Invalid configuration structure: {details}', suggestion_template='Check that the configuration is a valid dictionary.', documentation_url=None)
- E101 = ErrorCode(code='E101', category=<ErrorCategory.SCHEMA: 'schema'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Missing required field: {field}', suggestion_template="Add the '{field}' field to your configuration.", documentation_url=None)
- E102 = ErrorCode(code='E102', category=<ErrorCategory.SCHEMA: 'schema'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template="Invalid type for '{field}': expected {expected}, got {actual}", suggestion_template="Change '{field}' to be of type {expected}.", documentation_url=None)
- E103 = ErrorCode(code='E103', category=<ErrorCategory.SCHEMA: 'schema'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template="Invalid value for '{field}': {value}. Valid values: {valid_values}", suggestion_template='Use one of the valid values: {valid_values}', documentation_url=None)
- E104 = ErrorCode(code='E104', category=<ErrorCategory.SCHEMA: 'schema'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='No data source specified', suggestion_template="Add 'train_x', 'test_x', 'folder', 'sources', or 'variations' to your configuration.", documentation_url=None)
- E200 = ErrorCode(code='E200', category=<ErrorCategory.FILE: 'file'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='File not found: {path}', suggestion_template='Check that the file path is correct and the file exists.', documentation_url=None)
- E201 = ErrorCode(code='E201', category=<ErrorCategory.FILE: 'file'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Cannot read file: {path}. Error: {error}', suggestion_template='Check file permissions and encoding.', documentation_url=None)
- E202 = ErrorCode(code='E202', category=<ErrorCategory.FILE: 'file'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Unsupported file format: {format}', suggestion_template='Supported formats: CSV, NPY, NPZ, Parquet, Excel, MATLAB', documentation_url=None)
- E203 = ErrorCode(code='E203', category=<ErrorCategory.FILE: 'file'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Empty file: {path}', suggestion_template='Ensure the file contains data.', documentation_url=None)
- E204 = ErrorCode(code='E204', category=<ErrorCategory.FILE: 'file'>, severity=<ErrorSeverity.WARNING: 'warning'>, message_template='File encoding issue: {path}. Using fallback encoding: {encoding}', suggestion_template='Specify the encoding explicitly in loading parameters.', documentation_url=None)
- E300 = ErrorCode(code='E300', category=<ErrorCategory.DATA: 'data'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Data shape mismatch: {details}', suggestion_template='Ensure all data arrays have consistent sample counts.', documentation_url=None)
- E301 = ErrorCode(code='E301', category=<ErrorCategory.DATA: 'data'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template="NA values found in data and na_policy='abort': {details}", suggestion_template="Set na_policy='remove' or clean your data before loading.", documentation_url=None)
- E302 = ErrorCode(code='E302', category=<ErrorCategory.DATA: 'data'>, severity=<ErrorSeverity.WARNING: 'warning'>, message_template='NA values removed: {count} rows affected', suggestion_template='Review your data for missing values.', documentation_url=None)
- E303 = ErrorCode(code='E303', category=<ErrorCategory.DATA: 'data'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template="Column not found: '{column}' in {file}", suggestion_template='Available columns: {available}', documentation_url=None)
- E304 = ErrorCode(code='E304', category=<ErrorCategory.DATA: 'data'>, severity=<ErrorSeverity.WARNING: 'warning'>, message_template='Non-numeric values in feature data at column(s): {columns}', suggestion_template='Features should be numeric. Non-numeric values will be converted to NaN.', documentation_url=None)
- E400 = ErrorCode(code='E400', category=<ErrorCategory.LOADING: 'loading'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Failed to parse CSV: {error}', suggestion_template='Check delimiter and encoding settings.', documentation_url=None)
- E401 = ErrorCode(code='E401', category=<ErrorCategory.LOADING: 'loading'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Invalid JSON configuration at line {line}: {error}', suggestion_template='Check JSON syntax around line {line}.', documentation_url=None)
- E402 = ErrorCode(code='E402', category=<ErrorCategory.LOADING: 'loading'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Invalid YAML configuration at line {line}: {error}', suggestion_template='Check YAML indentation and syntax around line {line}.', documentation_url=None)
- E403 = ErrorCode(code='E403', category=<ErrorCategory.LOADING: 'loading'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Archive error: {error}', suggestion_template='Ensure the archive is not corrupted and contains the expected files.', documentation_url=None)
- E500 = ErrorCode(code='E500', category=<ErrorCategory.PARTITION: 'partition'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Invalid partition specification: {details}', suggestion_template="Use 'train', 'test', column-based, or percentage-based partition.", documentation_url=None)
- E501 = ErrorCode(code='E501', category=<ErrorCategory.PARTITION: 'partition'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template="Partition column not found: '{column}'", suggestion_template='Available columns: {available}', documentation_url=None)
- E502 = ErrorCode(code='E502', category=<ErrorCategory.PARTITION: 'partition'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Partition indices out of range: max index {max_index}, data has {n_samples} samples', suggestion_template='Ensure partition indices are within valid range.', documentation_url=None)
- E503 = ErrorCode(code='E503', category=<ErrorCategory.PARTITION: 'partition'>, severity=<ErrorSeverity.WARNING: 'warning'>, message_template='Overlapping partition indices detected', suggestion_template='Train and test indices should not overlap.', documentation_url=None)
- E600 = ErrorCode(code='E600', category=<ErrorCategory.AGGREGATION: 'aggregation'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template="Aggregation column not found: '{column}'", suggestion_template='Available columns in metadata: {available}', documentation_url=None)
- E601 = ErrorCode(code='E601', category=<ErrorCategory.AGGREGATION: 'aggregation'>, severity=<ErrorSeverity.WARNING: 'warning'>, message_template="Group '{group}' has only {count} sample(s), below minimum {min_samples}", suggestion_template='Consider lowering aggregate_min_samples or reviewing your data.', documentation_url=None)
- E602 = ErrorCode(code='E602', category=<ErrorCategory.AGGREGATION: 'aggregation'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template="Invalid aggregation method: '{method}'", suggestion_template='Valid methods: mean, median, vote, min, max, sum, std, first, last', documentation_url=None)
- E700 = ErrorCode(code='E700', category=<ErrorCategory.VARIATION: 'variation'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template="Duplicate source name: '{name}'", suggestion_template='Each source must have a unique name.', documentation_url=None)
- E701 = ErrorCode(code='E701', category=<ErrorCategory.VARIATION: 'variation'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template="Duplicate variation name: '{name}'", suggestion_template='Each variation must have a unique name.', documentation_url=None)
- E702 = ErrorCode(code='E702', category=<ErrorCategory.VARIATION: 'variation'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Unknown variation(s) in variation_select: {names}', suggestion_template='Available variations: {available}', documentation_url=None)
- E703 = ErrorCode(code='E703', category=<ErrorCategory.VARIATION: 'variation'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template="variation_mode='select' requires 'variation_select' to be specified", suggestion_template='Add \'variation_select: ["var1", "var2"]\' to your configuration.', documentation_url=None)
- E704 = ErrorCode(code='E704', category=<ErrorCategory.VARIATION: 'variation'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Sample count mismatch across sources: {details}', suggestion_template='All sources must have the same number of samples.', documentation_url=None)
- E800 = ErrorCode(code='E800', category=<ErrorCategory.FOLD: 'fold'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Invalid fold file format: {error}', suggestion_template='Fold files should be CSV with fold columns or JSON/YAML with fold definitions.', documentation_url=None)
- E801 = ErrorCode(code='E801', category=<ErrorCategory.FOLD: 'fold'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Fold sample IDs do not match dataset: {details}', suggestion_template='Ensure fold file was generated for this dataset.', documentation_url=None)
- E802 = ErrorCode(code='E802', category=<ErrorCategory.FOLD: 'fold'>, severity=<ErrorSeverity.WARNING: 'warning'>, message_template='Fold file has {fold_samples} samples, dataset has {data_samples} samples', suggestion_template='Folds will be adjusted to match current dataset size.', documentation_url=None)
- E900 = ErrorCode(code='E900', category=<ErrorCategory.RUNTIME: 'runtime'>, severity=<ErrorSeverity.ERROR: 'error'>, message_template='Unexpected error during loading: {error}', suggestion_template='Please report this issue with the full error traceback.', documentation_url=None)