nirs4all.operators.filters.report module

Filtering report generator for sample filtering operations.

Provides utilities to generate comprehensive reports about sample filtering, including statistics, visualizations, and export capabilities.

class nirs4all.operators.filters.report.FilterResult(filter_name: str, reason: str, n_samples: int, n_excluded: int, n_kept: int, exclusion_rate: float, excluded_indices: ~typing.List[int] = <factory>, stats: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: object

Result of applying a single filter.

filter_name

Name/identifier of the filter

Type:

str

reason

Exclusion reason string

Type:

str

n_samples

Total samples evaluated

Type:

int

n_excluded

Number of samples excluded by this filter

Type:

int

n_kept

Number of samples kept

Type:

int

exclusion_rate

Ratio of excluded to total

Type:

float

excluded_indices

Indices of excluded samples

Type:

List[int]

stats

Additional filter-specific statistics

Type:

Dict[str, Any]

excluded_indices: List[int]
exclusion_rate: float
filter_name: str
n_excluded: int
n_kept: int
n_samples: int
reason: str
stats: Dict[str, Any]
to_dict() Dict[str, Any][source]

Convert to dictionary representation.

class nirs4all.operators.filters.report.FilteringReport(dataset_name: str, partition: str, timestamp: str = <factory>, filter_results: ~typing.List[~nirs4all.operators.filters.report.FilterResult] = <factory>, combined_mode: str = 'any', n_total_samples: int = 0, n_final_excluded: int = 0, n_final_kept: int = 0, cascade_to_augmented: bool = True, n_augmented_excluded: int = 0)[source]

Bases: object

Comprehensive report of sample filtering operations.

This class aggregates results from multiple filters and provides methods for analysis, visualization, and export.

dataset_name

Name of the filtered dataset

Type:

str

partition

Partition that was filtered (e.g., “train”)

Type:

str

timestamp

When the filtering was performed

Type:

str

filter_results

List of individual filter results

Type:

List[nirs4all.operators.filters.report.FilterResult]

combined_mode

How filters were combined (“any” or “all”)

Type:

str

n_total_samples

Total samples before filtering

Type:

int

n_final_excluded

Final number of excluded samples

Type:

int

n_final_kept

Final number of kept samples

Type:

int

cascade_to_augmented

Whether augmented samples were also excluded

Type:

bool

n_augmented_excluded

Number of augmented samples excluded via cascade

Type:

int

add_filter_result(result: FilterResult) None[source]

Add a filter result to the report.

cascade_to_augmented: bool = True
combined_mode: str = 'any'
dataset_name: str
filter_results: List[FilterResult]
property final_exclusion_rate: float

Calculate final exclusion rate after combining filters.

n_augmented_excluded: int = 0
n_final_excluded: int = 0
n_final_kept: int = 0
n_total_samples: int = 0
partition: str
print_report(verbose: int = 1) None[source]

Print the filtering report to console.

Parameters:

verbose – Verbosity level (0=minimal, 1=normal, 2=detailed)

summary() Dict[str, Any][source]

Get a summary dictionary of the filtering report.

Returns:

Dict containing summary statistics

timestamp: str
to_dict() Dict[str, Any][source]

Convert the full report to a dictionary.

to_json(indent: int = 2) str[source]

Convert report to JSON string.

Parameters:

indent – JSON indentation level

Returns:

JSON string representation

class nirs4all.operators.filters.report.FilteringReportGenerator(dataset: SpectroDataset)[source]

Bases: object

Generator for creating comprehensive filtering reports.

This class provides utilities for collecting filter statistics, generating reports, and exporting results.

Example

>>> generator = FilteringReportGenerator(dataset)
>>> report = generator.create_report(
...     filters=[YOutlierFilter(method="iqr")],
...     mode="any",
...     partition="train"
... )
>>> report.print_report()
compare_filters(filters: List[SampleFilter], X: ndarray, y: ndarray) Dict[str, Any][source]

Compare multiple filters on the same data without applying them.

Useful for understanding which filter is more aggressive or to find the overlap between filter decisions.

Parameters:
  • filters – List of filters to compare

  • X – Feature array

  • y – Target array

Returns:

  • individual: Per-filter stats

  • overlap: Samples flagged by multiple filters

  • unique: Samples flagged by only one filter

Return type:

Dictionary with comparison statistics

create_report(filters: List[SampleFilter], X: ndarray, y: ndarray, sample_indices: ndarray, mode: str = 'any', partition: str = 'train', cascade_to_augmented: bool = True, dry_run: bool = True) FilteringReport[source]

Create a filtering report by applying filters to data.

Parameters:
  • filters – List of SampleFilter instances to apply

  • X – Feature array (n_samples, n_features)

  • y – Target array (n_samples,) or (n_samples, n_targets)

  • sample_indices – Array of sample indices corresponding to X/y

  • mode – Filter combination mode (“any” or “all”)

  • partition – Which partition is being filtered

  • cascade_to_augmented – Whether augmented samples will be cascaded

  • dry_run – If True, don’t actually mark samples as excluded

Returns:

FilteringReport with all statistics and results

generate_from_indexer(partition: str | None = 'train') FilteringReport[source]

Generate a report from current indexer exclusion state.

This method creates a report based on samples already marked as excluded in the indexer, rather than applying filters.

Parameters:

partition – Partition to report on (None for all partitions)

Returns:

FilteringReport based on current exclusion state