nirs4all.controllers.data.tag module

Controller for sample tagging operations.

This controller handles the tag keyword, computing and storing tags on samples without removing them. Tags can later be used for branching, analysis, or conditional processing.

class nirs4all.controllers.data.tag.TagController[source]

Bases: OperatorController

Controller for sample tagging operations.

This controller computes tags on samples using SampleFilter instances and stores the results as tag columns in the dataset’s indexer. Unlike exclude, the tag keyword never removes samples - it only stores computed values for later use.

Tags can be used for: - Analysis and reporting (e.g., identifying outliers) - Conditional branching (e.g., branch: {by_tag: “is_outlier”}) - Grouping samples for specialized processing

Pipeline syntax:

# Single filter (tag name from filter’s tag_name or class name) {“tag”: YOutlierFilter(method=”iqr”)}

# Multiple filters (each stores its own tag) {“tag”: [YOutlierFilter(), XOutlierFilter()]}

# Named tags (explicit tag names) {“tag”: {“outliers”: YOutlierFilter(), “leverage”: HighLeverageFilter()}}

Note

Tags are computed fresh during both training and prediction modes. This allows analyzing prediction samples with the same criteria.

execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) → Tuple[ExecutionContext, List][source]

Execute tagging operation.

This method: 1. Parses tag configuration (single, list, or dict of filters) 2. Gets samples for the current partition 3. Fits filters on training data (or uses loaded binaries in prediction) 4. Computes tag values (boolean mask from get_mask) 5. Stores tags in dataset’s indexer 6. Returns persisted artifacts for reproducibility

Parameters:

step_info – Parsed step containing operator and configuration
dataset – Dataset to operate on
context – Pipeline execution context
runtime_context – Runtime infrastructure context
source – Data source index (unused, tagging is dataset-level)
mode – Execution mode (“train” or “predict”)
loaded_binaries – Pre-loaded filter binaries for prediction mode
prediction_store – External prediction store (unused)

Returns:

Tuple of (updated_context, persisted_artifacts)

Raises:

ValueError – If no filters are specified
TypeError – If filter is not a SampleFilter instance

classmethod matches(step: Any, operator: Any, keyword: str) → bool[source]: Match tag keyword in pipeline.

priority: int = 5

classmethod supports_prediction_mode() → bool[source]

Tags are computed fresh on prediction data.

This allows identifying outliers or special cases in new data for analysis and conditional processing.

classmethod use_multi_source() → bool[source]: Tag operations are dataset-level, not per-source.