nirs4all.controllers.data.tag module

Controller for sample tagging operations.

This controller handles the tag keyword, computing and storing tags on samples without removing them. Tags can later be used for branching, analysis, or conditional processing.

class nirs4all.controllers.data.tag.TagController[source]

Bases: OperatorController

Controller for sample tagging operations.

This controller computes tags on samples using SampleFilter instances and stores the results as tag columns in the dataset’s indexer. Unlike exclude, the tag keyword never removes samples - it only stores computed values for later use.

Tags can be used for: - Analysis and reporting (e.g., identifying outliers) - Conditional branching (e.g., branch: {by_tag: “is_outlier”}) - Grouping samples for specialized processing

Pipeline syntax:

# Single filter (tag name from filter’s tag_name or class name) {“tag”: YOutlierFilter(method=”iqr”)}

# Multiple filters (each stores its own tag) {“tag”: [YOutlierFilter(), XOutlierFilter()]}

# Named tags (explicit tag names) {“tag”: {“outliers”: YOutlierFilter(), “leverage”: HighLeverageFilter()}}

Note

Tags are computed fresh during both training and prediction modes. This allows analyzing prediction samples with the same criteria.

execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, List][source]

Execute tagging operation.

This method: 1. Parses tag configuration (single, list, or dict of filters) 2. Gets samples for the current partition 3. Fits filters on training data (or uses loaded binaries in prediction) 4. Computes tag values (boolean mask from get_mask) 5. Stores tags in dataset’s indexer 6. Returns persisted artifacts for reproducibility

Parameters:
  • step_info – Parsed step containing operator and configuration

  • dataset – Dataset to operate on

  • context – Pipeline execution context

  • runtime_context – Runtime infrastructure context

  • source – Data source index (unused, tagging is dataset-level)

  • mode – Execution mode (“train” or “predict”)

  • loaded_binaries – Pre-loaded filter binaries for prediction mode

  • prediction_store – External prediction store (unused)

Returns:

Tuple of (updated_context, persisted_artifacts)

Raises:
  • ValueError – If no filters are specified

  • TypeError – If filter is not a SampleFilter instance

classmethod matches(step: Any, operator: Any, keyword: str) bool[source]

Match tag keyword in pipeline.

priority: int = 5
classmethod supports_prediction_mode() bool[source]

Tags are computed fresh on prediction data.

This allows identifying outliers or special cases in new data for analysis and conditional processing.

classmethod use_multi_source() bool[source]

Tag operations are dataset-level, not per-source.