nirs4all.controllers.data.tag module
Controller for sample tagging operations.
This controller handles the tag keyword, computing and storing tags on samples without removing them. Tags can later be used for branching, analysis, or conditional processing.
- class nirs4all.controllers.data.tag.TagController[source]
Bases:
OperatorControllerController for sample tagging operations.
This controller computes tags on samples using SampleFilter instances and stores the results as tag columns in the dataset’s indexer. Unlike exclude, the tag keyword never removes samples - it only stores computed values for later use.
Tags can be used for: - Analysis and reporting (e.g., identifying outliers) - Conditional branching (e.g., branch: {by_tag: “is_outlier”}) - Grouping samples for specialized processing
- Pipeline syntax:
# Single filter (tag name from filter’s tag_name or class name) {“tag”: YOutlierFilter(method=”iqr”)}
# Multiple filters (each stores its own tag) {“tag”: [YOutlierFilter(), XOutlierFilter()]}
# Named tags (explicit tag names) {“tag”: {“outliers”: YOutlierFilter(), “leverage”: HighLeverageFilter()}}
Note
Tags are computed fresh during both training and prediction modes. This allows analyzing prediction samples with the same criteria.
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, List][source]
Execute tagging operation.
This method: 1. Parses tag configuration (single, list, or dict of filters) 2. Gets samples for the current partition 3. Fits filters on training data (or uses loaded binaries in prediction) 4. Computes tag values (boolean mask from get_mask) 5. Stores tags in dataset’s indexer 6. Returns persisted artifacts for reproducibility
- Parameters:
step_info – Parsed step containing operator and configuration
dataset – Dataset to operate on
context – Pipeline execution context
runtime_context – Runtime infrastructure context
source – Data source index (unused, tagging is dataset-level)
mode – Execution mode (“train” or “predict”)
loaded_binaries – Pre-loaded filter binaries for prediction mode
prediction_store – External prediction store (unused)
- Returns:
Tuple of (updated_context, persisted_artifacts)
- Raises:
ValueError – If no filters are specified
TypeError – If filter is not a SampleFilter instance
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Match tag keyword in pipeline.