nirs4all.controllers.data.outlier_excluder module
Outlier Excluder Controller for sample-based branching.
This controller enables creating multiple branches with different outlier exclusion strategies. Each branch trains on a different subset of samples based on the outlier detection method applied.
This is useful for comparing how different outlier handling approaches affect model performance without creating separate pipeline runs.
Example
>>> pipeline = [
... ShuffleSplit(n_splits=5),
... {"branch": {
... "by": "outlier_excluder",
... "strategies": [
... None, # No exclusion (baseline)
... {"method": "isolation_forest", "contamination": 0.05},
... {"method": "mahalanobis", "threshold": 3.0},
... {"method": "leverage", "threshold": 2.0},
... {"method": "lof", "contamination": 0.05},
... ],
... }},
... PLSRegression(n_components=10),
... ]
- class nirs4all.controllers.data.outlier_excluder.OutlierExcluderController[source]
Bases:
OperatorControllerController for sample-based branching with outlier exclusion strategies.
This controller creates multiple branches, each with a different outlier exclusion strategy. Samples identified as outliers are excluded from training in that branch, but predictions still cover all samples.
- Key behaviors:
Each branch applies a different outlier detection method
Outlier detection runs on training data only
Exclusion is per-branch (tracked in context, not in indexer)
Predictions include exclusion metadata for analysis
Branch 0 with None strategy serves as baseline
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, StepOutput][source]
Execute the outlier excluder branch step.
Creates branches for each outlier exclusion strategy. In train mode, applies outlier detection and marks exclusions. In predict mode, reconstructs branch contexts without applying exclusions.
- Parameters:
step_info – Parsed step containing branch definitions
dataset – Dataset to operate on
context – Pipeline execution context
runtime_context – Runtime infrastructure context
source – Data source index
mode – Execution mode (“train” or “predict”)
loaded_binaries – Pre-loaded binary objects for prediction mode
prediction_store – External prediction store for model predictions
- Returns:
Tuple of (updated_context, StepOutput with collected artifacts)
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Check if the step matches the outlier excluder branch pattern.
- Matches:
{“branch”: {“by”: “outlier_excluder”, “strategies”: […]}}
- Parameters:
step – Original step configuration
operator – Deserialized operator
keyword – Step keyword
- Returns:
True if this is an outlier_excluder branch definition.