nirs4all.controllers.data.outlier_excluder module

Outlier Excluder Controller for sample-based branching.

This controller enables creating multiple branches with different outlier exclusion strategies. Each branch trains on a different subset of samples based on the outlier detection method applied.

This is useful for comparing how different outlier handling approaches affect model performance without creating separate pipeline runs.

Example

>>> pipeline = [
...     ShuffleSplit(n_splits=5),
...     {"branch": {
...         "by": "outlier_excluder",
...         "strategies": [
...             None,  # No exclusion (baseline)
...             {"method": "isolation_forest", "contamination": 0.05},
...             {"method": "mahalanobis", "threshold": 3.0},
...             {"method": "leverage", "threshold": 2.0},
...             {"method": "lof", "contamination": 0.05},
...         ],
...     }},
...     PLSRegression(n_components=10),
... ]
class nirs4all.controllers.data.outlier_excluder.OutlierExcluderController[source]

Bases: OperatorController

Controller for sample-based branching with outlier exclusion strategies.

This controller creates multiple branches, each with a different outlier exclusion strategy. Samples identified as outliers are excluded from training in that branch, but predictions still cover all samples.

Key behaviors:
  • Each branch applies a different outlier detection method

  • Outlier detection runs on training data only

  • Exclusion is per-branch (tracked in context, not in indexer)

  • Predictions include exclusion metadata for analysis

  • Branch 0 with None strategy serves as baseline

priority

Controller priority (set to 4 to run before regular branch controller).

Type:

int

execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, StepOutput][source]

Execute the outlier excluder branch step.

Creates branches for each outlier exclusion strategy. In train mode, applies outlier detection and marks exclusions. In predict mode, reconstructs branch contexts without applying exclusions.

Parameters:
  • step_info – Parsed step containing branch definitions

  • dataset – Dataset to operate on

  • context – Pipeline execution context

  • runtime_context – Runtime infrastructure context

  • source – Data source index

  • mode – Execution mode (“train” or “predict”)

  • loaded_binaries – Pre-loaded binary objects for prediction mode

  • prediction_store – External prediction store for model predictions

Returns:

Tuple of (updated_context, StepOutput with collected artifacts)

classmethod matches(step: Any, operator: Any, keyword: str) bool[source]

Check if the step matches the outlier excluder branch pattern.

Matches:

{“branch”: {“by”: “outlier_excluder”, “strategies”: […]}}

Parameters:
  • step – Original step configuration

  • operator – Deserialized operator

  • keyword – Step keyword

Returns:

True if this is an outlier_excluder branch definition.

priority: int = 4
classmethod supports_prediction_mode() bool[source]

Outlier excluder should execute in prediction mode.

In prediction mode, we need to reconstruct the branch contexts but NOT apply sample exclusion (we predict on all samples).

classmethod use_multi_source() bool[source]

Outlier excluder operates on dataset level.

class nirs4all.controllers.data.outlier_excluder.OutlierExclusionResult(strategy_name: str, n_total: int, n_excluded: int, excluded_indices: List[int], exclusion_mask: ndarray)[source]

Bases: object

Result of applying an outlier exclusion strategy.

excluded_indices: List[int]
exclusion_mask: ndarray
n_excluded: int
n_total: int
strategy_name: str