nirs4all.controllers.data.sample_partitioner module

Sample Partitioner Controller for sample-based branching.

This controller partitions the dataset into multiple branches based on a sample filter (e.g., outlier detection). Unlike OutlierExcluderController which excludes samples from training, this controller creates separate branches where each branch contains a different subset of samples.

For example, with Y-outlier detection:
  • Branch “outliers”: Contains ONLY the outlier samples

  • Branch “inliers”: Contains ONLY the non-outlier samples

This enables training separate models for different data subsets and comparing their performance.

Example

>>> pipeline = [
...     ShuffleSplit(n_splits=5),
...     {"branch": {
...         "by": "sample_partitioner",
...         "filter": {"method": "y_outlier", "threshold": 3.0},
...     }},
...     PLSRegression(n_components=10),
... ]
class nirs4all.controllers.data.sample_partitioner.SamplePartitionerController[source]

Bases: OperatorController

Controller for sample-based branching via partitioning.

This controller creates two branches by partitioning samples based on a filter (e.g., outlier detection). Each branch contains a different subset of samples:

  • “outliers” branch: samples where filter returns False (outliers)

  • “inliers” branch: samples where filter returns True (non-outliers)

Unlike OutlierExcluderController which only excludes from training, this controller truly partitions the samples so each branch trains and predicts only on its subset.

Key behaviors:
  • Each branch contains a disjoint subset of samples

  • Samples are partitioned, not excluded

  • Models train and predict only on their partition

  • Supports Y-outlier and X-outlier detection methods

priority

Controller priority (set to 3 to run before outlier excluder).

Type:

int

execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, StepOutput][source]

Execute the sample partitioner branch step.

Creates two branches: one for outliers and one for inliers. Each branch contains only its subset of samples.

Parameters:
  • step_info – Parsed step containing branch definitions

  • dataset – Dataset to operate on

  • context – Pipeline execution context

  • runtime_context – Runtime infrastructure context

  • source – Data source index

  • mode – Execution mode (“train” or “predict”)

  • loaded_binaries – Pre-loaded binary objects for prediction mode

  • prediction_store – External prediction store for model predictions

Returns:

Tuple of (updated_context, StepOutput with collected artifacts)

classmethod matches(step: Any, operator: Any, keyword: str) bool[source]

Check if the step matches the sample_partitioner branch pattern.

Matches:

{“branch”: {“by”: “sample_partitioner”, “filter”: {…}}}

Parameters:
  • step – Original step configuration

  • operator – Deserialized operator

  • keyword – Step keyword

Returns:

True if this is a sample_partitioner branch definition.

priority: int = 3
classmethod supports_prediction_mode() bool[source]

Sample partitioner should execute in prediction mode.

In prediction mode, we need to reconstruct the branch contexts and apply the same sample partitioning.

classmethod use_multi_source() bool[source]

Sample partitioner operates on dataset level.