nirs4all.controllers.data.sample_partitioner module

Sample Partitioner Controller for sample-based branching.

This controller partitions the dataset into multiple branches based on a sample filter (e.g., outlier detection). Unlike OutlierExcluderController which excludes samples from training, this controller creates separate branches where each branch contains a different subset of samples.

For example, with Y-outlier detection:

Branch “outliers”: Contains ONLY the outlier samples
Branch “inliers”: Contains ONLY the non-outlier samples

This enables training separate models for different data subsets and comparing their performance.

Example

>>> pipeline = [
...     ShuffleSplit(n_splits=5),
...     {"branch": {
...         "by": "sample_partitioner",
...         "filter": {"method": "y_outlier", "threshold": 3.0},
...     }},
...     PLSRegression(n_components=10),
... ]

class nirs4all.controllers.data.sample_partitioner.SamplePartitionerController[source]

Bases: OperatorController

Controller for sample-based branching via partitioning.

This controller creates two branches by partitioning samples based on a filter (e.g., outlier detection). Each branch contains a different subset of samples:

“outliers” branch: samples where filter returns False (outliers)

“inliers” branch: samples where filter returns True (non-outliers)

Unlike OutlierExcluderController which only excludes from training, this controller truly partitions the samples so each branch trains and predicts only on its subset.

Key behaviors:

Each branch contains a disjoint subset of samples
Samples are partitioned, not excluded
Models train and predict only on their partition
Supports Y-outlier and X-outlier detection methods

priority

Controller priority (set to 3 to run before outlier excluder).

Type:: int

execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) → Tuple[ExecutionContext, StepOutput][source]

Execute the sample partitioner branch step.

Creates two branches: one for outliers and one for inliers. Each branch contains only its subset of samples.

Parameters:

step_info – Parsed step containing branch definitions
dataset – Dataset to operate on
context – Pipeline execution context
runtime_context – Runtime infrastructure context
source – Data source index
mode – Execution mode (“train” or “predict”)
loaded_binaries – Pre-loaded binary objects for prediction mode
prediction_store – External prediction store for model predictions

Returns:

Tuple of (updated_context, StepOutput with collected artifacts)

classmethod matches(step: Any, operator: Any, keyword: str) → bool[source]

Check if the step matches the sample_partitioner branch pattern.

Matches:: {“branch”: {“by”: “sample_partitioner”, “filter”: {…}}}

Parameters:

step – Original step configuration
operator – Deserialized operator
keyword – Step keyword

Returns:

True if this is a sample_partitioner branch definition.

priority: int = 3

classmethod supports_prediction_mode() → bool[source]

Sample partitioner should execute in prediction mode.

In prediction mode, we need to reconstruct the branch contexts and apply the same sample partitioning.

classmethod use_multi_source() → bool[source]: Sample partitioner operates on dataset level.