nirs4all.controllers.data.sample_partitioner module
Sample Partitioner Controller for sample-based branching.
This controller partitions the dataset into multiple branches based on a sample filter (e.g., outlier detection). Unlike OutlierExcluderController which excludes samples from training, this controller creates separate branches where each branch contains a different subset of samples.
- For example, with Y-outlier detection:
Branch “outliers”: Contains ONLY the outlier samples
Branch “inliers”: Contains ONLY the non-outlier samples
This enables training separate models for different data subsets and comparing their performance.
Example
>>> pipeline = [
... ShuffleSplit(n_splits=5),
... {"branch": {
... "by": "sample_partitioner",
... "filter": {"method": "y_outlier", "threshold": 3.0},
... }},
... PLSRegression(n_components=10),
... ]
- class nirs4all.controllers.data.sample_partitioner.SamplePartitionerController[source]
Bases:
OperatorControllerController for sample-based branching via partitioning.
This controller creates two branches by partitioning samples based on a filter (e.g., outlier detection). Each branch contains a different subset of samples:
“outliers” branch: samples where filter returns False (outliers)
“inliers” branch: samples where filter returns True (non-outliers)
Unlike OutlierExcluderController which only excludes from training, this controller truly partitions the samples so each branch trains and predicts only on its subset.
- Key behaviors:
Each branch contains a disjoint subset of samples
Samples are partitioned, not excluded
Models train and predict only on their partition
Supports Y-outlier and X-outlier detection methods
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, StepOutput][source]
Execute the sample partitioner branch step.
Creates two branches: one for outliers and one for inliers. Each branch contains only its subset of samples.
- Parameters:
step_info – Parsed step containing branch definitions
dataset – Dataset to operate on
context – Pipeline execution context
runtime_context – Runtime infrastructure context
source – Data source index
mode – Execution mode (“train” or “predict”)
loaded_binaries – Pre-loaded binary objects for prediction mode
prediction_store – External prediction store for model predictions
- Returns:
Tuple of (updated_context, StepOutput with collected artifacts)
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Check if the step matches the sample_partitioner branch pattern.
- Matches:
{“branch”: {“by”: “sample_partitioner”, “filter”: {…}}}
- Parameters:
step – Original step configuration
operator – Deserialized operator
keyword – Step keyword
- Returns:
True if this is a sample_partitioner branch definition.