nirs4all.operators.models.selection module

Source model selection strategies for meta-model stacking.

This module provides flexible strategies for selecting which models to use as sources in a stacking ensemble.

Available selectors:

AllPreviousModelsSelector: Use all models from previous steps (default)
ExplicitModelSelector: Use explicitly named models
TopKByMetricSelector: Use top K models by validation metric
DiversitySelector: Select diverse models by class type

Example

>>> from nirs4all.operators.models.selection import (
...     AllPreviousModelsSelector,
...     TopKByMetricSelector,
...     SelectorFactory
... )
>>>
>>> # Default: all previous models
>>> selector = AllPreviousModelsSelector()
>>>
>>> # Top 3 by RMSE
>>> selector = TopKByMetricSelector(k=3, metric="rmse")
>>>
>>> # Using factory
>>> selector = SelectorFactory.create("top_k", k=5, metric="r2")

class nirs4all.operators.models.selection.AllPreviousModelsSelector(include_averaged: bool = False, exclude_classnames: Set[str] | None = None)[source]

Bases: SourceModelSelector

Select all models from previous steps in current branch.

This is the default selector that includes all models trained before the meta-model step within the same branch context.

include_averaged: If True, include fold-averaged models. Default False (uses individual fold models).

exclude_classnames: Set of model class names to exclude.

Example

>>> selector = AllPreviousModelsSelector(include_averaged=True)

select(candidates: List[ModelCandidate], context: ExecutionContext, prediction_store: Predictions) → List[ModelCandidate][source]

Select all previous models in current branch.

Parameters:

candidates – List of candidate models.
context – Execution context with step and branch info.
prediction_store – Predictions store (unused in this selector).

Returns:

Filtered list of candidates ordered by step index.

class nirs4all.operators.models.selection.DiversitySelector(max_per_class: int = 1, preferred_classes: List[str] | None = None)[source]

Bases: SourceModelSelector

Select diverse models by class type to maximize ensemble diversity.

Ensures the stacking ensemble includes different types of models rather than multiple similar models.

max_per_class: Maximum models per class type.

preferred_classes: Optional list of preferred class names.

Example

>>> selector = DiversitySelector(
...     max_per_class=2,
...     preferred_classes=["PLSRegression", "RandomForestRegressor"]
... )

select(candidates: List[ModelCandidate], context: ExecutionContext, prediction_store: Predictions) → List[ModelCandidate][source]

Select diverse models by class type.

Parameters:

candidates – List of candidate models.
context – Execution context.
prediction_store – Predictions store (unused).

Returns:

List of diverse models with at most max_per_class per type.

class nirs4all.operators.models.selection.ExplicitModelSelector(model_names: List[str], strict: bool = True)[source]

Bases: SourceModelSelector

Select explicitly named models.

Uses a predefined list of model names to select sources. Model names must match exactly (case-sensitive).

model_names: List of model names to select.

strict: If True, raise error if any named model is not found.

Example

>>> selector = ExplicitModelSelector(
...     model_names=["PLS", "RandomForest", "XGBoost"],
...     strict=True
... )

select(candidates: List[ModelCandidate], context: ExecutionContext, prediction_store: Predictions) → List[ModelCandidate][source]

Select models matching the specified names.

Parameters:

candidates – List of candidate models.
context – Execution context.
prediction_store – Predictions store (unused).

Returns:

List of candidates matching specified names, in the order specified by model_names.

Raises:

ValueError – If strict=True and any model name is not found.

class nirs4all.operators.models.selection.ModelCandidate(model_name: str, model_classname: str, step_idx: int, fold_id: str | None = None, branch_id: int | None = None, branch_name: str | None = None, val_score: float | None = None, metric: str | None = None, predictions: Dict[str, ndarray] | None = None)[source]

Bases: object

Information about a candidate source model.

Contains metadata and optionally predictions for a model that may be selected as a source for stacking.

model_name

Name of the model.

Type:: str

model_classname

Class name of the model (e.g., “PLSRegression”).

Type:: str

step_idx

Pipeline step index where the model was trained.

Type:: int

fold_id

Fold identifier (or “avg”/”w_avg” for averaged models).

Type:: str | None

branch_id

Branch identifier if in a branched pipeline.

Type:: int | None

branch_name

Human-readable branch name.

Type:: str | None

val_score

Validation score for the model.

Type:: float | None

metric

Metric used for scoring.

Type:: str | None

predictions

Optional dictionary with predictions data.

Type:: Dict[str, numpy.ndarray] | None

branch_id: int | None = None

branch_name: str | None = None

fold_id: str | None = None

metric: str | None = None

model_classname: str

model_name: str

predictions: Dict[str, ndarray] | None = None

step_idx: int

val_score: float | None = None

class nirs4all.operators.models.selection.SelectorFactory[source]

Bases: object

Factory for creating source model selectors.

Provides a convenient way to instantiate selectors by name.

Example

>>> selector = SelectorFactory.create("all")
>>> selector = SelectorFactory.create("explicit", model_names=["PLS", "RF"])
>>> selector = SelectorFactory.create("top_k", k=5, metric="rmse")

classmethod create(selector_type: str, **kwargs) → SourceModelSelector[source]

Create a selector by type name.

Parameters:

selector_type – Type name (e.g., “all”, “explicit”, “top_k”, “diversity”).
**kwargs – Arguments passed to the selector constructor.

Returns:

SourceModelSelector instance.

Raises:

ValueError – If selector_type is not recognized.

classmethod register(name: str, selector_class: type) → None[source]

Register a custom selector type.

Parameters:

name – Name to register under.
selector_class – Selector class (must inherit from SourceModelSelector).

Raises:

TypeError – If selector_class doesn’t inherit from SourceModelSelector.

class nirs4all.operators.models.selection.SourceModelSelector[source]

Bases: ABC

Abstract base class for source model selection strategies.

Defines the interface for selecting which models to include as sources in a stacking ensemble.

Subclasses must implement the select() method to define their selection logic.

Example

>>> class CustomSelector(SourceModelSelector):
...     def select(self, candidates, context, prediction_store):
...         # Custom selection logic
...         return [c for c in candidates if c.val_score > 0.9]

abstractmethod select(candidates: List[ModelCandidate], context: ExecutionContext, prediction_store: Predictions) → List[ModelCandidate][source]

Select source models from candidates.

Parameters:

candidates – List of candidate models to select from.
context – Execution context with current step and branch info.
prediction_store – Predictions store for accessing model data.

Returns:

List of selected ModelCandidate objects in the order they should be used as features (determines column order in meta-features).

validate(selected: List[ModelCandidate], context: ExecutionContext) → None[source]

Validate the selection (optional override).

Can raise ValueError if selection is invalid for the context.

Parameters:

selected – List of selected model candidates.
context – Execution context for validation.

Raises:

ValueError – If selection is invalid.

class nirs4all.operators.models.selection.TopKByMetricSelector(k: int, metric: str = 'val_score', ascending: bool | None = None, per_class: bool = False)[source]

Bases: SourceModelSelector

Select top K models by a validation metric.

Ranks models by their validation score and selects the top K performers.

k: Number of top models to select.

metric: Metric to rank by (e.g., “rmse”, “r2”, “accuracy”).

ascending: Sort direction. If None, inferred from metric.

per_class: If True, select top K per model class (for diversity).

Example

>>> selector = TopKByMetricSelector(k=3, metric="rmse", ascending=True)

select(candidates: List[ModelCandidate], context: ExecutionContext, prediction_store: Predictions) → List[ModelCandidate][source]

Select top K models by metric.

Parameters:

candidates – List of candidate models.
context – Execution context.
prediction_store – Predictions store (unused).

Returns:

Top K models sorted by metric.