nirs4all.operators.models.selection module
Source model selection strategies for meta-model stacking.
This module provides flexible strategies for selecting which models to use as sources in a stacking ensemble.
- Available selectors:
AllPreviousModelsSelector: Use all models from previous steps (default)
ExplicitModelSelector: Use explicitly named models
TopKByMetricSelector: Use top K models by validation metric
DiversitySelector: Select diverse models by class type
Example
>>> from nirs4all.operators.models.selection import (
... AllPreviousModelsSelector,
... TopKByMetricSelector,
... SelectorFactory
... )
>>>
>>> # Default: all previous models
>>> selector = AllPreviousModelsSelector()
>>>
>>> # Top 3 by RMSE
>>> selector = TopKByMetricSelector(k=3, metric="rmse")
>>>
>>> # Using factory
>>> selector = SelectorFactory.create("top_k", k=5, metric="r2")
- class nirs4all.operators.models.selection.AllPreviousModelsSelector(include_averaged: bool = False, exclude_classnames: Set[str] | None = None)[source]
Bases:
SourceModelSelectorSelect all models from previous steps in current branch.
This is the default selector that includes all models trained before the meta-model step within the same branch context.
- include_averaged
If True, include fold-averaged models. Default False (uses individual fold models).
- exclude_classnames
Set of model class names to exclude.
Example
>>> selector = AllPreviousModelsSelector(include_averaged=True)
- select(candidates: List[ModelCandidate], context: ExecutionContext, prediction_store: Predictions) List[ModelCandidate][source]
Select all previous models in current branch.
- Parameters:
candidates – List of candidate models.
context – Execution context with step and branch info.
prediction_store – Predictions store (unused in this selector).
- Returns:
Filtered list of candidates ordered by step index.
- class nirs4all.operators.models.selection.DiversitySelector(max_per_class: int = 1, preferred_classes: List[str] | None = None)[source]
Bases:
SourceModelSelectorSelect diverse models by class type to maximize ensemble diversity.
Ensures the stacking ensemble includes different types of models rather than multiple similar models.
- max_per_class
Maximum models per class type.
- preferred_classes
Optional list of preferred class names.
Example
>>> selector = DiversitySelector( ... max_per_class=2, ... preferred_classes=["PLSRegression", "RandomForestRegressor"] ... )
- select(candidates: List[ModelCandidate], context: ExecutionContext, prediction_store: Predictions) List[ModelCandidate][source]
Select diverse models by class type.
- Parameters:
candidates – List of candidate models.
context – Execution context.
prediction_store – Predictions store (unused).
- Returns:
List of diverse models with at most max_per_class per type.
- class nirs4all.operators.models.selection.ExplicitModelSelector(model_names: List[str], strict: bool = True)[source]
Bases:
SourceModelSelectorSelect explicitly named models.
Uses a predefined list of model names to select sources. Model names must match exactly (case-sensitive).
- model_names
List of model names to select.
- strict
If True, raise error if any named model is not found.
Example
>>> selector = ExplicitModelSelector( ... model_names=["PLS", "RandomForest", "XGBoost"], ... strict=True ... )
- select(candidates: List[ModelCandidate], context: ExecutionContext, prediction_store: Predictions) List[ModelCandidate][source]
Select models matching the specified names.
- Parameters:
candidates – List of candidate models.
context – Execution context.
prediction_store – Predictions store (unused).
- Returns:
List of candidates matching specified names, in the order specified by model_names.
- Raises:
ValueError – If strict=True and any model name is not found.
- class nirs4all.operators.models.selection.ModelCandidate(model_name: str, model_classname: str, step_idx: int, fold_id: str | None = None, branch_id: int | None = None, branch_name: str | None = None, val_score: float | None = None, metric: str | None = None, predictions: Dict[str, ndarray] | None = None)[source]
Bases:
objectInformation about a candidate source model.
Contains metadata and optionally predictions for a model that may be selected as a source for stacking.
- predictions
Optional dictionary with predictions data.
- Type:
Dict[str, numpy.ndarray] | None
- class nirs4all.operators.models.selection.SelectorFactory[source]
Bases:
objectFactory for creating source model selectors.
Provides a convenient way to instantiate selectors by name.
Example
>>> selector = SelectorFactory.create("all") >>> selector = SelectorFactory.create("explicit", model_names=["PLS", "RF"]) >>> selector = SelectorFactory.create("top_k", k=5, metric="rmse")
- classmethod create(selector_type: str, **kwargs) SourceModelSelector[source]
Create a selector by type name.
- Parameters:
selector_type – Type name (e.g., “all”, “explicit”, “top_k”, “diversity”).
**kwargs – Arguments passed to the selector constructor.
- Returns:
SourceModelSelector instance.
- Raises:
ValueError – If selector_type is not recognized.
- class nirs4all.operators.models.selection.SourceModelSelector[source]
Bases:
ABCAbstract base class for source model selection strategies.
Defines the interface for selecting which models to include as sources in a stacking ensemble.
Subclasses must implement the select() method to define their selection logic.
Example
>>> class CustomSelector(SourceModelSelector): ... def select(self, candidates, context, prediction_store): ... # Custom selection logic ... return [c for c in candidates if c.val_score > 0.9]
- abstractmethod select(candidates: List[ModelCandidate], context: ExecutionContext, prediction_store: Predictions) List[ModelCandidate][source]
Select source models from candidates.
- Parameters:
candidates – List of candidate models to select from.
context – Execution context with current step and branch info.
prediction_store – Predictions store for accessing model data.
- Returns:
List of selected ModelCandidate objects in the order they should be used as features (determines column order in meta-features).
- validate(selected: List[ModelCandidate], context: ExecutionContext) None[source]
Validate the selection (optional override).
Can raise ValueError if selection is invalid for the context.
- Parameters:
selected – List of selected model candidates.
context – Execution context for validation.
- Raises:
ValueError – If selection is invalid.
- class nirs4all.operators.models.selection.TopKByMetricSelector(k: int, metric: str = 'val_score', ascending: bool | None = None, per_class: bool = False)[source]
Bases:
SourceModelSelectorSelect top K models by a validation metric.
Ranks models by their validation score and selects the top K performers.
- k
Number of top models to select.
- metric
Metric to rank by (e.g., “rmse”, “r2”, “accuracy”).
- ascending
Sort direction. If None, inferred from metric.
- per_class
If True, select top K per model class (for diversity).
Example
>>> selector = TopKByMetricSelector(k=3, metric="rmse", ascending=True)
- select(candidates: List[ModelCandidate], context: ExecutionContext, prediction_store: Predictions) List[ModelCandidate][source]
Select top K models by metric.
- Parameters:
candidates – List of candidate models.
context – Execution context.
prediction_store – Predictions store (unused).
- Returns:
Top K models sorted by metric.