nirs4all.operators.models.meta module
Meta-model operator for stacking ensemble.
This module provides the MetaModel operator for building stacking ensembles that use predictions from previously trained models as input features.
The meta-model trains on out-of-fold (OOF) predictions from base models to prevent data leakage and overfitting.
Example
>>> from nirs4all.operators.models import MetaModel
>>> from sklearn.linear_model import Ridge
>>>
>>> pipeline = [
... MinMaxScaler(),
... KFold(n_splits=5),
... PLSRegression(n_components=10),
... RandomForestRegressor(n_estimators=100),
... {"model": MetaModel(model=Ridge(), source_models="all")},
... ]
- class nirs4all.operators.models.meta.BranchScope(value)[source]
Bases:
EnumWhich branches to include as source models.
Controls which branches’ predictions are used for stacking when the pipeline contains branching.
- CURRENT_ONLY
Only use models from the current branch (default).
- ALL_BRANCHES
Use models from all branches (requires compatible samples).
- SPECIFIED
Use explicit list from source_models parameter.
- ALL_BRANCHES = 'all_branches'
- CURRENT_ONLY = 'current_only'
- SPECIFIED = 'specified'
- class nirs4all.operators.models.meta.CoverageStrategy(value)[source]
Bases:
EnumStrategy for handling partial coverage in OOF reconstruction.
When some samples are missing predictions (e.g., from sample partitioning), this determines how to handle them.
- STRICT
Raise error if any sample is missing predictions (default).
- DROP_INCOMPLETE
Drop samples missing any source model predictions.
- IMPUTE_ZERO
Fill missing predictions with zeros.
- IMPUTE_MEAN
Fill missing predictions with mean of available predictions.
- IMPUTE_FOLD_MEAN
Fill with mean from the same fold.
- DROP_INCOMPLETE = 'drop_incomplete'
- IMPUTE_FOLD_MEAN = 'impute_fold_mean'
- IMPUTE_MEAN = 'impute_mean'
- IMPUTE_ZERO = 'impute_zero'
- STRICT = 'strict'
- class nirs4all.operators.models.meta.MetaModel(model: Any, source_models: str | List[str] = 'all', use_proba: bool = False, stacking_config: StackingConfig | None = None, selector: Any | None = None, name: str | None = None, finetune_space: Dict[str, Any] | None = None)[source]
Bases:
BaseModelOperatorWrapper for meta-model stacking using pipeline predictions.
Creates a meta-learner that uses predictions from previously trained models in the pipeline as input features. Implements stacked generalization with proper out-of-fold prediction handling to prevent data leakage.
The meta-model: 1. Collects out-of-fold (OOF) predictions from specified source models 2. Constructs training features from these predictions 3. Trains on these features using the provided sklearn-compatible model 4. For test data, aggregates source model predictions across folds
Multi-Level Stacking (Phase 7): MetaModel supports multi-level stacking where meta-models can use predictions from other meta-models as sources. This enables hierarchical ensemble architectures: - Level 0: Base models (PLS, RF, XGBoost, etc.) - Level 1: First meta-models (stack on Level 0) - Level 2: Second meta-models (stack on Level 0 + Level 1) - Level 3: Third meta-models (stack on all previous levels)
The level is auto-detected by default but can be explicitly set via stacking_config.level. Circular dependencies are automatically prevented.
- model
Sklearn-compatible model to use as meta-learner.
- source_models
Which models to use as sources (“all” or list of names).
- use_proba
For classification, use probabilities instead of class predictions.
- stacking_config
Configuration for OOF reconstruction and multi-level stacking.
- selector
Optional custom source model selector.
- finetune_space
Optional hyperparameter search space for Optuna finetuning.
Example
>>> # Basic usage - stack all previous models >>> MetaModel(model=Ridge()) >>> >>> # Explicit source selection >>> MetaModel( ... model=Ridge(), ... source_models=["PLS", "RandomForest", "XGBoost"] ... ) >>> >>> # Multi-level stacking >>> pipeline = [ ... KFold(n_splits=5), ... PLSRegression(n_components=5), # Level 0 ... RandomForestRegressor(), # Level 0 ... {"model": MetaModel(model=Ridge())}, # Level 1 (auto-detected) ... {"model": MetaModel( # Level 2 (uses Level 0 + Level 1) ... model=Lasso(), ... stacking_config=StackingConfig(level=StackingLevel.LEVEL_2) ... )}, ... ] >>> >>> # With probability features for classification >>> MetaModel( ... model=LogisticRegression(), ... use_proba=True ... ) >>> >>> # With Optuna hyperparameter tuning >>> MetaModel( ... model=Ridge(), ... finetune_space={"model__alpha": (0.001, 100.0)} ... )
Notes
Source models must be from earlier steps in the pipeline
In branched pipelines, only models from the current branch are used by default
For sample_partitioner branches, stacking is done within each partition
Multi-level stacking supports up to 3 levels by default (configurable)
Circular dependencies are automatically detected and prevented
- get_controller_type() str[source]
Return the type of controller that handles this operator.
- Returns:
“meta” to indicate MetaModelController should handle this.
- Return type:
- get_finetune_params() Dict[str, Any] | None[source]
Get finetuning parameters for Optuna optimization.
Returns the finetune_space with proper formatting for the Optuna manager.
- Returns:
Dict with finetune configuration or None if no finetuning configured.
- get_params(deep: bool = True) Dict[str, Any][source]
Get parameters for this operator.
- Parameters:
deep – If True, returns nested parameters from the model.
- Returns:
Parameter names mapped to their values.
- Return type:
- property level: int
Get the stacking level of this meta-model.
Returns the detected level if AUTO, otherwise the configured level.
- Returns:
Stacking level (1, 2, or 3).
- Return type:
- class nirs4all.operators.models.meta.StackingConfig(coverage_strategy: CoverageStrategy = CoverageStrategy.STRICT, test_aggregation: TestAggregation = TestAggregation.MEAN, branch_scope: BranchScope = BranchScope.CURRENT_ONLY, allow_no_cv: bool = False, min_coverage_ratio: float = 1.0, level: StackingLevel = StackingLevel.AUTO, allow_meta_sources: bool = True, max_level: int = 3)[source]
Bases:
objectConfiguration for meta-model training set reconstruction.
Controls how out-of-fold predictions are collected and processed to build the training features for the meta-model.
- coverage_strategy
How to handle samples with missing predictions.
- test_aggregation
How to aggregate test predictions across folds.
- branch_scope
Which branches to include as source models.
- level
Stacking level for multi-level stacking (AUTO, LEVEL_1, LEVEL_2, LEVEL_3).
Example
>>> config = StackingConfig( ... coverage_strategy=CoverageStrategy.DROP_INCOMPLETE, ... test_aggregation=TestAggregation.WEIGHTED_MEAN, ... min_coverage_ratio=0.5, ... level=StackingLevel.AUTO, ... allow_meta_sources=True ... )
- branch_scope: BranchScope = 'current_only'
- coverage_strategy: CoverageStrategy = 'strict'
- level: StackingLevel = 'auto'
- test_aggregation: TestAggregation = 'mean'
- class nirs4all.operators.models.meta.StackingLevel(value)[source]
Bases:
EnumLevel of stacking in multi-level stacking architecture.
Indicates where this meta-model sits in a stacking hierarchy. Used for validation and dependency tracking.
- AUTO
Automatically detect level based on source models (default).
- LEVEL_1
First meta-level (stacks on base models only).
- LEVEL_2
Second meta-level (can stack on LEVEL_1 meta-models).
- LEVEL_3
Third meta-level (can stack on LEVEL_1 and LEVEL_2).
- AUTO = 'auto'
- LEVEL_1 = 1
- LEVEL_2 = 2
- LEVEL_3 = 3
- class nirs4all.operators.models.meta.TestAggregation(value)[source]
Bases:
EnumStrategy for aggregating test predictions from multiple folds.
When base models are trained with cross-validation, each fold produces predictions for the test set. This determines how to combine them.
- MEAN
Simple average across folds (default).
- WEIGHTED_MEAN
Weighted average by validation scores.
- BEST_FOLD
Use prediction from best-scoring fold only.
- BEST_FOLD = 'best'
- MEAN = 'mean'
- WEIGHTED_MEAN = 'weighted'