nirs4all.operators.models.meta module

Meta-model operator for stacking ensemble.

This module provides the MetaModel operator for building stacking ensembles that use predictions from previously trained models as input features.

The meta-model trains on out-of-fold (OOF) predictions from base models to prevent data leakage and overfitting.

Example

>>> from nirs4all.operators.models import MetaModel
>>> from sklearn.linear_model import Ridge
>>>
>>> pipeline = [
...     MinMaxScaler(),
...     KFold(n_splits=5),
...     PLSRegression(n_components=10),
...     RandomForestRegressor(n_estimators=100),
...     {"model": MetaModel(model=Ridge(), source_models="all")},
... ]

class nirs4all.operators.models.meta.BranchScope(value)[source]

Bases: Enum

Which branches to include as source models.

Controls which branches’ predictions are used for stacking when the pipeline contains branching.

CURRENT_ONLY: Only use models from the current branch (default).

ALL_BRANCHES: Use models from all branches (requires compatible samples).

SPECIFIED: Use explicit list from source_models parameter.

ALL_BRANCHES = 'all_branches'

CURRENT_ONLY = 'current_only'

SPECIFIED = 'specified'

class nirs4all.operators.models.meta.CoverageStrategy(value)[source]

Bases: Enum

Strategy for handling partial coverage in OOF reconstruction.

When some samples are missing predictions (e.g., from sample partitioning), this determines how to handle them.

STRICT: Raise error if any sample is missing predictions (default).

DROP_INCOMPLETE: Drop samples missing any source model predictions.

IMPUTE_ZERO: Fill missing predictions with zeros.

IMPUTE_MEAN: Fill missing predictions with mean of available predictions.

IMPUTE_FOLD_MEAN: Fill with mean from the same fold.

DROP_INCOMPLETE = 'drop_incomplete'

IMPUTE_FOLD_MEAN = 'impute_fold_mean'

IMPUTE_MEAN = 'impute_mean'

IMPUTE_ZERO = 'impute_zero'

STRICT = 'strict'

class nirs4all.operators.models.meta.MetaModel(model: Any, source_models: str | List[str] = 'all', use_proba: bool = False, stacking_config: StackingConfig | None = None, selector: Any | None = None, name: str | None = None, finetune_space: Dict[str, Any] | None = None)[source]

Bases: BaseModelOperator

Wrapper for meta-model stacking using pipeline predictions.

Creates a meta-learner that uses predictions from previously trained models in the pipeline as input features. Implements stacked generalization with proper out-of-fold prediction handling to prevent data leakage.

The meta-model: 1. Collects out-of-fold (OOF) predictions from specified source models 2. Constructs training features from these predictions 3. Trains on these features using the provided sklearn-compatible model 4. For test data, aggregates source model predictions across folds

Multi-Level Stacking (Phase 7): MetaModel supports multi-level stacking where meta-models can use predictions from other meta-models as sources. This enables hierarchical ensemble architectures: - Level 0: Base models (PLS, RF, XGBoost, etc.) - Level 1: First meta-models (stack on Level 0) - Level 2: Second meta-models (stack on Level 0 + Level 1) - Level 3: Third meta-models (stack on all previous levels)

The level is auto-detected by default but can be explicitly set via stacking_config.level. Circular dependencies are automatically prevented.

model: Sklearn-compatible model to use as meta-learner.

source_models: Which models to use as sources (“all” or list of names).

use_proba: For classification, use probabilities instead of class predictions.

stacking_config: Configuration for OOF reconstruction and multi-level stacking.

selector: Optional custom source model selector.

finetune_space: Optional hyperparameter search space for Optuna finetuning.

Example

>>> # Basic usage - stack all previous models
>>> MetaModel(model=Ridge())
>>>
>>> # Explicit source selection
>>> MetaModel(
...     model=Ridge(),
...     source_models=["PLS", "RandomForest", "XGBoost"]
... )
>>>
>>> # Multi-level stacking
>>> pipeline = [
...     KFold(n_splits=5),
...     PLSRegression(n_components=5),         # Level 0
...     RandomForestRegressor(),               # Level 0
...     {"model": MetaModel(model=Ridge())},   # Level 1 (auto-detected)
...     {"model": MetaModel(                   # Level 2 (uses Level 0 + Level 1)
...         model=Lasso(),
...         stacking_config=StackingConfig(level=StackingLevel.LEVEL_2)
...     )},
... ]
>>>
>>> # With probability features for classification
>>> MetaModel(
...     model=LogisticRegression(),
...     use_proba=True
... )
>>>
>>> # With Optuna hyperparameter tuning
>>> MetaModel(
...     model=Ridge(),
...     finetune_space={"model__alpha": (0.001, 100.0)}
... )

Notes

Source models must be from earlier steps in the pipeline
In branched pipelines, only models from the current branch are used by default
For sample_partitioner branches, stacking is done within each partition
Multi-level stacking supports up to 3 levels by default (configurable)
Circular dependencies are automatically detected and prevented

__repr__() → str[source]: Return string representation.

get_controller_type() → str[source]

Return the type of controller that handles this operator.

Returns:: “meta” to indicate MetaModelController should handle this.
Return type:: str

get_finetune_params() → Dict[str, Any] | None[source]

Get finetuning parameters for Optuna optimization.

Returns the finetune_space with proper formatting for the Optuna manager.

Returns:: Dict with finetune configuration or None if no finetuning configured.

get_params(deep: bool = True) → Dict[str, Any][source]

Get parameters for this operator.

Parameters:: deep – If True, returns nested parameters from the model.
Returns:: Parameter names mapped to their values.
Return type:: dict

property level: int

Get the stacking level of this meta-model.

Returns the detected level if AUTO, otherwise the configured level.

Returns:: Stacking level (1, 2, or 3).
Return type:: int

property name: str

Get the display name for this meta-model.

Returns:: User-provided name or ‘MetaModel_<model_class>’.
Return type:: str

set_params(**params) → MetaModel[source]

Set the parameters of this operator.

Parameters:: **params – Operator parameters. Supports nested parameters for the model using ‘model__param_name’ syntax.
Returns:: MetaModel instance.
Return type:: self

class nirs4all.operators.models.meta.StackingConfig(coverage_strategy: CoverageStrategy = CoverageStrategy.STRICT, test_aggregation: TestAggregation = TestAggregation.MEAN, branch_scope: BranchScope = BranchScope.CURRENT_ONLY, allow_no_cv: bool = False, min_coverage_ratio: float = 1.0, level: StackingLevel = StackingLevel.AUTO, allow_meta_sources: bool = True, max_level: int = 3)[source]

Bases: object

Configuration for meta-model training set reconstruction.

Controls how out-of-fold predictions are collected and processed to build the training features for the meta-model.

coverage_strategy

How to handle samples with missing predictions.

Type:: nirs4all.operators.models.meta.CoverageStrategy

test_aggregation

How to aggregate test predictions across folds.

Type:: nirs4all.operators.models.meta.TestAggregation

branch_scope

Which branches to include as source models.

Type:: nirs4all.operators.models.meta.BranchScope

allow_no_cv

If True, allow stacking without cross-validation (with warning).

Type:: bool

min_coverage_ratio

Minimum ratio of source models required per sample.

Type:: float

level

Stacking level for multi-level stacking (AUTO, LEVEL_1, LEVEL_2, LEVEL_3).

Type:: nirs4all.operators.models.meta.StackingLevel

allow_meta_sources

If True, allow other MetaModels as source models.

Type:: bool

max_level

Maximum allowed stacking level (for validation).

Type:: int

Example

>>> config = StackingConfig(
...     coverage_strategy=CoverageStrategy.DROP_INCOMPLETE,
...     test_aggregation=TestAggregation.WEIGHTED_MEAN,
...     min_coverage_ratio=0.5,
...     level=StackingLevel.AUTO,
...     allow_meta_sources=True
... )

__post_init__()[source]: Validate configuration after initialization.

allow_meta_sources: bool = True

allow_no_cv: bool = False

branch_scope: BranchScope = 'current_only'

coverage_strategy: CoverageStrategy = 'strict'

level: StackingLevel = 'auto'

max_level: int = 3

min_coverage_ratio: float = 1.0

test_aggregation: TestAggregation = 'mean'

class nirs4all.operators.models.meta.StackingLevel(value)[source]

Bases: Enum

Level of stacking in multi-level stacking architecture.

Indicates where this meta-model sits in a stacking hierarchy. Used for validation and dependency tracking.

AUTO: Automatically detect level based on source models (default).

LEVEL_1: First meta-level (stacks on base models only).

LEVEL_2: Second meta-level (can stack on LEVEL_1 meta-models).

LEVEL_3: Third meta-level (can stack on LEVEL_1 and LEVEL_2).

AUTO = 'auto'

LEVEL_1 = 1

LEVEL_2 = 2

LEVEL_3 = 3

class nirs4all.operators.models.meta.TestAggregation(value)[source]

Bases: Enum

Strategy for aggregating test predictions from multiple folds.

When base models are trained with cross-validation, each fold produces predictions for the test set. This determines how to combine them.

MEAN: Simple average across folds (default).

WEIGHTED_MEAN: Weighted average by validation scores.

BEST_FOLD: Use prediction from best-scoring fold only.

BEST_FOLD = 'best'

MEAN = 'mean'

WEIGHTED_MEAN = 'weighted'