nirs4all.operators.models.meta module

Meta-model operator for stacking ensemble.

This module provides the MetaModel operator for building stacking ensembles that use predictions from previously trained models as input features.

The meta-model trains on out-of-fold (OOF) predictions from base models to prevent data leakage and overfitting.

Example

>>> from nirs4all.operators.models import MetaModel
>>> from sklearn.linear_model import Ridge
>>>
>>> pipeline = [
...     MinMaxScaler(),
...     KFold(n_splits=5),
...     PLSRegression(n_components=10),
...     RandomForestRegressor(n_estimators=100),
...     {"model": MetaModel(model=Ridge(), source_models="all")},
... ]
class nirs4all.operators.models.meta.BranchScope(value)[source]

Bases: Enum

Which branches to include as source models.

Controls which branches’ predictions are used for stacking when the pipeline contains branching.

CURRENT_ONLY

Only use models from the current branch (default).

ALL_BRANCHES

Use models from all branches (requires compatible samples).

SPECIFIED

Use explicit list from source_models parameter.

ALL_BRANCHES = 'all_branches'
CURRENT_ONLY = 'current_only'
SPECIFIED = 'specified'
class nirs4all.operators.models.meta.CoverageStrategy(value)[source]

Bases: Enum

Strategy for handling partial coverage in OOF reconstruction.

When some samples are missing predictions (e.g., from sample partitioning), this determines how to handle them.

STRICT

Raise error if any sample is missing predictions (default).

DROP_INCOMPLETE

Drop samples missing any source model predictions.

IMPUTE_ZERO

Fill missing predictions with zeros.

IMPUTE_MEAN

Fill missing predictions with mean of available predictions.

IMPUTE_FOLD_MEAN

Fill with mean from the same fold.

DROP_INCOMPLETE = 'drop_incomplete'
IMPUTE_FOLD_MEAN = 'impute_fold_mean'
IMPUTE_MEAN = 'impute_mean'
IMPUTE_ZERO = 'impute_zero'
STRICT = 'strict'
class nirs4all.operators.models.meta.MetaModel(model: Any, source_models: str | List[str] = 'all', use_proba: bool = False, stacking_config: StackingConfig | None = None, selector: Any | None = None, name: str | None = None, finetune_space: Dict[str, Any] | None = None)[source]

Bases: BaseModelOperator

Wrapper for meta-model stacking using pipeline predictions.

Creates a meta-learner that uses predictions from previously trained models in the pipeline as input features. Implements stacked generalization with proper out-of-fold prediction handling to prevent data leakage.

The meta-model: 1. Collects out-of-fold (OOF) predictions from specified source models 2. Constructs training features from these predictions 3. Trains on these features using the provided sklearn-compatible model 4. For test data, aggregates source model predictions across folds

Multi-Level Stacking (Phase 7): MetaModel supports multi-level stacking where meta-models can use predictions from other meta-models as sources. This enables hierarchical ensemble architectures: - Level 0: Base models (PLS, RF, XGBoost, etc.) - Level 1: First meta-models (stack on Level 0) - Level 2: Second meta-models (stack on Level 0 + Level 1) - Level 3: Third meta-models (stack on all previous levels)

The level is auto-detected by default but can be explicitly set via stacking_config.level. Circular dependencies are automatically prevented.

model

Sklearn-compatible model to use as meta-learner.

source_models

Which models to use as sources (“all” or list of names).

use_proba

For classification, use probabilities instead of class predictions.

stacking_config

Configuration for OOF reconstruction and multi-level stacking.

selector

Optional custom source model selector.

finetune_space

Optional hyperparameter search space for Optuna finetuning.

Example

>>> # Basic usage - stack all previous models
>>> MetaModel(model=Ridge())
>>>
>>> # Explicit source selection
>>> MetaModel(
...     model=Ridge(),
...     source_models=["PLS", "RandomForest", "XGBoost"]
... )
>>>
>>> # Multi-level stacking
>>> pipeline = [
...     KFold(n_splits=5),
...     PLSRegression(n_components=5),         # Level 0
...     RandomForestRegressor(),               # Level 0
...     {"model": MetaModel(model=Ridge())},   # Level 1 (auto-detected)
...     {"model": MetaModel(                   # Level 2 (uses Level 0 + Level 1)
...         model=Lasso(),
...         stacking_config=StackingConfig(level=StackingLevel.LEVEL_2)
...     )},
... ]
>>>
>>> # With probability features for classification
>>> MetaModel(
...     model=LogisticRegression(),
...     use_proba=True
... )
>>>
>>> # With Optuna hyperparameter tuning
>>> MetaModel(
...     model=Ridge(),
...     finetune_space={"model__alpha": (0.001, 100.0)}
... )

Notes

  • Source models must be from earlier steps in the pipeline

  • In branched pipelines, only models from the current branch are used by default

  • For sample_partitioner branches, stacking is done within each partition

  • Multi-level stacking supports up to 3 levels by default (configurable)

  • Circular dependencies are automatically detected and prevented

__repr__() str[source]

Return string representation.

get_controller_type() str[source]

Return the type of controller that handles this operator.

Returns:

“meta” to indicate MetaModelController should handle this.

Return type:

str

get_finetune_params() Dict[str, Any] | None[source]

Get finetuning parameters for Optuna optimization.

Returns the finetune_space with proper formatting for the Optuna manager.

Returns:

Dict with finetune configuration or None if no finetuning configured.

get_params(deep: bool = True) Dict[str, Any][source]

Get parameters for this operator.

Parameters:

deep – If True, returns nested parameters from the model.

Returns:

Parameter names mapped to their values.

Return type:

dict

property level: int

Get the stacking level of this meta-model.

Returns the detected level if AUTO, otherwise the configured level.

Returns:

Stacking level (1, 2, or 3).

Return type:

int

property name: str

Get the display name for this meta-model.

Returns:

User-provided name or ‘MetaModel_<model_class>’.

Return type:

str

set_params(**params) MetaModel[source]

Set the parameters of this operator.

Parameters:

**params – Operator parameters. Supports nested parameters for the model using ‘model__param_name’ syntax.

Returns:

MetaModel instance.

Return type:

self

class nirs4all.operators.models.meta.StackingConfig(coverage_strategy: CoverageStrategy = CoverageStrategy.STRICT, test_aggregation: TestAggregation = TestAggregation.MEAN, branch_scope: BranchScope = BranchScope.CURRENT_ONLY, allow_no_cv: bool = False, min_coverage_ratio: float = 1.0, level: StackingLevel = StackingLevel.AUTO, allow_meta_sources: bool = True, max_level: int = 3)[source]

Bases: object

Configuration for meta-model training set reconstruction.

Controls how out-of-fold predictions are collected and processed to build the training features for the meta-model.

coverage_strategy

How to handle samples with missing predictions.

Type:

nirs4all.operators.models.meta.CoverageStrategy

test_aggregation

How to aggregate test predictions across folds.

Type:

nirs4all.operators.models.meta.TestAggregation

branch_scope

Which branches to include as source models.

Type:

nirs4all.operators.models.meta.BranchScope

allow_no_cv

If True, allow stacking without cross-validation (with warning).

Type:

bool

min_coverage_ratio

Minimum ratio of source models required per sample.

Type:

float

level

Stacking level for multi-level stacking (AUTO, LEVEL_1, LEVEL_2, LEVEL_3).

Type:

nirs4all.operators.models.meta.StackingLevel

allow_meta_sources

If True, allow other MetaModels as source models.

Type:

bool

max_level

Maximum allowed stacking level (for validation).

Type:

int

Example

>>> config = StackingConfig(
...     coverage_strategy=CoverageStrategy.DROP_INCOMPLETE,
...     test_aggregation=TestAggregation.WEIGHTED_MEAN,
...     min_coverage_ratio=0.5,
...     level=StackingLevel.AUTO,
...     allow_meta_sources=True
... )
__post_init__()[source]

Validate configuration after initialization.

allow_meta_sources: bool = True
allow_no_cv: bool = False
branch_scope: BranchScope = 'current_only'
coverage_strategy: CoverageStrategy = 'strict'
level: StackingLevel = 'auto'
max_level: int = 3
min_coverage_ratio: float = 1.0
test_aggregation: TestAggregation = 'mean'
class nirs4all.operators.models.meta.StackingLevel(value)[source]

Bases: Enum

Level of stacking in multi-level stacking architecture.

Indicates where this meta-model sits in a stacking hierarchy. Used for validation and dependency tracking.

AUTO

Automatically detect level based on source models (default).

LEVEL_1

First meta-level (stacks on base models only).

LEVEL_2

Second meta-level (can stack on LEVEL_1 meta-models).

LEVEL_3

Third meta-level (can stack on LEVEL_1 and LEVEL_2).

AUTO = 'auto'
LEVEL_1 = 1
LEVEL_2 = 2
LEVEL_3 = 3
class nirs4all.operators.models.meta.TestAggregation(value)[source]

Bases: Enum

Strategy for aggregating test predictions from multiple folds.

When base models are trained with cross-validation, each fold produces predictions for the test set. This determines how to combine them.

MEAN

Simple average across folds (default).

WEIGHTED_MEAN

Weighted average by validation scores.

BEST_FOLD

Use prediction from best-scoring fold only.

BEST_FOLD = 'best'
MEAN = 'mean'
WEIGHTED_MEAN = 'weighted'