nirs4all.controllers.models.stacking.classification module

Classification Support for Meta-Model Stacking.

Phase 5 Implementation - Provides utilities for: 1. Detecting classification vs regression task types from predictions 2. Extracting probability features for classification stacking 3. Handling binary and multiclass classification scenarios 4. Generating meaningful feature names with class information

Key components: - ClassificationFeatureExtractor: Extracts probability features from predictions - TaskTypeDetector: Detects task type from prediction metadata - FeatureNameGenerator: Creates descriptive feature names for meta-features

class nirs4all.controllers.models.stacking.classification.ClassificationFeatureExtractor(classification_info: ClassificationInfo, use_proba: bool = False)[source]

Bases: object

Extracts classification features from predictions.

Handles extraction of probability features for binary and multiclass classification, with proper handling of different array shapes.

extract_features(pred: Dict[str, Any], n_samples: int) ndarray[source]

Extract features from a single prediction entry.

Parameters:
  • pred – Prediction dictionary with y_pred and optionally y_proba.

  • n_samples – Expected number of samples.

Returns:

Feature array of shape (n_samples,) or (n_samples, n_features).

get_n_features() int[source]

Get number of features that will be extracted per model.

Returns:

Number of feature columns per source model.

class nirs4all.controllers.models.stacking.classification.ClassificationInfo(task_type: StackingTaskType, n_classes: int | None = None, class_labels: List[Any] | None = None, has_probabilities: bool = False, proba_shape: Tuple[int, ...] | None = None)[source]

Bases: object

Information about classification task detected from predictions.

task_type

Detected task type (regression/binary/multiclass).

Type:

nirs4all.controllers.models.stacking.classification.StackingTaskType

n_classes

Number of classes if classification, else None.

Type:

int | None

class_labels

Optional class labels if available.

Type:

List[Any] | None

has_probabilities

Whether y_proba is available in predictions.

Type:

bool

proba_shape

Shape of probability arrays if available.

Type:

Tuple[int, …] | None

class_labels: List[Any] | None = None
get_n_features_per_model(use_proba: bool = False) int[source]

Get number of features per source model.

Parameters:

use_proba – Whether probability features are requested.

Returns:

Number of feature columns per source model. - Regression: 1 (y_pred) - Binary + use_proba: 1 (positive class probability) - Multiclass + use_proba: n_classes (all class probabilities) - Classification without use_proba: 1 (y_pred)

has_probabilities: bool = False
property is_binary: bool

Check if this is binary classification.

property is_classification: bool

Check if this is a classification task.

property is_multiclass: bool

Check if this is multiclass classification.

n_classes: int | None = None
proba_shape: Tuple[int, ...] | None = None
task_type: StackingTaskType
class nirs4all.controllers.models.stacking.classification.FeatureNameGenerator(classification_info: ClassificationInfo, use_proba: bool = False, pattern: str = '{model_name}_pred')[source]

Bases: object

Generates meaningful feature names for meta-model.

Creates descriptive feature names that include model name and, for classification with probabilities, class information.

generate_names(source_model_names: List[str]) List[str][source]

Generate feature names for all source models.

Parameters:

source_model_names – List of source model names.

Returns:

List of feature column names.

get_feature_importance_mapping(source_model_names: List[str]) Dict[str, List[str]][source]

Get mapping from source models to their feature names.

Useful for feature importance analysis.

Parameters:

source_model_names – List of source model names.

Returns:

Dictionary mapping model name to list of feature names.

class nirs4all.controllers.models.stacking.classification.MetaFeatureInfo(feature_names: ~typing.List[str], source_models: ~typing.List[str], feature_to_model: ~typing.Dict[str, str], classification_info: ~nirs4all.controllers.models.stacking.classification.ClassificationInfo, n_features_per_model: ~typing.Dict[str, int] = <factory>)[source]

Bases: object

Information about generated meta-features.

Used for tracking feature importance and providing interpretable results.

feature_names

List of all feature column names.

Type:

List[str]

source_models

List of source model names.

Type:

List[str]

feature_to_model

Mapping from feature name to source model.

Type:

Dict[str, str]

classification_info

Classification metadata.

Type:

nirs4all.controllers.models.stacking.classification.ClassificationInfo

n_features_per_model

Number of features from each model.

Type:

Dict[str, int]

aggregate_importance_by_model(feature_importances: Dict[str, float]) Dict[str, float][source]

Aggregate feature importances by source model.

Sums importance scores for all features from the same source model.

Parameters:

feature_importances – Mapping from feature name to importance score.

Returns:

Mapping from model name to aggregated importance.

classification_info: ClassificationInfo
feature_names: List[str]
feature_to_model: Dict[str, str]
get_model_for_feature(feature_name: str) str | None[source]

Get source model name for a feature.

Parameters:

feature_name – Feature column name.

Returns:

Source model name or None if not found.

n_features_per_model: Dict[str, int]
source_models: List[str]
class nirs4all.controllers.models.stacking.classification.StackingTaskType(value)[source]

Bases: Enum

Task type for stacking.

REGRESSION

Regression task using y_pred as features.

BINARY_CLASSIFICATION

Binary classification (2 classes).

MULTICLASS_CLASSIFICATION

Multi-class classification (>2 classes).

UNKNOWN

Could not determine task type.

BINARY_CLASSIFICATION = 'binary_classification'
MULTICLASS_CLASSIFICATION = 'multiclass_classification'
REGRESSION = 'regression'
UNKNOWN = 'unknown'
property is_classification: bool

Check if this is a classification task type.

property n_classes: int | None

Return expected number of classes or None for regression.

class nirs4all.controllers.models.stacking.classification.TaskTypeDetector(prediction_store: Predictions)[source]

Bases: object

Detects task type from prediction metadata.

Uses prediction store metadata and y_proba presence to determine whether the stacking involves regression or classification.

detect(source_model_names: List[str], context: ExecutionContext) ClassificationInfo[source]

Detect task type from source model predictions.

Examines predictions from source models to determine task type and gather classification metadata.

Parameters:
  • source_model_names – List of source model names to examine.

  • context – Execution context with branch info.

Returns:

ClassificationInfo with detected task type and metadata.

nirs4all.controllers.models.stacking.classification.build_meta_feature_info(source_model_names: List[str], classification_info: ClassificationInfo, use_proba: bool = False, name_pattern: str = '{model_name}_pred') MetaFeatureInfo[source]

Build MetaFeatureInfo from source models and classification info.

Parameters:
  • source_model_names – List of source model names.

  • classification_info – Classification metadata.

  • use_proba – Whether probability features are used.

  • name_pattern – Pattern for feature names.

Returns:

MetaFeatureInfo with all mappings populated.