nirs4all.controllers.models.stacking.classification module

Classification Support for Meta-Model Stacking.

Phase 5 Implementation - Provides utilities for: 1. Detecting classification vs regression task types from predictions 2. Extracting probability features for classification stacking 3. Handling binary and multiclass classification scenarios 4. Generating meaningful feature names with class information

Key components: - ClassificationFeatureExtractor: Extracts probability features from predictions - TaskTypeDetector: Detects task type from prediction metadata - FeatureNameGenerator: Creates descriptive feature names for meta-features

class nirs4all.controllers.models.stacking.classification.ClassificationFeatureExtractor(classification_info: ClassificationInfo, use_proba: bool = False)[source]

Bases: object

Extracts classification features from predictions.

Handles extraction of probability features for binary and multiclass classification, with proper handling of different array shapes.

extract_features(pred: Dict[str, Any], n_samples: int) → ndarray[source]

Extract features from a single prediction entry.

Parameters:

pred – Prediction dictionary with y_pred and optionally y_proba.
n_samples – Expected number of samples.

Returns:

Feature array of shape (n_samples,) or (n_samples, n_features).

get_n_features() → int[source]

Get number of features that will be extracted per model.

Returns:: Number of feature columns per source model.

class nirs4all.controllers.models.stacking.classification.ClassificationInfo(task_type: StackingTaskType, n_classes: int | None = None, class_labels: List[Any] | None = None, has_probabilities: bool = False, proba_shape: Tuple[int, ...] | None = None)[source]

Bases: object

Information about classification task detected from predictions.

task_type

Detected task type (regression/binary/multiclass).

Type:: nirs4all.controllers.models.stacking.classification.StackingTaskType

n_classes

Number of classes if classification, else None.

Type:: int | None

class_labels

Optional class labels if available.

Type:: List[Any] | None

has_probabilities

Whether y_proba is available in predictions.

Type:: bool

proba_shape

Shape of probability arrays if available.

Type:: Tuple[int, …] | None

class_labels: List[Any] | None = None

get_n_features_per_model(use_proba: bool = False) → int[source]

Get number of features per source model.

Parameters:: use_proba – Whether probability features are requested.
Returns:: Number of feature columns per source model. - Regression: 1 (y_pred) - Binary + use_proba: 1 (positive class probability) - Multiclass + use_proba: n_classes (all class probabilities) - Classification without use_proba: 1 (y_pred)

has_probabilities: bool = False

property is_binary: bool: Check if this is binary classification.

property is_classification: bool: Check if this is a classification task.

property is_multiclass: bool: Check if this is multiclass classification.

n_classes: int | None = None

proba_shape: Tuple[int, ...] | None = None

task_type: StackingTaskType

class nirs4all.controllers.models.stacking.classification.FeatureNameGenerator(classification_info: ClassificationInfo, use_proba: bool = False, pattern: str = '{model_name}_pred')[source]

Bases: object

Generates meaningful feature names for meta-model.

Creates descriptive feature names that include model name and, for classification with probabilities, class information.

generate_names(source_model_names: List[str]) → List[str][source]

Generate feature names for all source models.

Parameters:: source_model_names – List of source model names.
Returns:: List of feature column names.

get_feature_importance_mapping(source_model_names: List[str]) → Dict[str, List[str]][source]

Get mapping from source models to their feature names.

Useful for feature importance analysis.

Parameters:: source_model_names – List of source model names.
Returns:: Dictionary mapping model name to list of feature names.

class nirs4all.controllers.models.stacking.classification.MetaFeatureInfo(feature_names: ~typing.List[str], source_models: ~typing.List[str], feature_to_model: ~typing.Dict[str, str], classification_info: ~nirs4all.controllers.models.stacking.classification.ClassificationInfo, n_features_per_model: ~typing.Dict[str, int] = <factory>)[source]

Bases: object

Information about generated meta-features.

Used for tracking feature importance and providing interpretable results.

feature_names

List of all feature column names.

Type:: List[str]

source_models

List of source model names.

Type:: List[str]

feature_to_model

Mapping from feature name to source model.

Type:: Dict[str, str]

classification_info

Classification metadata.

Type:: nirs4all.controllers.models.stacking.classification.ClassificationInfo

n_features_per_model

Number of features from each model.

Type:: Dict[str, int]

aggregate_importance_by_model(feature_importances: Dict[str, float]) → Dict[str, float][source]

Aggregate feature importances by source model.

Sums importance scores for all features from the same source model.

Parameters:: feature_importances – Mapping from feature name to importance score.
Returns:: Mapping from model name to aggregated importance.

classification_info: ClassificationInfo

feature_names: List[str]

feature_to_model: Dict[str, str]

get_model_for_feature(feature_name: str) → str | None[source]

Get source model name for a feature.

Parameters:: feature_name – Feature column name.
Returns:: Source model name or None if not found.

n_features_per_model: Dict[str, int]

source_models: List[str]

class nirs4all.controllers.models.stacking.classification.StackingTaskType(value)[source]

Bases: Enum

Task type for stacking.

REGRESSION: Regression task using y_pred as features.

BINARY_CLASSIFICATION: Binary classification (2 classes).

MULTICLASS_CLASSIFICATION: Multi-class classification (>2 classes).

UNKNOWN: Could not determine task type.

BINARY_CLASSIFICATION = 'binary_classification'

MULTICLASS_CLASSIFICATION = 'multiclass_classification'

REGRESSION = 'regression'

UNKNOWN = 'unknown'

property is_classification: bool: Check if this is a classification task type.

property n_classes: int | None: Return expected number of classes or None for regression.

class nirs4all.controllers.models.stacking.classification.TaskTypeDetector(prediction_store: Predictions)[source]

Bases: object

Detects task type from prediction metadata.

Uses prediction store metadata and y_proba presence to determine whether the stacking involves regression or classification.

detect(source_model_names: List[str], context: ExecutionContext) → ClassificationInfo[source]

Detect task type from source model predictions.

Examines predictions from source models to determine task type and gather classification metadata.

Parameters:

source_model_names – List of source model names to examine.
context – Execution context with branch info.

Returns:

ClassificationInfo with detected task type and metadata.

nirs4all.controllers.models.stacking.classification.build_meta_feature_info(source_model_names: List[str], classification_info: ClassificationInfo, use_proba: bool = False, name_pattern: str = '{model_name}_pred') → MetaFeatureInfo[source]

Build MetaFeatureInfo from source models and classification info.

Parameters:

source_model_names – List of source model names.
classification_info – Classification metadata.
use_proba – Whether probability features are used.
name_pattern – Pattern for feature names.

Returns:

MetaFeatureInfo with all mappings populated.