nirs4all.controllers.models.stacking.reconstructor module
Training Set Reconstructor for Meta-Model Stacking.
This module provides the TrainingSetReconstructor class that builds meta-model training features from out-of-fold (OOF) predictions of source models.
The key principle is that each training sample’s meta-feature comes from a fold where that sample was NOT used for training, preventing data leakage.
- Classes:
TrainingSetReconstructor: Main class for OOF reconstruction. FoldAlignmentValidator: Validates fold structure consistency. ValidationResult: Container for validation errors and warnings. ReconstructionResult: Container for reconstructed data.
- class nirs4all.controllers.models.stacking.reconstructor.FoldAlignmentValidator(prediction_store: Predictions, config: ReconstructorConfig | None = None)[source]
Bases:
objectValidates fold structure consistency across source models.
Ensures that all source models have compatible fold structures for proper out-of-fold reconstruction.
Checks performed: 1. All models have the same number of folds. 2. Fold indices are sequential (0, 1, 2, …, K-1). 3. No sample appears in multiple validation sets within a model. 4. Sample indices are consistent across folds.
- prediction_store
Predictions storage for accessing fold data.
- config
Reconstructor configuration.
- validate(source_model_names: List[str], context: ExecutionContext, branch_id_override: int | None = -1) ValidationResult[source]
Validate fold alignment across source models.
- Parameters:
source_model_names – List of source model names to validate.
context – Execution context with branch info.
branch_id_override – Optional branch_id override. If -1 (default), use context’s branch_id. If None, don’t filter by branch (for ALL_BRANCHES scope).
- Returns:
ValidationResult with any errors or warnings.
- class nirs4all.controllers.models.stacking.reconstructor.ReconstructionResult(X_train_meta: ndarray, X_test_meta: ndarray, y_train: ndarray, y_test: ndarray, feature_names: List[str], source_models: List[str], valid_train_mask: ndarray, valid_test_mask: ndarray, validation_result: ValidationResult, n_folds: int, coverage_ratio: float, meta_feature_info: Any | None = None, classification_info: Any | None = None)[source]
Bases:
objectContainer for reconstructed training set data.
Holds the meta-feature matrices for training and test sets, along with metadata about the reconstruction process.
- X_train_meta
Training meta-features (n_train_samples, n_features).
- Type:
- X_test_meta
Test meta-features (n_test_samples, n_features).
- Type:
- y_train
Training targets (n_train_samples,).
- Type:
- y_test
Test targets (n_test_samples,).
- Type:
- valid_train_mask
Boolean mask of valid training samples (after coverage handling).
- Type:
- valid_test_mask
Boolean mask of valid test samples.
- Type:
- validation_result
Validation result from fold alignment.
- meta_feature_info
Optional MetaFeatureInfo for feature importance tracking.
- Type:
Any | None
- classification_info
Optional ClassificationInfo for task type metadata.
- Type:
Any | None
Example
>>> result = reconstructor.reconstruct(dataset, context) >>> X_train = result.X_train_meta[result.valid_train_mask] >>> y_train = result.y_train[result.valid_train_mask] >>> # For feature importance tracking >>> if result.meta_feature_info: ... model_importance = result.meta_feature_info.aggregate_importance_by_model( ... feature_importances ... )
- validation_result: ValidationResult
- class nirs4all.controllers.models.stacking.reconstructor.TrainingSetReconstructor(prediction_store: Predictions, source_model_names: List[str], stacking_config: StackingConfig | None = None, reconstructor_config: ReconstructorConfig | None = None, source_model_branch_map: Dict[str, int | None] | None = None)[source]
Bases:
objectReconstructs meta-model training set from out-of-fold predictions.
This is the core class for Phase 2 of the meta-model stacking implementation. It handles the critical task of collecting OOF predictions from source models and constructing feature matrices for the meta-learner.
The fundamental invariant is: No sample sees predictions from a model trained on that sample. This prevents data leakage.
- prediction_store
Predictions storage for accessing source predictions.
- source_model_names
List of source model names to use.
- stacking_config
Configuration for coverage and aggregation strategies.
- reconstructor_config
Internal configuration for reconstruction.
- fold_validator
Validator for fold alignment.
Example
>>> reconstructor = TrainingSetReconstructor( ... prediction_store=predictions, ... source_model_names=["PLS", "RF", "XGB"], ... stacking_config=StackingConfig( ... coverage_strategy=CoverageStrategy.DROP_INCOMPLETE, ... test_aggregation=TestAggregation.MEAN ... ) ... ) >>> result = reconstructor.reconstruct(dataset, context) >>> print(f"Coverage: {result.coverage_ratio:.1%}") >>> print(f"Features: {result.feature_names}")
- reconstruct(dataset: SpectroDataset, context: ExecutionContext, y_train: ndarray | None = None, y_test: ndarray | None = None, use_proba: bool = False) ReconstructionResult[source]
Reconstruct meta-model training and test sets from predictions.
Collects out-of-fold predictions for training samples and aggregated predictions for test samples.
Phase 5 Enhancement: Supports classification tasks with probability features for binary and multiclass classification.
- Parameters:
dataset – SpectroDataset for sample indices.
context – Execution context with partition and branch info.
y_train – Optional pre-computed training targets.
y_test – Optional pre-computed test targets.
use_proba – If True, use probability predictions for classification.
- Returns:
ReconstructionResult containing meta-feature matrices and metadata.
- Raises:
ValueError – If no source models found or critical validation fails.
- validate_branch_compatibility(context: ExecutionContext) ValidationResult[source]
Validate branch compatibility for stacking.
Checks that the current branch context is compatible with stacking based on the configured BranchScope.
- Parameters:
context – Execution context with branch info.
- Returns:
ValidationResult with any errors or warnings.
- class nirs4all.controllers.models.stacking.reconstructor.ValidationError(code: str, message: str, details: Dict[str, Any] | None = None)[source]
Bases:
objectA single validation error.
- class nirs4all.controllers.models.stacking.reconstructor.ValidationResult(errors: List[ValidationError] = <factory>, warnings: List[ValidationWarning] = <factory>)[source]
Bases:
objectContainer for validation errors and warnings.
Accumulates validation issues during fold alignment and coverage checks.
- errors
List of validation errors (critical issues).
- warnings
List of validation warnings (non-critical issues).
- is_valid
True if no errors (warnings are allowed).
Example
>>> result = ValidationResult() >>> result.add_error("FOLD_MISMATCH", "Folds don't align") >>> result.add_warning("PARTIAL_COVERAGE", "80% coverage") >>> if not result.is_valid: ... raise ValueError(result.format_errors())
- add_error(code: str, message: str, details: Dict[str, Any] | None = None) None[source]
Add a validation error.
- add_warning(code: str, message: str, details: Dict[str, Any] | None = None) None[source]
Add a validation warning.
- errors: List[ValidationError]
- merge(other: ValidationResult) None[source]
Merge another validation result into this one.
- warnings: List[ValidationWarning]