nirs4all.controllers.models.stacking.branch_validator module

Branch Validator for Meta-Model Stacking (Phase 4).

This module provides validation logic for stacking in branched pipelines, including support for: - Preprocessing branches (same samples, different features) - Sample partitioner branches (different sample subsets) - Outlier excluder branches (same samples, different exclusions) - Generator syntax branches (same samples, model variants)

The validator ensures that stacking is only performed in compatible scenarios and provides clear error messages for unsupported cases.

class nirs4all.controllers.models.stacking.branch_validator.BranchInfo(branch_type: ~nirs4all.controllers.models.stacking.branch_validator.BranchType, branch_id: int | None = None, branch_name: str | None = None, branch_path: ~typing.List[int] = <factory>, partition_info: ~typing.Dict[str, ~typing.Any] | None = None, exclusion_info: ~typing.Dict[str, ~typing.Any] | None = None, sample_indices: ~typing.List[int] | None = None, n_samples: int | None = None, is_nested: bool = False, nesting_depth: int = 0)[source]

Bases: object

Information about branch context for stacking validation.

branch_id: int | None = None
branch_name: str | None = None
branch_path: List[int]
branch_type: BranchType
exclusion_info: Dict[str, Any] | None = None
is_nested: bool = False
n_samples: int | None = None
nesting_depth: int = 0
partition_info: Dict[str, Any] | None = None
sample_indices: List[int] | None = None
class nirs4all.controllers.models.stacking.branch_validator.BranchType(value)[source]

Bases: Enum

Types of branching in nirs4all pipelines.

GENERATOR = 'generator'
METADATA_PARTITIONER = 'metadata_partitioner'
NESTED = 'nested'
NONE = 'none'
OUTLIER_EXCLUDER = 'outlier_excluder'
PREPROCESSING = 'preprocessing'
SAMPLE_PARTITIONER = 'sample_partitioner'
UNKNOWN = 'unknown'
class nirs4all.controllers.models.stacking.branch_validator.BranchValidationResult(is_valid: bool, compatibility: ~nirs4all.controllers.models.stacking.branch_validator.StackingCompatibility, branch_info: ~nirs4all.controllers.models.stacking.branch_validator.BranchInfo, errors: ~typing.List[str] = <factory>, warnings: ~typing.List[str] = <factory>, source_filter_hint: ~typing.Dict[str, ~typing.Any] | None = None)[source]

Bases: object

Result of branch validation for stacking.

add_error(message: str) None[source]

Add an error message.

add_warning(message: str) None[source]

Add a warning message.

branch_info: BranchInfo
compatibility: StackingCompatibility
errors: List[str]
is_valid: bool
source_filter_hint: Dict[str, Any] | None = None
warnings: List[str]
class nirs4all.controllers.models.stacking.branch_validator.BranchValidator(prediction_store: Predictions, log_warnings: bool = True)[source]

Bases: object

Validates branch contexts for meta-model stacking.

This validator checks that the current branch context is compatible with stacking and provides clear error messages for unsupported cases.

Supported scenarios: - No branching: Fully compatible - Preprocessing branches: Stack within branch - Outlier excluder branches: Stack within branch (all samples have predictions) - Sample partitioner branches: Stack within partition only

Unsupported or limited scenarios: - Cross-partition stacking with sample_partitioner - Deeply nested branching (depth > 2) - Generator syntax with large variant counts

Example

>>> validator = BranchValidator(prediction_store)
>>> result = validator.validate(context, source_model_names)
>>> if not result.is_valid:
...     raise ValueError(result.errors[0])
MAX_GENERATOR_VARIANTS_WARNING = 10
MAX_NESTING_DEPTH = 2
validate(context: ExecutionContext, source_model_names: List[str], dataset: SpectroDataset | None = None) BranchValidationResult[source]

Validate branch context for stacking compatibility.

Parameters:
  • context – Current execution context with branch info.

  • source_model_names – List of source model names to validate.

  • dataset – Optional dataset for sample index validation.

Returns:

BranchValidationResult with validation status and any errors.

validate_sample_alignment(source_model_names: List[str], expected_sample_indices: List[int], context: ExecutionContext) BranchValidationResult[source]

Validate that source models have predictions for expected samples.

This is particularly important for sample_partitioner branches where different partitions have different samples.

Parameters:
  • source_model_names – List of source model names.

  • expected_sample_indices – Expected sample indices (from current partition).

  • context – Execution context.

Returns:

Validation result with any sample alignment issues.

class nirs4all.controllers.models.stacking.branch_validator.StackingCompatibility(value)[source]

Bases: Enum

Compatibility level for stacking with a branch type.

COMPATIBLE = 'compatible'
COMPATIBLE_WITH_WARNINGS = 'compatible_with_warnings'
NOT_SUPPORTED = 'not_supported'
WITHIN_PARTITION_ONLY = 'within_partition_only'
nirs4all.controllers.models.stacking.branch_validator.detect_branch_type(context: ExecutionContext) BranchType[source]

Detect the type of branching from execution context.

Convenience function for quick branch type detection.

Parameters:

context – Execution context with branch info.

Returns:

Detected BranchType enum value.

nirs4all.controllers.models.stacking.branch_validator.get_disjoint_branch_info(context: ExecutionContext) Dict[str, Any] | None[source]

Get information about the disjoint branch if applicable.

Parameters:

context – Execution context with branch info.

Returns:

Dict with partition info, or None if not a disjoint branch. Keys may include:

  • partition_type: “metadata” or “sample”

  • column: Metadata column name (for metadata partitioner)

  • partition_value: Value(s) for this partition

  • sample_indices: List of sample indices in this partition

  • n_samples: Number of samples in this partition

nirs4all.controllers.models.stacking.branch_validator.is_disjoint_branch(context: ExecutionContext) bool[source]

Check if the current branch context represents disjoint sample branching.

Disjoint branches partition samples into non-overlapping sets, where each sample exists in exactly ONE branch. This is in contrast to copy branches where all branches see all samples.

Disjoint branch types:
  • METADATA_PARTITIONER: Branches by metadata column value

  • SAMPLE_PARTITIONER: Branches by outlier status

Copy branch types:
  • PREPROCESSING: All branches see all samples

  • GENERATOR: All branches see all samples (model variants)

Parameters:

context – Execution context with branch info.

Returns:

True if current context represents a disjoint sample branch.

nirs4all.controllers.models.stacking.branch_validator.is_stacking_compatible(context: ExecutionContext) bool[source]

Quick check if stacking is compatible with current context.

Parameters:

context – Execution context.

Returns:

True if stacking is likely compatible.