nirs4all.controllers.data.auto_transfer_preproc module
Auto Transfer Preprocessing Controller.
This module provides the AutoTransferPreprocessingController which automatically selects optimal preprocessing for transfer learning scenarios. It uses the TransferPreprocessingSelector to analyze source and target data and select preprocessing that minimizes distributional distance while preserving signal.
- Usage in pipeline:
# Standalone operator pipeline = [
{“auto_transfer_preproc”: {“preset”: “balanced”}}, “PLSRegressor”,
]
# With explicit configuration pipeline = [
- {
- “auto_transfer_preproc”: {
“preset”: “thorough”, “source_partition”: “train”, “target_partition”: “test”, “apply_recommendation”: True,
}
}, {“model”: “PLSRegressor”},
]
- class nirs4all.controllers.data.auto_transfer_preproc.AutoTransferPreprocessingController[source]
Bases:
OperatorControllerController for automatic transfer-optimized preprocessing selection.
This controller analyzes the distributional distance between source and target datasets and automatically selects preprocessing that best aligns them while preserving predictive information.
- Configuration options:
- preset: Preset configuration for the selector.
“fast” (default): Quick evaluation of single preprocessings only
“balanced”: Includes stacking evaluation
“thorough”: Includes stacking and augmentation
“full”: All stages including supervised validation
“exhaustive”: Deep analysis for research/benchmarking
- source_partition: Partition to use as source data (“train” or “test”).
Default is “train”.
- target_partition: Partition to use as target data (“train” or “test”).
Default is “test”.
- apply_recommendation: Whether to apply the best preprocessing to the
dataset. If False, only stores the recommendation in context. Default is True.
- top_k: Number of top recommendations to apply if using augmentation.
Default is 1 (best single preprocessing).
- use_augmentation: If top_k > 1, whether to use feature augmentation
to concatenate outputs. Default is False.
- n_components: Number of PCA components for metric computation.
Default is 10.
- verbose: Verbosity level (0=silent, 1=progress, 2=detailed).
Default is 1.
# Stage-specific options (override preset) run_stage2: Enable stacking evaluation. stage2_top_k: Number of top candidates for stacking. run_stage3: Enable augmentation evaluation. run_stage4: Enable supervised validation.
- Example pipeline configurations:
# Simple - use defaults {“auto_transfer_preproc”: {}}
# With preset {“auto_transfer_preproc”: {“preset”: “balanced”}}
# Full configuration {
- “auto_transfer_preproc”: {
“preset”: “thorough”, “source_partition”: “train”, “target_partition”: “test”, “apply_recommendation”: True, “top_k”: 1, “verbose”: 2,
}
}
# Multi-source with augmentation {
- “auto_transfer_preproc”: {
“preset”: “balanced”, “top_k”: 3, “use_augmentation”: True,
}
}
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, List[Tuple[str, Any]]][source]
Execute auto transfer preprocessing selection.
- In train mode:
Extract source and target data from the dataset
Run TransferPreprocessingSelector to find best preprocessing
Apply the recommended preprocessing if configured
Store the recommendation as an artifact
- In predict mode:
Load the saved preprocessing recommendation
Apply it to the incoming data
- Parameters:
step_info – Parsed step containing the auto_transfer_preproc config
dataset – SpectroDataset to operate on
context – Execution context with selector and metadata
runtime_context – Runtime infrastructure (saver, step_number, etc.)
source – Source index (-1 for all sources)
mode – Execution mode (“train”, “predict”, “explain”)
loaded_binaries – Pre-loaded artifacts for predict/explain mode
prediction_store – Not used by this controller
- Returns:
Tuple of (updated_context, list_of_artifacts)
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Check if step is an auto_transfer_preproc operation.