nirs4all.controllers.data.auto_transfer_preproc module

Auto Transfer Preprocessing Controller.

This module provides the AutoTransferPreprocessingController which automatically selects optimal preprocessing for transfer learning scenarios. It uses the TransferPreprocessingSelector to analyze source and target data and select preprocessing that minimizes distributional distance while preserving signal.

Usage in pipeline:

# Standalone operator pipeline = [

{“auto_transfer_preproc”: {“preset”: “balanced”}}, “PLSRegressor”,

]

# With explicit configuration pipeline = [

{
“auto_transfer_preproc”: {

“preset”: “thorough”, “source_partition”: “train”, “target_partition”: “test”, “apply_recommendation”: True,

}

}, {“model”: “PLSRegressor”},

]

class nirs4all.controllers.data.auto_transfer_preproc.AutoTransferPreprocessingController[source]

Bases: OperatorController

Controller for automatic transfer-optimized preprocessing selection.

This controller analyzes the distributional distance between source and target datasets and automatically selects preprocessing that best aligns them while preserving predictive information.

Configuration options:
preset: Preset configuration for the selector.
  • “fast” (default): Quick evaluation of single preprocessings only

  • “balanced”: Includes stacking evaluation

  • “thorough”: Includes stacking and augmentation

  • “full”: All stages including supervised validation

  • “exhaustive”: Deep analysis for research/benchmarking

source_partition: Partition to use as source data (“train” or “test”).

Default is “train”.

target_partition: Partition to use as target data (“train” or “test”).

Default is “test”.

apply_recommendation: Whether to apply the best preprocessing to the

dataset. If False, only stores the recommendation in context. Default is True.

top_k: Number of top recommendations to apply if using augmentation.

Default is 1 (best single preprocessing).

use_augmentation: If top_k > 1, whether to use feature augmentation

to concatenate outputs. Default is False.

n_components: Number of PCA components for metric computation.

Default is 10.

verbose: Verbosity level (0=silent, 1=progress, 2=detailed).

Default is 1.

# Stage-specific options (override preset) run_stage2: Enable stacking evaluation. stage2_top_k: Number of top candidates for stacking. run_stage3: Enable augmentation evaluation. run_stage4: Enable supervised validation.

Example pipeline configurations:

# Simple - use defaults {“auto_transfer_preproc”: {}}

# With preset {“auto_transfer_preproc”: {“preset”: “balanced”}}

# Full configuration {

“auto_transfer_preproc”: {

“preset”: “thorough”, “source_partition”: “train”, “target_partition”: “test”, “apply_recommendation”: True, “top_k”: 1, “verbose”: 2,

}

}

# Multi-source with augmentation {

“auto_transfer_preproc”: {

“preset”: “balanced”, “top_k”: 3, “use_augmentation”: True,

}

}

execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, List[Tuple[str, Any]]][source]

Execute auto transfer preprocessing selection.

In train mode:
  1. Extract source and target data from the dataset

  2. Run TransferPreprocessingSelector to find best preprocessing

  3. Apply the recommended preprocessing if configured

  4. Store the recommendation as an artifact

In predict mode:
  1. Load the saved preprocessing recommendation

  2. Apply it to the incoming data

Parameters:
  • step_info – Parsed step containing the auto_transfer_preproc config

  • dataset – SpectroDataset to operate on

  • context – Execution context with selector and metadata

  • runtime_context – Runtime infrastructure (saver, step_number, etc.)

  • source – Source index (-1 for all sources)

  • mode – Execution mode (“train”, “predict”, “explain”)

  • loaded_binaries – Pre-loaded artifacts for predict/explain mode

  • prediction_store – Not used by this controller

Returns:

Tuple of (updated_context, list_of_artifacts)

classmethod matches(step: Any, operator: Any, keyword: str) bool[source]

Check if step is an auto_transfer_preproc operation.

priority: int = 9
classmethod supports_prediction_mode() bool[source]

Supports prediction mode for applying saved recommendations.

In prediction mode, the controller loads the previously computed preprocessing recommendation and applies it to the new data.

classmethod use_multi_source() bool[source]

Supports multi-source datasets.