nirs4all.controllers package
Subpackages
- nirs4all.controllers.charts package
- Submodules
- Module contents
- nirs4all.controllers.data package
- Submodules
- nirs4all.controllers.data.auto_transfer_preproc module
- nirs4all.controllers.data.balancing module
- nirs4all.controllers.data.branch module
- nirs4all.controllers.data.concat_transform module
- nirs4all.controllers.data.feature_augmentation module
- nirs4all.controllers.data.feature_selection module
- nirs4all.controllers.data.merge module
- nirs4all.controllers.data.metadata_partitioner module
- nirs4all.controllers.data.outlier_excluder module
- nirs4all.controllers.data.repetition module
- nirs4all.controllers.data.resampler module
- nirs4all.controllers.data.sample_augmentation module
- nirs4all.controllers.data.sample_filter module
- nirs4all.controllers.data.sample_partitioner module
- nirs4all.controllers.data.source_branch module
- Module contents
AutoTransferPreprocessingControllerConcatAugmentationControllerFeatureAugmentationControllerFeatureSelectionControllerMergeConfigParserMergeControllerMergeController.priorityMergeController.SUPPORTED_KEYWORDSMergeController.SUPPORTED_KEYWORDSMergeController.build_config_from_meta_model()MergeController.execute()MergeController.matches()MergeController.merge_branches()MergeController.priorityMergeController.supports_prediction_mode()MergeController.use_multi_source()
MetadataPartitionerControllerOutlierExcluderControllerRepToPPControllerRepToSourcesControllerResamplerControllerSampleAugmentationControllerSampleFilterControllerSamplePartitionerControllerSourceBranchConfigParserSourceBranchController
- Submodules
- nirs4all.controllers.flow package
- nirs4all.controllers.models package
- Subpackages
- Submodules
- nirs4all.controllers.models.autogluon_model module
- nirs4all.controllers.models.base_model module
- nirs4all.controllers.models.factory module
- nirs4all.controllers.models.jax_model module
- nirs4all.controllers.models.jax_wrapper module
- nirs4all.controllers.models.meta_model module
- nirs4all.controllers.models.sklearn_model module
- nirs4all.controllers.models.tensorflow_model module
- nirs4all.controllers.models.torch_model module
- nirs4all.controllers.models.utilities module
- Module contents
AutoGluonModelControllerBaseModelControllerBaseModelController.optuna_managerBaseModelController.identifier_generatorBaseModelController.prediction_transformerBaseModelController.prediction_assemblerBaseModelController.score_calculatorBaseModelController.index_normalizerBaseModelController.prediction_storeBaseModelController.verboseBaseModelController.execute()BaseModelController.finetune()BaseModelController.get_effective_layout()BaseModelController.get_preferred_layout()BaseModelController.get_xy()BaseModelController.launch_training()BaseModelController.load_model()BaseModelController.priorityBaseModelController.process_hyperparameters()BaseModelController.save_model()BaseModelController.supports_prediction_mode()BaseModelController.train()BaseModelController.use_multi_source()
FoldAlignmentValidatorJaxModelControllerMetaModelControllerPyTorchModelControllerReconstructionResultReconstructionResult.X_train_metaReconstructionResult.X_test_metaReconstructionResult.y_trainReconstructionResult.y_testReconstructionResult.feature_namesReconstructionResult.source_modelsReconstructionResult.valid_train_maskReconstructionResult.valid_test_maskReconstructionResult.validation_resultReconstructionResult.n_foldsReconstructionResult.coverage_ratioReconstructionResult.meta_feature_infoReconstructionResult.classification_infoReconstructionResult.X_test_metaReconstructionResult.X_train_metaReconstructionResult.classification_infoReconstructionResult.coverage_ratioReconstructionResult.feature_namesReconstructionResult.meta_feature_infoReconstructionResult.n_foldsReconstructionResult.source_modelsReconstructionResult.valid_test_maskReconstructionResult.valid_train_maskReconstructionResult.validation_resultReconstructionResult.y_testReconstructionResult.y_train
ReconstructorConfigReconstructorConfig.validate_fold_alignmentReconstructorConfig.validate_sample_coverageReconstructorConfig.log_warningsReconstructorConfig.max_missing_fold_ratioReconstructorConfig.allow_partial_sourcesReconstructorConfig.feature_name_patternReconstructorConfig.excluded_fold_idsReconstructorConfig.__post_init__()ReconstructorConfig.allow_partial_sourcesReconstructorConfig.excluded_fold_idsReconstructorConfig.feature_name_patternReconstructorConfig.log_warningsReconstructorConfig.max_missing_fold_ratioReconstructorConfig.validate_fold_alignmentReconstructorConfig.validate_sample_coverage
SklearnModelControllerTensorFlowModelControllerTrainingSetReconstructorTrainingSetReconstructor.prediction_storeTrainingSetReconstructor.source_model_namesTrainingSetReconstructor.stacking_configTrainingSetReconstructor.reconstructor_configTrainingSetReconstructor.fold_validatorTrainingSetReconstructor.reconstruct()TrainingSetReconstructor.validate_branch_compatibility()
ValidationResultValidationResult.errorsValidationResult.warningsValidationResult.is_validValidationResult.add_error()ValidationResult.add_warning()ValidationResult.errorsValidationResult.format_errors()ValidationResult.format_warnings()ValidationResult.is_validValidationResult.merge()ValidationResult.warnings
- nirs4all.controllers.shared package
- nirs4all.controllers.splitters package
- nirs4all.controllers.transforms package
Submodules
Module contents
Controllers module for nirs4all package.
This module contains all controller classes for pipeline operator execution. Controllers implement the execution logic for different operator types following the operator-controller pattern.
- class nirs4all.controllers.AugmentationChartController[source]
Bases:
OperatorControllerController for visualizing augmentation effects on spectra.
Supports two visualization modes: 1. augment_chart: Shows original vs augmented samples overlaid with different colors 2. augment_details_chart: Shows a grid with raw data and each augmentation type separately
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: Any, source: int = -1, mode: str = 'train', loaded_binaries: Any = None, prediction_store: Any = None) Tuple[ExecutionContext, Any][source]
Execute augmentation visualization.
- Returns:
Tuple of (context, StepOutput)
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Check if the operator matches the step and keyword.
- class nirs4all.controllers.AutoTransferPreprocessingController[source]
Bases:
OperatorControllerController for automatic transfer-optimized preprocessing selection.
This controller analyzes the distributional distance between source and target datasets and automatically selects preprocessing that best aligns them while preserving predictive information.
- Configuration options:
- preset: Preset configuration for the selector.
“fast” (default): Quick evaluation of single preprocessings only
“balanced”: Includes stacking evaluation
“thorough”: Includes stacking and augmentation
“full”: All stages including supervised validation
“exhaustive”: Deep analysis for research/benchmarking
- source_partition: Partition to use as source data (“train” or “test”).
Default is “train”.
- target_partition: Partition to use as target data (“train” or “test”).
Default is “test”.
- apply_recommendation: Whether to apply the best preprocessing to the
dataset. If False, only stores the recommendation in context. Default is True.
- top_k: Number of top recommendations to apply if using augmentation.
Default is 1 (best single preprocessing).
- use_augmentation: If top_k > 1, whether to use feature augmentation
to concatenate outputs. Default is False.
- n_components: Number of PCA components for metric computation.
Default is 10.
- verbose: Verbosity level (0=silent, 1=progress, 2=detailed).
Default is 1.
# Stage-specific options (override preset) run_stage2: Enable stacking evaluation. stage2_top_k: Number of top candidates for stacking. run_stage3: Enable augmentation evaluation. run_stage4: Enable supervised validation.
- Example pipeline configurations:
# Simple - use defaults {“auto_transfer_preproc”: {}}
# With preset {“auto_transfer_preproc”: {“preset”: “balanced”}}
# Full configuration {
- “auto_transfer_preproc”: {
“preset”: “thorough”, “source_partition”: “train”, “target_partition”: “test”, “apply_recommendation”: True, “top_k”: 1, “verbose”: 2,
}
}
# Multi-source with augmentation {
- “auto_transfer_preproc”: {
“preset”: “balanced”, “top_k”: 3, “use_augmentation”: True,
}
}
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, List[Tuple[str, Any]]][source]
Execute auto transfer preprocessing selection.
- In train mode:
Extract source and target data from the dataset
Run TransferPreprocessingSelector to find best preprocessing
Apply the recommended preprocessing if configured
Store the recommendation as an artifact
- In predict mode:
Load the saved preprocessing recommendation
Apply it to the incoming data
- Parameters:
step_info – Parsed step containing the auto_transfer_preproc config
dataset – SpectroDataset to operate on
context – Execution context with selector and metadata
runtime_context – Runtime infrastructure (saver, step_number, etc.)
source – Source index (-1 for all sources)
mode – Execution mode (“train”, “predict”, “explain”)
loaded_binaries – Pre-loaded artifacts for predict/explain mode
prediction_store – Not used by this controller
- Returns:
Tuple of (updated_context, list_of_artifacts)
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Check if step is an auto_transfer_preproc operation.
- class nirs4all.controllers.BaseController[source]
Bases:
ABCAbstract base class for all controllers.
Controllers are responsible for executing operators within a pipeline context. They handle framework-specific logic, state management, and validation.
- abstractmethod can_handle(operator: Any) bool[source]
Check if this controller can handle the given operator.
- Parameters:
operator (Any) – The operator to check.
- Returns:
True if this controller can handle the operator.
- Return type:
- cleanup(operator: Any, context: ExecutionContext) None[source]
Clean up after operator execution.
This method can be overridden to perform cleanup tasks after execution.
- Parameters:
operator (Any) – The operator that was executed.
context (ExecutionContext) – Pipeline execution context.
- abstractmethod execute(operator: Any, context: ExecutionContext) Any[source]
Execute the operator within the pipeline context.
- Parameters:
operator (Any) – The operator to execute.
context (ExecutionContext) – Pipeline execution context including data, state, etc.
- Returns:
Result of operator execution.
- Return type:
Any
- prepare(operator: Any, context: ExecutionContext) None[source]
Prepare the operator for execution.
This method can be overridden to perform setup tasks before execution.
- Parameters:
operator (Any) – The operator to prepare.
context (ExecutionContext) – Pipeline execution context.
- validate(operator: Any) None[source]
Validate the operator before execution.
- Parameters:
operator (Any) – The operator to validate.
- Raises:
ValueError – If operator is invalid.
- class nirs4all.controllers.BranchController[source]
Bases:
OperatorControllerController for pipeline branching.
Implements the branching mechanism that allows multiple preprocessing chains to be evaluated independently within a single pipeline execution.
- Key behaviors:
Creates independent context copies for each branch
Executes branch steps sequentially within each branch
Stores branch contexts in context.custom[“branch_contexts”]
Post-branch steps iterate over all branch contexts
- priority
Controller priority (lower = higher priority). Set to 5 to execute before most other controllers.
- Type:
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, StepOutput][source]
Execute the branch step with V3 chain tracking.
Creates independent contexts for each branch, executes branch-specific steps, and stores branch contexts for post-branch iteration.
In predict/explain mode, only executes the target branch specified in runtime_context.target_model.branch_id for efficiency.
V3 improvements: - Uses trace_recorder.enter_branch() / exit_branch() for branch path tracking - Records each substep individually for complete trace fidelity - Builds proper operator chains for artifact identification
- Parameters:
step_info – Parsed step containing branch definitions
dataset – Dataset to operate on
context – Pipeline execution context
runtime_context – Runtime infrastructure context
source – Data source index
mode – Execution mode (“train” or “predict”)
loaded_binaries – Pre-loaded binary objects for prediction mode
prediction_store – External prediction store for model predictions
- Returns:
Tuple of (updated_context, StepOutput with collected artifacts)
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Check if the step matches the branch controller.
- Parameters:
step – Original step configuration
operator – Deserialized operator (may be list of branch definitions)
keyword – Step keyword
- Returns:
True if keyword is “branch”
- class nirs4all.controllers.ConcatAugmentationController[source]
Bases:
OperatorControllerController that concatenates multiple transformer outputs.
Semantics: - Top-level (add_feature=False): REPLACES each processing with concatenated version - Inside feature_augmentation (add_feature=True): ADDS one new processing
Supports: - Single transformers: PCA(50) - Chained transformers: [Wavelet(), PCA(50)] → sequential application - Mixed: [PCA(50), [Wavelet(), SVD(30)], LocalStats()]
Examples
Top-level replacement: >>> pipeline = [{“concat_transform”: [PCA(50), SVD(50)]}] # Before: (500, 3, 500) with [“raw”, “snv”, “savgol”] # After: (500, 3, 100) with [“raw_concat_PCA_SVD”, “snv_concat_PCA_SVD”, …]
Nested inside feature_augmentation: >>> pipeline = [{ … “feature_augmentation”: [ … SNV(), … {“concat_transform”: [PCA(50), SVD(50)]} … ] … }] # Before: (500, 1, 500) with [“raw”] # After: (500, 3, 500) with [“raw”, “snv”, “concat_PCA_SVD”] (padded)
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, List[Tuple[str, bytes]]][source]
Execute concat augmentation.
- Parameters:
step_info – Parsed step containing the concat_transform config
dataset – SpectroDataset to operate on
context – Execution context with selector and metadata
runtime_context – Runtime infrastructure (saver, step_number, etc.)
source – Source index (-1 for all sources)
mode – Execution mode (“train”, “predict”, “explain”)
loaded_binaries – Pre-fitted transformers for predict/explain mode
prediction_store – Not used by this controller
- Returns:
Tuple of (updated_context, list_of_artifacts)
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Check if step is a concat_transform operation.
- static normalize_generator_spec(spec: Any) Any[source]
Normalize generator spec for concat_transform context.
In concat_transform context, multi-selection should use combinations by default since the order of concatenated features doesn’t matter. Translates legacy ‘size’ to ‘pick’ for explicit semantics.
- Parameters:
spec – Generator specification (may contain _or_, size, pick, arrange).
- Returns:
Normalized spec with ‘size’ converted to ‘pick’ if needed.
- class nirs4all.controllers.CrossValidatorController[source]
Bases:
OperatorControllerController for any sklearn‑compatible splitter (native or custom).
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: Any = None, prediction_store: Any = None)[source]
Run
operator.splitand store the resulting folds on dataset.Smartly supplies
y/groupsonly if required.Extracts groups from metadata if specified.
Supports
force_groupparameter to wrap any splitter with group-awareness.Maps local indices back to the global index space.
Stores the list of folds into the dataset for subsequent steps.
- Parameters:
step_info (ParsedStep) – Parsed step containing the operator and original step configuration.
dataset (SpectroDataset) – The dataset to split.
context (ExecutionContext) – Current execution context.
runtime_context (RuntimeContext) – Runtime context with global settings.
source (int) – Source index (-1 for combined sources).
mode (str) – Execution mode (“train”, “predict”, or “explain”).
loaded_binaries (Any) – Pre-loaded binary data (not used).
prediction_store (Any) – Store for predictions (not used).
Notes
The
force_groupparameter enables any sklearn-compatible splitter to work with grouped samples by wrapping it withGroupedSplitterWrapper. This aggregates samples by group, passes “virtual samples” to the splitter, and expands fold indices back to the original dataset.Example usage:
{"split": KFold(n_splits=5), "force_group": "Sample_ID"} {"split": ShuffleSplit(test_size=0.2), "force_group": "ID", "aggregation": "median"}
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Return True if operator behaves like a splitter.
Criteria – must expose a callable
splitwhose first positional argument is named X. Optional presence ofget_n_splitsis a plus but not mandatory, so user‑defined simple splitters are still accepted.Also matches on the ‘split’ keyword for group-aware splitting syntax.
- class nirs4all.controllers.DummyController[source]
Bases:
OperatorControllerCatch-all controller for operators not handled by other controllers.
This controller has the lowest priority and will catch any operators that don’t match other controllers, providing detailed debugging information about why they weren’t handled elsewhere.
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, List[Tuple[str, bytes]]][source]
Handle unmatched operators and provide detailed debugging information.
- class nirs4all.controllers.FeatureAugmentationController[source]
Bases:
OperatorControllerController for feature augmentation with multiple action modes.
The feature_augmentation controller supports three action modes that control how preprocessing operations interact with existing processings:
extend (default): Add new processings to the set. Each operation runs independently on the base processing. If a processing already exists, it is not duplicated. Growth pattern is linear.
add: Chain each operation on top of ALL existing processings. Keep original processings alongside new chained versions. Growth pattern is multiplicative with originals (n + n×m).
replace: Chain each operation on top of ALL existing processings. Discard original processings, keeping only the chained versions. Growth pattern is multiplicative without originals (n×m).
Example
>>> # Extend mode (default) - linear growth >>> {"feature_augmentation": [SNV, Gaussian], "action": "extend"} >>> # With raw_A already present: raw_A, raw_SNV, raw_Gaussian
>>> # Add mode - multiplicative with originals >>> {"feature_augmentation": [SNV, Gaussian], "action": "add"} >>> # With raw_A present: raw_A, raw_A_SNV, raw_A_Gaussian
>>> # Replace mode - multiplicative, discards originals >>> {"feature_augmentation": [SNV, Gaussian], "action": "replace"} >>> # With raw_A present: raw_A_SNV, raw_A_Gaussian (raw_A discarded)
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, List[Tuple[str, bytes]]][source]
Execute feature augmentation with specified action mode.
- Parameters:
step_info – Parsed step information containing the operation list and action mode.
dataset – The spectroscopic dataset to process.
context – Current execution context with processing state.
runtime_context – Runtime infrastructure for step execution.
source – Source index (-1 for all sources).
mode – Execution mode (“train”, “predict”, etc.).
loaded_binaries – Pre-loaded binary artifacts for prediction mode.
prediction_store – Store for prediction-time state.
- Returns:
Tuple of (updated_context, artifacts_list).
- Raises:
ValueError – If action mode is invalid.
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Check if the operator matches the step and keyword.
- static normalize_generator_spec(spec: Any) Any[source]
Normalize generator spec for feature_augmentation context.
In feature_augmentation context, multi-selection should use combinations by default since the order of parallel feature channels doesn’t matter. Translates legacy ‘size’ to ‘pick’ for explicit semantics.
- Parameters:
spec – Generator specification (may contain _or_, size, pick, arrange).
- Returns:
Normalized spec with ‘size’ converted to ‘pick’ if needed.
- class nirs4all.controllers.FoldChartController[source]
Bases:
OperatorController- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: Any, source: int = -1, mode: str = 'train', loaded_binaries: Any = None, prediction_store: Any = None) Tuple[ExecutionContext, Any][source]
Execute fold visualization showing train/test splits with y-value color coding. Skips execution in prediction mode.
- Returns:
Tuple of (context, StepOutput)
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Check if the operator matches the step and keyword.
- class nirs4all.controllers.JaxModelController[source]
Bases:
BaseModelControllerController for JAX/Flax models.
Uses lazy loading pattern - JAX is only imported when training or prediction is actually performed.
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, bytes]] | None = None, prediction_store: Predictions = None) Tuple[ExecutionContext, List[Tuple[str, bytes]]][source]
Execute JAX model controller.
- get_preferred_layout() str[source]
Return the preferred data layout for JAX models.
Flax Dense layers expect (batch, features). Flax Conv layers expect (batch, length, features) i.e. (N, L, C). So ‘3d_transpose’ is suitable for Conv1D.
- class nirs4all.controllers.OperatorController[source]
Bases:
ABCBase class for pipeline operators.
- abstractmethod execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, Any][source]
Run the operator with the given parameters and context.
- Parameters:
step_info – Parsed step containing operator, keyword, and metadata
dataset – Dataset to operate on
context – Pipeline execution context
runtime_context – Runtime infrastructure context
source – Data source index
mode – Execution mode (“train” or “predict”)
loaded_binaries – Pre-loaded binary objects for prediction mode
prediction_store – External prediction store for model predictions
- Returns:
Tuple of (updated_context, StepOutput)
- abstractmethod classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Check if the operator matches the step and keyword.
- class nirs4all.controllers.PyTorchModelController[source]
Bases:
BaseModelControllerController for PyTorch models.
Uses lazy loading pattern - PyTorch is only imported when training or prediction is actually performed.
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, bytes]] | None = None, prediction_store: Predictions = None) Tuple[ExecutionContext, List[Tuple[str, bytes]]][source]
Execute PyTorch model controller.
- get_preferred_layout() str[source]
Return the preferred data layout for PyTorch models.
PyTorch typically expects (samples, channels, features) for 1D convs. We use ‘3d’ which gives (samples, processings, features) -> (N, C, L).
- class nirs4all.controllers.ResamplerController[source]
Bases:
OperatorControllerController for Resampler operators.
This controller: 1. Extracts wavelengths from dataset headers 2. Validates that headers are convertible to float (wavelengths in cm-1) 3. Fits the resampler with original wavelengths 4. Transforms all data to the target wavelength grid 5. Updates dataset with new features and headers 6. Supports multi-source datasets with per-source or shared parameters
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, List][source]
Execute resampling operation.
- Parameters:
step_info – Pipeline step configuration
dataset – Dataset to operate on
context – Pipeline execution context
runtime_context – Runtime context
source – Data source index (-1 for all sources)
mode – Execution mode (“train” or “predict”)
loaded_binaries – Pre-loaded binary objects for prediction mode
prediction_store – External prediction store (unused)
- Returns:
Tuple of (updated_context, fitted_resamplers)
- class nirs4all.controllers.SampleAugmentationController[source]
Bases:
OperatorControllerSample Augmentation Controller with delegation pattern.
This controller orchestrates sample augmentation by: 1. Calculating augmentation distribution (standard or balanced mode) 2. Creating transformer→samples mapping 3. Emitting ONE run_step per transformer with target samples
The actual augmentation work is delegated to TransformerMixinController.
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: Any | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, List][source]
Execute sample augmentation with standard or balanced mode.
- Step format for standard mode:
- {
- “sample_augmentation”: {
“transformers”: [transformer1, transformer2, …], “count”: int, “selection”: “random” or “all”, # Default “random” “random_state”: int # Optional
}
}
- Step format for balanced mode (choose one balancing strategy):
Mode 1 - Fixed target size per class: {
- “sample_augmentation”: {
“transformers”: […], “balance”: “y” or “metadata_column”, # Default “y” “target_size”: int, # Fixed target samples per class “selection”: “random” or “all”, “random_state”: int
}
}
Mode 2 - Multiplier for augmentation: {
- “sample_augmentation”: {
“transformers”: […], “balance”: “y” or “metadata_column”, “max_factor”: float, # Multiplier (e.g., 3 means class grows 3x) “selection”: “random” or “all”, “random_state”: int
}
}
Mode 3 - Percentage of majority class: {
- “sample_augmentation”: {
“transformers”: […], “balance”: “y” or “metadata_column”, “ref_percentage”: float, # Target as % of majority (0.0-1.0) “selection”: “random” or “all”, “random_state”: int
}
}
- Binning for regression (automatic when balance=”y” and task is regression):
- {
- “sample_augmentation”: {
“transformers”: […], “balance”: “y”, “bins”: int, # Number of virtual classes (default: 10) “binning_strategy”: “equal_width” or “quantile”, # Default: “equal_width” “max_factor”: float, # Choose one balancing mode “selection”: “random” or “all”, “random_state”: int
}
}
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Check if the operator matches the step and keyword.
- static normalize_generator_spec(spec: Any) Any[source]
Normalize generator spec for sample_augmentation context.
In sample_augmentation context, multi-selection should use combinations by default since the order of transformers doesn’t matter. Translates legacy ‘size’ to ‘pick’ for explicit semantics.
- Parameters:
spec – Generator specification (may contain _or_, size, pick, arrange).
- Returns:
Normalized spec with ‘size’ converted to ‘pick’ if needed.
- class nirs4all.controllers.SampleFilterController[source]
Bases:
OperatorControllerController for sample filtering operations.
This controller orchestrates sample filtering by: 1. Retrieving train samples (base only, no augmented) and their X/y values 2. Applying each filter’s get_mask() method to identify outliers 3. Combining masks according to the specified mode (any/all) 4. Marking excluded samples in the dataset’s indexer 5. Generating filtering report (optional)
Sample filters are non-destructive - they mark samples as excluded in the indexer rather than removing data. Excluded samples can be re-included using dataset._indexer.mark_included().
- Pipeline syntax:
- {
- “sample_filter”: {
- “filters”: [
YOutlierFilter(method=”iqr”, threshold=1.5), XOutlierFilter(method=”mahalanobis”),
], “mode”: “any”, # “any” = exclude if ANY filter flags “report”: True, # Generate filtering report “cascade_to_augmented”: True, # Also exclude augmented samples
}
}
Note
Filtering only runs during training mode - in prediction mode, this controller does nothing to avoid excluding prediction samples.
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, List][source]
Execute sample filtering operation.
This method: 1. Retrieves training data (base samples only) 2. Fits and applies each filter to identify outliers 3. Combines filter masks using the specified mode 4. Marks excluded samples in the dataset’s indexer 5. Optionally prints a filtering report
- Parameters:
step_info – Parsed step containing operator and configuration
dataset – Dataset to operate on
context – Pipeline execution context
runtime_context – Runtime infrastructure context
source – Data source index (unused, filtering is dataset-level)
mode – Execution mode (“train” or “predict”)
loaded_binaries – Pre-loaded binaries (filters may be persisted)
prediction_store – External prediction store (unused)
- Returns:
Tuple of (updated_context, persisted_artifacts)
- Raises:
ValueError – If no filters are specified
ValueError – If invalid mode is specified
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Match sample_filter keyword in pipeline.
- class nirs4all.controllers.SklearnModelController[source]
Bases:
BaseModelControllerController for scikit-learn models.
This controller handles sklearn models with support for training on 2D data, cross-validation, hyperparameter tuning with Optuna, model persistence, and integration with the nirs4all pipeline.
- priority
Controller priority (6) - higher than TransformerMixin to prioritize supervised models over transformers.
- Type:
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, bytes]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, List[Tuple[str, bytes]]][source]
Execute sklearn model controller with score management.
Main entry point for sklearn model execution in the pipeline. Sets the preferred data layout to ‘2d’ and delegates to parent execute method.
- Parameters:
step_info – Parsed step containing model configuration and operator.
dataset (SpectroDataset) – Dataset containing features and targets.
context (ExecutionContext) – Pipeline execution context with state info.
runtime_context (RuntimeContext) – Runtime context managing execution state.
source (int) – Source index for multi-source pipelines. Defaults to -1.
mode (str) – Execution mode (‘train’ or ‘predict’). Defaults to ‘train’.
loaded_binaries (Optional[List[Tuple[str, bytes]]]) – Pre-loaded model binaries for prediction mode. Defaults to None.
prediction_store (Optional[Any]) – Store for managing predictions. Defaults to None.
- Returns:
- Updated context and
list of model binaries (name, serialized_model) for persistence.
- Return type:
Tuple[ExecutionContext, List[Tuple[str, bytes]]]
Note
Automatically sets context[‘layout’] = ‘2d’ for sklearn compatibility
Inherits full training, evaluation, and prediction logic from BaseModelController
Respects force_layout if specified in step configuration
- get_preferred_layout() str[source]
Return the preferred data layout for sklearn models.
- Returns:
- Data layout preference, always ‘2d’ for sklearn models which
expect (n_samples, n_features) input format.
- Return type:
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Match sklearn estimators and model dictionaries with sklearn models.
Prioritizes supervised models (regressors and classifiers) over transformers by checking for predict methods and using sklearn’s is_regressor/is_classifier.
- Parameters:
step (Any) – Pipeline step to check, can be a dict with ‘model’ key or BaseEstimator instance.
operator (Any) – Optional operator object to check if it’s a BaseEstimator.
keyword (str) – Pipeline keyword (unused in this implementation).
- Returns:
- True if the step matches a sklearn estimator (regressor, classifier,
or has predict method), False otherwise.
- Return type:
- class nirs4all.controllers.SpectraChartController[source]
Bases:
OperatorController- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: Any, source: int = -1, mode: str = 'train', loaded_binaries: Any = None, prediction_store: Any = None) Tuple[ExecutionContext, Any][source]
Execute spectra visualization for both 2D and 3D plots. Skips execution in prediction mode.
- Supports optional parameters via dict syntax:
{“chart_2d”: {“include_excluded”: True, “highlight_excluded”: True}}
- Parameters:
include_excluded – If True, include excluded samples in visualization
highlight_excluded – If True, highlight excluded samples with different style
- Returns:
Tuple of (context, StepOutput)
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Check if the operator matches the step and keyword.
- class nirs4all.controllers.SpectralDistributionController[source]
Bases:
OperatorControllerController for spectral distribution envelope visualization.
Shows envelope (min/max/mean/IQR) for train vs test partitions, with optional per-fold visualization when cross-validation folds exist.
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: Any, source: int = -1, mode: str = 'train', loaded_binaries: Any = None, prediction_store: Any = None) Tuple[ExecutionContext, Any][source]
Execute spectral distribution envelope visualization.
Creates envelope plots showing min/max/mean/IQR for train vs test. If CV folds exist, creates a grid showing each fold.
- Returns:
Tuple of (context, StepOutput)
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Check if the operator matches the step and keyword.
- class nirs4all.controllers.TensorFlowModelController[source]
Bases:
BaseModelControllerController for TensorFlow/Keras models.
This controller manages the complete lifecycle of TensorFlow/Keras models including: - Model instantiation from various configuration formats - Data preparation with proper tensor formatting (2D/3D) - Model compilation with task-appropriate loss functions and metrics - Training with callbacks (early stopping, model checkpointing) - Hyperparameter tuning via Optuna integration - Model evaluation and prediction - Binary serialization for model persistence
The controller automatically detects TensorFlow models and functions decorated with @framework(‘tensorflow’). It uses lazy loading to avoid importing TensorFlow until actually needed.
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, bytes]] | None = None, prediction_store: Predictions = None) Tuple[ExecutionContext, List[Tuple[str, bytes]]][source]
Execute TensorFlow model training, finetuning, or prediction.
Sets the preferred data layout to ‘3d_transpose’ for TensorFlow Conv1D models, then delegates to the base class execute method.
- Parameters:
step_info – Parsed step containing model configuration and operator.
dataset – SpectroDataset with features, targets, and fold information.
context – Execution context with step_id, processing history, partition info.
runtime_context – Runtime context managing execution state.
source – Data source index (default: -1 for primary source).
mode – Execution mode - ‘train’, ‘finetune’, ‘predict’, or ‘explain’.
loaded_binaries – Optional list of (name, bytes) tuples for prediction mode, containing serialized model and preprocessing artifacts.
prediction_store – External Predictions storage instance for managing prediction results across pipeline steps.
- Returns:
updated_context: Context dict with added model information
artifact_metadata: List of serialized binary artifacts for persistence
- Return type:
Tuple of (updated_context, list_of_artifact_metadata) where
- Raises:
ImportError – If TensorFlow is not installed.
- get_preferred_layout() str[source]
Return the preferred data layout for TensorFlow models.
TensorFlow Conv1D expects input shape (features, channels) where: - features = number of wavelengths/spectral points (timesteps for convolution) - channels = number of preprocessing methods
The ‘3d_transpose’ layout returns (samples, features, processings) which is correct for Conv1D.
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Determine if this controller should handle the given step.
Matches TensorFlow/Keras models, functions decorated with @framework(‘tensorflow’), and serialized model configurations containing TensorFlow components.
- Parameters:
step – Pipeline step configuration (dict, model instance, or function).
operator – Optional operator instance extracted from step.
keyword – Optional keyword identifier for the step.
- Returns:
True if this controller should handle the step, False otherwise. Returns False immediately if TensorFlow is not installed.
- process_hyperparameters(params: Dict[str, Any]) Dict[str, Any][source]
Process hyperparameters for TensorFlow model tuning.
Supports TensorFlow-specific parameter organization: - Parameters prefixed with ‘compile_’ are grouped under ‘compile’ key
(e.g., ‘compile_learning_rate’ → compile[‘learning_rate’])
Parameters prefixed with ‘fit_’ are grouped under ‘fit’ key (e.g., ‘fit_batch_size’ → fit[‘batch_size’])
Other parameters are treated as model architecture parameters
- Parameters:
params – Dictionary of sampled parameters.
- Returns:
Dictionary of processed hyperparameters with proper nesting for TensorFlow compilation and fitting.
- class nirs4all.controllers.TransformerMixinController[source]
Bases:
OperatorController- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, Any]] | None = None, prediction_store: Any | None = None)[source]
Execute transformer - handles normal, feature augmentation, and sample augmentation modes.
Supports optional fit_on_all parameter in step configuration to fit the transformer on all data instead of just training data. This is useful for unsupervised preprocessing where you want the transformation to capture the full data distribution.
- Step format:
# Standard (fit on train, transform all): StandardScaler()
# Fit on ALL data (unsupervised preprocessing): {“preprocessing”: StandardScaler(), “fit_on_all”: True}
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Match TransformerMixin objects.
- class nirs4all.controllers.YChartController[source]
Bases:
OperatorController- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: Any, source: int = -1, mode: str = 'train', loaded_binaries: Any = None, prediction_store: Any = None) Tuple[ExecutionContext, Any][source]
Execute y values histogram visualization.
If cross-validation folds exist (more than 1 fold), displays a grid showing: - One histogram per fold validation set - One histogram for the test partition (if available)
Otherwise, displays a simple train vs test histogram.
- Supports optional parameters via dict syntax:
{“chart_y”: {“include_excluded”: True, “highlight_excluded”: True}}
- Parameters:
include_excluded – If True, include excluded samples in visualization
highlight_excluded – If True, show excluded samples as separate histogram
- Returns:
Tuple of (context, StepOutput)
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Check if the operator matches the step and keyword.
- class nirs4all.controllers.YTransformerMixinController[source]
Bases:
OperatorControllerController for applying sklearn TransformerMixin operators to targets (y) instead of features (X).
Triggered by the “y_processing” keyword and applies transformations to target data, fitting on train targets and transforming all target data.
- Supports both single transformers and chained transformers (list syntax):
Single: {“y_processing”: StandardScaler()}
Chained: {“y_processing”: [StandardScaler, QuantileTransformer(n_quantiles=30)]}
When using chained transformers, each transformer is applied sequentially, with proper ancestry tracking and individual artifact persistence for prediction mode.
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: Any = None, prediction_store: Any = None) Tuple[ExecutionContext, List[Any]][source]
Execute transformer(s) on dataset targets, fitting on train targets and transforming all targets.
Supports both single transformers and chained transformers (list). Each transformer is applied sequentially, with proper ancestry tracking.
- Parameters:
step_info – Parsed step containing operator and metadata
dataset – Dataset containing targets to transform
context – Pipeline context with partition information
runtime_context – Runtime context containing infrastructure components
source – Source index (not used for target processing)
mode – Execution mode (“train”, “predict”, or “explain”)
loaded_binaries – Pre-loaded fitted transformers for predict/explain mode
prediction_store – Not used for y_processing
- Returns:
Tuple of (updated_context, fitted_transformers_list)
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Match if keyword is ‘y_processing’ and operator is TransformerMixin or list thereof.
- Parameters:
step – Original step configuration
operator – Parsed operator (TransformerMixin instance, class, or list)
keyword – Step keyword
- Returns:
True if this controller should handle the step
- nirs4all.controllers.register_controller(operator_cls: Type[OperatorController])[source]
Decorator to register a controller class.