nirs4all.controllers.models.autogluon_model module

AutoGluon Model Controller - Controller for AutoGluon TabularPredictor

This controller handles AutoGluon TabularPredictor with support for: - Automatic model selection and ensembling - Training on tabular data (samples x features) - Model persistence and prediction storage - Integration with the nirs4all pipeline

AutoGluon differs from sklearn models in that: - It trains an ensemble of models automatically - It uses DataFrames internally, not numpy arrays - It manages its own model directory for persistence - It has its own hyperparameter tuning (no need for Optuna)

Lazy loading pattern: AutoGluon is only imported when actually needed for training or prediction, not at module import time.

class nirs4all.controllers.models.autogluon_model.AutoGluonModelController[source]

Bases: BaseModelController

Controller for AutoGluon TabularPredictor.

This controller handles AutoGluon models with automatic model selection, ensembling, and integration with the nirs4all pipeline.

AutoGluon automatically: - Trains multiple models (LightGBM, CatBoost, XGBoost, Neural Networks, etc.) - Performs cross-validation - Creates weighted ensembles - Handles hyperparameter tuning internally

Uses lazy loading - AutoGluon is only imported when training starts.

priority

Controller priority (5) - higher than sklearn (6) to prioritize AutoGluon when explicitly requested.

Type:: int

execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, bytes]] | None = None, prediction_store: Any | None = None) → Tuple[ExecutionContext, List[ArtifactMeta]][source]

Execute AutoGluon model controller.

Main entry point for AutoGluon model execution in the pipeline.

Parameters:

step_info – Parsed step containing model configuration.
dataset (SpectroDataset) – Dataset containing features and targets.
context (ExecutionContext) – Pipeline execution context.
runtime_context (RuntimeContext) – Runtime context.
source (int) – Source index. Defaults to -1.
mode (str) – Execution mode. Defaults to ‘train’.
loaded_binaries – Pre-loaded model binaries for prediction.
prediction_store – Store for managing predictions.

Returns:

Updated context: and list of model binaries.

Return type:

Tuple[ExecutionContext, List[ArtifactMeta]]

get_preferred_layout() → str[source]

Return the preferred data layout for AutoGluon.

Returns:: Data layout preference, ‘2d’ for AutoGluon.
Return type:: str

load_model(filepath: str) → Any[source]

Load AutoGluon model from disk.

Parameters:: filepath (str) – Path to the saved model directory.
Returns:: Loaded AutoGluon predictor.
Return type:: TabularPredictor

classmethod matches(step: Any, operator: Any, keyword: str) → bool[source]

Match AutoGluon TabularPredictor configurations.

Parameters:

step (Any) – Pipeline step to check.
operator (Any) – Optional operator object.
keyword (str) – Pipeline keyword (unused).

Returns:

True if the step matches an AutoGluon configuration.

Return type:

bool

priority: int = 5

save_model(model: Any, filepath: str) → None[source]

Save AutoGluon model to disk.

AutoGluon models are saved as directories. This method moves the model’s directory to the specified filepath.

Parameters:

model (TabularPredictor) – Trained AutoGluon predictor.
filepath (str) – Target path for saving.