nirs4all.controllers.models.autogluon_model module

AutoGluon Model Controller - Controller for AutoGluon TabularPredictor

This controller handles AutoGluon TabularPredictor with support for: - Automatic model selection and ensembling - Training on tabular data (samples x features) - Model persistence and prediction storage - Integration with the nirs4all pipeline

AutoGluon differs from sklearn models in that: - It trains an ensemble of models automatically - It uses DataFrames internally, not numpy arrays - It manages its own model directory for persistence - It has its own hyperparameter tuning (no need for Optuna)

Lazy loading pattern: AutoGluon is only imported when actually needed for training or prediction, not at module import time.

class nirs4all.controllers.models.autogluon_model.AutoGluonModelController[source]

Bases: BaseModelController

Controller for AutoGluon TabularPredictor.

This controller handles AutoGluon models with automatic model selection, ensembling, and integration with the nirs4all pipeline.

AutoGluon automatically: - Trains multiple models (LightGBM, CatBoost, XGBoost, Neural Networks, etc.) - Performs cross-validation - Creates weighted ensembles - Handles hyperparameter tuning internally

Uses lazy loading - AutoGluon is only imported when training starts.

priority

Controller priority (5) - higher than sklearn (6) to prioritize AutoGluon when explicitly requested.

Type:

int

execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, bytes]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, List[ArtifactMeta]][source]

Execute AutoGluon model controller.

Main entry point for AutoGluon model execution in the pipeline.

Parameters:
  • step_info – Parsed step containing model configuration.

  • dataset (SpectroDataset) – Dataset containing features and targets.

  • context (ExecutionContext) – Pipeline execution context.

  • runtime_context (RuntimeContext) – Runtime context.

  • source (int) – Source index. Defaults to -1.

  • mode (str) – Execution mode. Defaults to ‘train’.

  • loaded_binaries – Pre-loaded model binaries for prediction.

  • prediction_store – Store for managing predictions.

Returns:

Updated context

and list of model binaries.

Return type:

Tuple[ExecutionContext, List[ArtifactMeta]]

get_preferred_layout() str[source]

Return the preferred data layout for AutoGluon.

Returns:

Data layout preference, ‘2d’ for AutoGluon.

Return type:

str

load_model(filepath: str) Any[source]

Load AutoGluon model from disk.

Parameters:

filepath (str) – Path to the saved model directory.

Returns:

Loaded AutoGluon predictor.

Return type:

TabularPredictor

classmethod matches(step: Any, operator: Any, keyword: str) bool[source]

Match AutoGluon TabularPredictor configurations.

Parameters:
  • step (Any) – Pipeline step to check.

  • operator (Any) – Optional operator object.

  • keyword (str) – Pipeline keyword (unused).

Returns:

True if the step matches an AutoGluon configuration.

Return type:

bool

priority: int = 5
save_model(model: Any, filepath: str) None[source]

Save AutoGluon model to disk.

AutoGluon models are saved as directories. This method moves the model’s directory to the specified filepath.

Parameters:
  • model (TabularPredictor) – Trained AutoGluon predictor.

  • filepath (str) – Target path for saving.