nirs4all.controllers.models.sklearn_model module

Sklearn Model Controller - Controller for scikit-learn models

This controller handles sklearn models with support for: - Training on 2D data (samples x features) - Cross-validation and hyperparameter tuning with Optuna - Model persistence and prediction storage - Integration with the nirs4all pipeline

Matches any sklearn model object (estimators with fit/predict methods).

class nirs4all.controllers.models.sklearn_model.SklearnModelController[source]

Bases: BaseModelController

Controller for scikit-learn models.

This controller handles sklearn models with support for training on 2D data, cross-validation, hyperparameter tuning with Optuna, model persistence, and integration with the nirs4all pipeline.

priority

Controller priority (6) - higher than TransformerMixin to prioritize supervised models over transformers.

Type:: int

execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, bytes]] | None = None, prediction_store: Any | None = None) → Tuple[ExecutionContext, List[Tuple[str, bytes]]][source]

Execute sklearn model controller with score management.

Main entry point for sklearn model execution in the pipeline. Sets the preferred data layout to ‘2d’ and delegates to parent execute method.

Parameters:

step_info – Parsed step containing model configuration and operator.
dataset (SpectroDataset) – Dataset containing features and targets.
context (ExecutionContext) – Pipeline execution context with state info.
runtime_context (RuntimeContext) – Runtime context managing execution state.
source (int) – Source index for multi-source pipelines. Defaults to -1.
mode (str) – Execution mode (‘train’ or ‘predict’). Defaults to ‘train’.
loaded_binaries (Optional[List[Tuple[str, bytes]]]) – Pre-loaded model binaries for prediction mode. Defaults to None.
prediction_store (Optional[Any]) – Store for managing predictions. Defaults to None.

Returns:

Updated context and: list of model binaries (name, serialized_model) for persistence.

Return type:

Tuple[ExecutionContext, List[Tuple[str, bytes]]]

Note

Automatically sets context[‘layout’] = ‘2d’ for sklearn compatibility
Inherits full training, evaluation, and prediction logic from BaseModelController
Respects force_layout if specified in step configuration

get_preferred_layout() → str[source]

Return the preferred data layout for sklearn models.

Returns:

Data layout preference, always ‘2d’ for sklearn models which: expect (n_samples, n_features) input format.

Return type:

str

classmethod matches(step: Any, operator: Any, keyword: str) → bool[source]

Match sklearn estimators and model dictionaries with sklearn models.

Prioritizes supervised models (regressors and classifiers) over transformers by checking for predict methods and using sklearn’s is_regressor/is_classifier.

Parameters:

step (Any) – Pipeline step to check, can be a dict with ‘model’ key or BaseEstimator instance.
operator (Any) – Optional operator object to check if it’s a BaseEstimator.
keyword (str) – Pipeline keyword (unused in this implementation).

Returns:

True if the step matches a sklearn estimator (regressor, classifier,: or has predict method), False otherwise.

Return type:

bool

priority: int = 6