nirs4all.controllers.models.sklearn_model module
Sklearn Model Controller - Controller for scikit-learn models
This controller handles sklearn models with support for: - Training on 2D data (samples x features) - Cross-validation and hyperparameter tuning with Optuna - Model persistence and prediction storage - Integration with the nirs4all pipeline
Matches any sklearn model object (estimators with fit/predict methods).
- class nirs4all.controllers.models.sklearn_model.SklearnModelController[source]
Bases:
BaseModelControllerController for scikit-learn models.
This controller handles sklearn models with support for training on 2D data, cross-validation, hyperparameter tuning with Optuna, model persistence, and integration with the nirs4all pipeline.
- priority
Controller priority (6) - higher than TransformerMixin to prioritize supervised models over transformers.
- Type:
- execute(step_info: ParsedStep, dataset: SpectroDataset, context: ExecutionContext, runtime_context: RuntimeContext, source: int = -1, mode: str = 'train', loaded_binaries: List[Tuple[str, bytes]] | None = None, prediction_store: Any | None = None) Tuple[ExecutionContext, List[Tuple[str, bytes]]][source]
Execute sklearn model controller with score management.
Main entry point for sklearn model execution in the pipeline. Sets the preferred data layout to ‘2d’ and delegates to parent execute method.
- Parameters:
step_info – Parsed step containing model configuration and operator.
dataset (SpectroDataset) – Dataset containing features and targets.
context (ExecutionContext) – Pipeline execution context with state info.
runtime_context (RuntimeContext) – Runtime context managing execution state.
source (int) – Source index for multi-source pipelines. Defaults to -1.
mode (str) – Execution mode (‘train’ or ‘predict’). Defaults to ‘train’.
loaded_binaries (Optional[List[Tuple[str, bytes]]]) – Pre-loaded model binaries for prediction mode. Defaults to None.
prediction_store (Optional[Any]) – Store for managing predictions. Defaults to None.
- Returns:
- Updated context and
list of model binaries (name, serialized_model) for persistence.
- Return type:
Tuple[ExecutionContext, List[Tuple[str, bytes]]]
Note
Automatically sets context[‘layout’] = ‘2d’ for sklearn compatibility
Inherits full training, evaluation, and prediction logic from BaseModelController
Respects force_layout if specified in step configuration
- get_preferred_layout() str[source]
Return the preferred data layout for sklearn models.
- Returns:
- Data layout preference, always ‘2d’ for sklearn models which
expect (n_samples, n_features) input format.
- Return type:
- classmethod matches(step: Any, operator: Any, keyword: str) bool[source]
Match sklearn estimators and model dictionaries with sklearn models.
Prioritizes supervised models (regressors and classifiers) over transformers by checking for predict methods and using sklearn’s is_regressor/is_classifier.
- Parameters:
step (Any) – Pipeline step to check, can be a dict with ‘model’ key or BaseEstimator instance.
operator (Any) – Optional operator object to check if it’s a BaseEstimator.
keyword (str) – Pipeline keyword (unused in this implementation).
- Returns:
- True if the step matches a sklearn estimator (regressor, classifier,
or has predict method), False otherwise.
- Return type: