nirs4all.controllers.models.components package

Submodules

Module contents

Model Controller Components - Modular components for base_model refactoring

This package contains focused, testable components that replace the monolithic logic in the original launch_training() method.

Components:

identifier_generator: Generate model identifiers and names
prediction_transformer: Handle scaling/unscaling of predictions
prediction_assembler: Assemble prediction data for storage
score_calculator: Calculate evaluation scores
index_normalizer: Normalize and validate sample indices

class nirs4all.controllers.models.components.IndexNormalizer[source]

Bases: object

Normalizes sample indices to consistent format.

Converts numpy int types to Python int and validates indices are within valid ranges.

Example

>>> normalizer = IndexNormalizer()
>>> indices = normalizer.normalize([np.int64(0), np.int64(1), np.int64(2)])
>>> indices
[0, 1, 2]

normalize(indices: List | ndarray | None, n_samples: int, default_range: bool = True, validate: bool = False) → List[int][source]

Normalize indices to Python int list.

Parameters:

indices – Input indices (may be None, list, or numpy array)
n_samples – Total number of samples (for validation and defaults)
default_range – If True and indices is None, return range(n_samples)
validate – If True, validate indices are within bounds

Returns:

List of Python integers

Raises:

ValueError – If validate=True and indices are out of bounds

normalize_batch(indices_dict: dict, n_samples_dict: dict) → dict[source]

Normalize a dictionary of indices for multiple partitions.

Parameters:

indices_dict – Dictionary with keys like ‘train’, ‘val’, ‘test’ and values as index lists/arrays
n_samples_dict – Dictionary with same keys and values as sample counts

Returns:

Dictionary with same keys but normalized indices

class nirs4all.controllers.models.components.ModelIdentifierGenerator(helper=None)[source]

Bases: object

Generates consistent model identifiers for training and persistence.

This component extracts and centralizes all the naming logic that was previously scattered in launch_training().

Example

>>> generator = ModelIdentifierGenerator()
>>> identifiers = generator.generate(
...     model_config={'name': 'MyPLS', 'class': 'sklearn.cross_decomposition.PLSRegression'},
...     runner=runner,
...     context={'step_id': 5},
...     fold_idx=0
... )
>>> identifiers.model_id
'MyPLS_10'
>>> identifiers.display_name
'MyPLS_10_fold0'

extract_classname_from_config(model_config: Dict[str, Any]) → str[source]

Extract classname from model configuration.

Based on the model declared in config or instance.__class__.__name__ or function name.

Parameters:: model_config – Model configuration dictionary.
Returns:: Class name of the model.
Return type:: str

extract_core_name(model_config: Dict[str, Any]) → str[source]

Extract core name from model configuration.

User-provided name or class name. This is the base name provided by the user or derived from the class.

Parameters:: model_config – Model configuration dictionary.
Returns:: Core name extracted from config.
Return type:: str

generate(model_config: Dict[str, Any], runner: PipelineRunner, context: ExecutionContext, fold_idx: int | None = None) → ModelIdentifiers[source]

Generate all model identifiers from configuration and context.

Parameters:

model_config – Model configuration dictionary
runner – Pipeline runner for operation counter
context – Execution context with step_number
fold_idx – Optional fold index for cross-validation

Returns:

Container with all generated identifiers

Return type:

ModelIdentifiers

generate_binary_key(model_id: str, fold_idx: int | None = None) → str[source]

Generate the binary storage key for a model.

Parameters:

model_id – Base model identifier (e.g., “MyModel_10”)
fold_idx – Optional fold index

Returns:

Binary key string (e.g., “MyModel_10” or “MyModel_10_fold0”)

class nirs4all.controllers.models.components.ModelIdentifiers(classname: str, name: str, model_id: str, display_name: str, operation_counter: int, step_id: int, fold_idx: int | None)[source]

Bases: object

Container for all model identifiers.

classname: str

display_name: str

fold_idx: int | None

model_id: str

name: str

operation_counter: int

step_id: int

class nirs4all.controllers.models.components.PartitionPrediction(partition: str, indices: List[int], y_true: ndarray, y_pred: ndarray, score: float)[source]

Bases: object

Single partition prediction data.

indices: List[int]

partition: str

score: float

y_pred: ndarray

y_true: ndarray

class nirs4all.controllers.models.components.PartitionScores(train: float, val: float, test: float, metric: str, higher_is_better: bool, detailed_scores: Dict[str, float] | None = None)[source]

Bases: object

Scores for a single partition.

detailed_scores: Dict[str, float] | None = None

higher_is_better: bool

metric: str

test: float

train: float

val: float

class nirs4all.controllers.models.components.PredictionDataAssembler[source]

Bases: object

Assembles prediction data for storage.

Creates structured prediction records with all metadata required for storage in the prediction database.

Example

>>> assembler = PredictionDataAssembler()
>>> record = assembler.assemble(
...     dataset=dataset,
...     identifiers=identifiers,
...     scores={'train': 0.95, 'val': 0.90, 'test': 0.88},
...     predictions={'train': y_train_pred, 'val': y_val_pred, 'test': y_test_pred},
...     true_values={'train': y_train, 'val': y_val, 'test': y_test},
...     indices={'train': train_idx, 'val': val_idx, 'test': test_idx},
...     runner=runner,
...     X_shape=X_train.shape,
...     best_params=params
... )

assemble(dataset: Any, identifiers: Any, scores: dict, predictions: dict, true_values: dict, indices: dict, runner: Any, X_shape: Tuple[int, ...], best_params: dict | None = None, context: Any = None) → dict[source]

Assemble complete prediction record.

Parameters:

dataset – SpectroDataset instance
identifiers – ModelIdentifiers with name, id, etc.
scores – Dictionary of scores per partition
predictions – Dictionary of prediction arrays per partition (unscaled)
true_values – Dictionary of true value arrays per partition (unscaled)
indices – Dictionary of sample indices per partition
runner – PipelineRunner instance
X_shape – Shape of input data (for n_features)
best_params – Optional hyperparameters from optimization
context – Optional ExecutionContext for branch information

Returns:

Dictionary ready for storage in prediction database

assemble_fold_average(base_prediction: dict, averaged_predictions: dict, averaged_scores: dict, is_weighted: bool = False) → dict[source]

Assemble prediction record for fold-averaged model.

Parameters:

base_prediction – Base prediction record from a single fold (for metadata)
averaged_predictions – Dictionary of averaged prediction arrays
averaged_scores – Dictionary of averaged scores
is_weighted – Whether averaging was weighted by scores

Returns:

Dictionary ready for storage as fold-averaged prediction

class nirs4all.controllers.models.components.PredictionRecord(metadata: dict, partitions: List[Tuple[str, List[int], ndarray, ndarray]])[source]

Bases: object

Complete prediction record for storage.

metadata: dict

partitions: List[Tuple[str, List[int], ndarray, ndarray]]

class nirs4all.controllers.models.components.PredictionTransformer[source]

Bases: object

Transforms predictions between scaled and unscaled spaces.

Handles:

Classification tasks: Keep predictions in transformed space
Regression tasks: Transform predictions back to numeric space
Respects current y_processing from context

Example

>>> transformer = PredictionTransformer()
>>> y_pred_unscaled = transformer.transform_to_unscaled(
...     y_pred_scaled,
...     dataset,
...     context
... )

transform_batch_to_unscaled(predictions_dict: dict, dataset: SpectroDataset, context: ExecutionContext | None = None) → dict[source]

Transform a dictionary of predictions to unscaled space.

Parameters:

predictions_dict – Dictionary with keys like ‘train’, ‘val’, ‘test’ and values as prediction arrays
dataset – Dataset with transformation info
context – Execution context

Returns:

Dictionary with same keys but unscaled predictions

transform_to_unscaled(predictions_scaled: ndarray, dataset: SpectroDataset, context: ExecutionContext | None = None) → ndarray[source]

Transform predictions from scaled/processed space to unscaled/numeric space.

Parameters:

predictions_scaled – Predictions in scaled/processed space
dataset – Dataset with task type and target transformation info
context – Execution context with y processing info

Returns:

Predictions in unscaled/numeric space

class nirs4all.controllers.models.components.ScoreCalculator[source]

Bases: object

Calculates evaluation scores for models.

Uses ModelUtils to select appropriate metrics based on task type, and Evaluator to compute scores.

Example

>>> calculator = ScoreCalculator()
>>> scores = calculator.calculate(
...     y_true={'train': y_train, 'val': y_val, 'test': y_test},
...     y_pred={'train': y_train_pred, 'val': y_val_pred, 'test': y_test_pred},
...     task_type='regression'
... )
>>> scores.test
0.88

calculate(y_true: Dict[str, ndarray], y_pred: Dict[str, ndarray], task_type: str) → PartitionScores[source]

Calculate scores for all partitions.

Parameters:

y_true – Dictionary of true values per partition
y_pred – Dictionary of predictions per partition
task_type – Task type string (e.g., ‘regression’, ‘classification’)

Returns:

PartitionScores with scores for train, val, test

calculate_single(y_true: ndarray, y_pred: ndarray, task_type: str, metric: str | None = None) → float[source]

Calculate score for a single partition.

Parameters:

y_true – True values
y_pred – Predictions
task_type – Task type string
metric – Optional metric name (if None, uses best metric for task)

Returns:

Score value

format_scores(scores: PartitionScores) → str[source]

Format scores as a readable string.

Parameters:: scores – PartitionScores instance
Returns:: 0.95 | Val: 0.90 | Test: 0.88 (R2)”
Return type:: Formatted string like “Train