nirs4all.controllers.models.components package

Submodules

Module contents

Model Controller Components - Modular components for base_model refactoring

This package contains focused, testable components that replace the monolithic logic in the original launch_training() method.

Components:
  • identifier_generator: Generate model identifiers and names

  • prediction_transformer: Handle scaling/unscaling of predictions

  • prediction_assembler: Assemble prediction data for storage

  • score_calculator: Calculate evaluation scores

  • index_normalizer: Normalize and validate sample indices

class nirs4all.controllers.models.components.IndexNormalizer[source]

Bases: object

Normalizes sample indices to consistent format.

Converts numpy int types to Python int and validates indices are within valid ranges.

Example

>>> normalizer = IndexNormalizer()
>>> indices = normalizer.normalize([np.int64(0), np.int64(1), np.int64(2)])
>>> indices
[0, 1, 2]
normalize(indices: List | ndarray | None, n_samples: int, default_range: bool = True, validate: bool = False) List[int][source]

Normalize indices to Python int list.

Parameters:
  • indices – Input indices (may be None, list, or numpy array)

  • n_samples – Total number of samples (for validation and defaults)

  • default_range – If True and indices is None, return range(n_samples)

  • validate – If True, validate indices are within bounds

Returns:

List of Python integers

Raises:

ValueError – If validate=True and indices are out of bounds

normalize_batch(indices_dict: dict, n_samples_dict: dict) dict[source]

Normalize a dictionary of indices for multiple partitions.

Parameters:
  • indices_dict – Dictionary with keys like ‘train’, ‘val’, ‘test’ and values as index lists/arrays

  • n_samples_dict – Dictionary with same keys and values as sample counts

Returns:

Dictionary with same keys but normalized indices

class nirs4all.controllers.models.components.ModelIdentifierGenerator(helper=None)[source]

Bases: object

Generates consistent model identifiers for training and persistence.

This component extracts and centralizes all the naming logic that was previously scattered in launch_training().

Example

>>> generator = ModelIdentifierGenerator()
>>> identifiers = generator.generate(
...     model_config={'name': 'MyPLS', 'class': 'sklearn.cross_decomposition.PLSRegression'},
...     runner=runner,
...     context={'step_id': 5},
...     fold_idx=0
... )
>>> identifiers.model_id
'MyPLS_10'
>>> identifiers.display_name
'MyPLS_10_fold0'
extract_classname_from_config(model_config: Dict[str, Any]) str[source]

Extract classname from model configuration.

Based on the model declared in config or instance.__class__.__name__ or function name.

Parameters:

model_config – Model configuration dictionary.

Returns:

Class name of the model.

Return type:

str

extract_core_name(model_config: Dict[str, Any]) str[source]

Extract core name from model configuration.

User-provided name or class name. This is the base name provided by the user or derived from the class.

Parameters:

model_config – Model configuration dictionary.

Returns:

Core name extracted from config.

Return type:

str

generate(model_config: Dict[str, Any], runner: PipelineRunner, context: ExecutionContext, fold_idx: int | None = None) ModelIdentifiers[source]

Generate all model identifiers from configuration and context.

Parameters:
  • model_config – Model configuration dictionary

  • runner – Pipeline runner for operation counter

  • context – Execution context with step_number

  • fold_idx – Optional fold index for cross-validation

Returns:

Container with all generated identifiers

Return type:

ModelIdentifiers

generate_binary_key(model_id: str, fold_idx: int | None = None) str[source]

Generate the binary storage key for a model.

Parameters:
  • model_id – Base model identifier (e.g., “MyModel_10”)

  • fold_idx – Optional fold index

Returns:

Binary key string (e.g., “MyModel_10” or “MyModel_10_fold0”)

class nirs4all.controllers.models.components.ModelIdentifiers(classname: str, name: str, model_id: str, display_name: str, operation_counter: int, step_id: int, fold_idx: int | None)[source]

Bases: object

Container for all model identifiers.

classname: str
display_name: str
fold_idx: int | None
model_id: str
name: str
operation_counter: int
step_id: int
class nirs4all.controllers.models.components.PartitionPrediction(partition: str, indices: List[int], y_true: ndarray, y_pred: ndarray, score: float)[source]

Bases: object

Single partition prediction data.

indices: List[int]
partition: str
score: float
y_pred: ndarray
y_true: ndarray
class nirs4all.controllers.models.components.PartitionScores(train: float, val: float, test: float, metric: str, higher_is_better: bool, detailed_scores: Dict[str, float] | None = None)[source]

Bases: object

Scores for a single partition.

detailed_scores: Dict[str, float] | None = None
higher_is_better: bool
metric: str
test: float
train: float
val: float
class nirs4all.controllers.models.components.PredictionDataAssembler[source]

Bases: object

Assembles prediction data for storage.

Creates structured prediction records with all metadata required for storage in the prediction database.

Example

>>> assembler = PredictionDataAssembler()
>>> record = assembler.assemble(
...     dataset=dataset,
...     identifiers=identifiers,
...     scores={'train': 0.95, 'val': 0.90, 'test': 0.88},
...     predictions={'train': y_train_pred, 'val': y_val_pred, 'test': y_test_pred},
...     true_values={'train': y_train, 'val': y_val, 'test': y_test},
...     indices={'train': train_idx, 'val': val_idx, 'test': test_idx},
...     runner=runner,
...     X_shape=X_train.shape,
...     best_params=params
... )
assemble(dataset: Any, identifiers: Any, scores: dict, predictions: dict, true_values: dict, indices: dict, runner: Any, X_shape: Tuple[int, ...], best_params: dict | None = None, context: Any = None) dict[source]

Assemble complete prediction record.

Parameters:
  • dataset – SpectroDataset instance

  • identifiers – ModelIdentifiers with name, id, etc.

  • scores – Dictionary of scores per partition

  • predictions – Dictionary of prediction arrays per partition (unscaled)

  • true_values – Dictionary of true value arrays per partition (unscaled)

  • indices – Dictionary of sample indices per partition

  • runner – PipelineRunner instance

  • X_shape – Shape of input data (for n_features)

  • best_params – Optional hyperparameters from optimization

  • context – Optional ExecutionContext for branch information

Returns:

Dictionary ready for storage in prediction database

assemble_fold_average(base_prediction: dict, averaged_predictions: dict, averaged_scores: dict, is_weighted: bool = False) dict[source]

Assemble prediction record for fold-averaged model.

Parameters:
  • base_prediction – Base prediction record from a single fold (for metadata)

  • averaged_predictions – Dictionary of averaged prediction arrays

  • averaged_scores – Dictionary of averaged scores

  • is_weighted – Whether averaging was weighted by scores

Returns:

Dictionary ready for storage as fold-averaged prediction

class nirs4all.controllers.models.components.PredictionRecord(metadata: dict, partitions: List[Tuple[str, List[int], ndarray, ndarray]])[source]

Bases: object

Complete prediction record for storage.

metadata: dict
partitions: List[Tuple[str, List[int], ndarray, ndarray]]
class nirs4all.controllers.models.components.PredictionTransformer[source]

Bases: object

Transforms predictions between scaled and unscaled spaces.

Handles:
  • Classification tasks: Keep predictions in transformed space

  • Regression tasks: Transform predictions back to numeric space

  • Respects current y_processing from context

Example

>>> transformer = PredictionTransformer()
>>> y_pred_unscaled = transformer.transform_to_unscaled(
...     y_pred_scaled,
...     dataset,
...     context
... )
transform_batch_to_unscaled(predictions_dict: dict, dataset: SpectroDataset, context: ExecutionContext | None = None) dict[source]

Transform a dictionary of predictions to unscaled space.

Parameters:
  • predictions_dict – Dictionary with keys like ‘train’, ‘val’, ‘test’ and values as prediction arrays

  • dataset – Dataset with transformation info

  • context – Execution context

Returns:

Dictionary with same keys but unscaled predictions

transform_to_unscaled(predictions_scaled: ndarray, dataset: SpectroDataset, context: ExecutionContext | None = None) ndarray[source]

Transform predictions from scaled/processed space to unscaled/numeric space.

Parameters:
  • predictions_scaled – Predictions in scaled/processed space

  • dataset – Dataset with task type and target transformation info

  • context – Execution context with y processing info

Returns:

Predictions in unscaled/numeric space

class nirs4all.controllers.models.components.ScoreCalculator[source]

Bases: object

Calculates evaluation scores for models.

Uses ModelUtils to select appropriate metrics based on task type, and Evaluator to compute scores.

Example

>>> calculator = ScoreCalculator()
>>> scores = calculator.calculate(
...     y_true={'train': y_train, 'val': y_val, 'test': y_test},
...     y_pred={'train': y_train_pred, 'val': y_val_pred, 'test': y_test_pred},
...     task_type='regression'
... )
>>> scores.test
0.88
calculate(y_true: Dict[str, ndarray], y_pred: Dict[str, ndarray], task_type: str) PartitionScores[source]

Calculate scores for all partitions.

Parameters:
  • y_true – Dictionary of true values per partition

  • y_pred – Dictionary of predictions per partition

  • task_type – Task type string (e.g., ‘regression’, ‘classification’)

Returns:

PartitionScores with scores for train, val, test

calculate_single(y_true: ndarray, y_pred: ndarray, task_type: str, metric: str | None = None) float[source]

Calculate score for a single partition.

Parameters:
  • y_true – True values

  • y_pred – Predictions

  • task_type – Task type string

  • metric – Optional metric name (if None, uses best metric for task)

Returns:

Score value

format_scores(scores: PartitionScores) str[source]

Format scores as a readable string.

Parameters:

scores – PartitionScores instance

Returns:

0.95 | Val: 0.90 | Test: 0.88 (R2)”

Return type:

Formatted string like “Train