nirs4all.analysis.transfer_metrics module

Transfer Metrics Computation.

This module provides fast, optimized computation of transfer-focused metrics between two datasets in PCA space. Metrics are designed to assess how well preprocessing aligns datasets for transfer learning scenarios.

Metrics computed: - Centroid Distance: Euclidean distance between dataset centroids in PCA space - CKA (Centered Kernel Alignment): Representation similarity - Grassmann Distance: Angular distance between PCA subspaces - RV Coefficient: Multivariate correlation structure - Procrustes Disparity: Shape alignment after optimal transformation - Trustworthiness: Neighborhood preservation - Spread Distance: Distribution overlap combining covariance and sample distances

class nirs4all.analysis.transfer_metrics.TransferMetrics(centroid_distance: float, cka_similarity: float, grassmann_distance: float, rv_coefficient: float, procrustes_disparity: float, trustworthiness: float, spread_distance: float, evr_source: float, evr_target: float)[source]

Bases: object

Container for transfer metrics between two datasets.

centroid_distance: float
cka_similarity: float
evr_source: float
evr_target: float
grassmann_distance: float
procrustes_disparity: float
rv_coefficient: float
spread_distance: float
to_dict() Dict[str, float][source]

Convert to dictionary.

trustworthiness: float
class nirs4all.analysis.transfer_metrics.TransferMetricsComputer(n_components: int = 10, k_neighbors: int = 10, random_state: int = 0)[source]

Bases: object

Fast computation of transfer metrics between two datasets.

Key optimization: Computes PCA once per dataset, then reuses for all metric computations.

Parameters:
  • n_components – Number of PCA components for projection.

  • k_neighbors – Number of neighbors for trustworthiness computation.

  • random_state – Random state for reproducibility.

compute(X_source: ndarray, X_target: ndarray, compute_trust: bool = True) TransferMetrics[source]

Compute all transfer metrics between two datasets.

Parameters:
  • X_source – Source dataset (n_samples_src, n_features).

  • X_target – Target dataset (n_samples_tgt, n_features).

  • compute_trust – Whether to compute trustworthiness (slower).

Returns:

TransferMetrics containing all computed metrics.

compute_raw_and_preprocessed(X_source_raw: ndarray, X_target_raw: ndarray, X_source_pp: ndarray, X_target_pp: ndarray, compute_trust: bool = True) Tuple[TransferMetrics, TransferMetrics, Dict[str, float]][source]

Compute metrics for both raw and preprocessed data, plus improvement.

Parameters:
  • X_source_raw – Raw source dataset.

  • X_target_raw – Raw target dataset.

  • X_source_pp – Preprocessed source dataset.

  • X_target_pp – Preprocessed target dataset.

  • compute_trust – Whether to compute trustworthiness.

Returns:

Tuple of (raw_metrics, pp_metrics, improvements_dict)

nirs4all.analysis.transfer_metrics.compute_transfer_score(metrics: TransferMetrics, raw_metrics: TransferMetrics | None = None, weights: Dict[str, float] | None = None) float[source]

Compute a composite transfer score from metrics.

Higher scores indicate better transfer potential.

Parameters:
  • metrics – TransferMetrics from preprocessed data.

  • raw_metrics – Optional baseline metrics for computing improvements.

  • weights – Optional custom weights for metric combination.

Returns:

Composite transfer score (0-1 scale, higher is better). Returns NaN if critical metrics are invalid.