nirs4all.analysis.transfer_metrics module

Transfer Metrics Computation.

This module provides fast, optimized computation of transfer-focused metrics between two datasets in PCA space. Metrics are designed to assess how well preprocessing aligns datasets for transfer learning scenarios.

Metrics computed: - Centroid Distance: Euclidean distance between dataset centroids in PCA space - CKA (Centered Kernel Alignment): Representation similarity - Grassmann Distance: Angular distance between PCA subspaces - RV Coefficient: Multivariate correlation structure - Procrustes Disparity: Shape alignment after optimal transformation - Trustworthiness: Neighborhood preservation - Spread Distance: Distribution overlap combining covariance and sample distances

class nirs4all.analysis.transfer_metrics.TransferMetrics(centroid_distance: float, cka_similarity: float, grassmann_distance: float, rv_coefficient: float, procrustes_disparity: float, trustworthiness: float, spread_distance: float, evr_source: float, evr_target: float)[source]

Bases: object

Container for transfer metrics between two datasets.

centroid_distance: float

cka_similarity: float

evr_source: float

evr_target: float

grassmann_distance: float

procrustes_disparity: float

rv_coefficient: float

spread_distance: float

to_dict() → Dict[str, float][source]: Convert to dictionary.

trustworthiness: float

class nirs4all.analysis.transfer_metrics.TransferMetricsComputer(n_components: int = 10, k_neighbors: int = 10, random_state: int = 0)[source]

Bases: object

Fast computation of transfer metrics between two datasets.

Key optimization: Computes PCA once per dataset, then reuses for all metric computations.

Parameters:

n_components – Number of PCA components for projection.
k_neighbors – Number of neighbors for trustworthiness computation.
random_state – Random state for reproducibility.

compute(X_source: ndarray, X_target: ndarray, compute_trust: bool = True) → TransferMetrics[source]

Compute all transfer metrics between two datasets.

Parameters:

X_source – Source dataset (n_samples_src, n_features).
X_target – Target dataset (n_samples_tgt, n_features).
compute_trust – Whether to compute trustworthiness (slower).

Returns:

TransferMetrics containing all computed metrics.

compute_raw_and_preprocessed(X_source_raw: ndarray, X_target_raw: ndarray, X_source_pp: ndarray, X_target_pp: ndarray, compute_trust: bool = True) → Tuple[TransferMetrics, TransferMetrics, Dict[str, float]][source]

Compute metrics for both raw and preprocessed data, plus improvement.

Parameters:

X_source_raw – Raw source dataset.
X_target_raw – Raw target dataset.
X_source_pp – Preprocessed source dataset.
X_target_pp – Preprocessed target dataset.
compute_trust – Whether to compute trustworthiness.

Returns:

Tuple of (raw_metrics, pp_metrics, improvements_dict)

nirs4all.analysis.transfer_metrics.compute_transfer_score(metrics: TransferMetrics, raw_metrics: TransferMetrics | None = None, weights: Dict[str, float] | None = None) → float[source]

Compute a composite transfer score from metrics.

Higher scores indicate better transfer potential.

Parameters:

metrics – TransferMetrics from preprocessed data.
raw_metrics – Optional baseline metrics for computing improvements.
weights – Optional custom weights for metric combination.

Returns:

Composite transfer score (0-1 scale, higher is better). Returns NaN if critical metrics are invalid.