nirs4all.visualization.analysis.transfer module
- class nirs4all.visualization.analysis.transfer.PreprocPCAEvaluator(r_components=10, knn=10)[source]
Bases:
object- fit(raw_data: dict[str, ndarray], pp_data: dict[str, dict[str, ndarray]])[source]
raw_data: {“dataset”: X_raw_(n,m), …} pp_data: Can be either:
{“pp_name”: {“dataset”: X_pp_(n,p), …}, …} OR
{“dataset”: {“pp_name”: X_pp_(n,p), …}, …}
(will automatically detect and pivot if needed)
Assumes rows (samples) are aligned within each dataset across raw and pp.
- get_cross_dataset_summary(metric='centroid_improvement')[source]
Get a summary of how preprocessing affects inter-dataset distances.
- Parameters:
metric – ‘centroid_improvement’ or ‘spread_improvement’ Higher values = preprocessing brought datasets closer
- Returns:
DataFrame sorted by improvement (best preprocessing first)
- get_quality_metric_convergence()[source]
Analyze how preprocessing affects the similarity of quality metrics across datasets. Lower variance = preprocessing makes datasets more homogeneous in quality.
- Returns:
DataFrame with variance of quality metrics (evr, cka, rv, etc.) across datasets for raw vs preprocessed data. Lower values = better convergence.
- plot_all_datasets_pca(figsize=(16, 12))[source]
Plot all datasets together in the same PCA space for raw and each preprocessing. Shows how datasets cluster and separate in different preprocessing spaces.
- plot_cross_dataset_distances(figsize=(14, 8))[source]
Plot how preprocessing affects inter-dataset distances. Shows which preprocessing methods bring datasets closer together.
- plot_cross_dataset_heatmap(metric='centroid_improvement', figsize=(12, 10))[source]
Create a heatmap showing pairwise dataset distances for each preprocessing.
- Parameters:
metric – ‘centroid_improvement’, ‘centroid_dist_pp’, ‘spread_improvement’, or ‘spread_dist_pp’
- plot_distance_matrices(metric='centroid', figsize=(18, 12))[source]
Plot distance matrices showing inter-dataset distances for raw and all preprocessings. Shows which preprocessing reduces distances (better for transfer learning).
- Parameters:
metric – ‘centroid’ or ‘spread’ - which distance metric to display
- plot_distance_reduction_ranking(metric='centroid', log_scale=False, figsize=(14, 8))[source]
Bar chart showing which preprocessing methods best reduce inter-dataset distances. Directly answers: “Which preprocessing is best for transfer learning?”
- Parameters:
metric – ‘centroid’ or ‘spread’ - which distance method to use for ranking
log_scale – If True, use log scale for the right plot (absolute distances) to handle extreme values
- plot_pair(dataset: str, preproc: str, figsize=(10, 5))[source]
Enhanced comparison plot for a specific dataset-preprocessing pair.
- plot_preservation_summary(by='preproc', figsize=(14, 8))[source]
Enhanced summary plot with better styling.
- plot_quality_metric_convergence(figsize=(16, 10))[source]
Visualize how preprocessing makes quality metrics more homogeneous across datasets. Shows variance reduction in EVR, CKA, RV, Procrustes, Trustworthiness, Grassmann.
Lower variance after preprocessing = datasets behave more similarly = better for transfer learning.