Evaluation Metrics¶
Metrics for Comparing Two Embeddings¶
Align diffusion coordinates via orthogonal Procrustes and report R^2.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Px
|
(n, n) csr_matrix or ndarray
|
Diffusion operators to compare. |
required |
Py
|
(n, n) csr_matrix or ndarray
|
Diffusion operators to compare. |
required |
times
|
tuple of int
|
Diffusion times; we build Φ_t for each and average the R^2. |
(1,2,4,8)
|
r
|
int
|
Number of eigenpairs for coordinates. |
64
|
symmetric_hint
|
bool
|
See |
False
|
center
|
bool
|
Mean-center coordinates before Procrustes. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
R2_avg |
float
|
Average coefficient of determination across t (clipped to [0,1]). Higher is better (1.0 means perfect alignment up to a rotation). |
Compute the global score comparing an embedding to PCA.
The score is defined as exp(-(L_emb - L_pca) / L_pca) where L
denotes the mean reconstruction error (global loss) of a linear
projection. A score of 1 means the embedding preserves as much global
structure as PCA; scores below 1 indicate worse global preservation.
The result is clipped to [0, 1].
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
array-like of shape (n_samples, n_features) or sparse matrix
|
Input feature matrix. |
required |
Y
|
array-like of shape (n_samples, n_components)
|
Low-dimensional embedding to evaluate. |
required |
Y_pca
|
array-like of shape (n_samples, n_components)
|
Pre-computed PCA embedding. If None, computed from |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
score |
float in (0, 1]
|
Global structure preservation score relative to PCA. |
Compute the global score comparing an embedding to a Laplacian Eigenmap baseline.
The score is defined as exp(-(L_emb - L_lap) / L_lap) where L
denotes the mean reconstruction error (global loss) of a linear
projection. A score of 1 means the embedding preserves as much global
structure as a Laplacian Eigenmap of the same dimension; scores below 1
indicate worse global preservation. The result is clipped to [0, 1].
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
array-like of shape (n_samples, n_features) or sparse (n_samples, n_samples)
|
Input feature matrix, or precomputed affinity graph if
|
required |
Y
|
array-like of shape (n_samples, n_components)
|
Low-dimensional embedding to evaluate. |
required |
k
|
int
|
Number of neighbors used by |
10
|
data_is_graph
|
bool
|
If True, |
False
|
n_jobs
|
int
|
Number of parallel jobs for |
12
|
random_state
|
RandomState or int
|
Random state for |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
score |
float in (0, 1]
|
Global structure preservation score relative to Laplacian Eigenmaps. |
Rank correlation between reference and embedding geodesic distances.
Builds neighborhood graphs for both data and embedding, computes geodesic distances (optionally on landmarks), and returns their rank correlation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
np.ndarray of shape (n_samples, n_features) or scipy.sparse.spmatrix
|
Original high-dimensional data. If square, treated as a distance/affinity matrix. |
required |
emb
|
np.ndarray of shape (n_samples, n_components) or scipy.sparse.spmatrix
|
Low-dimensional embedding. If square, treated as a distance/affinity matrix. |
required |
landmarks
|
int or None
|
Number of landmarks. If None, uses all points. |
None
|
landmark_method
|
str
|
How to select landmarks ('random', 'kmeans', etc.). |
"random"
|
metric
|
str
|
Distance metric for data/embedding kNN graphs. |
"euclidean"
|
n_neighbors
|
int
|
Number of neighbors for kNN graphs. |
3
|
n_jobs
|
int
|
Number of parallel jobs. |
-1
|
cor_method
|
str
|
Rank correlation method ('spearman' or 'kendall'). |
"spearman"
|
random_state
|
int or RandomState
|
RNG seed. |
None
|
return_graphs
|
bool
|
If True, also return the computed graphs. |
False
|
verbose
|
bool
|
Verbosity flag. |
False
|
**kwargs
|
Extra arguments for kNN. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
score |
float
|
Rank correlation in [-1, 1]. Higher is better. |
(data_graph, emb_graph) : tuple, optional
|
If return_graphs=True, also return the neighborhood graphs. |
Spearman rank correlation between reference and embedding geodesic distances.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_graph
|
scipy.sparse.spmatrix or np.ndarray of shape (n_samples, n_samples)
|
Reference distance/affinity graph or matrix. |
required |
embedding_graph
|
scipy.sparse.spmatrix or np.ndarray of shape (n_samples, n_samples)
|
Embedding distance/affinity graph or matrix. |
required |
path_method
|
str
|
Geodesic distance method ('D' for shortest path, etc.). |
"D"
|
subsample_idx
|
np.ndarray of shape (n_subsample,)
|
Indices to subsample before computing correlation. If None, uses all points. |
None
|
unweighted
|
bool
|
If True, compute unweighted shortest paths. |
False
|
n_jobs
|
int
|
Number of jobs for parallel processing. |
1
|
Returns:
| Name | Type | Description |
|---|---|---|
score |
float
|
Spearman rank correlation coefficient in [-1, 1]. Higher is better. |
Kendall's tau rank correlation between reference and embedding geodesic distances.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_graph
|
scipy.sparse.spmatrix or np.ndarray of shape (n_samples, n_samples)
|
Reference distance/affinity graph or matrix. |
required |
embedding_graph
|
scipy.sparse.spmatrix or np.ndarray of shape (n_samples, n_samples)
|
Embedding distance/affinity graph or matrix. |
required |
path_method
|
str
|
Geodesic distance method ('D' for shortest path, etc.). |
"D"
|
subsample_idx
|
np.ndarray of shape (n_subsample,)
|
Indices to subsample before computing correlation. If None, uses all points. |
None
|
unweighted
|
bool
|
If True, compute unweighted shortest paths. |
False
|
n_jobs
|
int
|
Number of jobs for parallel processing. |
1
|
Returns:
| Name | Type | Description |
|---|---|---|
score |
float
|
Kendall's tau coefficient in [-1, 1]. Higher is better. |
Metrics for Comparing Two Sparse Graphs/Operators¶
Row-wise Jensen–Shannon (JS) similarity between two diffusion operators.
Given two (row-stochastic) operators Px and Py (csr_matrices or ndarrays), we compare each row i as a discrete probability distribution over columns (neighbors) and compute the Jensen–Shannon divergence JS(p_i, q_i). We then report a bounded similarity in [0, 1] via:
JS-similarity = 1 - mean_i JS(p_i, q_i)
where JS(·,·) = 0 for identical distributions and ≤ log(2) in nats; we use the standard normalized/base-e version implemented in scipy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Px
|
(n, n) csr_matrix or ndarray
|
Row-stochastic diffusion/transition operators to compare. (They need not be strictly stochastic; rows are renormalized internally.) |
required |
Py
|
(n, n) csr_matrix or ndarray
|
Row-stochastic diffusion/transition operators to compare. (They need not be strictly stochastic; rows are renormalized internally.) |
required |
eps
|
float
|
Small positive value added before per-row renormalization for numerical stability. |
1e-12
|
topk
|
int or None
|
If provided, restrict each row to its top-k entries by probability mass before comparing (helps robustness on very sparse / noisy rows). If None, we compare using the full sparse support (union of supports). |
None
|
return_per_row
|
bool
|
If True, also return the per-row JS similarities (1 - JS_i) as a 1D array. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
sim |
float in [0, 1]
|
1 - mean(JS divergence) across rows (higher is better). |
per_row |
(ndarray, optional)
|
Returned only if |
Notes
• Operates sparsely: for each row we build vectors over the union of the supports of Px[i, :] and Py[i, :] (unless topk is set, in which case supports are first truncated to top-k). • Sensitive to weights (transition probabilities), unlike set-overlap metrics. • If a row is empty in both operators, it is skipped.
Operator-level set overlap: F1 of top-k transition neighborhoods per row.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Px
|
(n, n) csr_matrix or ndarray
|
Diffusion/transition operators. |
required |
Py
|
(n, n) csr_matrix or ndarray
|
Diffusion/transition operators. |
required |
k
|
int or None
|
Number of neighbors (by transition probability) to keep per row. If None, use the natural sparsity (nnz per row) or min(nnz_x, nnz_y). |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
f1_avg |
float in [0, 1]
|
Mean F1 score across rows. 1.0 → identical sparse neighborhoods. |
Notes
- Build per-row sets of top-k columns by probability mass, then compute F1 = 2 |∩| / (|Sx| + |Sy|).
- Complements JS: insensitive to weights but sensitive to support overlap.
Local geometry agreement via Rank-Biased Overlap (RBO) of diffusion neighbors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Px
|
(n, n) csr_matrix or ndarray
|
Diffusion operators to compare. |
required |
Py
|
(n, n) csr_matrix or ndarray
|
Diffusion operators to compare. |
required |
times
|
tuple of int
|
Diffusion timescales; the score is averaged across them. |
(1,2,4,8)
|
r
|
int
|
Number of leading eigenpairs to use for diffusion distances. |
64
|
p
|
float in (0, 1)
|
Persistence parameter of RBO. Values closer to 1 give more weight to deeper ranks; smaller values emphasize the very top neighbors. |
0.9
|
k_max
|
int
|
Maximum depth of neighbor lists to compare. Truncates infinite RBO. |
100
|
symmetric_hint
|
bool
|
See |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
score |
float in [0, 1]
|
Average RBO similarity across timescales. 1.0 means identical ranked neighbor lists under diffusion distances. |
Notes
- RBO compares ordered neighbor lists, unlike kNN overlap which ignores order.
- Especially useful to assess whether diffusion rankings (top-100 neighbors) are preserved, not just set membership.
- Parameter p controls top-heaviness: p=0.9 gives ≈86% of weight to top-10.
Metrics for Evaluating Laplacian Spectra¶
Operator-level spectral agreement via eigenvalues and subspaces.
Returns:
| Type | Description |
|---|---|
If return_details=False:
|
float in [0,1]: cosine of the largest principal angle between the top-r eigenspaces (higher is better). |
If return_details=True:
|
dict with {'eigenvalue_w1', 'subspace_cos'}. |
Global connectivity gap via (approximate) trace of Laplacian pseudoinverse.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Px
|
(n, n) csr_matrix or ndarray
|
Diffusion operators (will be symmetrized to build Laplacians). |
required |
Py
|
(n, n) csr_matrix or ndarray
|
Diffusion operators (will be symmetrized to build Laplacians). |
required |
r
|
int
|
If using low-rank approximation, number of leading modes for the trace. |
64
|
symmetric_hint
|
bool
|
See |
False
|
hutchinson_probes
|
int or None
|
If provided, estimate trace via Hutchinson’s method with this many random probe vectors (for very large graphs). |
None
|
random_state
|
int or RandomState or None
|
RNG for Hutchinson probes. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
gap |
float >= 0
|
Absolute difference between (approximate) trace( L^+_x ) and trace( L^+_y ). Smaller is better (more similar commute-time geometry). |
Notes
- Build A = (P + P^T)/2, then normalized Laplacian L. Commute-time distances relate to entries of L^+; its trace summarizes overall connectivity.
- For speed, you can approximate using low-rank spectral sums or Hutchinson.
Metric for End-to-End Layout Quality¶
Composite TopoPreserve score from four operator-aware metrics.
Higher is better; returns ≈[0, 1] after internal normalizations.
Components
• PF1 : F1@k on top-k transition neighborhoods per row (set overlap).
Range [0,1], higher is better. (Weight-insensitive.)
• PJS : Row-wise Jensen–Shannon similarity of transitions (1 − JS, normalized).
Range [0,1], higher is better. (Weight-sensitive.)
• SP : Spectral Procrustes R^2 alignment of diffusion coordinates (average over times).
Range [0,1], higher is better. (Global/meso geometry.)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Px
|
(n, n) csr_matrix or ndarray
|
Diffusion (transition) operators to compare. |
required |
Py
|
(n, n) csr_matrix or ndarray
|
Diffusion (transition) operators to compare. |
required |
times
|
tuple of int
|
Diffusion times for Spectral Procrustes. Ignored by the other components. |
(1, 2, 4, 8)
|
r
|
int
|
Leading eigenpairs used for spectral metrics (SP internals). |
64
|
symmetric_hint
|
bool
|
If True, treat operators as symmetric for eigensolvers (stability hint). |
False
|
k_for_pf1
|
int or None
|
Top-k used in PF1. If None, each row uses its native sparsity. |
None
|
weights
|
dict
|
Mixture weights for the four components. Any NaN component is skipped and remaining weights are renormalized. |
{PF1=0.30, PJS=0.30, SP=0.30}
|
Returns:
| Name | Type | Description |
|---|---|---|
score |
float
|
Weighted average of the component scores in ≈[0,1] (higher is better). |
parts |
dict
|
{ 'PF1' : float in [0,1], 'PJS' : float in [0,1], 'SP' : float in [0,1], } |
Notes
• PF1 (set) and PJS (weight) together capture local neighborhood fidelity. • SP captures global/meso geometry via diffusion eigencoordinates. • All components are operator-native (diffusion/graph-based), aligning with TopoMAP/DM objectives more directly than raw Euclidean metrics.
Riemannian Distortion Diagnostics¶
Estimate and cache the Riemann metric of an embedding.
Wraps :func:riemann_metric with lazy, cached computation of the dual
metric H, the metric G, their SVD factors and the metric determinant.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Y
|
ndarray of shape (n_samples, n_dim)
|
Embedding coordinates. |
required |
L
|
ndarray or sparse matrix of shape (n_samples, n_samples)
|
Graph Laplacian of the embedding. |
required |
Estimate the dual Riemann metric of embedding Y from its Laplacian.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Y
|
ndarray of shape (n_samples, n_dim_Y)
|
Embedding coordinates. |
required |
laplacian
|
ndarray or sparse matrix of shape (n_samples, n_samples)
|
Graph Laplacian of the embedding's neighborhood graph. |
required |
n_dim
|
int
|
Intrinsic dimension to retain; defaults to |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
H |
ndarray of shape (n_samples, n_dim_Y, n_dim_Y)
|
The dual (inverse) Riemann metric at each point. |
G |
ndarray
|
The Riemann metric (pseudo-inverse of |
Vh, S, Sinv : ndarray
|
Right singular vectors, singular values and inverted singular values
of |