Skip to content

Evaluation Metrics

Metrics for Comparing Two Embeddings

Align diffusion coordinates via orthogonal Procrustes and report R^2.

Parameters:

Name Type Description Default
Px (n, n) csr_matrix or ndarray

Diffusion operators to compare.

required
Py (n, n) csr_matrix or ndarray

Diffusion operators to compare.

required
times tuple of int

Diffusion times; we build Φ_t for each and average the R^2.

(1,2,4,8)
r int

Number of eigenpairs for coordinates.

64
symmetric_hint bool

See _top_eigs_of_P.

False
center bool

Mean-center coordinates before Procrustes.

True

Returns:

Name Type Description
R2_avg float

Average coefficient of determination across t (clipped to [0,1]). Higher is better (1.0 means perfect alignment up to a rotation).

Compute the global score comparing an embedding to PCA.

The score is defined as exp(-(L_emb - L_pca) / L_pca) where L denotes the mean reconstruction error (global loss) of a linear projection. A score of 1 means the embedding preserves as much global structure as PCA; scores below 1 indicate worse global preservation. The result is clipped to [0, 1].

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features) or sparse matrix

Input feature matrix.

required
Y array-like of shape (n_samples, n_components)

Low-dimensional embedding to evaluate.

required
Y_pca array-like of shape (n_samples, n_components)

Pre-computed PCA embedding. If None, computed from X.

None

Returns:

Name Type Description
score float in (0, 1]

Global structure preservation score relative to PCA.

Compute the global score comparing an embedding to a Laplacian Eigenmap baseline.

The score is defined as exp(-(L_emb - L_lap) / L_lap) where L denotes the mean reconstruction error (global loss) of a linear projection. A score of 1 means the embedding preserves as much global structure as a Laplacian Eigenmap of the same dimension; scores below 1 indicate worse global preservation. The result is clipped to [0, 1].

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features) or sparse (n_samples, n_samples)

Input feature matrix, or precomputed affinity graph if data_is_graph=True.

required
Y array-like of shape (n_samples, n_components)

Low-dimensional embedding to evaluate.

required
k int

Number of neighbors used by SpectralEmbedding when data_is_graph=False.

10
data_is_graph bool

If True, X is treated as a precomputed affinity graph.

False
n_jobs int

Number of parallel jobs for SpectralEmbedding.

12
random_state RandomState or int

Random state for SpectralEmbedding.

None

Returns:

Name Type Description
score float in (0, 1]

Global structure preservation score relative to Laplacian Eigenmaps.

Rank correlation between reference and embedding geodesic distances.

Builds neighborhood graphs for both data and embedding, computes geodesic distances (optionally on landmarks), and returns their rank correlation.

Parameters:

Name Type Description Default
data np.ndarray of shape (n_samples, n_features) or scipy.sparse.spmatrix

Original high-dimensional data. If square, treated as a distance/affinity matrix.

required
emb np.ndarray of shape (n_samples, n_components) or scipy.sparse.spmatrix

Low-dimensional embedding. If square, treated as a distance/affinity matrix.

required
landmarks int or None

Number of landmarks. If None, uses all points.

None
landmark_method str

How to select landmarks ('random', 'kmeans', etc.).

"random"
metric str

Distance metric for data/embedding kNN graphs.

"euclidean"
n_neighbors int

Number of neighbors for kNN graphs.

3
n_jobs int

Number of parallel jobs.

-1
cor_method str

Rank correlation method ('spearman' or 'kendall').

"spearman"
random_state int or RandomState

RNG seed.

None
return_graphs bool

If True, also return the computed graphs.

False
verbose bool

Verbosity flag.

False
**kwargs

Extra arguments for kNN.

{}

Returns:

Name Type Description
score float

Rank correlation in [-1, 1]. Higher is better.

(data_graph, emb_graph) : tuple, optional

If return_graphs=True, also return the neighborhood graphs.

Spearman rank correlation between reference and embedding geodesic distances.

Parameters:

Name Type Description Default
data_graph scipy.sparse.spmatrix or np.ndarray of shape (n_samples, n_samples)

Reference distance/affinity graph or matrix.

required
embedding_graph scipy.sparse.spmatrix or np.ndarray of shape (n_samples, n_samples)

Embedding distance/affinity graph or matrix.

required
path_method str

Geodesic distance method ('D' for shortest path, etc.).

"D"
subsample_idx np.ndarray of shape (n_subsample,)

Indices to subsample before computing correlation. If None, uses all points.

None
unweighted bool

If True, compute unweighted shortest paths.

False
n_jobs int

Number of jobs for parallel processing.

1

Returns:

Name Type Description
score float

Spearman rank correlation coefficient in [-1, 1]. Higher is better.

Kendall's tau rank correlation between reference and embedding geodesic distances.

Parameters:

Name Type Description Default
data_graph scipy.sparse.spmatrix or np.ndarray of shape (n_samples, n_samples)

Reference distance/affinity graph or matrix.

required
embedding_graph scipy.sparse.spmatrix or np.ndarray of shape (n_samples, n_samples)

Embedding distance/affinity graph or matrix.

required
path_method str

Geodesic distance method ('D' for shortest path, etc.).

"D"
subsample_idx np.ndarray of shape (n_subsample,)

Indices to subsample before computing correlation. If None, uses all points.

None
unweighted bool

If True, compute unweighted shortest paths.

False
n_jobs int

Number of jobs for parallel processing.

1

Returns:

Name Type Description
score float

Kendall's tau coefficient in [-1, 1]. Higher is better.

Metrics for Comparing Two Sparse Graphs/Operators

Row-wise Jensen–Shannon (JS) similarity between two diffusion operators.

Given two (row-stochastic) operators Px and Py (csr_matrices or ndarrays), we compare each row i as a discrete probability distribution over columns (neighbors) and compute the Jensen–Shannon divergence JS(p_i, q_i). We then report a bounded similarity in [0, 1] via:

JS-similarity = 1 - mean_i JS(p_i, q_i)

where JS(·,·) = 0 for identical distributions and ≤ log(2) in nats; we use the standard normalized/base-e version implemented in scipy.

Parameters:

Name Type Description Default
Px (n, n) csr_matrix or ndarray

Row-stochastic diffusion/transition operators to compare. (They need not be strictly stochastic; rows are renormalized internally.)

required
Py (n, n) csr_matrix or ndarray

Row-stochastic diffusion/transition operators to compare. (They need not be strictly stochastic; rows are renormalized internally.)

required
eps float

Small positive value added before per-row renormalization for numerical stability.

1e-12
topk int or None

If provided, restrict each row to its top-k entries by probability mass before comparing (helps robustness on very sparse / noisy rows). If None, we compare using the full sparse support (union of supports).

None
return_per_row bool

If True, also return the per-row JS similarities (1 - JS_i) as a 1D array.

False

Returns:

Name Type Description
sim float in [0, 1]

1 - mean(JS divergence) across rows (higher is better).

per_row (ndarray, optional)

Returned only if return_per_row=True. Per-row (1 - JS_i) scores.

Notes

• Operates sparsely: for each row we build vectors over the union of the supports of Px[i, :] and Py[i, :] (unless topk is set, in which case supports are first truncated to top-k). • Sensitive to weights (transition probabilities), unlike set-overlap metrics. • If a row is empty in both operators, it is skipped.

Operator-level set overlap: F1 of top-k transition neighborhoods per row.

Parameters:

Name Type Description Default
Px (n, n) csr_matrix or ndarray

Diffusion/transition operators.

required
Py (n, n) csr_matrix or ndarray

Diffusion/transition operators.

required
k int or None

Number of neighbors (by transition probability) to keep per row. If None, use the natural sparsity (nnz per row) or min(nnz_x, nnz_y).

None

Returns:

Name Type Description
f1_avg float in [0, 1]

Mean F1 score across rows. 1.0 → identical sparse neighborhoods.

Notes
  • Build per-row sets of top-k columns by probability mass, then compute F1 = 2 |∩| / (|Sx| + |Sy|).
  • Complements JS: insensitive to weights but sensitive to support overlap.

Local geometry agreement via Rank-Biased Overlap (RBO) of diffusion neighbors.

Parameters:

Name Type Description Default
Px (n, n) csr_matrix or ndarray

Diffusion operators to compare.

required
Py (n, n) csr_matrix or ndarray

Diffusion operators to compare.

required
times tuple of int

Diffusion timescales; the score is averaged across them.

(1,2,4,8)
r int

Number of leading eigenpairs to use for diffusion distances.

64
p float in (0, 1)

Persistence parameter of RBO. Values closer to 1 give more weight to deeper ranks; smaller values emphasize the very top neighbors.

0.9
k_max int

Maximum depth of neighbor lists to compare. Truncates infinite RBO.

100
symmetric_hint bool

See diffusion_eigs.

False

Returns:

Name Type Description
score float in [0, 1]

Average RBO similarity across timescales. 1.0 means identical ranked neighbor lists under diffusion distances.

Notes
  • RBO compares ordered neighbor lists, unlike kNN overlap which ignores order.
  • Especially useful to assess whether diffusion rankings (top-100 neighbors) are preserved, not just set membership.
  • Parameter p controls top-heaviness: p=0.9 gives ≈86% of weight to top-10.

Metrics for Evaluating Laplacian Spectra

Operator-level spectral agreement via eigenvalues and subspaces.

Returns:

Type Description
If return_details=False:

float in [0,1]: cosine of the largest principal angle between the top-r eigenspaces (higher is better).

If return_details=True:

dict with {'eigenvalue_w1', 'subspace_cos'}.

Global connectivity gap via (approximate) trace of Laplacian pseudoinverse.

Parameters:

Name Type Description Default
Px (n, n) csr_matrix or ndarray

Diffusion operators (will be symmetrized to build Laplacians).

required
Py (n, n) csr_matrix or ndarray

Diffusion operators (will be symmetrized to build Laplacians).

required
r int

If using low-rank approximation, number of leading modes for the trace.

64
symmetric_hint bool

See diffusion_eigs.

False
hutchinson_probes int or None

If provided, estimate trace via Hutchinson’s method with this many random probe vectors (for very large graphs).

None
random_state int or RandomState or None

RNG for Hutchinson probes.

None

Returns:

Name Type Description
gap float >= 0

Absolute difference between (approximate) trace( L^+_x ) and trace( L^+_y ). Smaller is better (more similar commute-time geometry).

Notes
  • Build A = (P + P^T)/2, then normalized Laplacian L. Commute-time distances relate to entries of L^+; its trace summarizes overall connectivity.
  • For speed, you can approximate using low-rank spectral sums or Hutchinson.

Metric for End-to-End Layout Quality

Composite TopoPreserve score from four operator-aware metrics.

Higher is better; returns ≈[0, 1] after internal normalizations.

Components

• PF1 : F1@k on top-k transition neighborhoods per row (set overlap). Range [0,1], higher is better. (Weight-insensitive.) • PJS : Row-wise Jensen–Shannon similarity of transitions (1 − JS, normalized). Range [0,1], higher is better. (Weight-sensitive.) • SP : Spectral Procrustes R^2 alignment of diffusion coordinates (average over times). Range [0,1], higher is better. (Global/meso geometry.)

Parameters:

Name Type Description Default
Px (n, n) csr_matrix or ndarray

Diffusion (transition) operators to compare.

required
Py (n, n) csr_matrix or ndarray

Diffusion (transition) operators to compare.

required
times tuple of int

Diffusion times for Spectral Procrustes. Ignored by the other components.

(1, 2, 4, 8)
r int

Leading eigenpairs used for spectral metrics (SP internals).

64
symmetric_hint bool

If True, treat operators as symmetric for eigensolvers (stability hint).

False
k_for_pf1 int or None

Top-k used in PF1. If None, each row uses its native sparsity.

None
weights dict

Mixture weights for the four components. Any NaN component is skipped and remaining weights are renormalized.

{PF1=0.30, PJS=0.30, SP=0.30}

Returns:

Name Type Description
score float

Weighted average of the component scores in ≈[0,1] (higher is better).

parts dict

{ 'PF1' : float in [0,1], 'PJS' : float in [0,1], 'SP' : float in [0,1], }

Notes

• PF1 (set) and PJS (weight) together capture local neighborhood fidelity. • SP captures global/meso geometry via diffusion eigencoordinates. • All components are operator-native (diffusion/graph-based), aligning with TopoMAP/DM objectives more directly than raw Euclidean metrics.

Riemannian Distortion Diagnostics

Estimate and cache the Riemann metric of an embedding.

Wraps :func:riemann_metric with lazy, cached computation of the dual metric H, the metric G, their SVD factors and the metric determinant.

Parameters:

Name Type Description Default
Y ndarray of shape (n_samples, n_dim)

Embedding coordinates.

required
L ndarray or sparse matrix of shape (n_samples, n_samples)

Graph Laplacian of the embedding.

required

fit

fit(Y, L=None)

Recompute the metric for embedding Y (and optional Laplacian L).

transform

transform(Y, L=None)

Alias of :meth:fit for scikit-learn-style call sites.

Estimate the dual Riemann metric of embedding Y from its Laplacian.

Parameters:

Name Type Description Default
Y ndarray of shape (n_samples, n_dim_Y)

Embedding coordinates.

required
laplacian ndarray or sparse matrix of shape (n_samples, n_samples)

Graph Laplacian of the embedding's neighborhood graph.

required
n_dim int

Intrinsic dimension to retain; defaults to Y.shape[1].

None

Returns:

Name Type Description
H ndarray of shape (n_samples, n_dim_Y, n_dim_Y)

The dual (inverse) Riemann metric at each point.

G ndarray

The Riemann metric (pseudo-inverse of H) at each point.

Vh, S, Sinv : ndarray

Right singular vectors, singular values and inverted singular values of H from its per-point SVD.