Evaluation Metrics¶

Metrics for Comparing Two Embeddings¶

Align diffusion coordinates via orthogonal Procrustes and report R^2.

Parameters:

Name	Type	Description	Default
`Px`	`(n, n) csr_matrix or ndarray`	Diffusion operators to compare.	required
`Py`	`(n, n) csr_matrix or ndarray`	Diffusion operators to compare.	required
`times`	`tuple of int`	Diffusion times; we build Φ_t for each and average the R^2.	`(1,2,4,8)`
`r`	`int`	Number of eigenpairs for coordinates.	`64`
`symmetric_hint`	`bool`	See `_top_eigs_of_P`.	`False`
`center`	`bool`	Mean-center coordinates before Procrustes.	`True`

Returns:

Name	Type	Description
`R2_avg`	`float`	Average coefficient of determination across t (clipped to [0,1]). Higher is better (1.0 means perfect alignment up to a rotation).

Compute the global score comparing an embedding to PCA.

The score is defined as exp(-(L_emb - L_pca) / L_pca) where L denotes the mean reconstruction error (global loss) of a linear projection. A score of 1 means the embedding preserves as much global structure as PCA; scores below 1 indicate worse global preservation. The result is clipped to [0, 1].

Parameters:

Name	Type	Description	Default
`X`	`array-like of shape (n_samples, n_features) or sparse matrix`	Input feature matrix.	required
`Y`	`array-like of shape (n_samples, n_components)`	Low-dimensional embedding to evaluate.	required
`Y_pca`	`array-like of shape (n_samples, n_components)`	Pre-computed PCA embedding. If None, computed from `X`.	`None`

Returns:

Name	Type	Description
`score`	`float in (0, 1]`	Global structure preservation score relative to PCA.

Compute the global score comparing an embedding to a Laplacian Eigenmap baseline.

The score is defined as exp(-(L_emb - L_lap) / L_lap) where L denotes the mean reconstruction error (global loss) of a linear projection. A score of 1 means the embedding preserves as much global structure as a Laplacian Eigenmap of the same dimension; scores below 1 indicate worse global preservation. The result is clipped to [0, 1].

Parameters:

Name	Type	Description	Default
`X`	`array-like of shape (n_samples, n_features) or sparse (n_samples, n_samples)`	Input feature matrix, or precomputed affinity graph if `data_is_graph=True`.	required
`Y`	`array-like of shape (n_samples, n_components)`	Low-dimensional embedding to evaluate.	required
`k`	`int`	Number of neighbors used by `SpectralEmbedding` when `data_is_graph=False`.	`10`
`data_is_graph`	`bool`	If True, `X` is treated as a precomputed affinity graph.	`False`
`n_jobs`	`int`	Number of parallel jobs for `SpectralEmbedding`.	`12`
`random_state`	`RandomState or int`	Random state for `SpectralEmbedding`.	`None`

Returns:

Name	Type	Description
`score`	`float in (0, 1]`	Global structure preservation score relative to Laplacian Eigenmaps.

Rank correlation between reference and embedding geodesic distances.

Builds neighborhood graphs for both data and embedding, computes geodesic distances (optionally on landmarks), and returns their rank correlation.

Parameters:

Name	Type	Description	Default
`data`	`np.ndarray of shape (n_samples, n_features) or scipy.sparse.spmatrix`	Original high-dimensional data. If square, treated as a distance/affinity matrix.	required
`emb`	`np.ndarray of shape (n_samples, n_components) or scipy.sparse.spmatrix`	Low-dimensional embedding. If square, treated as a distance/affinity matrix.	required
`landmarks`	`int or None`	Number of landmarks. If None, uses all points.	`None`
`landmark_method`	`str`	How to select landmarks ('random', 'kmeans', etc.).	`"random"`
`metric`	`str`	Distance metric for data/embedding kNN graphs.	`"euclidean"`
`n_neighbors`	`int`	Number of neighbors for kNN graphs.	`3`
`n_jobs`	`int`	Number of parallel jobs.	`-1`
`cor_method`	`str`	Rank correlation method ('spearman' or 'kendall').	`"spearman"`
`random_state`	`int or RandomState`	RNG seed.	`None`
`return_graphs`	`bool`	If True, also return the computed graphs.	`False`
`verbose`	`bool`	Verbosity flag.	`False`
`**kwargs`		Extra arguments for kNN.	`{}`

Returns:

Name	Type	Description
`score`	`float`	Rank correlation in [-1, 1]. Higher is better.
	`(data_graph, emb_graph) : tuple, optional`	If return_graphs=True, also return the neighborhood graphs.

Spearman rank correlation between reference and embedding geodesic distances.

Parameters:

Name	Type	Description	Default
`data_graph`	`scipy.sparse.spmatrix or np.ndarray of shape (n_samples, n_samples)`	Reference distance/affinity graph or matrix.	required
`embedding_graph`	`scipy.sparse.spmatrix or np.ndarray of shape (n_samples, n_samples)`	Embedding distance/affinity graph or matrix.	required
`path_method`	`str`	Geodesic distance method ('D' for shortest path, etc.).	`"D"`
`subsample_idx`	`np.ndarray of shape (n_subsample,)`	Indices to subsample before computing correlation. If None, uses all points.	`None`
`unweighted`	`bool`	If True, compute unweighted shortest paths.	`False`
`n_jobs`	`int`	Number of jobs for parallel processing.	`1`

Returns:

Name	Type	Description
`score`	`float`	Spearman rank correlation coefficient in [-1, 1]. Higher is better.

Kendall's tau rank correlation between reference and embedding geodesic distances.

Parameters:

Name	Type	Description	Default
`data_graph`	`scipy.sparse.spmatrix or np.ndarray of shape (n_samples, n_samples)`	Reference distance/affinity graph or matrix.	required
`embedding_graph`	`scipy.sparse.spmatrix or np.ndarray of shape (n_samples, n_samples)`	Embedding distance/affinity graph or matrix.	required
`path_method`	`str`	Geodesic distance method ('D' for shortest path, etc.).	`"D"`
`subsample_idx`	`np.ndarray of shape (n_subsample,)`	Indices to subsample before computing correlation. If None, uses all points.	`None`
`unweighted`	`bool`	If True, compute unweighted shortest paths.	`False`
`n_jobs`	`int`	Number of jobs for parallel processing.	`1`

Returns:

Name	Type	Description
`score`	`float`	Kendall's tau coefficient in [-1, 1]. Higher is better.

Metrics for Comparing Two Sparse Graphs/Operators¶

Row-wise Jensen–Shannon (JS) similarity between two diffusion operators.

Given two (row-stochastic) operators Px and Py (csr_matrices or ndarrays), we compare each row i as a discrete probability distribution over columns (neighbors) and compute the Jensen–Shannon divergence JS(p_i, q_i). We then report a bounded similarity in [0, 1] via:

JS-similarity = 1 - mean_i JS(p_i, q_i)

where JS(·,·) = 0 for identical distributions and ≤ log(2) in nats; we use the standard normalized/base-e version implemented in scipy.

Parameters:

Name	Type	Description	Default
`Px`	`(n, n) csr_matrix or ndarray`	Row-stochastic diffusion/transition operators to compare. (They need not be strictly stochastic; rows are renormalized internally.)	required
`Py`	`(n, n) csr_matrix or ndarray`	Row-stochastic diffusion/transition operators to compare. (They need not be strictly stochastic; rows are renormalized internally.)	required
`eps`	`float`	Small positive value added before per-row renormalization for numerical stability.	`1e-12`
`topk`	`int or None`	If provided, restrict each row to its top-k entries by probability mass before comparing (helps robustness on very sparse / noisy rows). If None, we compare using the full sparse support (union of supports).	`None`
`return_per_row`	`bool`	If True, also return the per-row JS similarities (1 - JS_i) as a 1D array.	`False`

Returns:

Name	Type	Description
`sim`	`float in [0, 1]`	1 - mean(JS divergence) across rows (higher is better).
`per_row`	`(ndarray, optional)`	Returned only if `return_per_row=True`. Per-row (1 - JS_i) scores.

Notes

• Operates sparsely: for each row we build vectors over the union of the supports of Px[i, :] and Py[i, :] (unless topk is set, in which case supports are first truncated to top-k). • Sensitive to weights (transition probabilities), unlike set-overlap metrics. • If a row is empty in both operators, it is skipped.

Operator-level set overlap: F1 of top-k transition neighborhoods per row.

Parameters:

Name	Type	Description	Default
`Px`	`(n, n) csr_matrix or ndarray`	Diffusion/transition operators.	required
`Py`	`(n, n) csr_matrix or ndarray`	Diffusion/transition operators.	required
`k`	`int or None`	Number of neighbors (by transition probability) to keep per row. If None, use the natural sparsity (nnz per row) or min(nnz_x, nnz_y).	`None`

Returns:

Name	Type	Description
`f1_avg`	`float in [0, 1]`	Mean F1 score across rows. 1.0 → identical sparse neighborhoods.

Notes

Build per-row sets of top-k columns by probability mass, then compute F1 = 2 |∩| / (|Sx| + |Sy|).
Complements JS: insensitive to weights but sensitive to support overlap.

Local geometry agreement via Rank-Biased Overlap (RBO) of diffusion neighbors.

Parameters:

Name	Type	Description	Default
`Px`	`(n, n) csr_matrix or ndarray`	Diffusion operators to compare.	required
`Py`	`(n, n) csr_matrix or ndarray`	Diffusion operators to compare.	required
`times`	`tuple of int`	Diffusion timescales; the score is averaged across them.	`(1,2,4,8)`
`r`	`int`	Number of leading eigenpairs to use for diffusion distances.	`64`
`p`	`float in (0, 1)`	Persistence parameter of RBO. Values closer to 1 give more weight to deeper ranks; smaller values emphasize the very top neighbors.	`0.9`
`k_max`	`int`	Maximum depth of neighbor lists to compare. Truncates infinite RBO.	`100`
`symmetric_hint`	`bool`	See `diffusion_eigs`.	`False`

Returns:

Name	Type	Description
`score`	`float in [0, 1]`	Average RBO similarity across timescales. 1.0 means identical ranked neighbor lists under diffusion distances.

Notes

RBO compares ordered neighbor lists, unlike kNN overlap which ignores order.
Especially useful to assess whether diffusion rankings (top-100 neighbors) are preserved, not just set membership.
Parameter p controls top-heaviness: p=0.9 gives ≈86% of weight to top-10.

Metrics for Evaluating Laplacian Spectra¶

Operator-level spectral agreement via eigenvalues and subspaces.

Returns:

Type	Description
`If return_details=False:`	float in [0,1]: cosine of the largest principal angle between the top-r eigenspaces (higher is better).
`If return_details=True:`	dict with {'eigenvalue_w1', 'subspace_cos'}.

Global connectivity gap via (approximate) trace of Laplacian pseudoinverse.

Parameters:

Name	Type	Description	Default
`Px`	`(n, n) csr_matrix or ndarray`	Diffusion operators (will be symmetrized to build Laplacians).	required
`Py`	`(n, n) csr_matrix or ndarray`	Diffusion operators (will be symmetrized to build Laplacians).	required
`r`	`int`	If using low-rank approximation, number of leading modes for the trace.	`64`
`symmetric_hint`	`bool`	See `diffusion_eigs`.	`False`
`hutchinson_probes`	`int or None`	If provided, estimate trace via Hutchinson’s method with this many random probe vectors (for very large graphs).	`None`
`random_state`	`int or RandomState or None`	RNG for Hutchinson probes.	`None`

Returns:

Name	Type	Description
`gap`	`float >= 0`	Absolute difference between (approximate) trace( L^+_x ) and trace( L^+_y ). Smaller is better (more similar commute-time geometry).

Notes

Build A = (P + P^T)/2, then normalized Laplacian L. Commute-time distances relate to entries of L^+; its trace summarizes overall connectivity.
For speed, you can approximate using low-rank spectral sums or Hutchinson.

Metric for End-to-End Layout Quality¶

Composite TopoPreserve score from four operator-aware metrics.

Higher is better; returns ≈[0, 1] after internal normalizations.

Components

• PF1 : F1@k on top-k transition neighborhoods per row (set overlap). Range [0,1], higher is better. (Weight-insensitive.) • PJS : Row-wise Jensen–Shannon similarity of transitions (1 − JS, normalized). Range [0,1], higher is better. (Weight-sensitive.) • SP : Spectral Procrustes R^2 alignment of diffusion coordinates (average over times). Range [0,1], higher is better. (Global/meso geometry.)

Parameters:

Name	Type	Description	Default
`Px`	`(n, n) csr_matrix or ndarray`	Diffusion (transition) operators to compare.	required
`Py`	`(n, n) csr_matrix or ndarray`	Diffusion (transition) operators to compare.	required
`times`	`tuple of int`	Diffusion times for Spectral Procrustes. Ignored by the other components.	`(1, 2, 4, 8)`
`r`	`int`	Leading eigenpairs used for spectral metrics (SP internals).	`64`
`symmetric_hint`	`bool`	If True, treat operators as symmetric for eigensolvers (stability hint).	`False`
`k_for_pf1`	`int or None`	Top-k used in PF1. If None, each row uses its native sparsity.	`None`
`weights`	`dict`	Mixture weights for the four components. Any NaN component is skipped and remaining weights are renormalized.	`{PF1=0.30, PJS=0.30, SP=0.30}`

Returns:

Name	Type	Description
`score`	`float`	Weighted average of the component scores in ≈[0,1] (higher is better).
`parts`	`dict`	{ 'PF1' : float in [0,1], 'PJS' : float in [0,1], 'SP' : float in [0,1], }

Notes

• PF1 (set) and PJS (weight) together capture local neighborhood fidelity. • SP captures global/meso geometry via diffusion eigencoordinates. • All components are operator-native (diffusion/graph-based), aligning with TopoMAP/DM objectives more directly than raw Euclidean metrics.

Riemannian Distortion Diagnostics¶

Estimate and cache the Riemann metric of an embedding.

Wraps :func:riemann_metric with lazy, cached computation of the dual metric H, the metric G, their SVD factors and the metric determinant.

Parameters:

Name	Type	Description	Default
`Y`	`ndarray of shape (n_samples, n_dim)`	Embedding coordinates.	required
`L`	`ndarray or sparse matrix of shape (n_samples, n_samples)`	Graph Laplacian of the embedding.	required

fit ¶

fit(Y, L=None)

Recompute the metric for embedding Y (and optional Laplacian L).

transform ¶

transform(Y, L=None)

Alias of :meth:fit for scikit-learn-style call sites.

Estimate the dual Riemann metric of embedding Y from its Laplacian.

Parameters:

Name	Type	Description	Default
`Y`	`ndarray of shape (n_samples, n_dim_Y)`	Embedding coordinates.	required
`laplacian`	`ndarray or sparse matrix of shape (n_samples, n_samples)`	Graph Laplacian of the embedding's neighborhood graph.	required
`n_dim`	`int`	Intrinsic dimension to retain; defaults to `Y.shape[1]`.	`None`

Returns:

Name	Type	Description
`H`	`ndarray of shape (n_samples, n_dim_Y, n_dim_Y)`	The dual (inverse) Riemann metric at each point.
`G`	`ndarray`	The Riemann metric (pseudo-inverse of `H`) at each point.
	`Vh, S, Sinv : ndarray`	Right singular vectors, singular values and inverted singular values of `H` from its per-point SVD.