Advanced Estimators (Custom Pipelines)¶

These scikit-learn compatible transformers can be used individually to build custom manifold learning pipelines.

The Kernel(..., fuzzy=True) path delegates UMAP-specific fuzzy simplicial-set construction to umap-learn; TopoMetry uses the resulting graph inside its own spectral-scaffold and graph-refinement pipeline.

The Kernel(..., cknn=True) path builds the paper-defined binary CkNN graph: samples are adjacent when d(i, j) < delta * sqrt(rho_i * rho_j). This is an unweighted graph construction, not a weighted adaptive kernel; use cknn_ratio_matrix separately when normalized CkNN distance ratios are needed.

topo.base.ann.kNN ¶

kNN(
    X: ndarray | csr_matrix,
    Y: ndarray | csr_matrix | None = None,
    n_neighbors: int | float | str = 5,
    metric: str = "euclidean",
    n_jobs: int = -1,
    backend: str = "sklearn",
    M: int = 60,
    efC: int = 200,
    efS: int = 200,
    verbose: bool = False,
    **kwargs: Any,
) -> csr_matrix

Compute a k-nearest-neighbor distance graph.

Parameters:

Name	Type	Description	Default
`X`	`array-like or scipy.sparse matrix, shape (n_samples, n_features)`	Reference data used to fit the neighbor index. If `metric='precomputed'`, `X` must be a square distance matrix.	required
`Y`	`array-like or scipy.sparse matrix`	Query data. If provided, the output graph has shape `(Y.shape[0], X.shape[0])`.	`None`
`n_neighbors`	`int`	Number of non-self neighbors to return for self-query graphs.	`5`
`metric`	`str`	Distance metric.	`'euclidean'`
`n_jobs`	`int`	Number of threads. `-1` uses all available CPUs.	`-1`
`backend`	`(sklearn, hnswlib)`	Neighbor-search backend.	`'sklearn'`
`verbose`	`bool`	Emit backend/timing diagnostics through logging.	`False`

Returns:

Type	Description
`csr_matrix`	kNN distance graph.

topo.tpgraph.cknn.cknn_graph ¶

cknn_graph(
    X,
    *,
    scale_k: int = 10,
    delta: float = 1.0,
    metric: str = "euclidean",
    candidate_k: int | None = None,
    exact: bool = False,
    include_self: bool = False,
    symmetrize: SymmetrizeMode = "or",
    backend: str = "sklearn",
    n_jobs: int = -1,
    verbose: bool = False,
    **kwargs,
) -> csr_matrix

Build the unweighted Continuous k-Nearest Neighbors graph.

Two samples i and j are connected when d(i, j) < delta * sqrt(rho_i * rho_j), where rho_i is the distance from sample i to its scale_k-th nearest neighbor. Exact mode thresholds all pairs. Candidate-neighbor mode is scalable but may miss edges if candidate_k is too small.

Introduced by Berry & Sauer, "Consistent manifold representation for topological data analysis" (https://arxiv.org/abs/1606.02353). The binary CkNN graph with the unnormalized Laplacian gives a consistent estimate of the Laplace-Beltrami operator regardless of sampling density.

Parameters:

Name	Type	Description	Default
`X`	`array-like or sparse matrix of shape (n_samples, n_features)`	Input data, or a square distance matrix when `metric='precomputed'`.	required
`scale_k`	`int`	Neighbor rank used for the local scale `rho_i`.	`10`
`delta`	`float`	Connection threshold; larger values produce denser graphs.	`1.0`
`metric`	`str`	Distance metric, or `'precomputed'`.	`'euclidean'`
`candidate_k`	`int`	Number of candidate neighbors searched per sample. Defaults to `max(3 * scale_k, scale_k + 15)`.	`None`
`exact`	`bool`	Threshold all pairwise distances instead of candidate neighbors only. Quadratic in memory; use for small datasets or validation.	`False`
`include_self`	`bool`	Whether to keep self-loops.	`False`
`symmetrize`	`('or', 'and', none)`	How to symmetrize the directed candidate relation.	`'or'`
`backend`	`(sklearn, hnswlib)`	Neighbor-search backend for candidate retrieval.	`'sklearn'`
`n_jobs`	`int`	Number of threads. `-1` uses all available CPUs.	`-1`
`verbose`	`bool`	Emit search diagnostics through logging.	`False`
`**kwargs`		Forwarded to :func:`topo.base.ann.kNN`.	`{}`

Returns:

Type	Description
`csr_matrix`	Binary (0/1) adjacency matrix of shape `(n_samples, n_samples)`.

topo.tpgraph.cknn.cknn_ratio_matrix ¶

cknn_ratio_matrix(
    X,
    *,
    scale_k: int = 10,
    metric: str = "euclidean",
    candidate_k: int | None = None,
    exact: bool = False,
    symmetrize: SymmetrizeMode = "or",
    backend: str = "sklearn",
    n_jobs: int = -1,
    verbose: bool = False,
    **kwargs,
) -> csr_matrix

Return sparse CkNN normalized distance ratios.

Entry (i, j) is d(i, j) / sqrt(rho_i * rho_j). This matrix is useful for persistence/order diagnostics; it is not the binary CkNN adjacency used for the unnormalized graph Laplacian. Ratios are always computed without self-loops.

topo.tpgraph.kernels.compute_kernel ¶

compute_kernel(
    X,
    metric: str = ...,
    n_neighbors: int = ...,
    fuzzy: bool = ...,
    cknn: bool = ...,
    cknn_delta: float = ...,
    cknn_candidate_neighbors: int | None = ...,
    cknn_exact: bool = ...,
    pairwise: bool = ...,
    sigma: float | None = ...,
    adaptive_bw: bool = ...,
    expand_nbr_search: bool = ...,
    alpha_decaying: bool = ...,
    *,
    return_densities: Literal[False] = ...,
    symmetrize: bool = ...,
    backend: str = ...,
    n_jobs: int = ...,
    verbose: bool = ...,
    use_angular: bool = ...,
    square_distances: bool = ...,
    random_state: int | RandomState | None = ...,
    **kwargs,
) -> csr_matrix

compute_kernel(
    X,
    metric: str = ...,
    n_neighbors: int = ...,
    fuzzy: bool = ...,
    cknn: bool = ...,
    cknn_delta: float = ...,
    cknn_candidate_neighbors: int | None = ...,
    cknn_exact: bool = ...,
    pairwise: bool = ...,
    sigma: float | None = ...,
    adaptive_bw: bool = ...,
    expand_nbr_search: bool = ...,
    alpha_decaying: bool = ...,
    *,
    return_densities: Literal[True],
    symmetrize: bool = ...,
    backend: str = ...,
    n_jobs: int = ...,
    verbose: bool = ...,
    use_angular: bool = ...,
    square_distances: bool = ...,
    random_state: int | RandomState | None = ...,
    **kwargs,
) -> tuple[
    csr_matrix, dict[str, csr_matrix | np.ndarray | int]
]

compute_kernel(
    X,
    metric: str = "cosine",
    n_neighbors: int = 10,
    fuzzy: bool = False,
    cknn: bool = False,
    cknn_delta: float = 1.0,
    cknn_candidate_neighbors: int | None = None,
    cknn_exact: bool = False,
    pairwise: bool = False,
    sigma: float | None = None,
    adaptive_bw: bool = True,
    expand_nbr_search: bool = False,
    alpha_decaying: bool = False,
    *,
    return_densities: bool = False,
    symmetrize: bool = True,
    backend: str = "hnswlib",
    n_jobs: int = -1,
    verbose: bool = False,
    use_angular: bool = True,
    square_distances: bool = True,
    random_state: int | RandomState | None = None,
    **kwargs,
) -> (
    csr_matrix
    | tuple[
        csr_matrix, dict[str, csr_matrix | np.ndarray | int]
    ]
)

Compute a kernel matrix from a set of points.

Parameters:

Name	Type	Description	Default
`X`	`array-like, shape (n_samples, n_features).`	The set of points to compute the kernel matrix for. Accepts np.ndarrays and scipy.sparse matrices. If precomputed, assumed to be a square symmetric semidefinite matrix.	required
`metric`	`string`	The metric to use when computing the kernel matrix. Possible values are: 'cosine', 'euclidean' and others. Accepts precomputed distances.	`'cosine'`
`n_neighbors`	`int`	The number of neighbors to use when computing the kernel matrix. Ignored if `pairwise` set to `True`.	`10`
`fuzzy`	`bool`	Whether to build the kernel matrix using fuzzy simplicial sets, similarly to UMAP. If set to `True`, the `pairwise`, `sigma`, `adaptive_bw`, `expand_nbr_search` and `alpha_decaying` parameters are ignored. If set to `True` at the same time that `cknn` is set to `True`, the `cknn` parameter is ignored.	`False`
`cknn`	`bool`	Whether to build the binary, unweighted continuous k-nearest-neighbors graph. If set to `True`, the `pairwise`, `sigma`, `adaptive_bw`, `expand_nbr_search` and `alpha_decaying` parameters are ignored.	`False`
`cknn_delta`	`float`	Unitless CkNN edge threshold. Ignored if `cknn` is `False`.	`1.0`
`cknn_candidate_neighbors`	`int or None`	Number of candidate neighbors tested in approximate CkNN mode. Ignored when `cknn_exact=True`.	`None`
`cknn_exact`	`bool`	If True, threshold all pairwise distances for CkNN construction.	`False`
`pairwise`	`bool`	Whether to compute the kernel using dense pairwise distances. If set to `True`, the `n_neighbors` and `backend` parameters are ignored. Uses `numba` for computations if available. If not, uses `sklearn`.	`False`
`sigma`	`float`	Scaling factor if using fixed bandwidth kernel (only used if `adaptive_bw` is set to `False`).	`None`
`adaptive_bw`	`bool`	Whether to use an adaptive bandwidth based on the distance to median k-nearest-neighbor.	`True`
`expand_nbr_search`	`bool`	Whether to expand the neighborhood search (mitigates a choice of too small a number of k-neighbors).	`False`
`alpha_decaying`	`bool`	Whether to use an adaptively decaying kernel.	`False`
`return_densities`	`bool`	Whether to return the bandwidth metrics as a dictinary. If set to `True`, the function returns a tuple containing the kernel matrix and a dictionary containing the bandwidth metric.	`False`
`symmetrize`	`bool`	Whether to symmetrize the kernel matrix after normalizations.	`True`
`backend`	`str`	Which backend to use for neighborhood computations. Defaults to 'hnswlib'. Options are 'hnswlib' and 'sklearn'.	`'hnswlib'`
`n_jobs`	`int`	The number of jobs to use for parallel computations. If set to -1, all available cores are used.	`1`
`verbose`	`bool`	Whether to print progress messages.	`False`
`**kwargs`	`dict`	Additional arguments to be passed to the nearest-neighbors backend.	`{}`

Returns:

Name	Type	Description
`K`	`(array - like, shape(n_samples, n_samples))`	The kernel matrix.
`densities`	dict, optional (if `return_densities` is set to `True`)	If `fuzzy` and `cknn` are `False`, is a dictionary containing the bandwidth metrics. If `fuzzy` is set to `True`, the dictionary contains sigma and rho estimates. If `cknn` is set to `True`, the dictionary contains the binary adjacency.

topo.tpgraph.kernels.Kernel ¶

Kernel(
    metric="cosine",
    n_neighbors=10,
    fuzzy=False,
    cknn=False,
    cknn_delta=1.0,
    cknn_candidate_neighbors=None,
    cknn_exact=False,
    pairwise=False,
    sigma=None,
    adaptive_bw=True,
    expand_nbr_search=False,
    alpha_decaying=False,
    symmetrize=True,
    backend="hnswlib",
    n_jobs=1,
    laplacian_type="normalized",
    anisotropy=1.0,
    semi_aniso=False,
    n_landmarks=None,
    cache_input=False,
    verbose=False,
    random_state=None,
    use_angular=True,
)

Bases: BaseEstimator, TransformerMixin

Scikit-learn flavored estimator for computing a kernel matrix from points.

Includes functions for computing the kernel matrix with a variety of methods (adaptive bandwidth, fuzzy simplicial sets, continuous k-nearest-neighbors, etc) and performing operations on the resulting graph, such as obtaining its Laplacian, sparsifying it, filtering and interpolating signals, obtaining diffusion operators, imputing missing values and computing shortest paths.

Use this only when you want to manually construct a graph or diffusion operator.

Example

from topo.tpgraph import Kernel
from topo.spectral import EigenDecomposition
from topo.layouts import Projector
kernel = Kernel(n_neighbors=30, metric="cosine").fit(X)
Z = EigenDecomposition(n_components=64, method="msDM").fit_transform(kernel)
Y = Projector(projection_method="MAP").fit_transform(Z)

Parameters:

Name	Type	Description	Default
`metric`	`str`	The metric to use when computing the kernel matrix. Possible values are: 'cosine', 'euclidean' and others, depending on the chosen nearest-neighbors backend. Accepts precomputed distances as 'precomputed'.	`'cosine'`
`n_neighbors`	`int`	The number of neighbors to use when computing the kernel matrix. Ignored if `pairwise` set to `True`.	`10`
`fuzzy`	`bool`	Whether to build the kernel matrix using fuzzy simplicial sets, similarly to UMAP. If set to `True`, the `pairwise`, `sigma`, `adaptive_bw`, `expand_nbr_search` and `alpha_decaying` parameters are ignored. If set to `True` at the same time that `cknn` is set to `True`, the `cknn` parameter is ignored.	`False`
`cknn`	`bool`	Whether to build the binary, unweighted continuous k-nearest-neighbors graph. If set to `True`, the `pairwise`, `sigma`, `adaptive_bw`, `expand_nbr_search` and `alpha_decaying` parameters are ignored.	`False`
`pairwise`	`bool`	Whether to compute the kernel using dense pairwise distances. If set to `True`, the `n_neighbors` and `backend` parameters are ignored. Uses `numba` for computations if available. If not, uses `sklearn`.	`False`
`sigma`	`float or None`	Scaling factor if using fixed bandwidth kernel (only used if `adaptive_bw` is set to `False`).	`None`
`adaptive_bw`	`bool`	Whether to use an adaptive bandwidth based on the distance to median k-nearest-neighbor.	`True`
`expand_nbr_search`	`bool`	Whether to expand the neighborhood search (mitigates a choice of too small a number of k-neighbors).	`False`
`alpha_decaying`	`bool`	Whether to use an adaptively decaying kernel.	`False`
`anisotropy`	`float`	The anisotropy (alpha) parameter in the diffusion maps literature for kernel reweighting.	`1.0`
`semi_aniso`	`bool`	Whether to use semi-anisotropic diffusion. This reweights the original kernel (not the renormalized kernel) by the renormalized degree.	`False`
`symmetrize`	`bool`	Whether to symmetrize the kernel matrix after normalizations.	`True`
`backend`	`(hnswlib, sklearn)`	Which backend to use for k-nearest-neighbor computations. Defaults to 'hnswlib'. Options are 'hnswlib' and 'sklearn'.	`"hnswlib"`
`n_jobs`	`int`	The number of jobs to use for parallel computations. If -1, all CPUs are used. Parallellization (multiprocessing) is *highly* recommended whenever possible.	`-1`
`laplacian_type`	`(normalized, unnormalized, random_walk)`	The type of laplacian to use.	`"normalized"`

Fitted Attributes

knn_ : scipy.sparse.csr_matrix, shape (n_samples, n_samples) The computed k-nearest-neighbors graph.

_K : scipy.sparse.csr_matrix, shape (n_samples, n_samples) The kernel matrix.

dens_dict : dict Dictionary containing the density information for each point in the dataset for the computed kernel.

L : scipy.sparse.csr_matrix, shape (n_samples, n_samples) The laplacian matrix of the kernel graph.

P : scipy.sparse.csr_matrix, shape (n_samples, n_samples) The diffusion operator of the kernel graph.

SP : scipy.sparse.csr_matrix, shape (n_samples, n_samples) The shortest-path matrix.

degree : np.ndarray, shape (n_samples,) The degree of each point in the adjacency graph.

weighted_degree : np.ndarray, shape (n_samples,) The weighted degree of each point in the kernel graph.

knn `property` ¶

knn

The k-nearest-neighbors graph (sparse distance matrix).

Contains the raw nearest-neighbor distances used to compute the bandwidths and affinities. Represented as a CSR sparse matrix.

K `property` ¶

The kernel (affinity) matrix.

Represents the pairwise similarities between samples on the manifold, typically decaying from 1 (identical) to 0 (disconnected). Used as the weighted adjacency matrix for graph operators.

A `property` ¶

The binary graph adjacency matrix.

Unweighted representation of the graph connectivity, generated from the non-zero entries of the kernel affinity matrix (K).

degree `property` ¶

degree

The node degrees of the binary adjacency matrix.

A 1-D array containing the number of connected neighbors for each sample, ignoring the affinity weights.

weighted_degree `property` ¶

weighted_degree

The weighted node degrees of the kernel affinity matrix.

A 1-D array containing the sum of affinity weights for each sample. Used to normalize graph Laplacians and diffusion operators.

L `property` ¶

The graph Laplacian matrix.

Use this property for the default cached result. Use the laplacian method when you need to pass options or recompute with non-default settings.

Evaluates and caches the Laplacian on first access using the laplacian_type specified during initialization.

Use this property for the default cached result. Use the laplacian() method when you need to pass options or recompute with non-default settings.

P `property` ¶

The graph diffusion operator (Markov transition matrix).

Use this property for the default cached result. Use the diff_op method when you need to pass options or recompute with non-default settings.

Evaluates and caches the operator on first access. Describes the probabilities of random walks across the graph.

Use this property for the default cached result. Use the diff_op() method when you need to pass options or recompute with non-default settings.

SP `property` ¶

SP

The shortest-paths (geodesic distance) matrix.

Use this property for the default cached result. Use the shortest_paths method when you need to pass options or recompute with non-default settings.

Evaluates and caches the all-pairs shortest paths on the graph on first access.

Use this property for the default cached result. Use the shortest_paths() method when you need to pass options or recompute with non-default settings.

fit ¶

fit(X, recompute=False, **kwargs)

Fit the kernel matrix to the data.

Parameters:

Name	Type	Description	Default
`X`	`array-like or sparse matrix, shape (n_samples, n_features)`	Input data. Accepts NumPy arrays and SciPy CSR sparse matrices. If `metric="precomputed"`, this should be a square distance matrix.	required
`recompute`	`bool`	Whether to recompute the kernel if it has already been computed.	`False`
`**kwargs`		Additional arguments to be passed to the k-nearest-neighbor backend.	`{}`

Returns:

Name	Type	Description
`self`	`Kernel`	The fitted Kernel instance, populating properties like `K` and `knn`.

transform ¶

transform(X=None)

Return the fitted kernel (affinity) matrix.

Provided for scikit-learn compatibility in pipelines. X is ignored because out-of-sample extension for the kernel is not supported.

fit_transform ¶

fit_transform(X, y=None, **fit_params)

Fit the kernel to the data and return the affinity matrix.

Parameters:

Name	Type	Description	Default
`X`	`array-like or sparse matrix, shape (n_samples, n_features)`	Input data.	required
`y`	`None`	Ignored.	`None`
`**fit_params`		Additional parameters passed to `fit`.	`{}`

Returns:

Name	Type	Description
`K`	`(csr_matrix, shape(n_samples, n_samples))`	The computed kernel (affinity) matrix.

adjacency ¶

adjacency()

Compute the binary graph adjacency matrix.

The adjacency matrix defines the unweighted connectivity of the graph. It is represented as an N-by-N sparse matrix where entry A_{i,j} is 1 if the kernel affinity K_{i,j} > 0, and 0 otherwise.

laplacian ¶

laplacian(laplacian_type=None, recompute=False)

Compute the graph Laplacian of this kernel's affinity matrix.

For a friendly reference, see this material from James Melville: https://jlmelville.github.io/smallvis/spectral.html

Parameters:

Name	Type	Description	Default
`laplacian_type`	`str`	The type of laplacian to use. Can be 'unnormalized', 'normalized', or 'random_walk'. If not provided, uses the default `laplacian_type` specified in the constructor.	`None`

Returns:

Name	Type	Description
`L`	`(csr_matrix, shape(n_samples, n_samples))`	The computed graph Laplacian matrix.

diff_op ¶

diff_op(anisotropy=1.0, symmetric=True, recompute=False)

Compute the diffusion operator.

Parameters:

Name	Type	Description	Default
`anisotropy`	`float`	Anisotropy to apply. 'Alpha' in the diffusion maps literature. Defaults to the anisotropy defined in the constructor.	`1.0`
`symmetric`	`bool`	Whether to use a symmetric version of the diffusion operator. This is particularly useful to yield a symmetric operator when using anisotropy (alpha > 0), as the diffusion operator P would be asymmetric otherwise, which can be problematic during matrix decomposition. Eigenvalues are the same as the asymmetric version, and the right eigenvectors of the original asymmetric operator can be recovered by left multiplying by `Kernel.D_inv_sqrt_`.	`True`

Returns:

Name	Type	Description
`P`	`(csr_matrix, shape(n_samples, n_samples))`	The graph diffusion operator (Markov transition matrix).

Notes

Populates the Kernel.D_inv_sqrt_ attribute when symmetric=True.

shortest_paths ¶

shortest_paths(
    landmark=False, indices=None, recompute=False
)

Compute the shortest paths (geodesic distances) on the graph.

Notes

This method may require O(n_samples^2) memory and heavy computation if all-pairs distances are materialized. For large datasets, use landmark mode.

Parameters:

Name	Type	Description	Default
`landmark`	`bool`	If True, the shortest paths are computed between all pairs of landmarks, rather than all sample nodes.	`False`
`indices`	`list of int`	If None, the shortest paths are computed between all pairs of nodes. Else, the shortest paths are computed between all pairs of nodes and nodes with specified indices.	`None`

Returns:

Name	Type	Description
`D`	`(ndarray, shape(n_samples, n_samples))`	The shortest paths matrix. Unreachable nodes evaluate to infinity.

get_indices_distances ¶

get_indices_distances(n_neighbors=None, kernel=True)

Return per-row neighbor indices and distances from the kernel or kNN graph.

Returns:

Type	Description
`indices, distances : tuple of ndarray`	The extracted arrays.

impute ¶

impute(Y=None, t=None, threshold=0.01, tmax=10)

Impute data Y using the diffusion operator built from X.

Although the idea behind this is far older, it was first reported in single-cell genomics by the Krishnaswamy lab in the MAGIC (Markov Affinity-based Graph Imputation of Cells) manuscript

Parameters:

Name	Type	Description	Default
`Y`	`ndarray`	The input data to impute. If None, the input data X is imputed (if it was cached by setting the `cache_input` parameter to True). Otherwise, you'll have to specify it as Y.	`None`
`t`	`int`	The number of steps to perform during diffusion. The default `None` iterates until the Procrustes disparity value is below the `threshold` parameter.	`None`
`threshold`	`float`	The threshold value for the Procrustes disparity when finding an optimal `t`.	`0.01`
`tmax`	`int`	The maximum number of steps to perform during diffusion when estimating an optimal `t`.	`10`

Returns:

Name	Type	Description
`Y_imp`	`ndarray`	The imputed data.

filter ¶

filter(
    signal,
    replicates=None,
    beta=50,
    target=None,
    filterfunc=None,
    offset=0,
    order=1,
    solver="chebyshev",
    chebyshev_order=100,
)

Estimate per-sample density over the graph by filtering a signal.

Inspired by MELD for sample-associated density estimation. However, you can naturally use this for any signal in your data, not just samples of specific conditions. In practice, this is just a simple PyGSP filter on a graph. Indeed, it calls PyGSP, so you'll need it installed to use this function.

Parameters:

Name	Type	Description	Default
`signal`		Signal(s) to filter - usually sample labels.	required
`replicates`		Replicate labels for each sample. If None, no replicates are assumed.	`None`
`beta`	`int`	Amount of smoothing to apply. Vary this parameter to get good estimates - this can vary widely from dataset to dataset.	`50`
`target`	`array - like`	Similarity matrix to use for filtering. If None, uses the kernel matrix.	`None`
`filterfunc`	`function`	Function to use for filtering. If None, the default is to use a Laplacian filter.	`None`
`offset`		Amount to shift the filter in the eigenvalue spectrum. Recommended to use an eigenvalue from the graph based on the spectral distribution. Should be in interval [0,1]	`0`
`order`		Falloff and smoothness of the filter. High order leads to square-like filters.	`1`
`solver`	`string`	Method to solve convex problem. If 'chebyshev', uses a chebyshev polynomial approximation of the corresponding filter. Else, if 'exact', uses the eigenvalue solution to the problem	`'chebyshev'`
`chebyshev_order`	`int`	Order of chebyshev approximation to use.	`100`

Returns:

Name	Type	Description
`densities`	`DataFrame`	The filtered sample densities.

is_connected ¶

is_connected()

Check if the graph is connected (cached).

A graph is connected if and only if there exists a (directed) path between any two vertices.

Returns:

Name	Type	Description
`connected`	`bool`	True if the graph is connected, False otherwise.

topo.spectral.eigen.EigenDecomposition ¶

EigenDecomposition(
    n_components=10,
    method="DM",
    eigensolver="auto",
    eigen_tol=0.0001,
    drop_first=True,
    laplacian_type="random_walk",
    anisotropy=1,
    t=1,
    random_state=None,
    return_evals=False,
    estimate_eigengap=True,
    verbose=False,
)

Transformer for eigendecomposing kernels and spectral operators.

Use this when you already have a kernel, Laplacian, adjacency/affinity matrix, or diffusion operator and want spectral coordinates. The numerical eigendecomposition is delegated to SciPy:

dense inputs use scipy.linalg.eigh when eigensolver="dense" or when eigensolver="auto" receives a dense matrix;
sparse inputs use scipy.sparse.linalg.eigsh when eigensolver="arpack" or when eigensolver="auto" receives a sparse matrix.

The main input can be either a topo.tpgraph.Kernel object or a square matrix. For Kernel inputs, the transformer reuses the fitted kernel's stored operators when possible.

Parameters:

Name	Type	Description	Default
`n_components`	`int`	Number of non-trivial components to return. One extra eigenpair is computed internally so the trivial first component can be dropped when `drop_first=True`.	`10`
`method`	`(top, bottom, msDM, DM, LE)`	Method for organizing the eigendecomposition. `'top'`: compute the largest eigenpairs of the input matrix/operator. `'bottom'`: compute the smallest eigenpairs of the input matrix/operator. `'DM'`: compute Diffusion Maps coordinates from a diffusion operator. If a `Kernel` object is provided, its fitted diffusion operator is reused when available. `'msDM'`: compute multiscale Diffusion Maps coordinates by weighting diffusion components by `lambda / (1 - lambda)`. `'LE'`: compute Laplacian Eigenmaps coordinates from a graph Laplacian. If a `Kernel` object is provided, its fitted Laplacian is reused when available.	`'top'`
`eigensolver`	`(auto, dense, arpack)`	Solver policy. `'auto'`: use dense eigendecomposition for dense inputs and ARPACK for sparse inputs. `'dense'`: use `scipy.linalg.eigh` on a dense array. Avoid for large graphs. `'arpack'`: use `scipy.sparse.linalg.eigsh` for partial sparse eigendecomposition.	`'auto'`
`eigen_tol`	`float`	Tolerance passed to `scipy.sparse.linalg.eigsh`. Ignored by the dense path.	`1e-4`
`drop_first`	`bool`	Whether to drop the first eigenpair, typically the trivial component.	`True`
`laplacian_type`	`str`	Laplacian type used when `method='LE'` and the input is a matrix. Common values are `'normalized'`, `'unnormalized'`, and `'random_walk'`.	`'random_walk'`
`anisotropy`	`float`	Diffusion-maps anisotropy parameter, usually denoted `alpha`.	`1`
`t`	`int`	Diffusion time used when `method='DM'`. Ignored by other methods.	`1`
`random_state`	`optional`	Accepted for API compatibility. The simplified eigensolver path does not use randomness.	`None`
`return_evals`	`bool`	Whether `results()` should return eigenvalues along with the representation.	`False`
`estimate_eigengap`	`bool`	Whether to store a simple eigengap estimate after fitting.	`True`
`verbose`	`bool`	Accepted for API compatibility. The simplified eigensolver path does not emit verbose solver diagnostics.	`False`

fit ¶

fit(X)

Compute the eigendecomposition of kernel matrix X per method.

Parameters:

Name	Type	Description	Default
`X`	`(array - like, shape(n_samples, n_samples))`	Matrix to be decomposed. Should generally be an adjacency, affinity/kernel/similarity, Laplacian matrix or a diffusion-type operator.	required

Returns:

Name	Type	Description
`self`	`EigenDecomposition`	The fitted instance itself. Eigenvectors and eigenvalues are stored as attributes. If method is 'DM' or 'msDM', the diffusion operator is cached as well.

rescale ¶

rescale(use_eigs=50)

Re-compute the msDM embedding using a different number of eigenvectors.

Parameters:

Name	Type	Description	Default
`use_eigs`	`int`	Number of eigenvectors to include in the embedding (must be ≤ the number retained during `fit`).	`50`

Returns:

Name	Type	Description
`self`	`EigenDecomposition`	The modified instance with the scaled embedding updated.

results ¶

results(return_evals=None)

Return the fitted spectral representation.

For DM and msDM, this computes the embedding from stored eigenpairs if needed. For LE, top, and bottom, it returns the fitted eigenvectors.

transform ¶

transform(X=None)

Return the current representation.

X is ignored; the representation is computed from the eigenpairs stored during fit. Present for scikit-learn compatibility.

Parameters:

Name	Type	Description	Default
`X`	`None`	Ignored.	`None`

Returns:

Name	Type	Description
`embedding`	`ndarray of shape (n_samples, n_components)`	The computed spectral representation.

fit_transform ¶

fit_transform(X)

Fit the model on X and return the resulting representation.

topo.spectral.eigen.eigendecompose ¶

eigendecompose(
    G: ArrayLike | spmatrix,
    n_components: int = 8,
    eigensolver: str = "auto",
    largest: bool = True,
    eigen_tol: float = 0.0001,
    random_state=None,
    verbose: bool = False,
) -> tuple[NDArray[np.float64], NDArray[np.float64]]

Return sorted eigenpairs of a square symmetric matrix/operator.

Dense inputs use scipy.linalg.eigh. Sparse inputs use scipy.sparse.linalg.eigsh unless eigensolver="dense" is requested.

random_state and verbose are accepted for compatibility and ignored.

topo.spectral.LE ¶

LE(
    W: ArrayLike | spmatrix,
    n_eigs: int = 10,
    laplacian_type: str = "random_walk",
    drop_first: bool = True,
    return_evals: bool = False,
    eigen_tol: float = 0.0,
    random_state=None,
)

Compute a Laplacian Eigenmaps embedding from an affinity graph.

Laplacian Eigenmaps were introduced by Belkin and Niyogi as a spectral embedding method based on eigenvectors of a graph Laplacian: https://www2.imm.dtu.dk/projects/manifold/Papers/Laplacian.pdf

Parameters:

Name	Type	Description	Default
`W`	`ArrayLike \| spmatrix`	Dense or sparse graph adjacency/affinity matrix. `W` must be square. No symmetrization is performed here; callers should pass a symmetric affinity matrix when using a symmetric Laplacian.	required
`n_eigs`	`int`	Number of non-trivial eigenvectors to return.	`10`
`laplacian_type`	`str`	Laplacian type understood by `topo._compat.scipy_graph.graph_laplacian`. Common values are `"unnormalized"`, `"normalized"`, and `"random_walk"`.	`'random_walk'`
`drop_first`	`bool`	Whether to drop the first eigenvector, which is typically the trivial constant mode for connected graphs.	`True`
`return_evals`	`bool`	If `True`, return `(eigenvectors, eigenvalues)`.	`False`
`eigen_tol`	`float`	Tolerance passed to `scipy.sparse.linalg.eigsh`.	`0.0`
`random_state`		Accepted for API compatibility. The current implementation uses ARPACK through `eigsh` and does not use randomness.	`None`

Returns:

Type	Description
`ndarray or tuple[ndarray, ndarray]`	Eigenvectors sorted by ascending eigenvalue. If `return_evals=True`, also returns the corresponding eigenvalues.

Raises:

Type	Description
`ValueError`	If `W` is not square or if the requested number of eigenvectors is invalid.
`RuntimeError`	If ARPACK fails to compute the requested eigendecomposition.

topo.spectral.graph_laplacian ¶

graph_laplacian(
    graph,
    *,
    laplacian_type: str = "normalized",
    return_D: bool = False,
)

Return graph Laplacian with package-standard laplacian_type names.

Supported types:

"unnormalized": D - W, delegated to scipy.sparse.csgraph.laplacian.
"normalized" / "symmetric": I - D^-1/2 W D^-1/2, delegated to SciPy.
"random_walk" / "rw": I - D^-1 W, implemented here because SciPy's normalized Laplacian is the symmetric normalized Laplacian.

Isolated nodes are handled without inf/nan values.

topo.spectral.diffusion_operator ¶

diffusion_operator(
    W: ArrayLike | spmatrix,
    alpha: float = 1.0,
    symmetric: bool = False,
    semi_aniso: bool = False,
    *,
    return_D_inv_sqrt: bool = False,
) -> csr_matrix | tuple[csr_matrix, csr_matrix]

Compute a sparse diffusion-map operator from an affinity graph.

This implements the anisotropic normalization used in diffusion maps, following Coifman and Lafon: https://doi.org/10.1016/j.acha.2006.04.006

Parameters:

Name	Type	Description	Default
`W`	`ArrayLike \| spmatrix`	Dense or sparse graph adjacency/affinity matrix. `W` must be square. No symmetrization is performed here.	required
`alpha`	`float`	Diffusion-maps anisotropy parameter. For `alpha > 0`, the affinity is reweighted by `D^-alpha W D^-alpha` before row normalization. Negative values are treated as `0`.	`1.0`
`symmetric`	`bool`	If `True`, return the symmetric conjugate form of the diffusion operator. This is useful when downstream eigensolvers require a symmetric operator. The symmetric and row-stochastic forms are related by a diagonal similarity transform under the usual diffusion-map construction.	`False`
`semi_aniso`	`bool`	If `True`, compute the density correction from the anisotropically reweighted affinity but apply the resulting row normalization to the original affinity matrix.	`False`
`return_D_inv_sqrt`	`bool`	If `True`, also return the inverse square-root degree matrix used to construct the symmetric operator. This option requires `symmetric=True`.	`False`

Returns:

Type	Description
`csr_matrix or tuple[csr_matrix, csr_matrix]`	The diffusion operator. If `return_D_inv_sqrt=True`, returns `(P_symmetric, D_inv_sqrt)`.

Notes

The return type is always sparse CSR, regardless of the input format.

topo.layouts.projector.Projector ¶

Projector(
    n_components=2,
    projection_method="MAP",
    metric="euclidean",
    n_neighbors=10,
    n_jobs=1,
    landmarks=None,
    landmark_method="kmeans",
    num_iters=800,
    init: str | ndarray = "spectral",
    nbrs_backend="hnswlib",
    keep_estimator=False,
    random_state=None,
    verbose=False,
    save_every=None,
    save_limit=None,
    save_callback=None,
    include_init_snapshot=True,
)

Bases: BaseEstimator, TransformerMixin

Scikit-learn compatible class that handles all projection methods.

Use this when you already have a spectral representation or graph and only want the final 2-D/3-D layout. Ideally, it takes in either a orthonormal eigenbasis or a graph kernel learned from such an eigenbasis. It is included in TopoMetry to allow custom TopOGraph-like pipelines (projection is the final step).

Parameters:

Name	Type	Description	Default
`n_components`	`int`	Number of dimensions to optimize the layout to. Usually 2 or 3 if you're into visualizing data.	`2`
`projection_method`	`str`	Which projection method to use. `UMAP` delegates to `umap-learn`; `MAP` is TopoMetry's graph-layout optimizer (optional initial/final snapshot capture). Current options are: * 'Isomap' - one of the first manifold learning methods * 't-SNE' - a classic manifold learning method * 'MAP' - graph-layout optimization with initial/final snapshot capture * 'UMAP' - upstream `umap-learn` estimator * 'PaCMAP' (Pairwise-controlled Manifold Approximation and Projection) - for balanced visualizations * 'TriMAP' - dimensionality reduction using triplets * 'IsomorphicMDE' - MDE with preservation of nearest neighbors * 'IsometricMDE' - MDE with preservation of pairwise distances * 'NCVis' (Noise Contrastive Visualization) - a UMAP-like method with blazing fast performance These are frankly quite direct to add, so feel free to make a feature request if your favorite method is not listed here.	`'Isomap'`
`metric`	`str`	The metric to use when computing distances. Possible values are: 'cosine', 'euclidean' and others. Accepts precomputed distances ('precomputed').	`'euclidean'`
`n_neighbors`	`int`	The number of neighbors to use when computing the kernel matrix. Ignored if `pairwise` set to `True`.	`10`
`landmarks`	`int or ndarray`	If passed as `int`, will obtain the number of landmarks. If passed as `np.ndarray`, will use the specified indexes in the array. Any value other than `None` will result in only the specified landmarks being used in the layout optimization, and will populate the Projector.landmarks_ slot.	`None`
`landmark_method`	`str`	The method to use for selecting landmarks. If `landmarks` is passed as an `int`, this will be used to select the landmarks. Can be either 'kmeans' or 'random'.	`'kmeans'`
`num_iters`	`int`	Most (if not all) methods optimize the layout up to a limit number of iterations. Use this parameter to set this number.	`1000`
`keep_estimator`	`bool`	Whether to keep the used estimator as Projector.estimator_ after fitting. Useful if you want to use it later (e.g. UMAP allows inverse transforms and out-of-sample mapping).	`False`

fit ¶

fit(
    X: ndarray | csr_matrix | Kernel, **kwargs: Any
) -> Projector

Run the desired projection method on the specified data.

Parameters:

Name	Type	Description	Default
`X`	`array-like, shape (n_samples, n_features) or topo.Kernel() class.`	The set of points to compute the kernel matrix for. Accepts np.ndarrays and scipy.sparse matrices or a `topo.Kernel()` object. If precomputed, assumed to be a square symmetric semidefinite matrix.	required
`**kwargs`	`Any`	Additional keyword arguments for the desired projection method.	`{}`

Returns:

Name	Type	Description
`self`	`Projector`	The fitted Projector instance with populated `Y_` attribute.

transform ¶

transform(
    X: ndarray | csr_matrix | None = None,
) -> np.ndarray

Return the projection, using the backend's transform when available.

If the desired method does not have a transform method, returns the results from the fit method.

Returns:

Name	Type	Description
`Y`	`ndarray of shape (n_samples, n_components)`	Projection results.

fit_transform ¶

fit_transform(X, y=None, **kwargs) -> np.ndarray

Fit the projection and return the embedding.

If the desired method does not have a fit_transform method, returns the results from the fit method.

Parameters:

Name	Type	Description	Default
`X`	`(array - like or Kernel, shape(n_samples, n_features))`	The input data to fit and transform.	required
`y`	`None`	Ignored.	`None`
`**kwargs`		Additional arguments passed to the underlying backend.	`{}`

Returns:

Name	Type	Description
`Y`	`ndarray of shape (n_samples, n_components)`	Projection results.

topo.tpgraph.intrinsic_dim.IntrinsicDim ¶

IntrinsicDim(
    methods=["fsa", "mle"],
    k=[10, 20, 50, 75, 100],
    backend="hnswlib",
    metric="euclidean",
    n_jobs=-1,
    plot=True,
    random_state=None,
    **kwargs,
)

Bases: BaseEstimator, TransformerMixin

Scikit-learn flavored estimator of intrinsic dimensionality.

Use this to estimate manifold dimensionality or inspect local geometric complexity. This class iterates over a range of possible values of k-nearest-neighbors to consider in calculations using two different methods: the Farahmand-Szepesvári-Audibert (FSA) dimension estimator and the Maximum Likelihood Estimator (MLE).

Parameters:

Name	Type	Description	Default
`methods`	`list of str, (default ['fsa'])`	The dimensionality estimation methods to use. Current options are 'fsa' () and 'mle'().	`['fsa', 'mle']`
`k`	`int, range or list of ints, (default [10, 20, 50, 75, 100])`	The number of nearest neighbors to use for the dimensionality estimation methods. If a single value of `k` is provided, then the result dictionary will have keys corresponding to the methods, and values corresponding to the dimensionality estimates. If multiple values of `k` are provided, then the result dictionary will have keys corresponding to the number of k, and values corresponding to other dictionaries, which have keys corresponding to the methods, and values corresponding to the dimensionality estimates.	`[10, 20, 50, 75, 100]`
`metric`	`str (default 'euclidean')`	The metric to use when calculating distance between instances in a feature array.	`'euclidean'`
`backend`	`str`	Which backend to use for k-nearest-neighbor computations. Defaults to 'hnswlib'. Options are 'hnswlib' and 'sklearn'.	`'hnswlib'`
`n_jobs`	`int`	The number of jobs to use for parallel computations. If -1, all CPUs are used. Parallellization (multiprocessing) is *highly* recommended whenever possible.	`1`
`plot`	`bool`	Whether to plot the results when using the `fit()` method.	`True`
`random_state`	`int or numpy.random.RandomState() (optional`	A pseudo random number generator. Used for generating colors for plotting.	`None).`
`**kwargs`	`keyword arguments`	Additional keyword arguments to pass to the backend kNN estimator.	`{}`

Properties

local_id, global_id : dictionaries containing local and global dimensionality estimates, respectivelly.

Their structure depends on the value of the `k` parameter:

* If a single value of `k` is provided, then the dictionaries will have
keys corresponding to the methods, and values corresponding to the
dimensionality estimates.

* If multiple values of `k` are provided, then the dictionaries will have
keys corresponding to the number of k, and values corresponding to other dictionaries,
which have keys corresponding to the methods, and values corresponding to the
dimensionality estimates.

plot_id ¶

plot_id(
    bins=30,
    figsize=(6, 8),
    titlesize=22,
    labelsize=16,
    legendsize=10,
)

Plot histograms of the estimated local intrinsic dimensions.

fit ¶

fit(X, **kwargs)

Estimates the intrinsic dimensionalities of the data.

Parameters:

Name	Type	Description	Default
`X`	`(array - like, shape(n_samples, n_features))`	The set of points to compute the kernel matrix for. Accepts np.ndarrays and scipy.sparse matrices. If precomputed, assumed to be a square symmetric semidefinite matrix of k-nearest-neighbors, with its k higher or equal the highest value of k used to estimate the intrinsic dimensionalities.	required
`**kwargs`		Additional keyword arguments to pass to the plotting function `IntrinsicDim.plot_id()`.	`{}`

Returns:

Type	Description
Populates the `local` and `global` properties of the class.
Shows a plot of the results if `plot=True`.

transform ¶

transform(X=None)

No-op transform kept for scikit-learn API compatibility.

IntrinsicDim exposes its estimates via attributes (local_id, global_id); this method returns self unchanged.

Returns:

Name	Type	Description
`self`	`IntrinsicDim`	The estimator itself.

topo.tpgraph.intrinsic_dim.automated_scaffold_sizing ¶

automated_scaffold_sizing(
    X,
    method: str = "fsa",
    ks=(15, 30, 60),
    backend="hnswlib",
    metric="euclidean",
    n_jobs: int = -1,
    quantile: float = 0.99,
    min_components: int = 16,
    max_components: int = 512,
    headroom: float = 0.15,
    random_state=None,
    use_median: bool = False,
    return_details: bool = False,
    **knn_kwargs,
)

Unified automated scaffold sizing.

method='fsa': - Compute local FSA i.d. for each k in ks, take per-cell median across ks, then take the upper quantile across cells, add headroom, clamp to bounds.

method='mle': - Use a single neighborhood size k (if ks is an int; if iterable, use max(ks)). - Compute local MLE i.d. at k, then global i.d. via: * median of locals if use_median=True * Levina–Bickel harmonic-mean estimator (mle_global) otherwise. - Add headroom, clamp to bounds.

Returns:

Name	Type	Description
`n_components`	`int`
`details`	`dict (if return_details=True)`

topo.layouts.diagnostics.find_ideal_projection ¶

find_ideal_projection(
    tg: Any,
    min_dist_grid: Iterable[float] | None = None,
    spread_grid: Iterable[float] | None = None,
    initial_alpha_grid: Iterable[float] | None = None,
    *,
    multiscale: bool = True,
    num_iters: int = 600,
    save_every: int = 10,
    metric: str = "euclidean",
    n_neighbors: int = 30,
    times: Sequence[int] = (1, 2, 4),
    r: int = 32,
    k_for_pf1: int | None = None,
    symmetric_hint: bool = True,
) -> dict[str, Any]

Grid-search MAP hyperparameters for a fitted TopOGraph-like object.

The object must expose: - project(...) - P_X_ - _backend_resolved - _n_jobs_effective - verbosity - msTopoMAP_snapshots / TopoMAP_snapshots

This function does not rerun the best projection automatically. It returns the best parameters and score records; the caller can decide whether to run the final projection.

topo.layouts.diagnostics.run_best_projection ¶

run_best_projection(
    tg: Any,
    params: dict[str, float],
    *,
    multiscale: bool = True,
    num_iters: int = 600,
    save_every: int = 10,
) -> np.ndarray

Run MAP once using selected hyperparameters.

Advanced Estimators (Custom Pipelines)¶

topo.base.ann.kNN ¶

topo.tpgraph.cknn.cknn_graph ¶

topo.tpgraph.cknn.cknn_ratio_matrix ¶

topo.tpgraph.kernels.compute_kernel ¶

topo.tpgraph.kernels.Kernel ¶

knn property ¶

K property ¶

A property ¶

degree property ¶

weighted_degree property ¶

L property ¶

P property ¶

SP property ¶

fit ¶

transform ¶

fit_transform ¶

adjacency ¶

laplacian ¶

diff_op ¶

shortest_paths ¶

get_indices_distances ¶

impute ¶

filter ¶

is_connected ¶

topo.spectral.eigen.EigenDecomposition ¶

fit ¶

rescale ¶

results ¶

transform ¶

fit_transform ¶

topo.spectral.eigen.eigendecompose ¶

topo.spectral.LE ¶

topo.spectral.graph_laplacian ¶

topo.spectral.diffusion_operator ¶

topo.layouts.projector.Projector ¶

fit ¶

transform ¶

fit_transform ¶

topo.tpgraph.intrinsic_dim.IntrinsicDim ¶

plot_id ¶

fit ¶

transform ¶

topo.tpgraph.intrinsic_dim.automated_scaffold_sizing ¶

topo.layouts.diagnostics.find_ideal_projection ¶

topo.layouts.diagnostics.run_best_projection ¶

knn `property` ¶

K `property` ¶

A `property` ¶

degree `property` ¶

weighted_degree `property` ¶

L `property` ¶

P `property` ¶

SP `property` ¶