API Documentation¶
Scalable Manifold learning utilities and algorithms.
Graphs are represented with their weighted adjacency matrices, preferably using sparse matrices.
A note on symmetrization and internal sparse representations¶
For performance, this code uses the FLANN library to compute approximate neighborhoods efficiently. This means that (1) the adjacency matrix produced is NOT GUARANTEED to be symmetric and (2) compute_adjacency_matrix returns a sparse matrix called adjacency_matrix. adjacency_matrix has 0.0 on the diagonal, as it should. Implicitly, the missing entries are infinity not 0 for this matrix. But (1) and (2) mean that if one tries to symmetrize adjacency_matrix, the scipy.sparse code eliminates the 0.0 entries from adjacency_matrix hence in the affinity matrix we explicitly set the diagonal to 1.0 for sparse matrices.
 We adopted the following convention:
 adjacency_matrix will NOT BE GUARANTEED symmetric
 affinity_matrix will perform a symmetrization by default
 laplacian performs symmetrization only if symmetrize_input=True (the default setting), and DOES NOT check symmetry
 these conventions are the same for dense matrices, for consistency

class
megaman.geometry.geometry.
Geometry
(adjacency_method='auto', adjacency_kwds=None, affinity_method='auto', affinity_kwds=None, laplacian_method='auto', laplacian_kwds=None, **kwargs)[source]¶ The Geometry class stores the data, distance, affinity and laplacian matrices used by the various embedding methods and is the primary object passed to embedding functions.
The Geometry class contains functions to compute the aforementioned matrices and allows for recomputation whenever necessary.
Parameters: adjacency_method : string {‘auto’, ‘brute’, ‘pyflann’, ‘cyflann’}
method for computing pairwise radius neighbors graph.
adjacency_kwds : dict
dictionary containing keyword arguments for adjacency matrix. see distance.py docmuentation for arguments for each method. If new kwargs are passed to compute_adjacency_matrix then this dictionary will be updated.
affinity_method : string {‘auto’, ‘gaussian’}
method of computing affinity matrix
affinity_kwds : dict
dictionary containing keyword arguments for affinity matrix. see affinity.py docmuentation for arguments for each method. If new kwargs are passed to compute_affinity_matrix then this dictionary will be updated.
laplacian_method : string,
type of laplacian to be computed. Possibilities are {‘symmetricnormalized’, ‘geometric’, ‘renormalized’, ‘unnormalized’, ‘randomwalk’} see laplacian.py for more information.
laplacian_kwds : dict
dictionary containing keyword arguments for Laplacian matrix. see laplacian.py docmuentation for arguments for each method. If new kwargs are passed to compute_laplacian_matrix then this dictionary will be updated.
**kwargs :
additional arguments will be parsed and used to override values in the above dictionaries. For example:  affinity_radius will override affinity_kwds[‘radius’]  adjacency_n_neighbors will override adjacency_kwds[‘n_neighbors’] etc.

compute_adjacency_matrix
(copy=False, **kwargs)[source]¶ This function will compute the adjacency matrix. In order to acquire the existing adjacency matrix use self.adjacency_matrix as comptute_adjacency_matrix() will recompute the adjacency matrix.
Parameters: copy : boolean, whether to return a copied version of the adjacency matrix
**kwargs : see distance.py docmuentation for arguments for each method.
Returns: self.adjacency_matrix : sparse matrix (N_obs, N_obs)
Non explicit 0.0 values should be considered not connected.

compute_affinity_matrix
(copy=False, **kwargs)[source]¶ This function will compute the affinity matrix. In order to acquire the existing affinity matrix use self.affinity_matrix as comptute_affinity_matrix() will recompute the affinity matrix.
Parameters: copy : boolean
whether to return a copied version of the affinity matrix
**kwargs :
see affinity.py docmuentation for arguments for each method.
Returns: self.affinity_matrix : sparse matrix (N_obs, N_obs)
contains the pairwise affinity values using the Guassian kernel and bandwidth equal to the affinity_radius

compute_laplacian_matrix
(copy=True, return_lapsym=False, **kwargs)[source]¶  Note: this function will compute the laplacian matrix. In order to acquire
 the existing laplacian matrix use self.laplacian_matrix as comptute_laplacian_matrix() will recompute the laplacian matrix.
Parameters: copy : boolean, whether to return copied version of the self.laplacian_matrix
return_lapsym : boolean, if True returns additionally the symmetrized version of
the requested laplacian and the renormalization weights.
**kwargs : see laplacian.py docmuentation for arguments for each method.
Returns: self.laplacian_matrix : sparse matrix (N_obs, N_obs).
The requested laplacian.
self.laplacian_symmetric : sparse matrix (N_obs, N_obs)
The symmetric laplacian.
self.laplacian_weights : ndarray (N_obs,)
The renormalization weights used to make laplacian_matrix from laplacian_symmetric

set_adjacency_matrix
(adjacency_mat)[source]¶ Parameters: adjacency_mat : sparse matrix (N_obs, N_obs)
The adjacency matrix to input.

set_affinity_matrix
(affinity_mat)[source]¶ Parameters: affinity_mat : sparse matrix (N_obs, N_obs).
The adjacency matrix to input.

set_data_matrix
(X)[source]¶ Parameters: X : ndarray (N_obs, N_features)
The original data set to input.

set_laplacian_matrix
(laplacian_mat)[source]¶ Parameters: laplacian_mat : sparse matrix (N_obs, N_obs).
The Laplacian matrix to input.

set_radius
(radius, override=True, X=None, n_components=2)[source]¶ Set the radius for the adjacency and affinity computation
By default, this will override keyword arguments provided on initialization.
Parameters: radius : float
radius to set for adjacency and affinity.
override : bool (default: True)
if False, then only set radius if not already defined in adjacency_args and affinity_args.
X : ndarray or sparse (optional)
if provided, estimate a suitable radius from this data.
n_components : int (default=2)
the number of components to use when estimating the radius

Riemannian Metric learning utilities and algorithms.
To use the “geometric” Laplacian from geometry.py for statistically consistent results.

class
megaman.geometry.rmetric.
RiemannMetric
(Y, laplacian, n_dim=None, mode_inv='svd')[source]¶ RiemannMetric computes and stores the Riemannian metric and its dual associated with an embedding Y. The Riemannian metric is currently denoted by G, its dual by H, and the Laplacian by L. G at each point is the matrix inverse of H.
For performance, the following choices have been made: * the class makes no defensive copies of L, Y * no defensive copies of the array attributes H, G, Hvv, .... * G is computed on request only In the future, this class will be extended to compute H only once, for mdimY dimensions, but to store multiple G’s, with different dimensions.
In the near future plans is also a “lazy” implementation, which will compute G (and maybe even H) only at the requested points.
Parameters: Y : embedding coordinates, shape = (n, mdimY)
laplacian : estimated laplacian from data shape = (n, n)
n_dim : the manifold domension
mod_inv : if mode_inv = svd, also returns Hvv, Hsvals,
Gsvals the (transposed) eigenvectors of H and the singular values of H and G
Returns: mdimG : dimension of G, H
mdimY : dimension of Y
H : dual Riemann metric, shape = (n, mdimY, mdimY)
G : Riemann metric, shape = (n, mdimG, mdimG)
Hvv : (transposed) singular vectors of H, shape = (n, mdimY, mdimY)
Hsvals : singular values of H, shape = (n, mdimY)
Gsvals : singular values of G, shape = (n, mdimG)
detG : determinants of G, shape = (n,1)
Notes
H is always computed at full dimension self.mdimY G is computed at mdimG (some contradictions persist here)
References
“Nonlinear dimensionality reduction: Riemannian metric estimation and the problem of geometric discovery”, Dominique PerraulJoncas, Marina Meila, arXiv:1305.7255

megaman.geometry.rmetric.
compute_G_from_H
(H, mdimG=None, mode_inv='svd')[source]¶ Parameters: H : the inverse R. Metric
if mode_inv == ‘svd’:
also returns Hvv, Hsvals, Gsvals the (transposed) eigenvectors of H and the singular values of H and G
if mdimG < H.shape[2]:
G.shape = [ n_samples, mdimG, mdimG ] with n_samples = H.shape[0]
Notes
currently Hvv, Hsvals are n_dim = H.shape[2], and Gsvals, G are mdimG (This contradicts the documentation of riemann_metric which states that riemann_metric and h_dual_metric have the same dimensions)
See the notes in RiemannMetric

megaman.geometry.rmetric.
riemann_metric
(Y, laplacian=None, n_dim=None, invert_h=False, mode_inv='svd')[source]¶ Parameters: Y: arraylike, shape = (n_samples, mdimY )
The embedding coordinates of the points
laplacian: arraylike, shape = (n_samples, n_samples)
The Laplacian of the data. It is recommended to use the “geometric” Laplacian (default) option from geometry.graph_laplacian()
n_dim : integer, optional
Use only the first n_dim <= mdimY dimensions.All dimensions n_dim:mdimY are ignored.
invert_h: boolean, optional
if False, only the “dual Riemannian metric” is computed if True, the dual metric matrices are inverted to obtain the Riemannian metric G.
mode_inv: string, optional
 How to compute the inverses of h_dual_metric, if invert_h
“inv”, use numpy.inv() “svd” (default), use numpy.linalg.svd(), then invert the eigenvalues (possibly a more numerically stable method with H is symmetric and ill conditioned)
Returns: h_dual_metric : array, shape=(n_samples, n_dim, n_dim)
Optionally :
g_riemann_metric : array, shape=(n_samples, n_dim, n_dim )
Hvv : singular vectors of H, transposed, shape = ( n_samples, n_dim, n_dim )
Hsvals : singular values of H, shape = ( n_samples, n_dim )
Gsvals : singular values of G, shape = ( n_samples, n_dim )
Notes
References
“Nonlinear dimensionality reduction: Riemannian metric estimation and the problem of geometric discovery”, Dominique PerraulJoncas, Marina Meila, arXiv:1305.7255

megaman.geometry.rmetric.
riemann_metric_lazy
(Y, sample, laplacian, n_dim, invert_h=False, mode_inv='svd')[source]¶ Parameters: Y: arraylike, shape = (n_samples, mdimY )
The embedding coordinates of the points
sample : arraylike, shape (n_samples)
array of indices over which to calculate the riemannian metric.
laplacian: arraylike, shape = (n_samples, n_samples)
The Laplacian of the data. It is recommended to use the “geometric” Laplacian (default) option from geometry.graph_laplacian()
n_dim : integer, optional
Use only the first n_dim <= mdimY dimensions.All dimensions n_dim:mdimY are ignored.
invert_h: boolean, optional
if False, only the “dual Riemannian metric” is computed if True, the dual metric matrices are inverted to obtain the Riemannian metric G.
mode_inv: string, optional
 How to compute the inverses of h_dual_metric, if invert_h
“inv”, use numpy.inv() “svd” (default), use numpy.linalg.svd(), then invert the eigenvalues (possibly a more numerically stable method with H is symmetric and ill conditioned)
Returns: h_dual_metric : array, shape=(n_samples, n_dim, n_dim)
Optionally :
g_riemann_metric : array, shape=(n_samples, n_dim, n_dim )
Hvv : singular vectors of H, transposed, shape = ( n_samples, n_dim, n_dim )
Hsvals : singular values of H, shape = ( n_samples, n_dim )
Gsvals : singular values of G, shape = ( n_samples, n_dim )
Notes
References
“Nonlinear dimensionality reduction: Riemannian metric estimation and the problem of geometric discovery”, Dominique PerraulJoncas, Marina Meila, arXiv:1305.7255