API Documentation

Scalable Manifold learning utilities and algorithms.

Graphs are represented with their weighted adjacency matrices, preferably using sparse matrices.

A note on symmetrization and internal sparse representations

For performance, this code uses the FLANN library to compute approximate neighborhoods efficiently. This means that (1) the adjacency matrix produced is NOT GUARANTEED to be symmetric and (2) compute_adjacency_matrix returns a sparse matrix called adjacency_matrix. adjacency_matrix has 0.0 on the diagonal, as it should. Implicitly, the missing entries are infinity not 0 for this matrix. But (1) and (2) mean that if one tries to symmetrize adjacency_matrix, the scipy.sparse code eliminates the 0.0 entries from adjacency_matrix hence in the affinity matrix we explicitly set the diagonal to 1.0 for sparse matrices.

We adopted the following convention:
  • adjacency_matrix will NOT BE GUARANTEED symmetric
  • affinity_matrix will perform a symmetrization by default
  • laplacian performs symmetrization only if symmetrize_input=True (the default setting), and DOES NOT check symmetry
  • these conventions are the same for dense matrices, for consistency
class megaman.geometry.geometry.Geometry(adjacency_method='auto', adjacency_kwds=None, affinity_method='auto', affinity_kwds=None, laplacian_method='auto', laplacian_kwds=None, **kwargs)[source]

The Geometry class stores the data, distance, affinity and laplacian matrices used by the various embedding methods and is the primary object passed to embedding functions.

The Geometry class contains functions to compute the aforementioned matrices and allows for re-computation whenever necessary.

Parameters:

adjacency_method : string {‘auto’, ‘brute’, ‘pyflann’, ‘cyflann’}

method for computing pairwise radius neighbors graph.

adjacency_kwds : dict

dictionary containing keyword arguments for adjacency matrix. see distance.py docmuentation for arguments for each method. If new kwargs are passed to compute_adjacency_matrix then this dictionary will be updated.

affinity_method : string {‘auto’, ‘gaussian’}

method of computing affinity matrix

affinity_kwds : dict

dictionary containing keyword arguments for affinity matrix. see affinity.py docmuentation for arguments for each method. If new kwargs are passed to compute_affinity_matrix then this dictionary will be updated.

laplacian_method : string,

type of laplacian to be computed. Possibilities are {‘symmetricnormalized’, ‘geometric’, ‘renormalized’, ‘unnormalized’, ‘randomwalk’} see laplacian.py for more information.

laplacian_kwds : dict

dictionary containing keyword arguments for Laplacian matrix. see laplacian.py docmuentation for arguments for each method. If new kwargs are passed to compute_laplacian_matrix then this dictionary will be updated.

**kwargs :

additional arguments will be parsed and used to override values in the above dictionaries. For example: - affinity_radius will override affinity_kwds[‘radius’] - adjacency_n_neighbors will override adjacency_kwds[‘n_neighbors’] etc.

compute_adjacency_matrix(copy=False, **kwargs)[source]

This function will compute the adjacency matrix. In order to acquire the existing adjacency matrix use self.adjacency_matrix as comptute_adjacency_matrix() will re-compute the adjacency matrix.

Parameters:

copy : boolean, whether to return a copied version of the adjacency matrix

**kwargs : see distance.py docmuentation for arguments for each method.

Returns:

self.adjacency_matrix : sparse matrix (N_obs, N_obs)

Non explicit 0.0 values should be considered not connected.

compute_affinity_matrix(copy=False, **kwargs)[source]

This function will compute the affinity matrix. In order to acquire the existing affinity matrix use self.affinity_matrix as comptute_affinity_matrix() will re-compute the affinity matrix.

Parameters:

copy : boolean

whether to return a copied version of the affinity matrix

**kwargs :

see affinity.py docmuentation for arguments for each method.

Returns:

self.affinity_matrix : sparse matrix (N_obs, N_obs)

contains the pairwise affinity values using the Guassian kernel and bandwidth equal to the affinity_radius

compute_laplacian_matrix(copy=True, return_lapsym=False, **kwargs)[source]
Note: this function will compute the laplacian matrix. In order to acquire
the existing laplacian matrix use self.laplacian_matrix as comptute_laplacian_matrix() will re-compute the laplacian matrix.
Parameters:

copy : boolean, whether to return copied version of the self.laplacian_matrix

return_lapsym : boolean, if True returns additionally the symmetrized version of

the requested laplacian and the re-normalization weights.

**kwargs : see laplacian.py docmuentation for arguments for each method.

Returns:

self.laplacian_matrix : sparse matrix (N_obs, N_obs).

The requested laplacian.

self.laplacian_symmetric : sparse matrix (N_obs, N_obs)

The symmetric laplacian.

self.laplacian_weights : ndarray (N_obs,)

The renormalization weights used to make laplacian_matrix from laplacian_symmetric

set_adjacency_matrix(adjacency_mat)[source]
Parameters:

adjacency_mat : sparse matrix (N_obs, N_obs)

The adjacency matrix to input.

set_affinity_matrix(affinity_mat)[source]
Parameters:

affinity_mat : sparse matrix (N_obs, N_obs).

The adjacency matrix to input.

set_data_matrix(X)[source]
Parameters:

X : ndarray (N_obs, N_features)

The original data set to input.

set_laplacian_matrix(laplacian_mat)[source]
Parameters:

laplacian_mat : sparse matrix (N_obs, N_obs).

The Laplacian matrix to input.

set_matrix(X, input_type)[source]

Set the data matrix given the (string) input_type

set_radius(radius, override=True, X=None, n_components=2)[source]

Set the radius for the adjacency and affinity computation

By default, this will override keyword arguments provided on initialization.

Parameters:

radius : float

radius to set for adjacency and affinity.

override : bool (default: True)

if False, then only set radius if not already defined in adjacency_args and affinity_args.

X : ndarray or sparse (optional)

if provided, estimate a suitable radius from this data.

n_components : int (default=2)

the number of components to use when estimating the radius

Riemannian Metric learning utilities and algorithms.

To use the “geometric” Laplacian from geometry.py for statistically consistent results.

class megaman.geometry.rmetric.RiemannMetric(Y, laplacian, n_dim=None, mode_inv='svd')[source]

RiemannMetric computes and stores the Riemannian metric and its dual associated with an embedding Y. The Riemannian metric is currently denoted by G, its dual by H, and the Laplacian by L. G at each point is the matrix inverse of H.

For performance, the following choices have been made: * the class makes no defensive copies of L, Y * no defensive copies of the array attributes H, G, Hvv, .... * G is computed on request only In the future, this class will be extended to compute H only once, for mdimY dimensions, but to store multiple G’s, with different dimensions.

In the near future plans is also a “lazy” implementation, which will compute G (and maybe even H) only at the requested points.

Parameters:

Y : embedding coordinates, shape = (n, mdimY)

laplacian : estimated laplacian from data shape = (n, n)

n_dim : the manifold domension

mod_inv : if mode_inv = svd, also returns Hvv, Hsvals,

Gsvals the (transposed) eigenvectors of H and the singular values of H and G

Returns:

mdimG : dimension of G, H

mdimY : dimension of Y

H : dual Riemann metric, shape = (n, mdimY, mdimY)

G : Riemann metric, shape = (n, mdimG, mdimG)

Hvv : (transposed) singular vectors of H, shape = (n, mdimY, mdimY)

Hsvals : singular values of H, shape = (n, mdimY)

Gsvals : singular values of G, shape = (n, mdimG)

detG : determinants of G, shape = (n,1)

Notes

H is always computed at full dimension self.mdimY G is computed at mdimG (some contradictions persist here)

References

“Non-linear dimensionality reduction: Riemannian metric estimation and the problem of geometric discovery”, Dominique Perraul-Joncas, Marina Meila, arXiv:1305.7255

get_dual_rmetric(invert_h=False, mode_inv='svd')[source]

Compute the dual Riemannian Metric This is not satisfactory, because if mdimG<mdimY the shape of H will not be the same as the shape of G. TODO(maybe): return a (copied) smaller H with only the rows and columns in G.

get_rmetric(mode_inv='svd', return_svd=False)[source]

Compute the Reimannian Metric

megaman.geometry.rmetric.compute_G_from_H(H, mdimG=None, mode_inv='svd')[source]
Parameters:

H : the inverse R. Metric

if mode_inv == ‘svd’:

also returns Hvv, Hsvals, Gsvals the (transposed) eigenvectors of H and the singular values of H and G

if mdimG < H.shape[2]:

G.shape = [ n_samples, mdimG, mdimG ] with n_samples = H.shape[0]

Notes

currently Hvv, Hsvals are n_dim = H.shape[2], and Gsvals, G are mdimG (This contradicts the documentation of riemann_metric which states that riemann_metric and h_dual_metric have the same dimensions)

See the notes in RiemannMetric

megaman.geometry.rmetric.riemann_metric(Y, laplacian=None, n_dim=None, invert_h=False, mode_inv='svd')[source]
Parameters:

Y: array-like, shape = (n_samples, mdimY )

The embedding coordinates of the points

laplacian: array-like, shape = (n_samples, n_samples)

The Laplacian of the data. It is recommended to use the “geometric” Laplacian (default) option from geometry.graph_laplacian()

n_dim : integer, optional

Use only the first n_dim <= mdimY dimensions.All dimensions n_dim:mdimY are ignored.

invert_h: boolean, optional

if False, only the “dual Riemannian metric” is computed if True, the dual metric matrices are inverted to obtain the Riemannian metric G.

mode_inv: string, optional

How to compute the inverses of h_dual_metric, if invert_h

“inv”, use numpy.inv() “svd” (default), use numpy.linalg.svd(), then invert the eigenvalues (possibly a more numerically stable method with H is symmetric and ill conditioned)

Returns:

h_dual_metric : array, shape=(n_samples, n_dim, n_dim)

Optionally :

g_riemann_metric : array, shape=(n_samples, n_dim, n_dim )

Hvv : singular vectors of H, transposed, shape = ( n_samples, n_dim, n_dim )

Hsvals : singular values of H, shape = ( n_samples, n_dim )

Gsvals : singular values of G, shape = ( n_samples, n_dim )

Notes

References

“Non-linear dimensionality reduction: Riemannian metric estimation and the problem of geometric discovery”, Dominique Perraul-Joncas, Marina Meila, arXiv:1305.7255

megaman.geometry.rmetric.riemann_metric_lazy(Y, sample, laplacian, n_dim, invert_h=False, mode_inv='svd')[source]
Parameters:

Y: array-like, shape = (n_samples, mdimY )

The embedding coordinates of the points

sample : array-like, shape (n_samples)

array of indices over which to calculate the riemannian metric.

laplacian: array-like, shape = (n_samples, n_samples)

The Laplacian of the data. It is recommended to use the “geometric” Laplacian (default) option from geometry.graph_laplacian()

n_dim : integer, optional

Use only the first n_dim <= mdimY dimensions.All dimensions n_dim:mdimY are ignored.

invert_h: boolean, optional

if False, only the “dual Riemannian metric” is computed if True, the dual metric matrices are inverted to obtain the Riemannian metric G.

mode_inv: string, optional

How to compute the inverses of h_dual_metric, if invert_h

“inv”, use numpy.inv() “svd” (default), use numpy.linalg.svd(), then invert the eigenvalues (possibly a more numerically stable method with H is symmetric and ill conditioned)

Returns:

h_dual_metric : array, shape=(n_samples, n_dim, n_dim)

Optionally :

g_riemann_metric : array, shape=(n_samples, n_dim, n_dim )

Hvv : singular vectors of H, transposed, shape = ( n_samples, n_dim, n_dim )

Hsvals : singular values of H, shape = ( n_samples, n_dim )

Gsvals : singular values of G, shape = ( n_samples, n_dim )

Notes

References

“Non-linear dimensionality reduction: Riemannian metric estimation and the problem of geometric discovery”, Dominique Perraul-Joncas, Marina Meila, arXiv:1305.7255