API Documentation¶
Spectral Embedding
-
class
megaman.embedding.spectral_embedding.
SpectralEmbedding
(n_components=2, radius=None, geom=None, eigen_solver='auto', random_state=None, drop_first=True, diffusion_maps=False, diffusion_time=0, solver_kwds=None)[source]¶ Spectral embedding for non-linear dimensionality reduction.
Forms an affinity matrix given by the specified function and applies spectral decomposition to the corresponding graph laplacian. The resulting transformation is given by the value of the eigenvectors for each data point.
Parameters: n_components : integer
number of coordinates for the manifold.
radius : float (optional)
radius for adjacency and affinity calculations. Will be overridden if either is set in geom
geom : dict or megaman.geometry.Geometry object
specification of geometry parameters: keys are [“adjacency_method”, “adjacency_kwds”, “affinity_method”,
“affinity_kwds”, “laplacian_method”, “laplacian_kwds”]
eigen_solver : {‘auto’, ‘dense’, ‘arpack’, ‘lobpcg’, or ‘amg’}
- ‘auto’ :
algorithm will attempt to choose the best method for input data
- ‘dense’ :
use standard dense matrix operations for the eigenvalue decomposition. Uses a dense data array, and thus should be avoided for large problems.
- ‘arpack’ :
use arnoldi iteration in shift-invert mode. For this method, M may be a dense matrix, sparse matrix, or general linear operator. Warning: ARPACK can be unstable for some problems. It is best to try several random seeds in order to check results.
- ‘lobpcg’ :
Locally Optimal Block Preconditioned Conjugate Gradient Method. A preconditioned eigensolver for large symmetric positive definite (SPD) generalized eigenproblems.
- ‘amg’ :
AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities.
random_state : numpy.RandomState or int, optional
The generator or seed used to determine the starting vector for arpack iterations. Defaults to numpy.random.RandomState
drop_first : bool, optional, default=True
Whether to drop the first eigenvector. For spectral embedding, this should be True as the first eigenvector should be constant vector for connected graph, but for spectral clustering, this should be kept as False to retain the first eigenvector.
diffusion_map : boolean, optional. Whether to return the diffusion map
version by re-scaling the embedding by the eigenvalues.
solver_kwds : any additional keyword arguments to pass to the selected eigen_solver
References
[R1] A Tutorial on Spectral Clustering, 2007 Ulrike von Luxburg http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.165.9323 [R2] On Spectral Clustering: Analysis and an algorithm, 2011 Andrew Y. Ng, Michael I. Jordan, Yair Weiss http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.8100 [R3] Normalized cuts and image segmentation, 2000 Jianbo Shi, Jitendra Malik http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.2324 -
fit
(X, y=None, input_type='data')[source]¶ Fit the model from data in X.
Parameters: input_type : string, one of: ‘data’, ‘distance’ or ‘affinity’.
The values of input data X. (default = ‘data’)
X : array-like, shape (n_samples, n_features)
Training vector, where n_samples in the number of samples and n_features is the number of features.
If self.input_type is distance, or affinity:
X : array-like, shape (n_samples, n_samples),
Interpret X as precomputed distance or adjacency graph computed from samples.
Returns: self : object
Returns the instance itself.
-
megaman.embedding.spectral_embedding.
spectral_embedding
(geom, n_components=8, eigen_solver='auto', random_state=None, drop_first=True, diffusion_maps=False, diffusion_time=0, solver_kwds=None)[source]¶ Project the sample on the first eigen vectors of the graph Laplacian.
The adjacency matrix is used to compute a normalized graph Laplacian whose principal eigenvectors (associated to the smallest eigen values) represent the embedding coordinates of the data.
The
adjacency
variable is not strictly the adjacency matrix of a graph but more generally an affinity or similarity matrix between samples (for instance the heat kernel of a euclidean distance matrix or a k-NN matrix). The Laplacian must be symmetric so that the eigen vector decomposition works as expected. This is ensured by the default setting (for more details, see the documentation in geometry.py).The data and generic geometric parameters are passed via a Geometry object, which also computes the Laplacian. By default, the ‘geometric’ Laplacian (or “debiased”, or “renormalized” with alpha=1) is used. This is the Laplacian construction defined in [Coifman and Lafon, 2006] (see also documentation in laplacian.py). Thus, with diffusion_maps=False, spectral embedding is a modification of the Laplacian Eigenmaps algorithm of [Belkin and Nyiogi, 2002], with diffusion_maps=False, geom.laplacian_method =’symmetricnormalized’ it is exactly the Laplacian Eigenmaps, with diffusion_maps=True, diffusion_time>0 it is the Diffusion Maps algorithm of [Coifman and Lafon 2006]; diffusion_maps=True and diffusion_time=0 is the same as diffusion_maps=False and default geom.laplacian_method.
Parameters: geom : a Geometry object from megaman.embedding.geometry
n_components : integer, optional
The dimension of the projection subspace.
eigen_solver : {‘auto’, ‘dense’, ‘arpack’, ‘lobpcg’, or ‘amg’}
- ‘auto’ :
algorithm will attempt to choose the best method for input data
- ‘dense’ :
use standard dense matrix operations for the eigenvalue decomposition. For this method, M must be an array or matrix type. This method should be avoided for large problems.
- ‘arpack’ :
use arnoldi iteration in shift-invert mode. For this method, M may be a dense matrix, sparse matrix, or general linear operator. Warning: ARPACK can be unstable for some problems. It is best to try several random seeds in order to check results.
- ‘lobpcg’ :
Locally Optimal Block Preconditioned Conjugate Gradient Method. A preconditioned eigensolver for large symmetric positive definite (SPD) generalized eigenproblems.
- ‘amg’ :
AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities.
random_state : int seed, RandomState instance, or None (default)
A pseudo random number generator used for the initialization of the lobpcg eigen vectors decomposition when eigen_solver == ‘amg’. By default, arpack is used.
drop_first : bool, optional, default=True
Whether to drop the first eigenvector. For spectral embedding, this should be True as the first eigenvector should be constant vector for connected graph, but for spectral clustering, this should be kept as False to retain the first eigenvector.
diffusion_map : boolean, optional. Whether to return the diffusion map
version by re-scaling the embedding coordinate by the eigenvalues to the power diffusion_time.
diffusion_time: if diffusion_map=True, the eigenvectors of the Laplacian are rescaled by
(1-lambda)^diffusion_time, where lambda is the corresponding eigenvalue. diffusion_time has the role of scale parameter. One of the main ideas of diffusion framework is that running the diffusion forward in time (taking larger and larger powers of the Laplacian/transition matrix) reveals the geometric structure of X at larger and larger scales (the diffusion process). diffusion_time = 0 empirically provides a reasonable balance from a clustering perspective. Specifically, the notion of a cluster in the data set is quantified as a region in which the probability of escaping this region is low (within a certain time t). Credit to Satrajit Ghosh (http://satra.cogitatum.org/) for description
solver_kwds : any additional keyword arguments to pass to the selected eigen_solver
Returns: embedding : array, shape=(n_samples, n_components)
The reduced samples.
Notes
Spectral embedding is most useful when the graph has one connected component. If there graph has many components, the first few eigenvectors will simply uncover the connected components of the graph.
References
- http://en.wikipedia.org/wiki/LOBPCG
- Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method Andrew V. Knyazev http://dx.doi.org/10.1137%2FS1064827500366124
ISOMAP
-
class
megaman.embedding.isomap.
Isomap
(n_components=2, radius=None, geom=None, eigen_solver='auto', random_state=None, path_method='auto', solver_kwds=None)[source]¶ Isomap Embedding
Non-linear dimensionality reduction through Isometric Mapping
Parameters: n_components : integer
number of coordinates for the manifold.
radius : float (optional)
radius for adjacency and affinity calculations. Will be overridden if either is set in geom
geom : dict or megaman.geometry.Geometry object
specification of geometry parameters: keys are [“adjacency_method”, “adjacency_kwds”, “affinity_method”,
“affinity_kwds”, “laplacian_method”, “laplacian_kwds”]
eigen_solver : {‘auto’, ‘dense’, ‘arpack’, ‘lobpcg’, or ‘amg’}
- ‘auto’ :
algorithm will attempt to choose the best method for input data
- ‘dense’ :
use standard dense matrix operations for the eigenvalue decomposition. Uses a dense data array, and thus should be avoided for large problems.
- ‘arpack’ :
use arnoldi iteration in shift-invert mode. For this method, M may be a dense matrix, sparse matrix, or general linear operator. Warning: ARPACK can be unstable for some problems. It is best to try several random seeds in order to check results.
- ‘lobpcg’ :
Locally Optimal Block Preconditioned Conjugate Gradient Method. A preconditioned eigensolver for large symmetric positive definite (SPD) generalized eigenproblems.
- ‘amg’ :
AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities.
random_state : numpy.RandomState or int, optional
The generator or seed used to determine the starting vector for arpack iterations. Defaults to numpy.random.RandomState
path_method : string, optionl. method for computing graph shortest path.
One of [‘auto’, ‘D’, ‘FW’, ‘BF’, ‘J’]. See scipy.sparse.csgraph.shortest_path for more information.
solver_kwds : any additional keyword arguments to pass to the selected eigen_solver
References
[R4] Tenenbaum, J.B.; De Silva, V.; & Langford, J.C. A global geometric framework for nonlinear dimensionality reduction. Science 290 (5500) Attributes
embedding_ (array, shape = (n_samples, n_components)) Spectral embedding of the training matrix. -
fit
(X, y=None, input_type='data')[source]¶ Fit the model from data in X.
Parameters: input_type : string, one of: ‘data’, ‘distance’.
The values of input data X. (default = ‘data’)
X : array-like, shape (n_samples, n_features)
Training vector, where n_samples in the number of samples and n_features is the number of features.
If self.input_type is ‘distance’:
X : array-like, shape (n_samples, n_samples),
Interpret X as precomputed distance or adjacency graph computed from samples.
eigen_solver : {None, ‘arpack’, ‘lobpcg’, or ‘amg’}
The eigenvalue decomposition strategy to use. AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities.
Returns: self : object
Returns the instance itself.
-
megaman.embedding.isomap.
isomap
(geom, n_components=8, eigen_solver='auto', random_state=None, path_method='auto', distance_matrix=None, graph_distance_matrix=None, centered_matrix=None, solver_kwds=None)[source]¶ Parameters: geom : a Geometry object from megaman.geometry.geometry
n_components : integer, optional
The dimension of the projection subspace.
eigen_solver : {‘auto’, ‘dense’, ‘arpack’, ‘lobpcg’, or ‘amg’}
- ‘auto’ :
algorithm will attempt to choose the best method for input data
- ‘dense’ :
use standard dense matrix operations for the eigenvalue decomposition. For this method, M must be an array or matrix type. This method should be avoided for large problems.
- ‘arpack’ :
use arnoldi iteration in shift-invert mode. For this method, M may be a dense matrix, sparse matrix, or general linear operator. Warning: ARPACK can be unstable for some problems. It is best to try several random seeds in order to check results.
- ‘lobpcg’ :
Locally Optimal Block Preconditioned Conjugate Gradient Method. A preconditioned eigensolver for large symmetric positive definite (SPD) generalized eigenproblems.
- ‘amg’ :
AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities.
random_state : int seed, RandomState instance, or None (default)
A pseudo random number generator used for the initialization of the lobpcg eigen vectors decomposition when eigen_solver == ‘amg’. By default, arpack is used.
path_method : string, method for computing graph shortest path. One of :
‘auto’, ‘D’, ‘FW’, ‘BF’, ‘J’. See scipy.sparse.csgraph.shortest_path for more information.
distance_matrix : sparse Ndarray (n_obs, n_obs), optional. Pairwise distance matrix
sparse zeros considered ‘infinite’.
graph_distance_matrix : Ndarray (n_obs, n_obs), optional. Pairwise graph distance
matrix. Output of graph_shortest_path.
centered_matrix : Ndarray (n_obs, n_obs), optional. Centered version of
graph_distance_matrix
solver_kwds : any additional keyword arguments to pass to the selected eigen_solver
Returns: embedding : array, shape=(n_samples, n_components)
The reduced samples.
Locally Linear Embedding
-
class
megaman.embedding.locally_linear.
LocallyLinearEmbedding
(n_components=2, radius=None, geom=None, eigen_solver='auto', random_state=None, reg=1000.0, solver_kwds=None)[source]¶ Locally Linear Embedding
Parameters: n_components : integer
number of coordinates for the manifold.
radius : float (optional)
radius for adjacency and affinity calculations. Will be overridden if either is set in geom
geom : dict or megaman.geometry.Geometry object
specification of geometry parameters: keys are [“adjacency_method”, “adjacency_kwds”, “affinity_method”,
“affinity_kwds”, “laplacian_method”, “laplacian_kwds”]
eigen_solver : {‘auto’, ‘dense’, ‘arpack’, ‘lobpcg’, or ‘amg’}
- ‘auto’ :
algorithm will attempt to choose the best method for input data
- ‘dense’ :
use standard dense matrix operations for the eigenvalue decomposition. Uses a dense data array, and thus should be avoided for large problems.
- ‘arpack’ :
use arnoldi iteration in shift-invert mode. For this method, M may be a dense matrix, sparse matrix, or general linear operator. Warning: ARPACK can be unstable for some problems. It is best to try several random seeds in order to check results.
- ‘lobpcg’ :
Locally Optimal Block Preconditioned Conjugate Gradient Method. A preconditioned eigensolver for large symmetric positive definite (SPD) generalized eigenproblems.
- ‘amg’ :
AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities.
random_state : numpy.RandomState or int, optional
The generator or seed used to determine the starting vector for arpack iterations. Defaults to numpy.random.RandomState
reg : float, optional
regularization constant, multiplies the trace of the local covariance matrix of the distances.
solver_kwds : any additional keyword arguments to pass to the selected eigen_solver
References
[R5] Roweis, S. & Saul, L. Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323 (2000). -
fit
(X, y=None, input_type='data')[source]¶ Fit the model from data in X.
Parameters: input_type : string, one of: ‘data’, ‘distance’.
The values of input data X. (default = ‘data’)
X : array-like, shape (n_samples, n_features)
Training vector, where n_samples in the number of samples and n_features is the number of features.
If self.input_type is ‘distance’:
X : array-like, shape (n_samples, n_samples),
Interpret X as precomputed distance or adjacency graph computed from samples.
Returns: self : object
Returns the instance itself.
-
megaman.embedding.locally_linear.
barycenter_graph
(distance_matrix, X, reg=0.001)[source]¶ Computes the barycenter weighted graph for points in X
Parameters: distance_matrix: sparse Ndarray, (N_obs, N_obs) pairwise distance matrix.
X : Ndarray (N_obs, N_dim) observed data matrix.
reg : float, optional
Amount of regularization when solving the least-squares problem. Only relevant if mode=’barycenter’. If None, use the default.
Returns: W : sparse matrix in CSR format, shape = [n_samples, n_samples]
W[i, j] is assigned the weight of edge that connects i to j.
-
megaman.embedding.locally_linear.
locally_linear_embedding
(geom, n_components, reg=0.001, eigen_solver='auto', random_state=None, solver_kwds=None)[source]¶ Perform a Locally Linear Embedding analysis on the data.
Parameters: geom : a Geometry object from megaman.geometry.geometry
n_components : integer
number of coordinates for the manifold.
reg : float
regularization constant, multiplies the trace of the local covariance matrix of the distances.
eigen_solver : {‘auto’, ‘dense’, ‘arpack’, ‘lobpcg’, or ‘amg’}
- ‘auto’ :
algorithm will attempt to choose the best method for input data
- ‘dense’ :
use standard dense matrix operations for the eigenvalue decomposition. For this method, M must be an array or matrix type. This method should be avoided for large problems.
- ‘arpack’ :
use arnoldi iteration in shift-invert mode. For this method, M may be a dense matrix, sparse matrix, or general linear operator. Warning: ARPACK can be unstable for some problems. It is best to try several random seeds in order to check results.
- ‘lobpcg’ :
Locally Optimal Block Preconditioned Conjugate Gradient Method. A preconditioned eigensolver for large symmetric positive definite (SPD) generalized eigenproblems.
- ‘amg’ :
AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities.
random_state : numpy.RandomState or int, optional
The generator or seed used to determine the starting vector for arpack iterations. Defaults to numpy.random.
solver_kwds : any additional keyword arguments to pass to the selected eigen_solver
Returns: Y : array-like, shape [n_samples, n_components]
Embedding vectors.
squared_error : float
Reconstruction error for the embedding vectors. Equivalent to
norm(Y - W Y, 'fro')**2
, where W are the reconstruction weights.References
[R6] Roweis, S. & Saul, L. Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323 (2000).
Local Tangent Space Alignment
-
class
megaman.embedding.ltsa.
LTSA
(n_components=2, radius=None, geom=None, eigen_solver='auto', random_state=None, tol=1e-06, max_iter=100, solver_kwds=None)[source]¶ Local Tangent Space Alignment
Parameters: n_components : integer
number of coordinates for the manifold.
radius : float (optional)
radius for adjacency and affinity calculations. Will be overridden if either is set in geom
geom : dict or megaman.geometry.Geometry object
specification of geometry parameters: keys are [“adjacency_method”, “adjacency_kwds”, “affinity_method”,
“affinity_kwds”, “laplacian_method”, “laplacian_kwds”]
eigen_solver : {‘auto’, ‘dense’, ‘arpack’, ‘lobpcg’, or ‘amg’}
- ‘auto’ :
algorithm will attempt to choose the best method for input data
- ‘dense’ :
use standard dense matrix operations for the eigenvalue decomposition. Uses a dense data array, and thus should be avoided for large problems.
- ‘arpack’ :
use arnoldi iteration in shift-invert mode. For this method, M may be a dense matrix, sparse matrix, or general linear operator. Warning: ARPACK can be unstable for some problems. It is best to try several random seeds in order to check results.
- ‘lobpcg’ :
Locally Optimal Block Preconditioned Conjugate Gradient Method. A preconditioned eigensolver for large symmetric positive definite (SPD) generalized eigenproblems.
- ‘amg’ :
AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities.
random_state : numpy.RandomState or int, optional
The generator or seed used to determine the starting vector for arpack iterations. Defaults to numpy.random.RandomState
solver_kwds : any additional keyword arguments to pass to the selected eigen_solver
References
[R7] Zhang, Z. & Zha, H. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. Journal of Shanghai Univ. 8:406 (2004) -
fit
(X, y=None, input_type='data')[source]¶ Fit the model from data in X.
Parameters: input_type : string, one of: ‘data’, ‘distance’.
The values of input data X. (default = ‘data’)
X : array-like, shape (n_samples, n_features)
Training vector, where n_samples in the number of samples and n_features is the number of features.
If self.input_type is ‘distance’, or ‘affinity’:
X : array-like, shape (n_samples, n_samples),
Interpret X as precomputed distance or adjacency graph computed from samples.
Returns: self : object
Returns the instance itself.
-
megaman.embedding.ltsa.
ltsa
(geom, n_components, eigen_solver='auto', random_state=None, solver_kwds=None)[source]¶ Perform a Local Tangent Space Alignment analysis on the data.
Parameters: geom : a Geometry object from megaman.geometry.geometry
n_components : integer
number of coordinates for the manifold.
eigen_solver : {‘auto’, ‘dense’, ‘arpack’, ‘lobpcg’, or ‘amg’}
- ‘auto’ :
algorithm will attempt to choose the best method for input data
- ‘dense’ :
use standard dense matrix operations for the eigenvalue decomposition. For this method, M must be an array or matrix type. This method should be avoided for large problems.
- ‘arpack’ :
use arnoldi iteration in shift-invert mode. For this method, M may be a dense matrix, sparse matrix, or general linear operator. Warning: ARPACK can be unstable for some problems. It is best to try several random seeds in order to check results.
- ‘lobpcg’ :
Locally Optimal Block Preconditioned Conjugate Gradient Method. A preconditioned eigensolver for large symmetric positive definite (SPD) generalized eigenproblems.
- ‘amg’ :
AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities.
random_state : numpy.RandomState or int, optional
The generator or seed used to determine the starting vector for arpack iterations. Defaults to numpy.random.
solver_kwds : any additional keyword arguments to pass to the selected eigen_solver
Returns: embedding : array-like, shape [n_samples, n_components]
Embedding vectors.
squared_error : float
Reconstruction error for the embedding vectors. Equivalent to
norm(Y - W Y, 'fro')**2
, where W are the reconstruction weights.References
- Zhang, Z. & Zha, H. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. Journal of Shanghai Univ. 8:406 (2004)