Learning Topological Structures and Vector Fields on Manifolds with (Higher-order) Discrete Laplacians
Loading...
Date
Authors
Chen, Yu-Chia
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Unsupervised learning algorithms, which extract geometric information without labels, are pivotal in analyzing high-dimensional observational data in complex physical and social systems. Prior accomplishments of scientific discoveries with these methods include applying (i) non-linear dimensionality reduction, also called manifold learning (ML), algorithms in revealing hidden structures of quantum chemistry and astronomy datasets. Additionally, (ii) clustering analysis techniques are critical for categorizing stages in cellular differentiation or analyzing community structures in social networks. Finally, (iii) topological data analysis (TDA) methods are essential for investigating the cyclic/periodic structures in the neuroscientific, galactic, and human action systems. Despite these early successes in the scientific community, the vast majority of the unsupervised learning methodologies are highly unexplored. For instance, how can we learn from a dataset equipped with temporal information? On the other hand, how can we tell/test whether the obtained topological structures are signals instead of noises or algorithmic defects? Lastly, can we extend the current unsupervised learning framework to deal with the higher-order (e.g., triplet-wise) relations? If so, what potentials does it open up? In this thesis, we will answer some of these questions under the lens of differential geometry, topology, and machine learning. This thesis centers around the estimators for the Laplace-Beltrami operator $\Delta_0$ of a manifold $\mathcal{M}$ and its higher-order counterparts $\Delta_k$ (called the $k$-Laplacian). In particular, we are interested in the spectral (i.e., the eigenvalues and the eigenvectors) properties of these estimators. First, we analyze a known deficiency in the outputs of the standard embedding algorithms when the aspect ratio of the manifold is large. This deficiency, called the Independent Eigencoordinate Search (IES) problem, arises due to the functional dependencies in the eigenfunctions of $\Delta_0$. We address the IES problem by proposing a bicriterial algorithm that has a low computational overhead and has an analyzable asymptotic limit. Second, the discrete Helmholtzian ${\mathbf{\mathcal{L}}}_1$ (a first-order extension of the graph Laplacian ${\mathbf{\mathcal{L}}}_0$ to the edge space) is introduced to enrich the manifold learning methodology. We provide a theoretical analysis of the large sample limit of ${\mathbf{\mathcal{L}}}_1$ and show its connection to the manifold Helmholtzian (1-Laplacian) $\Delta_1$, an operator that acts on vector fields on the manifold. The proposed Helmholtzian estimator ${\mathbf{\mathcal{L}}}_1$ made it possible to distill higher-order topological structures, such as the first homology vector space $\mathcal{H}_1$ encoding the cyclic information. Third, we explore the possibility of utilizing the vector field basis defined from the eigenflows of ${\mathbf{\mathcal{L}}}_1$; specifically, we study the extensions of the learning algorithms that are based on ${\mathbf{\mathcal{L}}}_0$ to vector fields smoothing, vector field interpolation, and inferring underlying vector fields from sparsely observed trajectories. Lastly, we study the decomposition of the $k$-th homology vector space $\mathcal{H}_k$ (null space of the $k$-Laplacian ${\mathbf{\mathcal{L}}}_k$) of the sparsely connected manifolds; under this condition, we show that the homology embedding can be roughly factorized. Our analysis is conducted by viewing the connected sum (gluing) of manifolds as a perturbation to the matrix ${\mathbf{\mathcal{L}}}_k$. We exemplify the efficacy of the proposed framework by applying it to the {\em shortest homologous loop detection} problem, a problem known to be NP-hard in general. We support our claims in each section with an extensive set of experiments on synthetic manifolds along with real datasets from chemistry, biology, medical imaging, and astronomy.
Description
Thesis (Ph.D.)--University of Washington, 2021
