Robust Approaches for Unsupervised Learning

Dorabiala, Olga

Robust Approaches for Unsupervised Learning

dc.contributor.advisor	Aravkin, Aleksandr
dc.contributor.advisor	Kutz, Nathan J.
dc.contributor.author	Dorabiala, Olga
dc.date.accessioned	2023-08-14T17:01:49Z
dc.date.available	2023-08-14T17:01:49Z
dc.date.issued	2023-08-14
dc.date.submitted	2023
dc.description	Thesis (Ph.D.)--University of Washington, 2023
dc.description.abstract	Today, data science and machine learning are a cornerstone in the engineering, physical, social, and biological sciences. Often, the data available in these fields is large and unlabeled, motivating the development of unsupervised learning methods that can efficiently extract information about object behavior with no human supervision. Unsupervised learning discovers the underlying patterns or structures of unlabeled data through the use of methods such as clustering, dimensionality reduction, and anomaly detection. Although often treated as separate problems, these methods have significant overlap in practice. When dealing with real-world data many traditional techniques are compromised by lack of clear separation between groups, noisy observations, and/or outlying data points. Thus, robust statistical algorithms are required for successful data analytics. In this work, we focus on the robustification of existing cluster analysis and dimensionality reduction techniques. In particular, we propose three new algorithms: Robust Trimmed k-means (RTKM), Spatiotemporal k-means (STKM), and Ensemble Principal Component Analysis (EPCA). RTKM augments the objective function in k-means clustering to create a flexible method that simultaneously identifies outliers and clusters points and can be applied to either single or multi-membership data. Using a similar approach, STKM reframes k-means for the spatiotemporal domain to address the moving cluster problem and proposes a noise robust extension as future work. Finally, EPCA ensembles boostrapped Principal Component Analysis (PCA) with k-means clustering to create a scalable, noise-resistant extension of PCA that lends itself naturally to uncertainty quantification. We introduce each of our methods, demonstrate their effectiveness against current competitors, and discuss potential for future work.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Dorabiala_washington_0250E_25904.pdf
dc.identifier.uri	http://hdl.handle.net/1773/50206
dc.language.iso	en_US
dc.rights	none
dc.subject	Clustering
dc.subject	Dimensionality reduction
dc.subject	Unsupervised learning
dc.subject	Applied mathematics
dc.subject.other	Applied mathematics
dc.title	Robust Approaches for Unsupervised Learning
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Dorabiala_washington_0250E_25904.pdf
Size:: 4.71 MB
Format:: Adobe Portable Document Format

Download

Collections

Applied mathematics