Robust Approaches for Unsupervised Learning
| dc.contributor.advisor | Aravkin, Aleksandr | |
| dc.contributor.advisor | Kutz, Nathan J. | |
| dc.contributor.author | Dorabiala, Olga | |
| dc.date.accessioned | 2023-08-14T17:01:49Z | |
| dc.date.available | 2023-08-14T17:01:49Z | |
| dc.date.issued | 2023-08-14 | |
| dc.date.submitted | 2023 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2023 | |
| dc.description.abstract | Today, data science and machine learning are a cornerstone in the engineering, physical, social, and biological sciences. Often, the data available in these fields is large and unlabeled, motivating the development of unsupervised learning methods that can efficiently extract information about object behavior with no human supervision. Unsupervised learning discovers the underlying patterns or structures of unlabeled data through the use of methods such as clustering, dimensionality reduction, and anomaly detection. Although often treated as separate problems, these methods have significant overlap in practice. When dealing with real-world data many traditional techniques are compromised by lack of clear separation between groups, noisy observations, and/or outlying data points. Thus, robust statistical algorithms are required for successful data analytics. In this work, we focus on the robustification of existing cluster analysis and dimensionality reduction techniques. In particular, we propose three new algorithms: Robust Trimmed k-means (RTKM), Spatiotemporal k-means (STKM), and Ensemble Principal Component Analysis (EPCA). RTKM augments the objective function in k-means clustering to create a flexible method that simultaneously identifies outliers and clusters points and can be applied to either single or multi-membership data. Using a similar approach, STKM reframes k-means for the spatiotemporal domain to address the moving cluster problem and proposes a noise robust extension as future work. Finally, EPCA ensembles boostrapped Principal Component Analysis (PCA) with k-means clustering to create a scalable, noise-resistant extension of PCA that lends itself naturally to uncertainty quantification. We introduce each of our methods, demonstrate their effectiveness against current competitors, and discuss potential for future work. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Dorabiala_washington_0250E_25904.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/50206 | |
| dc.language.iso | en_US | |
| dc.rights | none | |
| dc.subject | Clustering | |
| dc.subject | Dimensionality reduction | |
| dc.subject | Unsupervised learning | |
| dc.subject | Applied mathematics | |
| dc.subject.other | Applied mathematics | |
| dc.title | Robust Approaches for Unsupervised Learning | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Dorabiala_washington_0250E_25904.pdf
- Size:
- 4.71 MB
- Format:
- Adobe Portable Document Format
