Mixture models to fit heavy-tailed, heterogeneous or sparse data
| dc.contributor.advisor | Dobra, Adrian | |
| dc.contributor.advisor | Chen, Yen-Chi | |
| dc.contributor.author | Miao, Zhen | |
| dc.date.accessioned | 2023-08-14T17:07:52Z | |
| dc.date.available | 2023-08-14T17:07:52Z | |
| dc.date.issued | 2023-08-14 | |
| dc.date.issued | 2023-08-14 | |
| dc.date.submitted | 2023 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2023 | |
| dc.description.abstract | With the advent of modern technologies, many scientific fields collect and analyze increasingly large datasets. Unfortunately, the complexity and heterogeneity of these datasets cannot be properly captured through classical statistical models. In this thesis, we develop new classes of mixture models that alleviate these issues. Their key feature is the assumption that the overall population consists of several subpopulations and each of these subpopulations can be represented through simpler statistical models. Our new mixture models are defined through three classes of distributions for different data types, as follows. The first type of mixture model is called the bi-$s^*$-concave distribution for continuous data. We propose this distribution as a generalization of two popular distributions, i.e., the $s$-concave distribution and the bi-log-concave distribution, in the field of estimation under shape constraints to include multimodal and heavy-tail densities. Although its definition is not directly related to mixture models, this class include several important mixture distributions (e.g., mixture of Student-t distributions, mixture of Gaussian distributions) under some conditions. The second type of mixture model is the nonparametric Poisson mixture distribution for count data, which generalizes Poisson distribution by assuming its parameter following a totally unknown mixing distribution. We provide a minimax-optimal convergence rate for the nonparametric maximum likelihood estimation for the mixing distribution and apply it on a single-cell RNA-sequencing data. The third type of mixture model is the Ising mixture distribution for inferring associations between binary variables. This method combines the strengths of classic methods, such as Ising models and multivariate Bernoulli mixture models. We examine the conditions required for the identifiability of the Ising mixture model, and develop a Bayesian framework for implementation. Through simulations and two real data applications, we demonstrate the effectiveness of our proposed method. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Miao_washington_0250E_25376.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/50565 | |
| dc.language.iso | en_US | |
| dc.rights | none | |
| dc.subject | Ising model | |
| dc.subject | Mixture model | |
| dc.subject | Nonparametric Maximum Likelihood Estimation | |
| dc.subject | Shape constraints | |
| dc.subject | Single-cell RNA sequencing data | |
| dc.subject | Statistics | |
| dc.subject.other | Statistics | |
| dc.title | Mixture models to fit heavy-tailed, heterogeneous or sparse data | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Miao_washington_0250E_25376.pdf
- Size:
- 1.11 MB
- Format:
- Adobe Portable Document Format
