Mixture models to fit heavy-tailed, heterogeneous or sparse data

Miao, Zhen

Mixture models to fit heavy-tailed, heterogeneous or sparse data

dc.contributor.advisor	Dobra, Adrian
dc.contributor.advisor	Chen, Yen-Chi
dc.contributor.author	Miao, Zhen
dc.date.accessioned	2023-08-14T17:07:52Z
dc.date.available	2023-08-14T17:07:52Z
dc.date.issued	2023-08-14
dc.date.issued	2023-08-14
dc.date.submitted	2023
dc.description	Thesis (Ph.D.)--University of Washington, 2023
dc.description.abstract	With the advent of modern technologies, many scientific fields collect and analyze increasingly large datasets. Unfortunately, the complexity and heterogeneity of these datasets cannot be properly captured through classical statistical models. In this thesis, we develop new classes of mixture models that alleviate these issues. Their key feature is the assumption that the overall population consists of several subpopulations and each of these subpopulations can be represented through simpler statistical models. Our new mixture models are defined through three classes of distributions for different data types, as follows. The first type of mixture model is called the bi-$s^*$-concave distribution for continuous data. We propose this distribution as a generalization of two popular distributions, i.e., the $s$-concave distribution and the bi-log-concave distribution, in the field of estimation under shape constraints to include multimodal and heavy-tail densities. Although its definition is not directly related to mixture models, this class include several important mixture distributions (e.g., mixture of Student-t distributions, mixture of Gaussian distributions) under some conditions. The second type of mixture model is the nonparametric Poisson mixture distribution for count data, which generalizes Poisson distribution by assuming its parameter following a totally unknown mixing distribution. We provide a minimax-optimal convergence rate for the nonparametric maximum likelihood estimation for the mixing distribution and apply it on a single-cell RNA-sequencing data. The third type of mixture model is the Ising mixture distribution for inferring associations between binary variables. This method combines the strengths of classic methods, such as Ising models and multivariate Bernoulli mixture models. We examine the conditions required for the identifiability of the Ising mixture model, and develop a Bayesian framework for implementation. Through simulations and two real data applications, we demonstrate the effectiveness of our proposed method.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Miao_washington_0250E_25376.pdf
dc.identifier.uri	http://hdl.handle.net/1773/50565
dc.language.iso	en_US
dc.rights	none
dc.subject	Ising model
dc.subject	Mixture model
dc.subject	Nonparametric Maximum Likelihood Estimation
dc.subject	Shape constraints
dc.subject	Single-cell RNA sequencing data
dc.subject	Statistics
dc.subject.other	Statistics
dc.title	Mixture models to fit heavy-tailed, heterogeneous or sparse data
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Miao_washington_0250E_25376.pdf
Size:: 1.11 MB
Format:: Adobe Portable Document Format

Download

Collections

Statistics