Mixture models to fit heavy-tailed, heterogeneous or sparse data

dc.contributor.advisorDobra, Adrian
dc.contributor.advisorChen, Yen-Chi
dc.contributor.authorMiao, Zhen
dc.date.accessioned2023-08-14T17:07:52Z
dc.date.available2023-08-14T17:07:52Z
dc.date.issued2023-08-14
dc.date.issued2023-08-14
dc.date.submitted2023
dc.descriptionThesis (Ph.D.)--University of Washington, 2023
dc.description.abstractWith the advent of modern technologies, many scientific fields collect and analyze increasingly large datasets. Unfortunately, the complexity and heterogeneity of these datasets cannot be properly captured through classical statistical models. In this thesis, we develop new classes of mixture models that alleviate these issues. Their key feature is the assumption that the overall population consists of several subpopulations and each of these subpopulations can be represented through simpler statistical models. Our new mixture models are defined through three classes of distributions for different data types, as follows. The first type of mixture model is called the bi-$s^*$-concave distribution for continuous data. We propose this distribution as a generalization of two popular distributions, i.e., the $s$-concave distribution and the bi-log-concave distribution, in the field of estimation under shape constraints to include multimodal and heavy-tail densities. Although its definition is not directly related to mixture models, this class include several important mixture distributions (e.g., mixture of Student-t distributions, mixture of Gaussian distributions) under some conditions. The second type of mixture model is the nonparametric Poisson mixture distribution for count data, which generalizes Poisson distribution by assuming its parameter following a totally unknown mixing distribution. We provide a minimax-optimal convergence rate for the nonparametric maximum likelihood estimation for the mixing distribution and apply it on a single-cell RNA-sequencing data. The third type of mixture model is the Ising mixture distribution for inferring associations between binary variables. This method combines the strengths of classic methods, such as Ising models and multivariate Bernoulli mixture models. We examine the conditions required for the identifiability of the Ising mixture model, and develop a Bayesian framework for implementation. Through simulations and two real data applications, we demonstrate the effectiveness of our proposed method.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherMiao_washington_0250E_25376.pdf
dc.identifier.urihttp://hdl.handle.net/1773/50565
dc.language.isoen_US
dc.rightsnone
dc.subjectIsing model
dc.subjectMixture model
dc.subjectNonparametric Maximum Likelihood Estimation
dc.subjectShape constraints
dc.subjectSingle-cell RNA sequencing data
dc.subjectStatistics
dc.subject.otherStatistics
dc.titleMixture models to fit heavy-tailed, heterogeneous or sparse data
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Miao_washington_0250E_25376.pdf
Size:
1.11 MB
Format:
Adobe Portable Document Format

Collections