Mixture models to fit heavy-tailed, heterogeneous or sparse data

Loading...
Thumbnail Image

Authors

Miao, Zhen

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

With the advent of modern technologies, many scientific fields collect and analyze increasingly large datasets. Unfortunately, the complexity and heterogeneity of these datasets cannot be properly captured through classical statistical models. In this thesis, we develop new classes of mixture models that alleviate these issues. Their key feature is the assumption that the overall population consists of several subpopulations and each of these subpopulations can be represented through simpler statistical models. Our new mixture models are defined through three classes of distributions for different data types, as follows. The first type of mixture model is called the bi-$s^*$-concave distribution for continuous data. We propose this distribution as a generalization of two popular distributions, i.e., the $s$-concave distribution and the bi-log-concave distribution, in the field of estimation under shape constraints to include multimodal and heavy-tail densities. Although its definition is not directly related to mixture models, this class include several important mixture distributions (e.g., mixture of Student-t distributions, mixture of Gaussian distributions) under some conditions. The second type of mixture model is the nonparametric Poisson mixture distribution for count data, which generalizes Poisson distribution by assuming its parameter following a totally unknown mixing distribution. We provide a minimax-optimal convergence rate for the nonparametric maximum likelihood estimation for the mixing distribution and apply it on a single-cell RNA-sequencing data. The third type of mixture model is the Ising mixture distribution for inferring associations between binary variables. This method combines the strengths of classic methods, such as Ising models and multivariate Bernoulli mixture models. We examine the conditions required for the identifiability of the Ising mixture model, and develop a Bayesian framework for implementation. Through simulations and two real data applications, we demonstrate the effectiveness of our proposed method.

Description

Thesis (Ph.D.)--University of Washington, 2023

Citation

DOI

Collections