Statistical Methods for Sparse Binary, Count Data and Treatment Effect Heterogeneity
Loading...
Date
Authors
Xie, Yuxiang
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The concept of `sparsity' is common to see in many topics of statistics. `Sparsity' is a double-edged sword, depending on the statistical context. Sometimes, sparsity brings convenience; for example, a sparse statistical model is one having only a small number of nonzero parameters, which is easier to interpret than a dense model. On the other hand, sparsity may cause troubles; for example, a sparse sequencing read count table contains excessive zeros due to the issue that many rare bacterial taxa are not captured in the sequencing reads, and this sparsity may lead to inaccurate estimates of bacterial abundances. This dissertation focuses on developing statistical methodologies for dealing with sparsity problems in three different statistical topics. We first present a false discovery rate (FDR) controlled variable selection method for a sparse model with binary covariates. We show that our proposal controls FDR under a pre-specied level in a finite sample and achieves asymptotic power equal to one under some mild assumptions. Next, we consider a sparse generalized linear model for studying treatment effect heterogeneity, and we propose two statistical frameworks that can detect factors contributing to heterogeneous treatment effect, and simultaneously control FDR. Finally, we develop a statistical method based on non-negative matrix factorization (NMF) for estimating bacterial compositions from sparse count data in microbiome studies. We establish upper bounds of estimation error for our NMF estimators and show in simulation studies that our proposal outperforms some existing methods in various settings. We also demonstrate the interpretability of our model in a real data application.
Description
Thesis (Ph.D.)--University of Washington, 2019
