Statistical Methods for Sparse Binary, Count Data and Treatment Effect Heterogeneity

dc.contributor.advisorChan, Kwun Chuen Gary
dc.contributor.authorXie, Yuxiang
dc.date.accessioned2019-08-14T22:29:54Z
dc.date.issued2019-08-14
dc.date.submitted2019
dc.descriptionThesis (Ph.D.)--University of Washington, 2019
dc.description.abstractThe concept of `sparsity' is common to see in many topics of statistics. `Sparsity' is a double-edged sword, depending on the statistical context. Sometimes, sparsity brings convenience; for example, a sparse statistical model is one having only a small number of nonzero parameters, which is easier to interpret than a dense model. On the other hand, sparsity may cause troubles; for example, a sparse sequencing read count table contains excessive zeros due to the issue that many rare bacterial taxa are not captured in the sequencing reads, and this sparsity may lead to inaccurate estimates of bacterial abundances. This dissertation focuses on developing statistical methodologies for dealing with sparsity problems in three different statistical topics. We first present a false discovery rate (FDR) controlled variable selection method for a sparse model with binary covariates. We show that our proposal controls FDR under a pre-specied level in a finite sample and achieves asymptotic power equal to one under some mild assumptions. Next, we consider a sparse generalized linear model for studying treatment effect heterogeneity, and we propose two statistical frameworks that can detect factors contributing to heterogeneous treatment effect, and simultaneously control FDR. Finally, we develop a statistical method based on non-negative matrix factorization (NMF) for estimating bacterial compositions from sparse count data in microbiome studies. We establish upper bounds of estimation error for our NMF estimators and show in simulation studies that our proposal outperforms some existing methods in various settings. We also demonstrate the interpretability of our model in a real data application.
dc.embargo.lift2020-08-13T22:29:54Z
dc.embargo.termsRestrict to UW for 1 year -- then make Open Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherXie_washington_0250E_20176.pdf
dc.identifier.urihttp://hdl.handle.net/1773/44067
dc.language.isoen_US
dc.rightsnone
dc.subject
dc.subjectBiostatistics
dc.subject.otherBiostatistics
dc.titleStatistical Methods for Sparse Binary, Count Data and Treatment Effect Heterogeneity
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Xie_washington_0250E_20176.pdf
Size:
1.94 MB
Format:
Adobe Portable Document Format

Collections