Testing for a difference in means after selection

Chen, Yiqun

Testing for a difference in means after selection

Files

Chen_washington_0250E_24634.pdf (15.98 MB)

Date

2022-09-23

Authors

Chen, Yiqun

Abstract

In modern data analysis, we often want to test for a difference in means between groups selected based on the observed data. This is a challenging task: when the null hypothesis is selected based on the data, classical tests (e.g., a z-test) that do not account for this will fail to control the Type I error. In this dissertation, we leverage the selective inference framework to develop valid tests for a difference in means when the groups under investigation are selected based on the output of a statistical learning method. We first consider the task of quantifying the uncertainty of spikes estimated from calcium imaging data. Here, the scientific question can be cast as a test of equality of (weighted) means between groups that are defined through a changepoint detection algorithm. Next, we describe a new test of a null hypothesis that is selected based on the output of the graph fused lasso. Our proposal conditions on less information than existing approaches, thereby leading to higher power while guaranteeing Type I error control. The final chapter is motivated by statistical challenges that arise in single-cell transcriptomics studies, where researchers are interested in ascertaining whether the estimated clusters are truly different from each other. We develop a finite-sample, correctly-sized test for a difference in cluster means when the clusters are obtained via k-means clustering, and demonstrate that our method leads to conclusions that align better with the underlying truth than classical tests.