Rigorous and flexible statistical tests for correlations between stationary or nonstationary time series

Loading...
Thumbnail Image

Authors

Yuan, Alexander

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

A growing body of life-sciences research seeks to infer correlational and causal relationships from time series. Yet, these analyses can encounter challenges in practice, and this dissertation focuses on three such challenges: First, many popular statistical approaches for correlational and causal analysis of time series come with assumptions and caveats that are easy to overlook. Second, time series typically exhibit autocorrelation, which violates the fundamental assumptions of many standard statistical tests. Third, researchers are increasingly using correlation statistics that lack a known analytical null distribution and which therefore lack a standard parametric test, thus requiring the development of appropriate nonparametric methods. To address the first challenge (overlooked assumptions), Part I uses a multimedia strategy (including video) to illustrate key concepts and caveats of three popular statistical approaches that are used to make causal claims. Although primarily an interdisciplinary synthesis, Part I also describes some novel pathologies of existing methods. The later parts report methodological advances targeted at the second (autocorrelation) and third (nonparametric correlation) challenges outlined above. Both parts describe new tests that are statistically valid (meaning that they can guarantee a false positive rate that does not surpass a user-defined “significance level”). Part II reports a significance test that is applicable to any pairwise correlation statistic and which is valid as long as one of the two time series is stationary (i.e. behavior does not change systematically over time). As a demonstration, the test is used to detect known statistical dependence relationships in disciplines from microbiome science to climatology. Part III tackles the difficult setting of nonstationary time series, for which multiple biological replicates are often needed. In this context, I describe a valid significance test for correlation between time series that enables detections with higher confidence and fewer replicates than similar approaches. The test is used to verify the previously observed relationship between swimming speed and directional alignment in zebrafish, using a publicly available data set with only 3 replicates of likely-nonstationary time series. These efforts seek to empower scientists to use current and future data-analytic approaches without sacrificing the benefits of statistical rigor.

Description

Thesis (Ph.D.)--University of Washington, 2023

Citation

DOI