Rigorous and flexible statistical tests for correlations between stationary or nonstationary time series

dc.contributor.advisorShou, Wenying
dc.contributor.authorYuan, Alexander
dc.date.accessioned2023-04-17T18:04:36Z
dc.date.issued2023-04-17
dc.date.submitted2023
dc.descriptionThesis (Ph.D.)--University of Washington, 2023
dc.description.abstractA growing body of life-sciences research seeks to infer correlational and causal relationships from time series. Yet, these analyses can encounter challenges in practice, and this dissertation focuses on three such challenges: First, many popular statistical approaches for correlational and causal analysis of time series come with assumptions and caveats that are easy to overlook. Second, time series typically exhibit autocorrelation, which violates the fundamental assumptions of many standard statistical tests. Third, researchers are increasingly using correlation statistics that lack a known analytical null distribution and which therefore lack a standard parametric test, thus requiring the development of appropriate nonparametric methods. To address the first challenge (overlooked assumptions), Part I uses a multimedia strategy (including video) to illustrate key concepts and caveats of three popular statistical approaches that are used to make causal claims. Although primarily an interdisciplinary synthesis, Part I also describes some novel pathologies of existing methods. The later parts report methodological advances targeted at the second (autocorrelation) and third (nonparametric correlation) challenges outlined above. Both parts describe new tests that are statistically valid (meaning that they can guarantee a false positive rate that does not surpass a user-defined “significance level”). Part II reports a significance test that is applicable to any pairwise correlation statistic and which is valid as long as one of the two time series is stationary (i.e. behavior does not change systematically over time). As a demonstration, the test is used to detect known statistical dependence relationships in disciplines from microbiome science to climatology. Part III tackles the difficult setting of nonstationary time series, for which multiple biological replicates are often needed. In this context, I describe a valid significance test for correlation between time series that enables detections with higher confidence and fewer replicates than similar approaches. The test is used to verify the previously observed relationship between swimming speed and directional alignment in zebrafish, using a publicly available data set with only 3 replicates of likely-nonstationary time series. These efforts seek to empower scientists to use current and future data-analytic approaches without sacrificing the benefits of statistical rigor.
dc.embargo.lift2024-04-16T18:04:36Z
dc.embargo.termsRestrict to UW for 1 year -- then make Open Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherYuan_washington_0250E_25255.pdf
dc.identifier.urihttp://hdl.handle.net/1773/49947
dc.language.isoen_US
dc.relation.haspartPart_II_supplementary_data.zip; spreadsheet; Part II supplementary data files.
dc.rightsCC BY-SA
dc.subject
dc.subjectBioinformatics
dc.subjectEcology
dc.subject.otherMolecular and cellular biology
dc.titleRigorous and flexible statistical tests for correlations between stationary or nonstationary time series
dc.typeThesis

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Yuan_washington_0250E_25255.pdf
Size:
22.74 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
Part_II_supplementary_data.zip
Size:
62.8 KB
Format:
Unknown data format