Statistical Issues in Microbiome Data Analysis: Batch Effects and Multi-Omics Analysis

Fu, WeijiaStatistical Issues in Microbiome Data Analysis: Batch Effects and Multi-Omics AnalysisMy University2019Association TestBatch EffectsMicrobiomeMulti-Omics AnalysisBiostatisticsBiostatisticsMy UniversityMy UniversityWu, Michael2019-08-142019-08-142019en-USThesisFu_washington_0250O_20443.pdfhttp://hdl.handle.net/1773/44069application/pdfnoneThesis (Master's)--University of Washington, 2019Progress in high throughput sequencing has facilitated the conduct of large scale microbiome profiling studies which have already begun to elucidate the role of microbes in many disorders and clinical outcomes. Despite the many successes, statistical analysis of data from these studies continues to pose a challenge. In the thesis, we proposed methods to study two specific challenges: batch effects and integrative analysis of microbiome and other omics data. Both issues are increasingly relevant problems. As studies get larger, batching becomes inevitable and integrative analysis is imperative for gaining clues as to the mechanisms underlying discovered associations. The thesis is composed of two projects. In the first project, we compared six existing batch correction methods for microarray data when applied to microbiome data. Two real microbiome data sets were used to evaluate the performance using data visualization and several evaluation metrics. Our results suggest that an empirical bayes approach (ComBat), when applied appropriately, can outperform other methods. In the second project, we proposed a robust microbiome regression-based kernel association test (MiRKAT-R) to screen a large number of genomic markers for association with microbiome profiles. This approach utilizes a recently developed robust kernel machine test. We further propose to incorporate an omnibus test that simultaneously considers different models so as to allow for different relationships between the individual markers and microbiome composition. Systematic simulations and applications to real data show that the MiRKAT-R improves both type I error control and power.