Statistics
Browse by
Recent Submissions
-
Statistical methods for genomic sequencing data
Genomic sequencing data has revolutionized our understanding of the genetic basis of biological processes. The cost of sequencing the first human genome was estimated to be greater than 50 million dollars. However, with ... -
Exponential Family Models for Rich Preference Ranking Data
Preferences can be found in a wide array of contexts, from recommender systems, to opinion polls, consumer habits, and elections. The specific method of data collection, and the types of data collected can greatly vary the ... -
Bayesian methods for variable selection
Choosing a statistical model and accounting for uncertainty about this choice are important parts of the scientific process and are required for common statistical tasks such as parameter estimation, interval estimation, ... -
Estimation and Inference for Network Data
Networks play a key role in many scientific domains. In this thesis, we analyze several important questions in network analysis. The first question we analyze concerns how to understand latent structure in networks. ... -
Methods for the Statistical Analysis of Preferences, with Applications to Social Science Data
Preference data, such as rankings and ratings, are prevalent in the social sciences for expressing and measuring attitudes or opinions. Oftentimes, deterministic algorithms or summary statistics are used to aggregate ... -
Addressing double dipping through selective inference and data thinning
While classical statistical methods assume that we only ever test pre-specified hypotheses about pre-specified models, the reality is that scientists often explore their data before coming up with models and hypotheses of ... -
Mixture models to fit heavy-tailed, heterogeneous or sparse data
With the advent of modern technologies, many scientific fields collect and analyze increasingly large datasets. Unfortunately, the complexity and heterogeneity of these datasets cannot be properly captured through classical ... -
Estimating subnational health and demographic indicators using complex survey data
Subnational estimates of health and demographic indicators such as immunization coverage rates and child mortality rates are critical for identifying regional health disparities and guiding policy design. When population ... -
Interpretation and Validation for Unsupervised Learning
This thesis studies two major problems in unsupervised learning: manifold learning and clustering. The motivation of this research is to establish mathematically rigorous methods that enable practitioners to have better ... -
Likelihood-based haplotype frequency modeling using variable-order Markov chains
The localized haplotype-cluster model uses variable-order Markov chains (VOMCs) to create an empirical model for haplotype probabilities that adapts to the changing structure of linkage disequilibrium (LD) across the genome. ... -
Statistical Divergences for Learning and Inference: Limit Laws and Non-Asymptotic Bounds
Statistical divergences have been widely used in statistics and artificial intelligence to measure the dissimilarity between probability distributions. The applications range from generative modeling to statistical inference. ... -
Methods, Models, and Interpretations for Spatial-Temporal Public Health Applications
Improving the health of communities and individuals around the world is one of the great challenges of this densely connected global era which finds itself rife with disparity. In order to make the best use of our limited ... -
Causal Structure Learning in High Dimensions
Directed graphical models are commonly used to model causal relations between random variables and to understand conditional independencies in their joint distributions. We focus on the crucial task of structure learning, ... -
Statistical Methods for Clustering and High Dimensional Time Series Analysis
This dissertation mainly explores two statistical tasks, namely clustering and analysis of high-dimensional time series. Clustering, a very important unsupervised learning problem, studies the structure of unlabeled datasets. ... -
Missing Data Methods for Observational Health Dataset
This dissertation is motivated by missing data problems arising from two observational health datasets. The first dataset is created by the SWOG study that linked medicare claims to a prostate cancer prevention trial ... -
Geometric algorithms for interpretable manifold learning
This thesis proposes several algorithms in the area of interpretable unsupervised learning.Chapters 3 and 4 introduce a sparse convex regression approach for identifying local diffeomor- phisms from a dictionary of ... -
Improving Uncertainty Quantification and Visualization for Spatiotemporal Earthquake Rate Models for the Pacific Northwest
The Pacific Northwest (PNW) has substantial earthquake risk, both due to the offshore Cascadia megathrust fault but also other fault systems that produce earthquakes under the region's population centers. Forecasts of ... -
Statistical Modeling of Long Memory and Uncontrolled Effects in Neural Recordings
Scientific analyses of time series data are often formalized as statistical investigations targeting one or more aspects of a complex underlying dependence structure. In the multivariate time series setting, there are three ... -
Statistical analysis of low-frequency earthquake catalogs
Low-frequency earthquakes (LFEs) are small magnitude (less than 2) earthquakes, with reduced amplitudes at frequencies greater than 10 Hz relative to ordinary small earthquakes. They are usually grouped into families of ... -
Causality, Fairness, and Information in Peer Review
In this dissertation, I study peer review---the process by which scientists evaluate one another's work for publication or funding---through three distinct but related lenses. I focus on multi-step grant proposal peer ...