Statistics
Browse by
Recent Submissions

Bayesian Methods for Inferring Gene Regulatory Networks
The recent explosion in the availability of gene expression data has opened up new possibilities in advancing our understanding of the fundamental processes of life. To keep up with the increasing size of the datasets, new ... 
Finite Sampling Exponential Bounds
This dissertation develops new exponential bounds for the tail of the hypergeometric distribution. It is organized as follows. In Chapter 1, it reviews existing exponential bounds used to control the hypergeometric tail. ... 
Finite Population Inference for Causal Parameters
Randomized experiments are often employed to determine whether a treatment X has a causal effect on an outcome Y. Under the NeymanRubin causal model with binary X and Y, each patient is characterized by two binary potential ... 
LikelihoodBased Inference for Partially Observed MultiType Markov Branching Processes
Markov branching processes are a class of continuoustime Markov chains (CTMCs) frequently used in stochastic modeling with ubiquitous applications. Bivariate or multitype processes are necessary to model phenomena such ... 
SpaceTime Smoothing Models for Surveillance and Complex Survey Data
Area and timespecific estimates of disease rates, causespecific mortality rates and other key health indicators are of great interest for health care and policy purposes. Such estimates provide the information needed to ... 
Testing Independence in High Dimensions & Identifiability of Graphical Models
In this thesis two problems in multivariate statistics will be studied. In the first chaper, we treat the problem of testing independence between m continuous observations when m can be larger than the available sample ... 
Statistical Hurdle Models for Single Cell Gene Expression: Differential Expression and Graphical Modeling
This dissertation describes a set of statistical methods developed for analysis of single cell gene expression. A characteristic of single cell expression is bimodal expression, in which two clusters of expression are ... 
Bayesian Modeling of a High Resolution Housing Price Index
Understanding how housing values evolve over time is important to consumers, real estate professionals, and policy makers. Existing methods for constructing housing indices are computed at a coarse spatial granularity, ... 
Phylogenetic Stochastic Mapping
Phylogenetic stochastic mapping is a method for reconstructing the history of trait changes on a phylogenetic tree relating species/organisms carrying the trait. Stateoftheart methods assume that the trait evolves ... 
Degeneracy, Duration, and Coevolution: Extending Exponential Random Graph Models (ERGM) for Social Network Analysis
We address three aspects of statistical methodology in the application of Exponential family Random Graphs to modeling social network processes. The first is the topic of model degeneracy in ERGMs. We show this is a ... 
Lord's Paradox and Targeted Interventions: The Case of Special Education
Lord (1967) describes a hypothetical “paradox” in which two statisticians, analyzing the same dataset using different but defensible methods, come to very different conclusions about the effects of an intervention on student ... 
The Likelihood Pivot: Performing Inference with Confidence
Maximum likelihood estimation is a popular statistical method. To account for possible model misspecification, the sandwich estimate of variance can be used to generate asymptotically correct confidence intervals. Several ... 
Theory and Methods for Tensor Data
We present novel methods and new theory in the statistical analysis of tensorvalued data. A tensor is a multidimensional array. When data come in the form of a tensor, special methods and models are required to capture ... 
DiscreteTime Threshold Regression for Survival Data with TimeDependent Covariates
A natural approach to survival analysis in many settings is to model the subject's ``health'' status as a latent stochastic process, where the terminal event is represented by the first time that the process crosses a ... 
Rsquared inference under nonnormal error
Assessment of the relationship between diet and health status, especially association between diet and chronic disease risk, has attracted lot of research interest in statistical and epidemiologic studies. However, due to ... 
Gravimetric Anomaly Detection using Compressed Sensing
We address the problem of identifying underground anomalies (e.g. holes) based on gravity measurements. This is a theoretically wellstudied yet difficult problem. In all except a few special cases, the inverse problem has ... 
Rsquared inference under nonnormal error
Assessment of the relationship between diet and health status, especially association between diet and chronic disease risk, has attracted lot of research interest in statistical and epidemiologic studies. However, due to ... 
Probabilistic Population Projection for Countries with Generalized HIV/AIDS Epidemics
Population projection has long been an issue for researchers, governments and international organizations so that they can monitor and plan development and resources. The United Nation Population Division (UNPD) publishes ... 
Functional Quantitative Genetics and the Missing Heritability Problem
In classical quantitative genetics, the correlation between the phenotypes of individuals with unknown genotypes and a known pedigree relationship is expressed in terms of probabilities of IBD states. In existing models ... 
Monte Carlo estimation of identity by descent in populations
Genetic similarity between organisms arises from segments of shared genome, which are said to be identical by descent (IBD). Modeling IBD in pedigrees forms the basis of classical linkage analysis and has been a fruitful ...