Parameter Identification and Assessment of Independence in Multivariate Statistical Modeling
MetadataShow full item record
We are interested in the extent to which, possibly causal, relationships can be statistically quantified from multivariate data obtained from a system of random variables. In the ideal setting, we would begin with refined knowledge of which variables in our system can causally impact one another and be in the position to perform randomized controlled experiments where any intervention is possible. Unfortunately this ideal is often unrealistic: in many important cases it is impossible to conduct an intervention, we cannot ethically ask a pregnant mother to start smoking or feasibly assign a country a new governmental system, and, additionally, a researcher may have little or no prior knowledge of how their system of variables interact. This dissertation studies two problems that arise as we depart from the experimental ideal. While scientists may not always be able to conduct a controlled experiment, thus only having observational data, they may they may be able to hypothesize or determine the directions in which causal relations point. For instance, ``mother smoking during first trimester of pregnancy'' may causally impact ``baby birth weight'' but, without time travel, certainly the reverse is impossible. Unfortunately causal relationships can be infeasible to estimate from observational data due to the presence of hidden confounding variables. In a more recent shift of paradigm, pioneered by researchers such as Judea Pearl, Jamie Robins, Don Rubin, and Peter Spirtes, causal knowledge is represented by a directed graph whose vertices are the variables in the system. These directed graphs have a corresponding mathematical formalism called structural causal models. We consider the setting of linear structural causal models, models in which causal effects are assumed to be linear. We present combinatorial criteria for determining whether or not, given a graph, the corresponding causal relationships can be consistently estimated from observational data in the presence of hidden confounding. In particular we define determinantal instrumental variables, a generalization of the well-known instrumental variables, which can be used to identify causal effects. Departing even further from the above ideal, a scientist may be in the exploratory stage of research and thus have little to no understanding of the causal or functional relationships in their data. In this case, a natural first question to ask is whether or not the observed variables are associated at all. That is, we would like to test whether or not the observed variables are independent. To this end, we develop a class of nonparametric measures of dependence which generalize many rank measures of association such as Kendall's tau, Spearman's rho, Hoeffding's D, and the more recently developed Bergsma--Dassios Sign Covariance tau*. This new class leads naturally to multivariate extensions of tau*. Our measures may be estimated unbiasedly using U-statistics, for which we prove results on computational efficiency and large-sample behavior. The algorithms we develop for their computation include, to the best of our knowledge, the first efficient algorithms for Hoeffding’s D statistic in the multivariate setting.
- Statistics