Latent Variable Models for Indirectly or Imprecisely Measured Networks
MetadataShow full item record
In many scientific settings, networks are important structures used to represent the relationships between actors in a population of study. The most common methods for measuring networks are to survey study participants about who their connections are and to collect interaction activity between pairs of actors. However, directly measuring the exact network of interest can be challenging. In the context of surveys, participants do not always provide accurate accounts of their connections, which can result in mismeasurement of the network. In context of logged activity data, interactions do not directly quantify relationships between individuals or the propensity to interact in the future. In this thesis, we broadly conceptualize observed data from either source as manifestations from a latent network of interest, and we seek to use the former to infer the structure of the latter. In Chapter 2, we demonstrate how using mismeasured network data can affect subsequent inference, specifically in the setting of experiments on networks. In these experiments, individuals are not only influenced by their own treatment assignments, but also by those of their peers; these indirect treatment effects are often of direct scientific interest. In order to measure these indirect effects, researchers typically collect network data by surveying subjects about their connections. However, both survey design decisions and misreporting can lead to an observed network with mismeasurement. We show that mismeasured connections can in turn bias existing estimators of treatment effects, but this bias can be attenuated by explicitly accounting for (via a mixture model) the relationship between the observed, mismeasured network and the latent network of interest. An alternate source of network data to surveys are relational event data, consisting of interactions between pairs of actors over time. Typically recorded using automated data-gathering technology, relational event data can potentially sidestep design and misreporting issues more common in survey data but present their own additional modelling challenges. These events are typically measured in continuous-time and do not directly quantify relationships between actors, preventing their direct use in inference problems such as the experimental setting considered in Chapter 2. We propose a continuous-time point-process model for inferring a network of social relations from interaction data in Chapter 3. We allow the propensity for interactions to depend on time and covariates, in addition to the dynamic latent network, thus decoupling observed interaction counts from relational strength. In Chapter 4, we address another issue with modeling relational event data: the potentially large scale of the networks on which the data is collected. As data-gathering technology becomes more ubiquitous, relational event data is able to measure activity on networks of a much larger size than accessible via survey sources. Estimation for many existing models becomes computationally prohibitive on these networks. Focusing on a dynamic latent factor model, we embed a variational Bayesian approach within an online estimation scheme in order to model activity on a network with tens of thousands of nodes.
- Statistics