Modeling Heterogeneity within and between Matrices and Arrays
Abstract
Datasets in the form of matrices and arrays arise frequently in the social and biological sciences and are characterized by measurements indexed by two or more factors. In this dissertation we address two problems relating to these datasets. In the case of a single observed array Y , the primary goal in an analysis is often to decompose the array into Y=M+E, where M=f(X,β) represents a function of covariates X and unknown parameter β and E represents an array of random noise. Typically the errors E are assumed to be independent or dependent along at most one of the array dimensions. Failing to account for other dependencies can lead to inefficient estimates of β, inaccurate standard errors and poor predictions. An alternative to assuming independent errors is to allow for dependence along each dimension of the array using a separable covariance model. However, for many arrays maximum likelihood estimates of the covariance matrices in this model do not exist. We propose a submodel of the separable covariance model that restricts some of the covariance matrices to have factor analytic structure; this model can be viewed as extension of factor analysis to array-valued data. The second problem we address is specific to matrices that contain network data where the row and column index sets represent a set of actors. Frequently the objective in network analysis is to determine whether dependencies exist between a matrix of network relations and a matrix of actor-specific attributes. Approaches to this problem often condition on either the relations or attributes, require specification of the exact nature of the association between the network and attributes, and are unable to provide predictions simultaneously for missing attribute and network information. We propose methodology for a unified approach to analysis that allows for testing for dependencies between the relations and attributes, and in the event the test concludes such structure exists, jointly modeling the relations and attributes to conduct inference and make predictions for missing values. We investigate Bayesian estimation procedures for a general class of relational data models, significantly improve the efficiency of a Markov chain Monte Carlo algorithm, and illustrate the inadequacies of a mean-field variational approach for this model class.
Collections
- Statistics [108]