Discovering Interactions in Multivariate Time Series
MetadataShow full item record
In large collections of multivariate time series it is of interest to determine interactions between each pair of time series. Classically, interactions between time series have been studied using linear vector autoregressive models. However, new methodology must be developed to determine time series interactions in settings that depart from the classical stationary linear model. For example, many time series interactions may be non-linear or non-stationary. Some time series datasets also undergo subsampling or mixed-frequency sampling, so that classical methods cannot be directly applied. Furthermore, many collections of time series are not real valued, but may consist of categorical or event time data. In this thesis we develop methodology for inferring time series interactions in five domains that demand methodology beyond the classical linear model for real valued, fully observed time series. First, we explore a Bayesian framework for inferring graphical models of time series. The goal is to determine conditional independence relations between entire time series, which for stationary series, are encoded by zeros in the inverse spectral density matrix. We place priors on (i) the graph structure and (ii) spectral matrices given the graph. We leverage a Whittle likelihood approximation and define a conjugate prior---the hyper complex inverse Wishart---on the complex-valued and graph-constrained spectral matrices. Due to conjugacy, we analytically marginalize the spectral matrices and obtain a closed-form marginal likelihood of the time series given a graph. Second, we take a regularized likelihood approach and formulate a convex estimation procedure for the multiple transition distribution (MTD) model of multivariate categorical time series. Traditionally, the MTD model is plagued by a nonconvex objective, non-identifiability, and presence of many local optima. Our new convex formulation facilitates the application of MTD to high-dimensional multivariate time series using convex penalties. Our formulation also allows identifiability conditions to be stated and imposed. We further derive a novel projected gradient algorithm for optimization. Third, we study identifiability and estimation of the structural vector autoregressive model under both subsampled and mixed frequency scenarios. We find that when the errors are non-Gaussian and independent, both the lagged linear effects and instantaneous causal effects are identifiable. This implies that the full directed acyclic graph structure of the dynamic causal model is identifiable under arbitrary subsampling and mixed frequencies. An expectation-maximization algorithm is developed for inference. Fourth, we develop two penalized neural network models based on a multilayer perceptron (MLP) network and a recurrent long-short term memory (LSTM) network able to detect nonlinear Granger causality. In both cases, we add group or hierarchical group lasso penalties to the outgoing weights of an input, shrinking all weights of an input time series to zero when there is no Granger causality between two series. We find that both MLP and LSTM models give state-of-the-art performance for detecting Granger causal connections in the genomics DREAM challenge. Finally, we develop an efficient linear time alternating direction method of multipliers algorithm to segment locally stationary multivariate time series. The efficiency of our algorithm relies on recasting the global problem of the algorithm in a state space form allowing the use of a fast Kalman filter-smoother algorithm for optimization. Taken together, these projects provide new methodology for inferring interactions in multivariate time series across data types, sampling regimes, and model classes.
- Statistics