Interpretation and Optimization of Recurrent Neural Network Performance Through Lyapunov Exponents Methodology

Shlizerman, EliVogt, Ryan Andrew2023-08-142023-08-142023-08-142023Vogt_washington_0250E_25670.pdfhttp://hdl.handle.net/1773/50205Thesis (Ph.D.)--University of Washington, 2023Common deep learning models for learning multivariate time series data are Recurrent Neural Networks (RNN). These models are ubiquitous computing systems which have been studied for decades. The propagation of gradients over long time-sequences can make training RNNs particularly challenging and difficult to interpret. The hidden states of RNNs can be viewed as non-autonomous dynamical systems which can be analyzed using dynamical systems tools. In this work, we leverage Lyapunov Exponents, a dynamical systems tool which measures the rate at which nearby trajectories expand or contract over time to analyze the propagation of information in RNNs and relate these properties to RNN training and performance. We show that several statistics of the Lyapunov spectrum have moderate correlation with network loss on both classification and regression tasks, and emerge early in training. We also train an autoencoder to learn the relation between the full Lyapunov spectrum and an RNN's loss on given tasks. The latent representation of the autoencoder distinguishes between high- and low-accuracy networks across a variety of network hyperparameters, including initialization parameter, network size, and network architecture more effectively than direct statistics of the Lyapunov spectrum. From a theoretical perspective to further analyze Lyapunov Exponents of RNNs, we derive a direct expression for gradient in terms of the components of RNNs' Lyapunov Exponents which measure directions (vectors Q) and factors (scalars R) of expansion and contraction over a sequence. We find that the Q vectors associated with the greatest degree of expansion become increasingly aligned with the dominant directions of the gradient extracted by singular value decomposition. Furthermore, we show that the predictions generated by RNN are maximally affected by input perturbations at moments which the R values are maximal. These results showcase correlation between dynamical systems stability theory for RNNs, network performance, and loss gradients. This may open the way to design hyperparameter optimization algorithms and adaptive training methods that account for state-space dynamics as measured by Lyapunov Exponents to improve computations. It may also provide a unifying dynamical systems framework to study RNN performance across network architectures and tasks.application/pdfen-USCC BYArtificial IntelligenceDynamical SystemsMachine LearningRecurrent Neural NetworksMathematicsComputer scienceApplied mathematicsInterpretation and Optimization of Recurrent Neural Network Performance Through Lyapunov Exponents MethodologyThesis