Graph-Structured Random Hermitian Matrices to Model Electron Dynamics During Protein Folding

Loading...
Thumbnail Image

Authors

Rath, Siddharth

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Random Matrix Theory is a well-known method in physics and mathematics to model quantum chaos by studying the statistics of energy states of both fermionic and nucleonic systems on the ergodic timescale. Additionally, recent advancements established Wigner’s Surmise, Gaussian Orthogonal Ensemble symmetry and Quantum Unique Ergodicity in random band-matrices that model electrons interacting with one another along 1 and 2-dimensions; and associated the width of the band around the leading diagonal with the non-locality of the eigenvectors, thereby inferring the conductivity of the system under study. Quantum chaos has so far been modeled by random matrix theory in up to 2 dimensions, and exact or numerical solutions to dimensions ≥ 3 do not exist.However, such methods have not yet been applied to the study of inter-electronic behavior during protein folding in 3 dimensions, where electronic interactions are critically important, causing a marked change in the conductivity of the protein as it folds. Protein folding is primarily entropy driven and involves weak hydrophilic-hydrophobic and Van der Waals interaction between residues and the backbone to create predominantly hydrogen bonds and other secondary bonds involving empty or half-empty π-orbitals. Additionally, the water molecules around the proteins are known to adopt structures that are more regular than bulk water, to the extent that most of the water in living cells is structured. However, the resulting loss of configurational entropy when proteins and water molecules around them adopt regular stable structures, is chalked up to vague entropic compensation terms without clarification on the types of entropy that could compensate for the apparent reduction in entropy. As a result, our understanding of how protein sequences adopt unique structures and how such structures directly map to the protein’s function is lacking. While a mostly classical description of protein folding is inadequate to address the underlying physical mechanisms of specificity during molecular recognition and information flow during signal transduction cascades, a quantum mechanical description is currently computationally prohibitive. There is also the issue of decoherence at the temperatures at which biomolecules operate, due to thermal noise. Several problems also persist when it comes to predicting how a multitude of biomolecules specifically interact to execute signaling cascades leading to cellular functions. To determine underlying physics, Physics Inspired Neural Networks, PINNs, have recently taken center-stage. However, the fact that intrinsically disordered regions (IDRs) of proteins are involved in specific binding events, makes a simple sequence-structure-function predictive PINN untenable. The dynamic structures of IDRs necessitate a variational approach to protein structure prediction. Here we use random graph-structured matrices to circumvent the < 3-dimension limit to model the electronic interaction network in a protein while it folds in an aqueous medium. Assuming a) n points in 3-dimensional finite cubic Euclidean space of volume l3 with coordinates (x, y, z) ∀ x, y, z ∼ U ({−(l/2), ..., (l/2)}) i.i.d., where U ({−(l/2), ..., (l/2)}) is the uniform distribution over the set of real numbers between [−l/2, l/2]; b) Threshold value ε ∈ R+ which is the radius of the sphere in the finite cubic Euclidean space, within which all points interact; Leading to c) An adjacency matrix Anà n(ε) : Aij (ε) = 0 ∀ Dij > ε and Aij (ε) = 1 ∀ Dij < ε where the pairwise Euclidean distance between the n points is given by the distance matrix Dnà n : Dij = √(xj − xi)2 + (yj − yi)2 + (zj − zi)2; and d) A random real valued square Hermitian matrix H : HT = H, we show that the continuous differential entropy, denoted as CDE(A(ε)), of the joint probability distribution of the nonzero elements of the leading diagonal and the lower triangle (or upper triangle) of the matrix H ◦ A(ε) (which directly represents the distribution of single point correlations of the eigenvalues of the matrix H ◦ A(ε)) increases as the number of nonzero elements in the lower (or upper) triangle increases as a result of increasing ε. We also show that if we sample many random Hermitian matrices H1, H2, ..., HN , N ∈ Z+, N > 40, and denote the ordered set of eigenvalues of each matrix Hk ◦ A(ε), k = 1, 2, ...N as λkH◦A(ε), then the standard deviation σsp(ε) of the distribution of spacing between any two consecutive eigenvalues from all the ordered eigenvalue sets λkH◦A(ε) first decreases then increases for increasing ε values. Moreover, the ratio between the ||L∞|| and the ||L2|| norm of the eigenvectors VkλH◦A(ε) corresponding to the eigenvalues λ ∈ λkH◦A(ε) approaches a lower bound of 1/√n as ε increases, indicating that the eigenvectors become more delocalized on the ergodic scale. Our results indicate that the CDE(A(ε)) and σsp(ε) as defined above can be used as metrics to optimize ε which is directly related to the number of nonzero elements in the lower (or upper) triangle of the adjacency matrix A. Our results also show that a) Reduction in configurational and conformational entropy during protein folding is associated with an increase in information entropy obtained from electron energy level statistics, and b) Reduction in uncertainty in electron energy levels is associated with delocalization length of electrons along secondary bonds. We also apply our numerical simulations to the case of protein folding where we use an all atom molecular dynamics simulation of a standard polyalanine peptide folding in water to show that the numerical results hold for the proteins as well, paving the way to use random graph structured matrices as suitable representations of biomolecules to study their structure function relationships in greater depth. Our results posit that the nature of information transfer and flow during signaling cascades could be the certainty in electron energies of participating molecules. We directly infer the nature of information flow during signal transduction cascades and postulate about the role of quantum phenomena and decoherence in biomolecular function. Such notions add fuel to the idea that quantum mechanical phenomena are intricately and non trivially involved in protein structure-function relationships, and bolster the conjecture that water might be a necessary condition for life on earth. The results show a mathematically accurate and physically relevant, first-principles based representation of proteins and other biomolecules that would be beneficial in machine learning approaches to predict secondary structures and in the nascent field of protein design for medical and technological applications. Therefore, we propose a future physics inspired variational autoencoder-generative adversarial network deep-learning model based on our results that could be applied as a more interpretable biomolecular structure and function predictor.

Description

Thesis (Ph.D.)--University of Washington, 2022

Citation

DOI