Rotationally equivariant learning of generalizable protein structure-to-function maps

Pun, Michael Neal

Rotationally equivariant learning of generalizable protein structure-to-function maps

Files

Pun_washington_0250E_25773.pdf (26.45 MB)

Date

2023-08-14

Authors

Pun, Michael Neal

Abstract

Proteins play a central role in biology from immune recognition to brain activity. Although major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from structure remains a major challenge. While the challenge of data availability has recently been alleviated due to computational structure prediction methods, the three-dimensional nature of protein structures complicates the application of traditional machine learning methods. Geometric deep learning offers a principled framework for efficiently extracting information from data which naturally respect physical symmetries. These symmetry-aware models have been shown to outperform and generalize better than non-geometric models. The goal of this thesis is to develop a minimal rotationally equivariant model to analyze local protein structures and systematically test its ability to generalize to relevant tasks in protein science. Here we develop Holographic Convolutional Neural Network (H-CNN), a rotationally equivariant neural network for predicting amino acid propensity based on local atomic micro-environments. We show that H-CNN’s predictions quantitatively reflect the physical and chemical nature of amino acids leading to interpretation of H-CNN as an effective potential for amino acids. Subsequently, we use this interpretation to demonstrate H-CNN’s generalizability in the zero-shot prediction of experimentally measured free energies of protein stability and binding. Finally, we apply H-CNN to the problem of determining T Cell Receptor (TCR) specificity by attempting to classify, predict, and design peptides that bind to given TCRs.