Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps

Loading...
Thumbnail Image

Authors

Moritz, Spencer A

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Understanding a protein’s structure can lead to the discovery of therapeutic protein drugs. However, imaging proteins remains a challenge due to their small size. Cryo-electron microscopy (cryo-EM) is a leading imaging technology that has recently been able to produce near atomic resolution images called electron density maps. However, predicting various protein structures remains a challenge on all but the most pristine density maps (< 2.5Å resolution). Here we introduce a deep learning model that uses a set of cascaded convolutional neural networks (CNNs) to predict the critical Cα atoms along a protein’s backbone structure. The cascaded-CNN (C-CNN) is a novel deep learning architecture comprised of multiple CNNs, each predicting a specific aspect of a protein’s structure. This model predicts various levels of protein structures: secondary structure elements (SSEs), backbone structure, and Cα atoms. It combines the results of each to produce a complete prediction map. The cascaded-CNN is a semantic segmentation image classifier and was trained using thousands of simulated density maps. This method is largely automatic and only requires a recommended threshold value for each evaluated protein. A specialized tabu-search path walking algorithm was used to produce an initial backbone trace with Cα placements. This method was tested on 50 experimental maps between 2.6Å and 4.4Å resolution. It outperformed several state-of-the-art prediction methods including RosettaES, MAINMAST, and a Phenix-based method by producing the most complete prediction models, as measured by percentage of found Cα atoms. This method accurately predicted 88.5% (mean) of the Cα atoms within 3Å of a protein’s backbone structure surpassing the 66.8% mark achieved by the leading alternate method (Phenix based fully automatic method) on the same set of density maps. The C-CNN also achieved an average RMSD of 1.23Å for all 50 experimental density maps which is similar to the Phenix based fully automatic method. The source code and demo of this research has been published at https://github.com/DrDongSi/Ca-Backbone-Prediction.

Description

Thesis (Master's)--University of Washington, 2019

Citation

DOI