Deep Learning for Validating Resolution and Detecting Secondary Structure Elements of Proteins in 3D Cryo-Electron Microscopy Images
Avramov, Todor Kirilov
MetadataShow full item record
Cryo-electron microscopy (cryo-EM) is becoming the imaging method of choice for determining protein structures. Many atomic structures have been resolved based on an exponentially growing number of published three-dimensional (3D) cryo-EM density maps. However, the resolution value claimed for the reconstructed 3D density map has been the topic of scientific debate for many years. The Fourier Shell Correlation (FSC) is the currently accepted cryo-EM resolution measure, but it can be subjective, manipulated, and has its own limitations. This thesis proposes supervised deep learning methods to extract representative 3D features at high, medium and low resolutions from simulated protein density maps and build classification models that objectively validate resolutions of experimental 3D cryo-EM maps. Specifically, classification models based on dense artificial neural network (DNN) and 3D convolutional neural network (3D CNN) architectures are presented. The trained models can classify a given 3D cryo-EM density map into one of three resolution levels: high, medium, low. The DNN model achieved 92.73% accuracy and the 3D CNN model achieved 99. 75% accuracy on simulated test maps. When tested on simulated maps at gradually varying resolutions, the two models identified the resolution boundaries between high, medium and low resolutions. The deep learning models clustered maps lower than 4-4.5Å in the high resolution class, maps between 5.0-8.5Å in the medium resolution, and maps at resolutions>=9.0Å were classified as low resolution. Applying the DNN and 3D CNN models to thirty experimental cryo-EM maps achieved an agreement of 60.0% and 56.7%, respectively, with the author published resolution value of the density maps. These results suggest that deep learning can potentially improve the resolution evaluation process of cryo-EM maps but further work is needed to account for local variability of resolution as suggested by recent studies. Detection of protein secondary structure elements (SSEs) to aid in the creation of accurate atomic models especially from medium resolution cryo-EM maps is another area of current research. Medium resolution experimental cryo-EM images lack detail and contain noise, and thus require additional computational and visualization analyses to fully determine protein structures. Most previous researches proposed prescriptive image-processing and pattern matching algorithms to locate α-helices and β-sheets in cryo-EM maps, but these methods were not fully automated and required subjective selection of parameters. This thesis explores a convolutional neural network model for end-to-end voxelwise segmentation of 3D cryo-EM density images. The 3D segmentation model, adapted from the U-Net architecture, was constructed in TensorFlow and it optimized a multi-class Tversky loss function with Adam optimization algorithm. The proposed 3D U-Net model was trained to segment a cryo-EM map by classifying each voxel as either being part of an α-helix, a β-sheet, a tum/loop, or background. For that purpose, I first introduce and describe a novel method to generate large amounts of labeled cryo-EM maps suitable for training deep learning models for secondary structure segmentation. The model achieved higher per-class and overall precision and recall rates than previous methods when tested on 3597 simulated cryo-EM density maps. The proposed method was also shown to reliably segment experimental cryo-EM maps. Finally, the 3D U-Net segmentation model was compiled into an executable program and integrated as a plug-in in the UCSF Chimera visualization and analysis system.