Machine Learning-Based Determination of Protein Secondary Structures

Loading...
Thumbnail Image

Authors

Lee, Huan-Jui

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The main purpose of this thesis is to construct a machine learning model that yields protein secondary structures from sequences and circular dichroism (CD) spectra, and test the contribution of each part. This effort is motivated by the desire to reduce the costs and time involved in state-of-the-art approaches, which involve elaborate instrumentation, such as nuclear magnetic resonance (NMR) and X-ray powder diffraction (XRD). Conformational analysis based on current experimental methods require preparations and analytical processes that are often hampered by sample impurities and aging, and, limitations originating from crystal cultures. A well-developed machine learning algorithm, based on existing conformational data provides an easier and also faster way to predict unknown conformations of proteins. In the research here, we make use of CD spectra and improvement of machine learning model. The algorithm used in this thesis is based on Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN), we analyzed the performance of single model and stacked model. The result indicates that stacked model and CD spectra can help us to improve the accuracy of prediction.

Description

Thesis (Master's)--University of Washington, 2021

Citation

DOI