Machine Learning-Based Determination of Protein Secondary Structures
Loading...
Date
Authors
Lee, Huan-Jui
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The main purpose of this thesis is to construct a machine learning model that yields protein secondary structures from sequences and circular dichroism (CD) spectra, and test the contribution of each part. This effort is motivated by the desire to reduce the costs and time involved in state-of-the-art approaches, which involve elaborate instrumentation, such as nuclear magnetic resonance (NMR) and X-ray powder diffraction (XRD). Conformational analysis based on current experimental methods require preparations and analytical processes that are often hampered by sample impurities and aging, and, limitations originating from crystal cultures. A well-developed machine learning algorithm, based on existing conformational data provides an easier and also faster way to predict unknown conformations of proteins. In the research here, we make use of CD spectra and improvement of machine learning model. The algorithm used in this thesis is based on Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN), we analyzed the performance of single model and stacked model. The result indicates that stacked model and CD spectra can help us to improve the accuracy of prediction.
Description
Thesis (Master's)--University of Washington, 2021
