Analyzing small molecule inhibition of enzymes: A preliminary machine learning approach towards drug lead generation
Loading...
Date
Authors
Philip, Pearl
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This project is designed to create an implementation of quantitative structure-activity relationships (QSAR) models in Python for the prediction of inhibitory action of small-molecule drugs on the enzyme USP1 - an enzyme essential to DNA-repair in proliferating cancer cells. Molecular descriptors are calculated using PyChem and employed to characterize the properties of about 400,000 drug-like compounds from a high-throughput screening assay made available on PubChem. Multiple machine learning models are created on the training data using Scikit-learn and Theano after feature selection and processing, followed by a genetic algorithm to synthesize an ideal enzyme inhibitor to be tested for activity and use as a drug compound. Higher error and poorer model fits can be attributed to multiple sources of error – measurement of activity using AC50, imbalanced dataset in favor of molecules with zero inhibition, incomplete feature space, highly non-linear interactions between the enzyme and drug, and the attainment of local minima in hyperparameter optimization. Solutions have been suggested for each of these issues, and is proposed as a part of future work. The genetic algorithm is used to synthesize a molecule in-silico and as the model prediction accuracy is increased, it can be pursued as a drug lead in clinical trials. This project provides a promising pipeline for future work in open-source molecular drug design and can be extended for use with other datasets and target species.
Description
Thesis (Master's)--University of Washington, 2017-06
