Analyzing small molecule inhibition of enzymes: A preliminary machine learning approach towards drug lead generation

Philip, Pearl

Analyzing small molecule inhibition of enzymes: A preliminary machine learning approach towards drug lead generation

Files

Philip_washington_0250O_17372.pdf (1.5 MB)

Date

2017-08-11

relationships.isAuthorOf

Philip, Pearl

Abstract

This project is designed to create an implementation of quantitative structure-activity relationships (QSAR) models in Python for the prediction of inhibitory action of small-molecule drugs on the enzyme USP1 - an enzyme essential to DNA-repair in proliferating cancer cells. Molecular descriptors are calculated using PyChem and employed to characterize the properties of about 400,000 drug-like compounds from a high-throughput screening assay made available on PubChem. Multiple machine learning models are created on the training data using Scikit-learn and Theano after feature selection and processing, followed by a genetic algorithm to synthesize an ideal enzyme inhibitor to be tested for activity and use as a drug compound. Higher error and poorer model fits can be attributed to multiple sources of error – measurement of activity using AC50, imbalanced dataset in favor of molecules with zero inhibition, incomplete feature space, highly non-linear interactions between the enzyme and drug, and the attainment of local minima in hyperparameter optimization. Solutions have been suggested for each of these issues, and is proposed as a part of future work. The genetic algorithm is used to synthesize a molecule in-silico and as the model prediction accuracy is increased, it can be pursued as a drug lead in clinical trials. This project provides a promising pipeline for future work in open-source molecular drug design and can be extended for use with other datasets and target species.