Applications of Machine Learning in The Optimization of Genetically Encoded Optogenetic Sensors
Abstract
Naturally occurring proteins provide a wealth of opportunities as tools in research, industry, and medicine. However, native proteins are rarely well suited for usage outside their biological setting. Therefore, the protein's functional ability and stability must be optimized by mutating its amino acid sequence. This challenge is complicated by the vastness of each protein's mutation space, where mutants containing desired biophysical characteristics are rare and become more difficult to find as more specifications are required. Traditional engineering techniques, such as point-mutation screening, compound this issue by being time- and resource-intensive. Here, we present an alternative approach that harnesses machine learning models to learn from sequence-to-function libraries and screen untested mutants computationally. To showcase this technique, we identified variants of the genetically encoded calcium sensor, GCaMP, that improved the fluorescent response by 5-fold (eGCaMP2+) and increased the decay speed by 3-fold (eGCaMP). To further demonstrate the capabilities of our machine learning platform, we utilized the same approach to engineer the functional capabilities of the red-shifted calcium indicator jRCaMP1b. Our study indicates that machine learning can efficiently learn from complex mutational datasets and harness their predictive power to guide the engineering of functional proteins. This methodology is poised to shift the protein engineering landscape by providing alternative methods to rapidly engineer proteins for desired characteristics.
Description
Thesis (Ph.D.)--University of Washington, 2025
