Machine learning methods for biological hypothesis generation, facilitating new discoveries at lower costs

dc.contributor.advisorWang, Sheng
dc.contributor.authorWoicik, Adelaide Woods Chambers
dc.date.accessioned2024-09-09T23:06:28Z
dc.date.available2024-09-09T23:06:28Z
dc.date.issued2024-09-09
dc.date.submitted2024
dc.descriptionThesis (Ph.D.)--University of Washington, 2024
dc.description.abstractMachine learning methods for biological data have become increasingly popular in recent years, acknowledging the transformative applications, complex patterns, and latent variation underlying biological systems. Importantly, many biological measurements are very expensive to produce experimentally. This poses challenges for biological discovery, limiting the number of experiments that can practically be conducted, and for data-hungry machine learning methods, which may require massive datasets that are not publicly available. One approach to these challenges is computational simulation with generative machine learning models, leveraging available high-quality data from heterogeneous sources to synthesize additional datapoints for subsequent analyses, which can help propose novel and prioritize existing biological hypotheses that can subsequently be tested in an experimental lab. In this thesis, I present three methods for high-quality in silico data generation across three biological domains: genomic time series extrapolation with Sagittarius, high-resolution dense chromatin contact map generation with Capricorn, and approximately-automatically-curated gene network generation using augmented network integration with Gemini. These diverse applications focus on high-cost experimental data, highlighting the immense value of computational datapoint simulation, and heterogeneous biological measurements, requiring methods that account for the diverse inputs and leverage all sources of information to improve the generation process. Finally, I connect each model back to its practical applications in biology, ranging from assisting biological experts in their current work to novel hypothesis generation.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherWoicik_washington_0250E_26762.pdf
dc.identifier.urihttps://hdl.handle.net/1773/51875
dc.language.isoen_US
dc.rightsCC BY
dc.subjectChromatin structure
dc.subjectComputational biology
dc.subjectMachine learning
dc.subjectNetwork integration
dc.subjectTranscriptomic time series
dc.subjectComputer science
dc.subject.otherComputer science and engineering
dc.titleMachine learning methods for biological hypothesis generation, facilitating new discoveries at lower costs
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Woicik_washington_0250E_26762.pdf
Size:
54.69 MB
Format:
Adobe Portable Document Format