Content-based Similarity Search in DNA Data Storage Systems

dc.contributor.advisorCeze, Luis
dc.contributor.authorBee, Callista Lavender
dc.date.accessioned2021-03-19T22:53:42Z
dc.date.available2021-03-19T22:53:42Z
dc.date.issued2021-03-19
dc.date.submitted2020
dc.descriptionThesis (Ph.D.)--University of Washington, 2020
dc.description.abstractAs global demand for digital storage capacity grows, storage technologies based on synthetic DNA have emerged as a dense and durable alternative to traditional media. Existing approaches leverage robust error correcting codes and precise molecular mechanisms to reliably retrieve specific files from large databases. Typically, files are retrieved using a pre-specified key, analogous to a filename. However, these approaches lack the ability to perform more complex computations over the stored data, such as content-based search. Here, we demonstrate the design, implementation, and evaluation of techniques for executing similarity search in DNA-based databases. By using machine learning to build a predictor of DNA hybridization reactions, we are able to create an encoding from images to DNA sequences that is optimized for similarity search. With this encoding, an encoded query image is most likely to hybridize with targets that are encoded from images visually similar to the query. This allows a query molecule to act as a molecular filter, which can select relevant results from a large database. We perform wetlab experiments with a database of 1.6 million images encoded and synthesized as DNA molecules, and show that our technique produces results which are comparable to those of state-of-the-art electronic implementations of similarity search. By demonstrating that DNA-based systems are capable of both storage and computation, we believe this work will encourage further development of this emerging technology.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherBee_washington_0250E_22399.pdf
dc.identifier.urihttp://hdl.handle.net/1773/46761
dc.language.isoen_US
dc.rightsCC BY
dc.subjectDNA computing
dc.subjectDNA digital storage
dc.subjectmolecular programming
dc.subjectsimilarity search
dc.subjectComputer science
dc.subject.otherComputer science and engineering
dc.titleContent-based Similarity Search in DNA Data Storage Systems
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Bee_washington_0250E_22399.pdf
Size:
56.36 MB
Format:
Adobe Portable Document Format