Experimental Construction of Very Large Scale DNA Databases with Associative Search Capability
Lecture Notes in Computer Science
DNA has the theoretical capability of storing vast databases in a very compact volume, for example, a gram of DNA can store 4.2 x 10 21 bits of information. Subsequently, encoded data can be retrieved by associative search queries. However, until now no large scale experiments have verified this. We describe the experimental creation of very large databases of artificially synthesized DNA sequences designed for encoding digital data. A database, or library, consists of sequences of
... ed DNA, each sequence encodes a number which provides the index to the database element. DNA subsequences, also referred to as words, were designed using computer search algorithms to ensure significant Hamming distance between distinct words to allow for annealing discrimination. The largest libraries are constructed in two phases: (1) An initial DNA library is constructed on plastic microbeads by combinatorial, mix-and-split methods. (2) Half the library is cleaved from the beads and concatenated onto the remaining bead-bound strands to generate a new library containing elements of twice the original length and library diversity which is the square of the original. We have completed the first stage in a number of experiments of increasing size, currently the largest of which has 12 7 microbeads, each carrying approximately 10 7 strands of DNA. This already constitutes by far the largest number of distinct DNA strands synthesized in a library of this type. Following successful completion of the second construction phase, the resulting DNA library will contain 12 14 or 1.28 x 10 15 distinct data elements. We describe our on-going experiments for executing associative search queries within the synthesized DNA databases. These queries are executed by hybridization of a target database strand with a complementary query strand probe. In our initial annealing experiments for processing associative search queries, we employed fluorescently labeled query strands and performed separation of fluorescent versus non-fluorescent beads using Fluorescence Activated Cell Sorting (FACS or flow cytometry). We also tested polymerase chain reaction (PCR) as an output method, and developed a PCR technique for search in the pair-wise constructed library that exploits the particular properties of words in that library. We have also implemented computer software that provides a simulation (viewable on the internet) of the experimental search procedures, as well as a simulation of input/output from conventional 2D images.