Researchers from the Henry and Marilyn Taub Faculty of Computer Science have developed an AI-based method that accelerates DNA-based data retrieval by three orders of magnitude while significantly improving accuracy. The research team included Ph.D. student Omer Sabary, Dr. Daniella Bar-Lev, Dr. Itai Orr, Prof. Eitan Yaakobi, and Prof. Tuvi Etzion.
The research is published in the journal Nature Machine Intelligence.
DNA data storage is an emerging field that leverages DNA as a platform for storing information. DNA offers significant advantages as a storage medium, including:
Long-term preservation: In 2013, researchers in Denmark successfully extracted DNA from a horse bone dating back 700,000 years. In 2021, an international team recovered DNA from mammoths that lived over a million years ago. By contrast, magnetic disks used in data centers have lifespans measured in years or, at best, a few decades. This highlights DNA’s potential for long-term storage.
Energy and cost efficiency: The “cloud” that powers most of today’s computing services relies on data centers that consume approximately 3% of global electricity and emit around 2% of total carbon emissions. With the exponential growth of data, the environmental impact of existing technologies is expected to increase significantly.
Unmatched data density: DNA storage offers data density up to 100 million times greater than traditional digital storage. This means that a volume currently holding one megabyte could theoretically store up to 100 terabytes using DNA.
DNA is a molecule composed of a sequence of organic compounds called nucleotides. These nucleotides are classified into four types, represented by the letters A, C, G, and T. Unlike traditional computing, where data is encoded using only two digits (0 and 1), DNA storage is based on sequences of four letters, dramatically increasing the number of possible combinations.
To write (store) data in this technology, DNA synthesis is required—creating DNA molecules based on the sequences encoding the information. To read the stored data, DNA sequencing is necessary.
Test tubes containing DNA encoding the information. © Rami Shlush
Challenges in DNA data storage
Developing DNA-based storage technology presents several technological challenges:
Both synthesis and sequencing are lengthy and error-prone processes, introducing deletion, insertion, and substitution errors
Due to the limitations of the synthesis process, multiple copies of each DNA molecule encoding the data are produced. These copies are stored together, unordered, in a storage container
During sequencing, many erroneous copies of these molecules are