DNA data storage might sound futuristic, but it's on the immediate horizon

As the quantity of data generated worldwide continues to expand at an aggressive rate, researchers are looking for ultra-dense and ultra-durable storage technologies capable of housing it all.

For example, Microsoft is examining the possibility of using lasers to etch data into quartz glass, or storing information in hologram form inside crystals. New developments in the field of tape storage, the current leading choice for archival use cases, are also promising.

However, one new storage medium in particular appears to have all the necessary attributes: deoxyribonucleic acid, or DNA. Researchers have found a single gram of DNA is capable of storing 215 PB (220,000 TB) of data.

To find out more about the work going into making DNA storage a commercial reality, TechRadar Pro spoke with the DNA Data Storage Alliance, founded last year by Microsoft, Western Digital, Twist Bioscience and Ilumina.

The Alliance was launched with the goal of raising awareness about the emerging storage technology and establishing a set of standards and specifications on which the industry can build.

What is DNA storage and what challenges is it expected to address?

DNA data storage is the process of encoding and decoding binary data onto and from synthesized strands of DNA (deoxyribonucleic acid).  DNA has several unique properties, including density, it’s essentially free to copy, the code will always be readable, and the cost of ownership over time will be less due to the longevity. In addition, it saves significantly on energy costs when compared to digital storage today.

Legacy storage solutions have scaled extensively over the years, but the areal density of magnetic media (HDD and tape), which enables today’s mainstream archival storage solutions, is slowing, and the size of libraries is becoming unwieldy. In short, data growth is outpacing the scalability of today’s storage solutions. The industry needs a new storage medium that is more dense, durable, sustainable, and cost effective in order to cope with the expected future growth of archival data.

How is it possible for digital information to be translated into a biological format (and back again)?

What kinds of complications can arise here?

To store data in DNA, the original (binary) digital data is encoded (mapped from 1’s and 0’s to sequences of DNA bases, ACGTs), then written (synthesized using chemical/biological processes), and stored. When the stored data is needed again, the DNA molecules are read (sequenced to reveal each individual ACG or T in order) and decoded (re-mapped from DNA bases back to 1’s and 0’s).  

There are some concerns about data accuracy potentially introduced by oligonucleotide (short pieces of DNA) synthesis and sequencing errors. However, unlike oligo synthesis for healthcare, which must be perfect, DNA storage can tolerate errors due to the error correction algorithms typically used in storage today. DNA data storage pioneers are already working on encoding/error correction algorithm improvement which will mitigate this risk and recover the data accurately.  In addition, cost, speed, logistics, and other challenges remain as barriers for data centers to adopt this technology.  

The DNA Data Storage Alliance was formed by Illumina, Microsoft Research, Twist Bioscience and Western Digital. Our mission is to create and promote an interoperable storage ecosystem based on manufactured DNA as a data storage medium. Our initial aim is to educate the public and raise awareness about this emerging technology. In addition, as the methods and tools for commercially viable DNA data storage become better understood and more widely available, the Alliance will consider the creation of specifications and standards (e.g., encoding, physical interfaces, retention, file systems) to promote the emergence of interoperable DNA data storage-based solutions that complement existing storage hierarchies.

What might the impact of DNA storage be on the data center industry?

DNA is an inherently environmentally friendly medium in terms of power, space, and sustainability in addition to significantly reducing the need to migrate data every few years. When it is used as the main archival storage medium at a data center, it has the potential to change data center’s size as well as total cost of ownership and alternatively place significantly lower burdens than legacy archival storage technologies on the earth’s resources.

What are the main barriers that DNA storage will need to overcome?

DNA synthesis and sequencing cost are still relatively high, when comparing with currently used archival storage media such as HDD or tape and significant cost reduction is required in order for DNA data storage to be adopted at scale. In addition, education and trust building to prepare the market for this new storage media will also be critical, which is why the DNA Data Storage Alliance was formed.

What are the latest R&D innovations bringing DNA storage closer to reality?

Costs continue to come down due to miniaturization of the DNA synthesis process by Twist Bioscience. Other companies are pursuing alternative DNA synthesis methods, with both approaches enabling massively parallelized synthesis and cost reductions.  NGS cost and throughput are also continuously improving, which makes DNA data retrieval more promising. In addition, coding and decoding algorithm development has demonstrated successes. 

What kind of timeline are we dealing with?

DNA data storage will be available in the medium-term. There is more work to do and much momentum moving forward to make it a reality. Early adopters of DNA data storage are likely to be applications where they have Write Once, Read Never (WORN) or Write Once, Read Seldom if Ever (WORSE) data. As the technology evolves and gains acceptance within the community, the market will expand and evolve.

Which incumbent storage technologies is DNA mostly likely to rival?

The demand for long-term data storage in the cloud is reaching unprecedented levels. Existing storage technologies do not provide a cost-effective solution for storing long-lived data. Operating at such scales in the cloud requires a fundamental re-thinking of how we build large-scale storage systems, as well as the underlying storage technologies that underpin them.

Are there any other emerging storage technologies under development that could be equally promising?

Researchers are exploring a variety of technologies to support this evolution, including storing data in synthetic DNA, quartz glass, and other scalable optical systems. DNA data storage is unique in its characteristics and properties – one can argue it will enable a new tier in storage.



Comments