Overview
Challenge
The demand for data-driven decision making coupled with need to retain data to meet regulatory compliance requirements has resulted in a rapid increase in the amount of archival data stored by enterprises. As data generation rate far outpaces the rate of improvement in storage density of media like HDD and tape, researchers have started investigating new architectures and media types that can store such “cold”, infrequently accessed data at very low cost.
Synthetic dna
Synthetic DNA is one such storage media that has received some attention recently due to its high density and durability. DNA possesses three key properties that make it relevant for archival storage. First, it is an extremely dense threedimensional storage medium that has the theoretical ability to store 455 Exabytes in 1 gram; in contrast, a 3.5” hard disk drive can store 10 Terabytes and weighs 600 grams today. Second, DNA can last several centuries even in harsh storage environments; hard disk drives and tape have life times of five and thirty years. Third, it is very easy, quick, and cheap to perform in-vitro replication of DNA; tape and hard disk drive have bandwidth limitations that result in hours or days for copying large Exabyte-sized archives.
Proof of concept experiments and results
In this three year project (€3M funded by the EU), we will research all relevant aspects of DNA storage in a consortium of six partners across three countries (UK, France, Ireland) bringing together all necessary expertise. We will research all relevant technologies such as encoding different types of data in DNA, scalable DNA synthesis to store data, experimental techniques to manipulate the data, efficient sequencing and decoding approaches to read back the data as well as automation of all aspects. The final result will be an end-to-end prototype for storing data in DNA and for reading it back.
In initial work we have developed OligoArchive, an architecture for using DNA-based storage system as the archival tier of a relational database. We demonstrate that OligoArchive can be realized in practice by building archiving and recovery tools (pg_oligo_dump and pg_oligo_restore) for PostgreSQL that perform schema-aware encoding and decoding of relational data on DNA, and using these tools to archive a 12KB TPC-H database to DNA, perform in-vitro computation, and restore it back again.
Our initial results are summarised in the paper available here.
A factsheet/summary of the project can be found here.


People

Thomas Heinis
Imperial College
PI & Coordinator

Raja Appuswamy
Eurecom
PI

James MacDonalD
Imperial College
PI

Paul Freemont
Imperial College
PI

Pascal Barbry
Universite Nice & CNRS
PI

Marc Antonini
universite nice & CNRS
PI

Sachin Chalapti
Helixworks
PI

Nimesh Pinnamaneni
Helixworks
PI
CONSORTIUM PARTNERS





PUBLications
2020 |
Antonini, Melpomeni Dimopoulou; Marc Efficient Storage of Images onto DNA Using Vector Quantization (Journal Article) 2020. @article{5085-20, title = {Efficient Storage of Images onto DNA Using Vector Quantization}, author = {Melpomeni Dimopoulou; Marc Antonini}, url = {http://sigport.org/5085}, year = {2020}, date = {2020-01-01}, publisher = {IEEE SigPort}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Dimopoulou, Melpomeni; Antonini, Marc; Barbry, Pascal; Appuswamy, Raja Storing Digital Data into DNA: A Comparative Study of Quaternary Code Construction (Inproceedings) ICASSP, Barcelona, Spain, 2020. @inproceedings{dimopoulou:hal-02549746, title = {Storing Digital Data into DNA: A Comparative Study of Quaternary Code Construction}, author = {Melpomeni Dimopoulou and Marc Antonini and Pascal Barbry and Raja Appuswamy}, url = {https://hal.archives-ouvertes.fr/hal-02549746}, year = {2020}, date = {2020-01-01}, booktitle = {ICASSP}, address = {Barcelona, Spain}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } |
2019 |
Appuswamy, Raja; Brigand, Kevin Le; Barbry, Pascal; Antonini, Marc; Madderson, Olivier; Freemont, Paul; McDonald, James; Heinis, Thomas OligoArchive: Using DNA in the DBMS Storage Hierarchy (Inproceedings) CIDR 2019, 9th Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, 2019, 2019. @inproceedings{DBLP:conf/cidr/AppuswamyBBAMFM19, title = {OligoArchive: Using DNA in the DBMS Storage Hierarchy}, author = {Raja Appuswamy and Kevin Le Brigand and Pascal Barbry and Marc Antonini and Olivier Madderson and Paul Freemont and James McDonald and Thomas Heinis}, url = {http://cidrdb.org/cidr2019/papers/p98-appuswamy-cidr19.pdf}, year = {2019}, date = {2019-01-01}, booktitle = {CIDR 2019, 9th Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, 2019}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } |
Buterez, David; Heinis, Thomas Efficient Approximation of Sequence Hybridization (Inproceedings) DNA Computing and Molecular Programming, 2019, ISBN: 978-3-030-26807-7. (BibTeX) @inproceedings{78110.1007/978-3-030-26807-7_3, title = {Efficient Approximation of Sequence Hybridization}, author = {David Buterez and Thomas Heinis}, isbn = {978-3-030-26807-7}, year = {2019}, date = {2019-01-01}, booktitle = {DNA Computing and Molecular Programming}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } |
Ling, Jeremy; Heinis, Thomas Encoding Information in Primers (Inproceedings) DNA Computing and Molecular Programming, 2019, ISBN: 978-3-030-26807-7. (BibTeX) @inproceedings{10.1007/978-3-030-26807-7_31, title = {Encoding Information in Primers}, author = {Jeremy Ling and Thomas Heinis}, isbn = {978-3-030-26807-7}, year = {2019}, date = {2019-01-01}, booktitle = {DNA Computing and Molecular Programming}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } |
Melpomeni, Dimopoulou; Antonini, Marc; Barbry, Pascal; Appuswamy, Raja A Biologically Constrained Encoding Solution for Long-term Storage of Images onto Synthetic DNA (Inproceedings) EUSIPCO 2019, 27th European Signal Processing Conference, Coruna, Spain, 2019. @inproceedings{EURECOM+5841, title = {A Biologically Constrained Encoding Solution for Long-term Storage of Images onto Synthetic DNA}, author = {Dimopoulou Melpomeni and Marc Antonini and Pascal Barbry and Raja Appuswamy}, url = {http://www.eurecom.fr/publication/5841}, year = {2019}, date = {2019-01-01}, booktitle = {EUSIPCO 2019, 27th European Signal Processing Conference, Coruna, Spain}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } |
Memishi, Bunjamin; Appuswamy, Raja; Paradies, Marcus Cold Storage Data Archives: More Than Just A Bunch of Tapes (Inproceedings) DAMON 2019, 15th International Workshop on Data Management on New Hardware, Held with ACM SIGMOD/PODS, Amsterdam, Netherlands, 2019. @inproceedings{EURECOM+5858, title = {Cold Storage Data Archives: More Than Just A Bunch of Tapes}, author = {Bunjamin Memishi and Raja Appuswamy and Marcus Paradies}, url = {http://www.eurecom.fr/publication/5858}, year = {2019}, date = {2019-01-01}, booktitle = {DAMON 2019, 15th International Workshop on Data Management on New Hardware, Held with ACM SIGMOD/PODS, Amsterdam, Netherlands}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } |
2018 |
Dimopoulou, Melpomeni; Antonini, Marc; Barbry, Pascal; Appuswamy, Raja DNA Coding for Image Storage Using Image Compression Techniques (Inproceedings) CORESA 2018, 20emes journées d'étude et d'échange sur la COmpression et la REprésentation des Signaux Audiovisuels, Poitiers, France, 2018. @inproceedings{EURECOM+5788, title = {DNA Coding for Image Storage Using Image Compression Techniques}, author = {Melpomeni Dimopoulou and Marc Antonini and Pascal Barbry and Raja Appuswamy}, url = {http://www.eurecom.fr/publication/5788}, year = {2018}, date = {2018-01-01}, booktitle = {CORESA 2018, 20emes journées d'étude et d'échange sur la COmpression et la REprésentation des Signaux Audiovisuels, Poitiers, France}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } |
NEWS
RT @TechData_FR: Le projet européen @oligoarchive étudie le stockage de données sur de l’ADN synthétique : un support inégalé en termes de…
Read MoreRT @raja_appuswamy: Our latest work with @nimchat12 on randomized low distortion embedding for sequence alignment is out. We introduce seed…
Read MoreRT @raja_appuswamy: Excited to talk about the @EURECOM --EUPALIA collaboration on building an Analog (film) bootstrap for a Biological medi…
Read MoreRT @raja_appuswamy: @PostgreSQL printed on paper? shot on film? synthesized to DNA? Checkout this write up by Dora @EURECOM. Shout-out to V…
Read MoreRT @Marc06000:
Read More