Overview
Challenge
The demand for data-driven decision making coupled with need to retain data to meet regulatory compliance requirements has resulted in a rapid increase in the amount of archival data stored by enterprises. As data generation rate far outpaces the rate of improvement in storage density of media like HDD and tape, researchers have started investigating new architectures and media types that can store such “cold”, infrequently accessed data at very low cost.
Synthetic dna
Synthetic DNA is one such storage media that has received some attention recently due to its high density and durability. DNA possesses three key properties that make it relevant for archival storage. First, it is an extremely dense threedimensional storage medium that has the theoretical ability to store 455 Exabytes in 1 gram; in contrast, a 3.5” hard disk drive can store 10 Terabytes and weighs 600 grams today. Second, DNA can last several centuries even in harsh storage environments; hard disk drives and tape have life times of five and thirty years. Third, it is very easy, quick, and cheap to perform in-vitro replication of DNA; tape and hard disk drive have bandwidth limitations that result in hours or days for copying large Exabyte-sized archives.
Proof of concept experiments and results
In this three year project (€3M funded by the EU), we will research all relevant aspects of DNA storage in a consortium of six partners across three countries (UK, France, Ireland) bringing together all necessary expertise. We will research all relevant technologies such as encoding different types of data in DNA, scalable DNA synthesis to store data, experimental techniques to manipulate the data, efficient sequencing and decoding approaches to read back the data as well as automation of all aspects. The final result will be an end-to-end prototype for storing data in DNA and for reading it back.
In initial work we have developed OligoArchive, an architecture for using DNA-based storage system as the archival tier of a relational database. We demonstrate that OligoArchive can be realized in practice by building archiving and recovery tools (pg_oligo_dump and pg_oligo_restore) for PostgreSQL that perform schema-aware encoding and decoding of relational data on DNA, and using these tools to archive a 12KB TPC-H database to DNA, perform in-vitro computation, and restore it back again.
Our initial results are summarised in the paper available here.
A factsheet/summary of the project can be found here.
People
Thomas Heinis
Imperial College
PI & Coordinator
Raja Appuswamy
Eurecom
PI
James MacDonalD
Imperial College
PI
Paul Freemont
Imperial College
PI
Pascal Barbry
Universite Nice & CNRS
PI
Marc Antonini
universite nice & CNRS
PI
Sachin Chalapti
Helixworks
PI
Nimesh Pinnamaneni
Helixworks
PI
CONSORTIUM PARTNERS
PUBLications
2023
Yan, Yiqing; Pinnamaneni, Nimesh; Chalapati, Sachin; Crosbie, Conor; Appuswamy, Raja
Scaling Logical Density of DNA storage with Enzymatically-Ligated Composite Motifs Journal Article
In: bioRxiv, 2023.
@article{Yan2023.02.02.526799,
title = {Scaling Logical Density of DNA storage with Enzymatically-Ligated Composite Motifs},
author = {Yiqing Yan and Nimesh Pinnamaneni and Sachin Chalapati and Conor Crosbie and Raja Appuswamy},
url = {https://www.biorxiv.org/content/early/2023/02/02/2023.02.02.526799},
doi = {10.1101/2023.02.02.526799},
year = {2023},
date = {2023-01-01},
journal = {bioRxiv},
publisher = {Cold Spring Harbor Laboratory},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2022
Pic, Xavier; Antonini, Marc
A constrained Shannon-Fano entropy coder for image storage in synthetic DNA Proceedings Article
In: European Signal Processing Conference (EUSIPCO 2022), IEEE, 2022.
@inproceedings{pic:hal-04056181,
title = {A constrained Shannon-Fano entropy coder for image storage in synthetic DNA},
author = {Xavier Pic and Marc Antonini},
doi = {10.23919/EUSIPCO55093.2022.9909833},
year = {2022},
date = {2022-01-01},
booktitle = {European Signal Processing Conference (EUSIPCO 2022)},
publisher = {IEEE},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Yan, Yiqing; Chaturvedi, Nimisha; Appuswamy, Raja
Optimizing the Accuracy of Randomized Embedding for Sequence Alignment Proceedings Article
In: International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2022.
@inproceedings{9835534,
title = {Optimizing the Accuracy of Randomized Embedding for Sequence Alignment},
author = {Yiqing Yan and Nimisha Chaturvedi and Raja Appuswamy},
doi = {10.1109/IPDPSW55747.2022.00036},
year = {2022},
date = {2022-01-01},
booktitle = {International Parallel and Distributed Processing Symposium Workshops (IPDPSW)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Appuswamy, Raja
Towards Passive, Migration-Free, Standardized, Long-Term Database Archival Journal Article
In: SIGMOD Rec., vol. 51, no. 2, 2022.
@article{10.1145/3552490.3552506,
title = {Towards Passive, Migration-Free, Standardized, Long-Term Database Archival},
author = {Raja Appuswamy},
url = {https://doi.org/10.1145/3552490.3552506},
doi = {10.1145/3552490.3552506},
year = {2022},
date = {2022-01-01},
journal = {SIGMOD Rec.},
volume = {51},
number = {2},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Marinelli, Eugenio; Yan, Yiqing; Magnone, Virginie; Dumargne, Marie-Charlotte; Barbry, Pascal; Heinis, Thomas; Appuswamy, Raja
OligoArchive-DSM: Columnar Design for Error-Tolerant Database Archival using Synthetic DNA Journal Article
In: bioRxiv, 2022.
@article{Marinelli2022.10.06.511077,
title = {OligoArchive-DSM: Columnar Design for Error-Tolerant Database Archival using Synthetic DNA},
author = {Eugenio Marinelli and Yiqing Yan and Virginie Magnone and Marie-Charlotte Dumargne and Pascal Barbry and Thomas Heinis and Raja Appuswamy},
url = {https://www.biorxiv.org/content/early/2022/10/06/2022.10.06.511077},
doi = {10.1101/2022.10.06.511077},
year = {2022},
date = {2022-01-01},
journal = {bioRxiv},
publisher = {Cold Spring Harbor Laboratory},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2021
Moore, Omer Sella; Amir Apelbaum; Thomas Heinis; Jasmine Quah; Andrew S W
DNA archival storage, a bottom up approach Conference
ACM Workshop on Hot Topics in storage and File Systems, 2021.
@conference{Sella2021DNA,
title = {DNA archival storage, a bottom up approach},
author = {Omer Sella; Amir Apelbaum; Thomas Heinis; Jasmine Quah; Andrew S W Moore},
year = {2021},
date = {2021-07-27},
publisher = {ACM Workshop on Hot Topics in storage and File Systems},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Yan, Yiqing; Chaturvedi, Nimisha; Appuswamy, Raja
Accel-Align: a fast sequence mapper and aligner based on the seed--embed--extend method Journal Article
In: BMC Bioinformatics, vol. 22, no. 1, pp. 257, 2021, ISBN: 1471-2105.
@article{accel-align,
title = {Accel-Align: a fast sequence mapper and aligner based on the seed--embed--extend method},
author = {Yiqing Yan and Nimisha Chaturvedi and Raja Appuswamy},
url = {https://doi.org/10.1186/s12859-021-04162-z},
doi = {10.1186/s12859-021-04162-z},
isbn = {1471-2105},
year = {2021},
date = {2021-05-20},
journal = {BMC Bioinformatics},
volume = {22},
number = {1},
pages = {257},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Antonio, Eva Gil San; Heinis, Thomas; Carteron, Louis; Dimopoulou, Melpomeni; Antonini, Marc
Nanopore Sequencing Simulator for DNA Data Storage Journal Article
In: Visual Communications and Image Processing (VCIP 2021), 2021.
@article{doi:10.1126/science.aat0971c,
title = {Nanopore Sequencing Simulator for DNA Data Storage},
author = {Eva Gil San Antonio and Thomas Heinis and Louis Carteron and Melpomeni Dimopoulou and Marc Antonini},
year = {2021},
date = {2021-01-01},
urldate = {2021-01-01},
journal = {Visual Communications and Image Processing (VCIP 2021)},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Marinelli, Eugenio; Ghabach, Eddy; Bolbroe, Thomas; Sella, Omer; Heinis, Thomas; Appuswamy, Raja
DNA4DNA: Preserving Culturally Significant Digital Data with Synthetic DNA Journal Article
In: 17th International Conference on Digital Preservation (iPRES 2021), 2021.
@article{doi:10.1126/science.aat0971c,
title = {DNA4DNA: Preserving Culturally Significant Digital Data with Synthetic DNA},
author = {Eugenio Marinelli and Eddy Ghabach and Thomas Bolbroe and Omer Sella and Thomas Heinis and Raja Appuswamy},
year = {2021},
date = {2021-01-01},
journal = {17th International Conference on Digital Preservation (iPRES 2021)},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Marinelli, Eugenio; Ghabach, Eddy; Bolbroe, Thomas; Sella, Omer; Heinis, Thomas; Appuswamy, Raja
Digital Preservation with Synthetic DNA Journal Article
In: 37eme Conference sur la Gestion de Donnees – Principes, Technologies et Applications (BDA 2021), 2021.
@article{doi:10.1126/science.aat0971b,
title = {Digital Preservation with Synthetic DNA},
author = {Eugenio Marinelli and Eddy Ghabach and Thomas Bolbroe and Omer Sella and Thomas Heinis and Raja Appuswamy},
year = {2021},
date = {2021-01-01},
journal = {37eme Conference sur la Gestion de Donnees – Principes, Technologies et Applications (BDA 2021)},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Marinelli, Eugenio; Appuswamy, Raja
XJoin: Portable, Parallel Hash Join across Diverse XPU Architectures with OneAPI Proceedings Article
In: International Workshop on Data Management on New Hardware (DaMoN 2021), 2021.
@inproceedings{10.1145/3465998.3466012,
title = {XJoin: Portable, Parallel Hash Join across Diverse XPU Architectures with OneAPI},
author = {Eugenio Marinelli and Raja Appuswamy},
year = {2021},
date = {2021-01-01},
urldate = {2021-01-01},
booktitle = {International Workshop on Data Management on New Hardware (DaMoN 2021)},
abstract = {Modern server hardware is increasingly heterogeneous with a diverse mix of XPU architectures deployed across CPU, GPU, and FPGAs. However, till date, database developers have had to rely on either proprietary, architecture-specific solutions (like CUDA), or low-level, cross-architecture solutions that complicate development (like OpenCL). The lack of portable parallelism caused by the absence of a common high-level programming framework is one of the main reasons preventing a wider adoption of XPUs by database systems.In this paper, we take the first steps towards solving this problem using oneAPI-a cross-industry effort for developing an open, standards-based unified programming model that extends standard C++ to provide portable parallelism across diverse processor architectures. In particular, we port a recently-proposed, highly-optimized, GPU-based hash join algorithm from CUDA to Data Parallel C++ (DPC++). We then execute the hash join on multicore CPUs, integrated GPUs (Intel GEN9), and discrete GPUs (Intel DG1 and NVIDIA GeForce) without changing a single line of kernel code to demonstrate that DPC++ enables portable parallelism. We compare the performance of DPC++ kernels with hand-optimized CUDA kernels and model-based theoretical performance bounds to demonstrate the performance-portability trade off in using DPC++.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Marinelli, Eugenio; Appuswamy, Raja
OneJoin: Cross-architecture, scalable edit similarity join for DNA data storage using oneAPI Conference
International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, 2021.
@conference{EURECOM+6613,
title = {OneJoin: Cross-architecture, scalable edit similarity join for DNA data storage using oneAPI},
author = {Eugenio Marinelli and Raja Appuswamy},
year = {2021},
date = {2021-01-01},
booktitle = {International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Chalapati, Sachin; Crosbie, Conor; Limbachiya, Dixita; Pinnamaneni, Nimesh
Direct oligonucleotide sequencing with nanopores Journal Article
In: Open Research Europe, vol. 1, pp. 47, 2021.
@article{article,
title = {Direct oligonucleotide sequencing with nanopores},
author = {Sachin Chalapati and Conor Crosbie and Dixita Limbachiya and Nimesh Pinnamaneni},
doi = {10.12688/openreseurope.13578.1},
year = {2021},
date = {2021-01-01},
journal = {Open Research Europe},
volume = {1},
pages = {47},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Antonio, Eva Gil San; Dimopoulou, Melpomeni; Antonini, Marc; Barbry, Pascal; Appuswamy, Raja
Decoding Of Nanopore-Sequenced Synthetic DNA Storing Digital Images Proceedings Article
In: 2021 IEEE International Conference on Image Processing (ICIP), 2021.
@inproceedings{9506592,
title = {Decoding Of Nanopore-Sequenced Synthetic DNA Storing Digital Images},
author = {Eva Gil San Antonio and Melpomeni Dimopoulou and Marc Antonini and Pascal Barbry and Raja Appuswamy},
doi = {10.1109/ICIP42928.2021.9506592},
year = {2021},
date = {2021-01-01},
urldate = {2021-01-01},
booktitle = {2021 IEEE International Conference on Image Processing (ICIP)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Dimopoulou, Melpomeni; Antonini, Marc; Barbry, Pascal; Appuswamy, Raja
Image storage onto synthetic DNA Journal Article
In: Signal Processing: Image Communication, 2021.
@article{DIMOPOULOU2021116331,
title = {Image storage onto synthetic DNA},
author = {Melpomeni Dimopoulou and Marc Antonini and Pascal Barbry and Raja Appuswamy},
url = {https://www.sciencedirect.com/science/article/pii/S0923596521001478},
doi = {https://doi.org/10.1016/j.image.2021.116331},
year = {2021},
date = {2021-01-01},
journal = {Signal Processing: Image Communication},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Dimopoulou, Melpomeni; Antonio, Eva Gil San; Antonini, Marc
A JPEG-based image coding solution for data storage on DNA Miscellaneous
2021.
@misc{dimopoulou2021jpegbased,
title = {A JPEG-based image coding solution for data storage on DNA},
author = {Melpomeni Dimopoulou and Eva Gil San Antonio and Marc Antonini},
year = {2021},
date = {2021-01-01},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Franzese, Giulio; Yan, Yiqing; Serra, Giuseppe; DÓnofrio, Ivan; Appuswamy, Raja; Michiardi, Pietro
Generative DNA: Representation Learning for DNA-based Approximate Image Storage Proceedings Article
In: International Conference on Visual Communications and Image Processing (VCIP), pp. 01-05, 2021.
@inproceedings{9675366,
title = {Generative DNA: Representation Learning for DNA-based Approximate Image Storage},
author = {Giulio Franzese and Yiqing Yan and Giuseppe Serra and Ivan DÓnofrio and Raja Appuswamy and Pietro Michiardi},
doi = {10.1109/VCIP53242.2021.9675366},
year = {2021},
date = {2021-01-01},
booktitle = {International Conference on Visual Communications and Image Processing (VCIP)},
pages = {01-05},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2020
Antonini, Melpomeni Dimopoulou; Marc
Efficient Storage of Images onto DNA Using Vector Quantization Journal Article
In: 2020.
@article{5085-20,
title = {Efficient Storage of Images onto DNA Using Vector Quantization},
author = {Melpomeni Dimopoulou; Marc Antonini},
url = {http://sigport.org/5085},
year = {2020},
date = {2020-01-01},
publisher = {IEEE SigPort},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Dimopoulou, Melpomeni; Antonini, Marc; Barbry, Pascal; Appuswamy, Raja
Storing Digital Data into DNA: A Comparative Study of Quaternary Code Construction Proceedings Article
In: ICASSP, Barcelona, Spain, 2020.
@inproceedings{dimopoulou:hal-02549746,
title = {Storing Digital Data into DNA: A Comparative Study of Quaternary Code Construction},
author = {Melpomeni Dimopoulou and Marc Antonini and Pascal Barbry and Raja Appuswamy},
url = {https://hal.archives-ouvertes.fr/hal-02549746},
year = {2020},
date = {2020-01-01},
booktitle = {ICASSP},
address = {Barcelona, Spain},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2019
Appuswamy, Raja; Brigand, Kevin Le; Barbry, Pascal; Antonini, Marc; Madderson, Olivier; Freemont, Paul; McDonald, James; Heinis, Thomas
OligoArchive: Using DNA in the DBMS Storage Hierarchy Proceedings Article
In: CIDR 2019, 9th Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, 2019, 2019.
@inproceedings{DBLP:conf/cidr/AppuswamyBBAMFM19,
title = {OligoArchive: Using DNA in the DBMS Storage Hierarchy},
author = {Raja Appuswamy and Kevin Le Brigand and Pascal Barbry and Marc Antonini and Olivier Madderson and Paul Freemont and James McDonald and Thomas Heinis},
url = {http://cidrdb.org/cidr2019/papers/p98-appuswamy-cidr19.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {CIDR 2019, 9th Biennial Conference on Innovative Data Systems Research,
Asilomar, CA, USA, 2019},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Buterez, David; Heinis, Thomas
Efficient Approximation of Sequence Hybridization Proceedings Article
In: DNA Computing and Molecular Programming, 2019, ISBN: 978-3-030-26807-7.
@inproceedings{78110.1007/978-3-030-26807-7_3,
title = {Efficient Approximation of Sequence Hybridization},
author = {David Buterez and Thomas Heinis},
isbn = {978-3-030-26807-7},
year = {2019},
date = {2019-01-01},
booktitle = {DNA Computing and Molecular Programming},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Ling, Jeremy; Heinis, Thomas
Encoding Information in Primers Proceedings Article
In: DNA Computing and Molecular Programming, 2019, ISBN: 978-3-030-26807-7.
@inproceedings{10.1007/978-3-030-26807-7_31,
title = {Encoding Information in Primers},
author = {Jeremy Ling and Thomas Heinis},
isbn = {978-3-030-26807-7},
year = {2019},
date = {2019-01-01},
booktitle = {DNA Computing and Molecular Programming},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Melpomeni, Dimopoulou; Antonini, Marc; Barbry, Pascal; Appuswamy, Raja
A Biologically Constrained Encoding Solution for Long-term Storage of Images onto Synthetic DNA Proceedings Article
In: EUSIPCO 2019, 27th European Signal Processing Conference, Coruna, Spain, 2019.
@inproceedings{EURECOM+5841,
title = {A Biologically Constrained Encoding Solution for Long-term Storage of Images onto Synthetic DNA},
author = {Dimopoulou Melpomeni and Marc Antonini and Pascal Barbry and Raja Appuswamy},
url = {http://www.eurecom.fr/publication/5841},
year = {2019},
date = {2019-01-01},
booktitle = {EUSIPCO 2019, 27th European Signal Processing Conference, Coruna, Spain},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Memishi, Bunjamin; Appuswamy, Raja; Paradies, Marcus
Cold Storage Data Archives: More Than Just A Bunch of Tapes Proceedings Article
In: DAMON 2019, 15th International Workshop on Data Management on New Hardware, Held with ACM SIGMOD/PODS, Amsterdam, Netherlands, 2019.
@inproceedings{EURECOM+5858,
title = {Cold Storage Data Archives: More Than Just A Bunch of Tapes},
author = {Bunjamin Memishi and Raja Appuswamy and Marcus Paradies},
url = {http://www.eurecom.fr/publication/5858},
year = {2019},
date = {2019-01-01},
booktitle = {DAMON 2019, 15th International Workshop on Data Management on New Hardware, Held with ACM SIGMOD/PODS, Amsterdam, Netherlands},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2018
Dimopoulou, Melpomeni; Antonini, Marc; Barbry, Pascal; Appuswamy, Raja
DNA Coding for Image Storage Using Image Compression Techniques Proceedings Article
In: CORESA 2018, 20emes journées d'étude et d'échange sur la COmpression et la REprésentation des Signaux Audiovisuels, Poitiers, France, 2018.
@inproceedings{EURECOM+5788,
title = {DNA Coding for Image Storage Using Image Compression Techniques},
author = {Melpomeni Dimopoulou and Marc Antonini and Pascal Barbry and Raja Appuswamy},
url = {http://www.eurecom.fr/publication/5788},
year = {2018},
date = {2018-01-01},
booktitle = {CORESA 2018, 20emes journées d'étude et d'échange sur la COmpression et la REprésentation des Signaux Audiovisuels, Poitiers, France},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}