Bioinformatics Advance Access published online on November 7, 2008
Bioinformatics, doi:10.1093/bioinformatics/btn582
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Human genomes as email attachments


1Department of Computer Science and 2Institute for Genomics and Bioinformatics, University of California Irvine, Irvine, CA 92697, USA.
*To whom correspondence should be addressed. Chen Li, E-mail: chenli{at}ics.uci.edu, Xiaohui Xie, E-mail: xhx{at}ics.uci.edu
| Abstract |
|---|
Summary: The amount of genomic sequence data being generated and made available through public databases continues to increase at an ever-expanding rate. Downloading, copying, sharing and manipulating these large data sets is becoming difficult and time-consuming for researchers. We need to consider using advanced compression techniques as part of a standard data format for genomic data. The inherent structure of genome data allows for more efficient lossless compression than can be obtained through the use of generic compression programs. We present a series of techniques that in combination reduces a single genome to a size small enough to be sent as an email attachment.
Availability: Our algorithms are implemented in C++ and are freely available from http://www.ics.uci.edu/~xhx/project/DNAzip.
Contact: xhx{at}ics.uci.edu or chenli{at}ics.uci.edu
Associate Editor: Prof. Alfonso Valencia
Received on October 7, 2008; revised on October 31, 2008; accepted on November 6, 2008
This article has been cited by other articles:
![]() |
M. C. Brandon, D. C. Wallace, and P. Baldi Data structures and compression algorithms for genomic sequence data Bioinformatics, July 15, 2009; 25(14): 1731 - 1738. [Abstract] [Full Text] [PDF] |
||||
Joint first authors. 