A compression mechanism for sequence databases to improve the efficiency of conventional tools
Biocomputing, Basel University Biozentrum, Klingelbergstrasse 70, CH-4056 Basel, Switzerland
1To whom correspondence should be addressed
This paper describes a method to compress molecular biology databases that are characterized by an increasing proportion of data derived from genome projects. The performance of our tool has been tested on various data files of the EMBL nucleotide sequence database. The best compression ratios were achieved on EST (Expressed Sequence Tags) data, typically derived from large-scale sequence projects. The compression of sequence database updates was tested in combination with the common Unix compression program compress. Our tool improved the efficiency of compress on average by 16%.
Received on November 1, 1994; revised on November 11, 1994; accepted on January 13, 1995