Bioinformatics Vol. 19 Suppl. 2 2003
pages ii156-ii161
© 2003 Oxford University Press
The shortest common supersequence problem in a microarray production setting
1 Department of Computational Molecular
Biology, Max Planck Institute for Molecular Genetics,
Ihnestraße 73, D-14195 Berlin, Germany
2 Department of Mathematics and Computer
Science, Freie Universität, Berlin
Received on March 17, 2003
; accepted on June 9, 2003
Motivation: During microarray production, several thousands of oligonucleotides (short DNA sequences) are synthesized in parallel, one nucleotide at a time. We are interested in finding the shortest possible nucleotide deposition sequence to synthesize all oligos in order to reduce production time and increase oligo quality. Thus we study the shortest common supersequence problem of several thousand short strings over a four-letter alphabet.
Results: We present a statistical analysis of the basic ALPHABET-LEFTMOSTapproximation algorithm, and propose several practical heuristics to reduce the length of the supersequence. Our results show that it is hard to beat ALPHABET-LEFTMOSTin the microarray production setting by more than 2 characters, but these savings can improve overall oligo quality by more than four percent.
Availability: Source code in C may be obtained by contacting the author, or from http://oligos.molgen.mpg.de.
Contact: Sven.Rahmann{at}molgen.mpg.de