Bioinformatics Vol. 18 no. 90002 2002
Pages S4-S16
© 2002 Oxford University Press
Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combinatorics
1 International Computer Science Institute,
1947 Center Street, Berkeley, CA, 94704-1198, USA
2 DEIS, Universitá di Bologna,
Viale Risorgimento 2, Bologna 40136, Italy
3 Zentrum für Bioinformatik,
Universität des Saarlandes, Im Stadtwald, Saarbrücken
66123, Germany
4 Informatics Research, Celera Genomics,
45 West Gude Drive, Rockville 20850, USA
Received on April 18, 2002
; accepted on June 15, 2002
Multiple sequence alignment is one of the dominant problems in computational molecular biology. Numerous scoring functions and methods have been proposed, most of which result in NP-hard problems. In this paper we propose for the first time a general formulation for multiple alignment with arbitrary gap-costs based on an integer linear program (ILP). In addition we describe a branch-and-cut algorithm to effectively solve the ILP to optimality. We evaluate the performances of our approach in terms of running time and quality of the alignments using the BAliBase database of reference alignments. The results show that our implementation ranks amongst the best programs developed so far.
Contact: althaus{at}icsi.berkeley.edu