Bioinformatics Advance Access originally published online on November 15, 2006
Bioinformatics 2007 23(3):289-297; doi:10.1093/bioinformatics/btl578
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Indelign: a probabilistic framework for annotation of insertions and deletions in a multiple alignment
Department of Computer Science, University of Illinois, Urbana-Champaign Urbana, IL, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: A quantitative study of molecular evolutionary events such as substitutions, insertions and deletions from closely related genomes requires (1) an accurate multiple sequence alignment program and (2) a method to annotate the insertions and deletions that explain the gaps in the alignment. Although the former requirement has been extensively addressed, the latter problem has received little attention, especially in a comprehensive probabilistic framework.
Results: Here, we present Indelign, a program that uses a probabilistic evolutionary model to compute the most likely scenario of insertions and deletions consistent with an input multiple alignment. It is also capable of modifying the given alignment so as to obtain a better agreement with the evolutionary model. We find close to optimal performance and substantial improvement over alternative methods, in tests of Indelign on synthetic data. We use Indelign to analyze regulatory sequences in Drosophila, and find an excess of insertions over deletions, which is different from what has been reported for neutral sequences.
Availability: The Indelign program may be downloaded from the website http://veda.cs.uiuc.edu/indelign/
Supplementary information: Supplementary material is available at Bioinformatics online.
Contact: sinhas{at}uiuc.edu
Associate Editor: John Quackenbush
Received on August 24, 2006; revised on October 19, 2006; accepted on November 13, 2006
This article has been cited by other articles:
![]() |
R. A. Cartwright Problems and Solutions for Estimating Indel Rates and Length Distributions Mol. Biol. Evol., February 1, 2009; 26(2): 473 - 480. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Halfon, S. M. Gallo, and C. M. Bergman REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila Nucleic Acids Res., January 11, 2008; 36(suppl_1): D594 - D598. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. K. Bradley and I. Holmes Transducers: an emerging probabilistic framework for modeling indels on trees Bioinformatics, December 1, 2007; 23(23): 3258 - 3262. [Abstract] [Full Text] [PDF] |
||||


