Bioinformatics Advance Access originally published online on November 2, 2005
Bioinformatics 2006 22(1):13-20; doi:10.1093/bioinformatics/bti748
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Improved spliced alignment from an information theoretic approach
1Department of Genetics, School of Medicine, Washington University in St Louis 4566 Scott Avenue, St Louis, MO 63110, USA
2Department of Biomedical Engineering, School of Engineering, Washington University in St Louis 1 Brookings Drive, St Louis, MO 63130, USA
*To whom correspondence should be addressed.
Motivation: mRNA sequences and expressed sequence tags represent some of the most abundant experimental data for identifying genes and alternatively spliced products in metazoans. These transcript sequences are frequently studied by aligning them to a genomic sequence template. For existing programs, error-prone, polymorphic and cross-species data, as well as non-canonical splice sites, still present significant barriers to producing accurate, complete alignments.
Results: We took a novel approach to spliced alignment that meaningfully combined information from sequence similarity with that obtained from PSSM splice site models. Scoring systems were chosen to maximize their power of discrimination, and dynamic programming (DP) was employed to guarantee optimal solutions would be found. The resultant program, EXALIN, performed better than other popular tools tested under a wide range of conditions that included detection of micro-exons and humanmouse cross-species comparisons. For improved speed with only a marginal decrease in splice site prediction accuracy, EXALIN could perform limited DP guided by a result from BLASTN.
Availability: The source code, binaries, scripts, scoring matrices and splice site models for human, mouse, rice and Caenorhabditis elegans utilized in this study are posted at http://blast.wustl.edu/exalin. The software (scripts, source code and binaries) is copyrighted but free for all to use.
Contact: gish{at}blast.wustl.edu
Supplementary information: http://blast.wustl.edu/exalin/exalin-supplement.pdf
Received on August 4, 2005; revised on October 12, 2005; accepted on October 27, 2005
This article has been cited by other articles:
![]() |
D. V. Lu, R. H. Brown, M. Arumugam, and M. R. Brent Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner Bioinformatics, July 1, 2009; 25(13): 1587 - 1593. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Zhou, M. Pertea, A. L. Delcher, and L. Florea Sim4cc: a cross-species spliced alignment program Nucleic Acids Res., June 1, 2009; 37(11): e80 - e80. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. De Bona, S. Ossowski, K. Schneeberger, and G. Ratsch Optimal spliced alignments of short sequence reads Bioinformatics, August 15, 2008; 24(16): i174 - i180. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Gotoh A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence Nucleic Acids Res., May 1, 2008; 36(8): 2630 - 2638. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Schulze, B. Hepp, C. S. Ong, and G. Ratsch PALMA: mRNA to genome alignments using large margin algorithms Bioinformatics, August 1, 2007; 23(15): 1892 - 1900. [Abstract] [Full Text] [PDF] |
||||

