Skip Navigation


Bioinformatics Advance Access originally published online on August 26, 2008
Bioinformatics 2008 24(21):2438-2444; doi:10.1093/bioinformatics/btn460
This Article
Right arrow Full Text
Right arrow Full Text (Print PDF)
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/21/2438    most recent
btn460v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gotoh, O.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gotoh, O.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Direct mapping and alignment of protein sequences onto genomic sequence

Osamu Gotoh 1,2,*

1Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto 606-8501 and 2National Institute of Advanced Industrial Science and Technology, Computational Biology Research Center, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan

*To whom correspondence should be addressed.


   Abstract

Motivation: Finding protein-coding genes in a newly determined genomic sequence is the first step toward understanding the content written in the genome. Sequences of transcripts of homologous genes, if available, can considerably improve accuracy of prediction of genes and their structures, compared with that without such knowledge. As protein sequences are generally better conserved than nucleotide sequences, remote homologs can be used as templates, extending the applicability of evidence-based gene recognition methods. However, no tool seems to have been developed so far to simultaneously map and align a number of protein sequences on mammalian-sized genomic sequence.

Results: We have extended our computer program Spaln to accept protein sequences, as well as cDNA sequences, as queries. When the query and the target sequences are reasonably similar, e.g. between mammalian orthologs, Spaln runs one to two orders of magnitude faster than conventional approaches that rely on Blast search followed by dynamic-programming-based spliced alignment. Exon-level and gene-level accuracies of Spaln are significantly higher than those obtained by the best available methods of the same type, particularly when the query and the target are distantly related.

Availability: Spaln is accessible online for a few species at http://www.genome.ist.i.kyoto-u.ac.jp/~aln_user. The source code is available for free for academic users from the same site.

Contact: o.gotoh{at}i.kyoto-u.ac.jp

Supplementary information: Supplementary data are available at Bioinformatics online.

Associate Editor: Burkhard Rost


Received on June 8, 2008; revised on August 21, 2008; accepted on August 22, 2008

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.