Skip Navigation

Bioinformatics 2007 23(13):i97-i103; doi:10.1093/bioinformatics/btm225
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Cui, X.
Right arrow Articles by Li, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Cui, X.
Right arrow Articles by Li, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Homology search for genes

Xuefeng Cui 1, Tomás Vinar 2, Brona Brejová 2, Dennis Shasha 3 and Ming Li 1,*

1Cheriton School of Computer Science, University of Waterloo, Ontario, Canada N2L 3G1, 2Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA and 3Department of Computer Science, New York University, NY 10012, USA

*To whom correspondence should be addressed.


   Abstract

Motivation: Life science researchers often require an exhaustive list of protein coding genes similar to a given query gene. To find such genes, homology search tools, such as BLAST or PatternHunter, return a set of high-scoring pairs (HSPs). These HSPs then need to be correlated with existing sequence annotations, or assembled manually into putative gene structures. This process is error-prone and labor-intensive, especially in genomes without reliable gene annotation.

Results: We have developed a homology search solution that automates this process, and instead of HSPs returns complete gene structures. We achieve better sensitivity and specificity by adapting a hidden Markov model for gene finding to reflect features of the query gene. Compared to traditional homology search, our novel approach identifies splice sites much more reliably and can even locate exons that were lost in the query gene.

On a testing set of 400 mouse query genes, we report 79% exon sensitivity and 80% exon specificity in the human genome based on orthologous genes annotated in NCBI HomoloGene. In the same set, we also found 50 (12%) gene structures with better protein alignment scores than the ones identified in HomoloGene.

Availability: The Java implementation is available for download from http://www.bioinformatics.uwaterloo.ca/software

Contact: mli{at}uwaterloo.ca



Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Genome ResHome page
R. She, J. S.-C. Chu, K. Wang, J. Pei, and N. Chen
genBlastA: Enabling BLAST to identify homologous gene sequences
Genome Res., January 1, 2009; 19(1): 143 - 149.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
O. Gotoh
Direct mapping and alignment of protein sequences onto genomic sequence
Bioinformatics, November 1, 2008; 24(21): 2438 - 2444.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.