Homology search for genes
Vina
2
a Brejová 21Cheriton School of Computer Science, University of Waterloo, Ontario, Canada N2L 3G1, 2Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA and 3Department of Computer Science, New York University, NY 10012, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Life science researchers often require an exhaustive list of protein coding genes similar to a given query gene. To find such genes, homology search tools, such as BLAST or PatternHunter, return a set of high-scoring pairs (HSPs). These HSPs then need to be correlated with existing sequence annotations, or assembled manually into putative gene structures. This process is error-prone and labor-intensive, especially in genomes without reliable gene annotation.
Results: We have developed a homology search solution that automates this process, and instead of HSPs returns complete gene structures. We achieve better sensitivity and specificity by adapting a hidden Markov model for gene finding to reflect features of the query gene. Compared to traditional homology search, our novel approach identifies splice sites much more reliably and can even locate exons that were lost in the query gene.
On a testing set of 400 mouse query genes, we report 79% exon sensitivity and 80% exon specificity in the human genome based on orthologous genes annotated in NCBI HomoloGene. In the same set, we also found 50 (12%) gene structures with better protein alignment scores than the ones identified in HomoloGene.
Availability: The Java implementation is available for download from http://www.bioinformatics.uwaterloo.ca/software
Contact: mli{at}uwaterloo.ca
This article has been cited by other articles:
![]() |
R. She, J. S.-C. Chu, K. Wang, J. Pei, and N. Chen genBlastA: Enabling BLAST to identify homologous gene sequences Genome Res., January 1, 2009; 19(1): 143 - 149. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Gotoh Direct mapping and alignment of protein sequences onto genomic sequence Bioinformatics, November 1, 2008; 24(21): 2438 - 2444. [Abstract] [Full Text] [PDF] |
||||

