Bioinformatics Advance Access published online on February 19, 2004
Bioinformatics, doi:10.1093/bioinformatics/bth115
Bioinformatics © Oxford University Press 2004; all rights reserved
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Molecular Profiling, Merck & Co., Inc., P.O. Box 2000, RY80-A1, Rahway, NJ 07065
* To whom correspondence should be addressed. E-mail: jeffrey_yuan{at}merck.com.
Motivation: Many bioinformatic approaches exist for finding novel genes within genomic sequence data. Traditionally, homology search based methods are often the first approach employed in determining whether a novel gene exists that is similar to a known gene. Unfortunately, distantly related genes or motifs often are difficult to find using single query based homology search algorithms against large sequence datasets such as the human genome. Therefore, the motivation behind this work was to develop an approach to enhance the sensitivity of traditional single query based homology algorithms against genomic data without losing search selectivity. Results: We demonstrate that by searching against a genome fragmented into all possible reading frames, the sensitivity of homology based searches is enhanced without degrading its selectivity. Using the ETS-domain, bromodomain, and Acetyl-CoA acetyltransferase gene as queries, we were able to demonstrate that direct protein-protein searches using BLAST2P or FASTA3 against a human genome segmented among all possible reading frames and translated was substantially more sensitive than a traditional protein-DNA searches against raw genomic sequence using an application such as TBLAST2N. Receiver operating characteristic (ROC) analysis was employed to demonstrate that the algorithms remained selective while comparisons of the algorithms showed that the protein-protein searches were more sensitive in identifying hits. Therefore, through the overprediction of reading frames by this method and the increased sensitivity of protein-protein based homology search algorithms, a genome can be deeply mined, potentially finding hits overlooked by protein-DNA searches against raw genomic data.
Revised October 20, 2003
Accepted December 18, 2003
Article
Enhanced homology searching through genome reading frame predetermination
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?