Bioinformatics Advance Access published online on April 10, 2008
Bioinformatics, doi:10.1093/bioinformatics/btn130
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Simple is beautiful: a straightforward approach to improve the delineation of true and false positives in PSI-BLAST searches
1The Ohio State Biophysics Program, 484 W 12th Av., Columbus OH 43210-1292, U.S.A.
2Departments of Biochemistry and Chemistry, Ohio State University, 484 W 12th Av., Columbus OH 43210-1292, U.S.A.
3Department of Physics, Ohio State University, 191 W Woodruff Av., Columbus OH 43210-1117, U.S.A.
*To whom correspondence should be addressed. Prof. Michael Chan, E-mail: chan{at}chemistry.ohio-state.edu
| Abstract |
|---|
Motivation: The deluge of biological information from different genomic initiatives and the rapid advancement in biotechnologies have made bioinformatics tools an integral part of modern biology. Among the widely-used sequence alignment tools, BLAST and PSI-BLAST are arguably the most popular. PSI-BLAST, which uses an iterative profile (PSSM)-based search strategy, is more sensitive than BLAST in detecting weak homologies, thus making it suitable for remote homolog detection. Many refinements have been made to improve PSI-BLAST and its computational efficiency and high specificity have been much touted. Nevertheless, corruption of its profile via the incorporation of false positive sequences remains a major challenge.
Results: We have developed a simple and elegant approach to resolve the problem of model corruption in PSI-BLAST searches. We hypothesized that combining results from the first (least-corrupted) profile with results from later (most sensitive) iterations of PSI-BLAST provides a better discriminator for true and false hits. Accordingly, we have derived a formula that utilizes the E-values from these two PSI-BLAST iterations to obtain a figure of merit for rank-ordering the hits. Our verification results based on a "gold-standard" test set indicate that this figure of merit does indeed delineate true positives from false positives better than PSI-BLAST E-values. Perhaps what is most notable about this strategy is that it is simple and straightforward to implement.
Associate Editor: Prof. Thomas Lengauer
Received on January 5, 2008; revised on February 28, 2008; accepted on April 7, 2008
This article has been cited by other articles:
![]() |
M. M. Lee, M. K. Chan, and R. Bundschuh SIB-BLAST: a web server for improved delineation of true and false positives in PSI-BLAST searches Nucleic Acids Res., July 1, 2009; 37(suppl_2): W53 - W56. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Jung and D. Kim SIMPRO: simple protein homology detection method by using indirect signals Bioinformatics, March 15, 2009; 25(6): 729 - 735. [Abstract] [Full Text] [PDF] |
||||

