Bioinformatics, Vol 14, 40-47, Copyright © 1998 by Oxford University Press
P Agarwal and DJ States
MOTIVATION: Searching a protein sequence database for homologs is a
powerful tool for discovering the structure and function of a sequence. Two
new methods for searching sequence databases have recently been described:
Probabilistic Smith-Waterman (PSW), which is based on Hidden Markov models
for a single sequence using a standard scoring matrix, and a new version of
BLAST (WU-BLAST2), which uses Sum statistics for gapped alignments.
RESULTS: This paper compares and contrasts the effectiveness of these
methods with three older methods (Smith- Waterman: SSEARCH, FASTA and
BLASTP). The analysis indicates that the new methods are useful, and often
offer improved accuracy. These tools are compared using a curated (by Bill
Pearson) version of the annotated portion of PIR 39. Three different
statistical criteria are utilized: equivalence number, minimum errors and
the receiver operating characteristic. For complete-length protein query
sequences from large families, PSW's accuracy is superior to that of the
other methods, but its accuracy is poor when used with partial-length query
sequences. False negatives are twice as common as false positives
irrespective of the search methods if a family-specific threshold score
that minimizes the total number of errors (i.e. the most favorable
threshold score possible) is used. Thus, sensitivity, not selectivity, is
the major problem. Among the analyzed methods using default parameters, the
best accuracy was obtained from SSEARCH and PSW for complete-length
proteins, and the two BLAST programs, plus SSEARCH, for partial-length
proteins.
ARTICLES
Comparative accuracy of methods for protein sequence similarity search
Institute for Biomedical Computing, Washington University, St Louis, MO 63110, USA.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
D. W. Mount Using Gaps and Gap Penalties to Optimize Pairwise Sequence Alignments CSH Protocols, June 1, 2008; 2008(7): pdb.top40 - pdb.top40. [Abstract] [Full Text] |
||||
![]() |
H. A. Ross, S. Murugan, and W. L. Sibon Li Testing the Reliability of Genetic Methods of Species Identification via Simulation Syst Biol, April 1, 2008; 57(2): 216 - 230. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Reese, G. Hartzell, N. L. Harris, U. Ohler, J. F. Abril, and S. E. Lewis Genome Annotation Assessment in Drosophila melanogaster Genome Res., April 1, 2000; 10(4): 483 - 501. [Abstract] [Full Text] |
||||
![]() |
V. Geetha, V. Di Francesco, J. Garnier, and P. J. Munson Comparing protein sequence-based and predicted secondary structure-based methods for identification of remote homologs Protein Eng. Des. Sel., July 1, 1999; 12(7): 527 - 534. [Abstract] [Full Text] [PDF] |
||||



