Bioinformatics Advance Access published online on August 16, 2005
Bioinformatics, doi:10.1093/bioinformatics/bti627
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Bioengineering, University of California, Berkeley, CA, USA 94720
* To whom correspondence should be addressed.
Motivation: Protein sequence comparison methods are routinely used to infer the intricate network of evolutionary relationships found within the rapidly growing library of protein sequences, and thereby to predict the structure and function of uncharacterized proteins. Here, we detail an improved statistical benchmark of pairwise protein sequence comparison algorithms. We use bootstrap resampling techniques to determine standard statistical errors, and to estimate the confidence of our conclusions. We show that the underlying structure within benchmark databases causes Efron's standard, nonparametric bootstrap to be biased. Consequently, the standard bootstrap under-predicts average performance when used in the context of evaluating sequence comparison methods. As an alternative, we have developed an unbiased statistical evaluation based upon the Bayesian bootstrap, a resampling method operationally similar to the standard bootstrap. Results: We apply our analysis to the comparative study of amino acid substitution matrix families and find that using modern matrices results in a small, but statistically significant improvement in remote homology detection compared to the classic PAM and BLOSUM matrices. Availability: The sequence sets and code for performing these analyses are available from http://compbio.berkeley.edu/.
Received April 14, 2005
Revised July 16, 2005
Accepted August 11, 2005
Article
Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap
2 Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA 94720
3 Department of Bioengineering, University of California, Berkeley, CA, USA 94720; Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA 94720
Steven E. Brenner, E-mail: brenner{at}compio.berkeley.edu
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
E. L. Peterson, J. Kondev, J. A. Theriot, and R. Phillips Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment Bioinformatics, June 1, 2009; 25(11): 1356 - 1362. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wong and M. A. Ragan MACHOS: Markov clusters of homologous subsequences Bioinformatics, July 1, 2008; 24(13): i77 - i85. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Stojmirovic, E. M. Gertz, S. F. Altschul, and Y.-K. Yu The effectiveness of position- and composition-specific gap costs for protein similarity searches Bioinformatics, July 1, 2008; 24(13): i15 - i23. [Abstract] [Full Text] [PDF] |
||||
