Bioinformatics Advance Access originally published online on August 4, 2008
Bioinformatics 2008 24(18):1987-1993; doi:10.1093/bioinformatics/btn384
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Powerful fusion: PSI-BLAST and consensus sequences
1Department of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th Street, New York, NY 10032, 2Broad Institute of MIT and Harvard University, 320 Charles St., Cambridge, MA 02141 and 3Columbia University Center for Computational Biology and Bioinformatics (C2B2), NorthEast Structural Genomics Consortium (NESG), New York Consortium on Membrane Protein Structure (NYCOMPS), 1130 St Nicholas Ave. Rm. 802, New York, NY 10032, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: A typical PSI-BLAST search consists of iterative scanning and alignment of a large sequence database during which a scoring profile is progressively built and refined. Such a profile can also be stored and used to search against a different database of sequences. Using it to search against a database of consensus rather than native sequences is a simple add-on that boosts performance surprisingly well. The improvement comes at a price: we hypothesized that random alignment score statistics would differ between native and consensus sequences. Thus PSI-BLAST-based profile searches against consensus sequences might incorrectly estimate statistical significance of alignment scores. In addition, iterative searches against consensus databases may fail. Here, we addressed these challenges in an attempt to harness the full power of the combination of PSI-BLAST and consensus sequences.
Results: We studied alignment score statistics for various types of consensus sequences. In general, the score distribution parameters of profile-based consensus sequence alignments differed significantly from those derived for the native sequences. PSI-BLAST partially compensated for the parameter variation. We have identified a protocol for building specialized consensus sequences that significantly improved search sensitivity and preserved score distribution parameters. As a result, PSI-BLAST profiles can be used to search specialized consensus sequences without sacrificing estimates of statistical significance. We also provided results indicating that iterative PSI-BLAST searches against consensus sequences could work very well. Overall, we showed how a very popular and effective method could be used to identify significantly more relevant similarities among protein sequences.
Availability: http://www.rostlab.org/services/consensus/
Contact: dariusz{at}mit.edu
Associate Editor: John Quackenbush
Received on January 30, 2008; revised on July 6, 2008; accepted on July 22, 2008
This article has been cited by other articles:
![]() |
I. Jung and D. Kim SIMPRO: simple protein homology detection method by using indirect signals Bioinformatics, March 15, 2009; 25(6): 729 - 735. [Abstract] [Full Text] [PDF] |
||||
