Bioinformatics Vol. 16 no. 11 2000
Pages 988-1002
© 2000 Oxford University Press
Original Paper |
Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases
1 Department of Chemistry, Rutgers University, Wright-Rieman Laboratories, 610 Taylor Rd, Piscataway, NJ 08854-8087, USA
Received on December 15, 1999
; revised on May 23, 2000
; accepted on May 23, 2000
Motivation: Sequence alignment techniques have been developed into extremely powerful tools for identifying the folding families and function of proteins in newly sequenced genomes. For a sufficiently low sequence identity it is necessary to incorporate additional structural information to positively detect homologous proteins. We have carried out an extensive analysis of the effectiveness of incorporating secondary structure information directly into the alignments for fold recognition and identification of distant protein homologs. A secondary structure similarity matrix based on a database of three-dimensionally aligned proteins was first constructed. An iterative application of dynamic programming was used which incorporates linear combinations of amino acid and secondary structure sequence similarity scores. Initially, only primary sequence information is used. Subsequently contributions from secondary structure are phased in and new homologous proteins are positively identified if their scores are consistent with the predetermined error rate.
Results: We used the SCOP40 database, where only PDB sequences
that have 40% homology or less are included, to calibrate homology
detection by the combined amino acid and secondary structure
sequence alignments. Combining predicted secondary structure with
sequence information results in a 815% increase in homology
detection within SCOP40 relative to the pairwise alignments using
only amino acid sequence data at an error rate of 0.01 errors per
query; a 35% increase is observed when the actual secondary
structure sequences are used. Incorporating predicted secondary
structure information in the analysis of six small genomes yields an
improvement in the homology detection of
20% over
SSEARCH pairwise alignments, but no improvement in the total number
of homologs detected over PSI-BLAST, at an error rate of 0.01 errors
per query. However, because the pairwise alignments based on
combinations of amino acid and secondary structure similarity are
different from those produced by PSI-BLAST and the error rates can
be calibrated, it is possible to combine the results of both
searches. An additional 25% relative improvement in the number of
genes identified at an error rate of 0.01 is observed when the data
is pooled in this way. Similarly for the SCOP40 dataset, PSI-BLAST
detected 15% of all possible homologs, whereas the pooled results
increased the total number of homologs detected to 19%. These
results are compared with recent reports of homology detection using
sequence profiling methods.
Availability: Secondary structure alignment homepage at http://lutece.rutgers.edu/ssas
Contact: anders{at}rutchem.rutgers.edu; ronlevy{at}lutece.rutgers.edu
Supplementary Information: Genome sequence/structure alignment results at http://lutece.rutgers.edu/ss·fold·predictions.
To whom correspondence should be addressed.
** Present address: Genomic Science Laboratory, RIKEN Life Science Tsukuba Center, 3-1-1 Koya-dai, Tsukuba, Ibaraki 305, Japan.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
P. Fariselli, I. Rossi, E. Capriotti, and R. Casadio The WWWH of remote homolog detection: The state of the art Brief Bioinform, March 1, 2007; 8(2): 78 - 87. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Birzele, J. E. Gewehr, G. Csaba, and R. Zimmer Vorolign--fast structural alignment using Voronoi contacts Bioinformatics, January 15, 2007; 23(2): e205 - e211. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Nozaki and M. Bellgard Statistical evaluation and comparison of a pairwise alignment algorithm that a priori assigns the number of gaps rather than employing gap penalties Bioinformatics, April 15, 2005; 21(8): 1421 - 1428. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Uehara, T. Kawabata, and N. Go Filtering remote homologues using predicted structural information Protein Eng. Des. Sel., July 1, 2004; 17(7): 565 - 570. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Schaffer, L. Aravind, T. L. Madden, S. Shavirin, J. L. Spouge, Y. I. Wolf, E. V. Koonin, and S. F. Altschul Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements Nucleic Acids Res., July 15, 2001; 29(14): 2994 - 3005. [Abstract] [Full Text] [PDF] |
||||



