Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (19)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by McGuffin, L. J.
Right arrow Articles by Jones, D. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by McGuffin, L. J.
Right arrow Articles by Jones, D. T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 17 no. 1 2001
Pages 63-72
© 2001 Oxford University Press


Original Paper

What are the baselines for protein fold recognition?

Liam J. McGuffin 1, Kevin Bryson 2 and David T. Jones 1,*

1 Bioinformatics Group, Department of Biological Sciences, Brunel University, Uxbridge UB8 3PH, UK
2 Agent-Based Systems Group, Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK

Received on April 20, 2000 ; revised on July 19, 2000 ; accepted on September 23, 2000

Motivation: What constitutes a baseline level of success for protein fold recognition methods? As fold recognition benchmarks are often presented without any thought to the results that might be expected from a purely random set of predictions, an analysis of fold recognition baselines is long overdue. Given varying amounts of basic information about a protein—ranging from the length of the sequence to a knowledge of its secondary structure—to what extent can the fold be determined by intelligent guesswork? Can simple methods that make use of secondary structure information assign folds more accurately than purely random methods and could these methods be used to construct viable hierarchical classifications?

Experiments performed: A number of rapid automatic methods which score similarities between protein domains were devised and tested. These methods ranged from those that incorporated no secondary structure information, such as measuring absolute differences in sequence lengths, to more complex alignments of secondary structure elements. Each method was assessed for accuracy by comparison with the Class Architecture Topology Homology (CATH) classification. Methods were rated against both a random baseline fold assignment method as a lower control and FSSP as an upper control. Similarity trees were constructed in order to evaluate the accuracy of optimum methods at producing a classification of structure.

Results: Using a rigorous comparison of methods with CATH, the random fold assignment method set a lower baseline of 11% true positives allowing for 3% false positives and FSSP set an upper benchmark of 47% true positives at 3% false positives. The optimum secondary structure alignment method used here achieved 27% true positives at 3% false positives. Using a less rigorous Critical Assessment of Structure Prediction (CASP)-like sensitivity measurement the random assignment achieved 6%, FSSP—59% and the optimum secondary structure alignment method—32%. Similarity trees produced by the optimum method illustrate that these methods cannot be used alone to produce a viable protein structural classification system.

Conclusions: Simple methods that use perfect secondary structure information to assign folds cannot produce an accurate protein taxonomy, however they do provide useful baselines for fold recognition. In terms of a typical CASP assessment our results suggest that approximately 6% of targets with folds in the databases could be assigned correctly by randomly guessing, and as many as 32% could be recognised by trivial secondary structure comparison methods, given knowledge of their correct secondary structures.

Contact: David.Jones{at}brunel.ac.uk

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
F. Birzele, J. E. Gewehr, G. Csaba, and R. Zimmer
Vorolign--fast structural alignment using Voronoi contacts
Bioinformatics, January 15, 2007; 23(2): e205 - e211.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
F. Birzele and S. Kramer
A new representation for protein secondary structure prediction based on frequent patterns
Bioinformatics, November 1, 2006; 22(21): 2628 - 2634.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. E. Gewehr and R. Zimmer
SSEP-Domain: protein domain prediction by alignment of secondary structure elements and profiles
Bioinformatics, January 15, 2006; 22(2): 181 - 187.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. Fontana, E. Bindewald, S. Toppo, R. Velasco, G. Valle, and S. C. E. Tosatto
The SSEA server for protein secondary structure alignment
Bioinformatics, February 1, 2005; 21(3): 393 - 395.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
E. Bindewald, A. Cestaro, J. Hesser, M. Heiler, and S. C.E. Tosatto
MANIFOLD: protein fold recognition based on secondary structure, sequence similarity and enzyme classification
Protein Eng. Des. Sel., November 1, 2003; 16(11): 785 - 789.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.