Skip Navigation


Bioinformatics Advance Access originally published online on January 9, 2008
Bioinformatics 2008 24(4):586-587; doi:10.1093/bioinformatics/btn014
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/4/586    most recent
btn014v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by McGuffin, L. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by McGuffin, L. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

The ModFOLD server for the quality assessment of protein structural models

Liam J. McGuffin

School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MODFOLD METHOD
 3 THE MODFOLDCLUST METHOD
 4 SERVER IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: The reliable assessment of the quality of protein structural models is fundamental to the progress of structural bioinformatics. The ModFOLD server provides access to two accurate techniques for the global and local prediction of the quality of 3D models of proteins. Firstly ModFOLD, which is a fast Model Quality Assessment Program (MQAP) used for the global assessment of either single or multiple models. Secondly ModFOLDclust, which is a more intensive method that carries out clustering of multiple models and provides per-residue local quality assessment.

Availability: http://www.biocentre.rdg.ac.uk/bioinformatics/ModFOLD/

Contact: l.j.mcguffin{at}reading.ac.uk


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MODFOLD METHOD
 3 THE MODFOLDCLUST METHOD
 4 SERVER IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The selection of the highest quality 3D model of a protein structure from a number of alternatives is a challenging problem in structural biology. Currently, there are two main approaches to the problem of ranking models. Firstly, there are the Model Quality Assessment Programs (MQAPs), which are able to provide a useful score by considering only a single model. Secondly, there are clustering or consensus-based approaches, which attempt to rank a population of alternative models based on pair-wise structural comparisons, and may also include additional information about the confidence of models from fold recognition servers or from sequence alignments. Both techniques are useful in different situations. Clustering-based methods have recently been shown to provide a more accurate assessment of model quality if many models are available from many different resources. However, MQAPs are arguably more useful than clustering when only a few models are available from a few different sources, or if many thousands of models are available, but there is insufficient CPU time available in order to carry out clustering (McGuffin, 2007).

Several MQAP servers exist for carrying out predictions on single models, for example ProQ (Wallner and Elofsson, 2003) and ProSA-web (Wiederstein and Sippl, 2007). Additionally, a few fold recognition meta-servers are now available which carry out clustering or a consensus of scores in order to rank predictions (Kajan and Rychlewski, 2007; Wallner and Elofsson, 2006). However, no servers have been implemented that allow users to accurately compare multiple 3D models of their own choice from any source in terms of predicted quality. The ModFOLD server provides a unified resource for obtaining accurate model quality information based on either individual or multiple uploaded models. The server incorporates two competitive methods: the latest version of the ModFOLD method and a clustering-based method called ModFOLDclust. Through the combination of methods the server can provide: (i) a single score and a P-value relating to the predicted quality of a single 3D model of a protein structure; (ii) rankings for multiple 3D models for the same protein target according to predicted model quality; (iii) predictions of the local quality (per-residue errors) within multiple models.


    2 THE MODFOLD METHOD
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MODFOLD METHOD
 3 THE MODFOLDCLUST METHOD
 4 SERVER IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The original version of the ModFOLD method was recently benchmarked against several widely used top performing MQAPs and clustering-based methods. ModFOLD was shown to be competitive with clustering methods and significantly outperformed the individual MQAPs tested (McGuffin, 2007). The original ModFOLD protocol combined scores obtained from the ModSSEA method (McGuffin, 2007), the MODCHECK method (Pettitt et al., 2005) and the two ProQ methods (Wallner and Elofsson, 2003) using a neural network trained with the TM-score (Zhang and Skolnick, 2004). The latest server implementation of the method has been retrained and includes two additional secondary structure scores, similar to those used by Eramian and colleagues (Eramian et al., 2006), as inputs to the neural network.

The ModFOLD method is an MQAP, which is capable of producing a consistent score based on the analysis of a single model. The consistency of the ModFOLD score allows us to calculate a P-value, which represents a quantitative measure of the confidence in a model. P-values for the ModFOLD server were calculated using a similar approach to that previously adopted for measuring the coverage of genomic scale fold recognition (McGuffin et al., 2004, 2007). Thus, for a given predicted model quality score, the P-value reported by the server is the proportion of models with that score that do not share any similarity with the native structure (TM-score < 0.2).


    3 THE MODFOLDCLUST METHOD
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MODFOLD METHOD
 3 THE MODFOLDCLUST METHOD
 4 SERVER IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The server also includes the option of clustering multiple models using the ModFOLDclust method. The method carries out pairwise comparisons of models in order to produce both global and local predictions of model accuracy. The global clustering score is based on the 3D-Jury method (Ginalski et al., 2003), whereby each model is compared to every other model and the average structural similarity score is calculated. However, in this application, the TM-score is used for pairwise comparisons, with the cut-off score of >0.2. This emulation of the 3D-Jury score has been previously benchmarked on the set of CASP7 server models and was shown to significantly outperform every method tested for the selection of the highest quality models (McGuffin, 2007). Unlike the 3D-Jury server itself however, where users can evaluate a single model through the comparison with a few available fold recognition models (Kajan and Rychlewski, 2007), the ModFOLD server implementation allows users to directly upload multiple models of their own choice from any source.

In addition to the global clustering score, the ModFOLDclust method incorporates the scoring of local model quality on a per-residue basis. The local model quality is evaluated by using a score similar to the average S-score (Levitt and Gerstein, 1998), which was originally used for model evaluation in the 3D-SHOTGUN method (Fischer, 2003) and was more recently benchmarked using the Pcons server (Wallner and Elofsson, 2006). The idea in this implementation is to reuse each pairwise model superposition, carried out in the calculation of the global score, in order to evaluate the local structural conservation of each residue. Here, the S-score is used to evaluate residues that are within 3.9 Å according to pairwise TM-score superpositions, where the TM-scores >0.2. The S-score is defined as: Si = 1/(1 + (di/d0)2), where Si ranges from 0 to 1, di is the distance between structurally aligned residues and d0 is the distance threshold (3.9). An Si score of 0 is given if di > 3.9 Å. The S-scores for each residue are then summed and the mean score is taken. The mean S-score for each residue is then converted to the predicted distance from the native structure, by simply rearranging the equation: di = d0{surd}((1/Si)–1).


    4 SERVER IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MODFOLD METHOD
 3 THE MODFOLDCLUST METHOD
 4 SERVER IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The ModFOLD server provides an intuitive web interface for job submission, whereby either single or multiple PDB files may be uploaded for evaluation with a selected method, and the results are returned via email. The results email contains a URL, which links to HTML formatted graphical results, as well as an attached file, which provides results in a machine readable format [CASP7 QMODE1 format for ModFOLD and QMODE2 for ModFOLDclust (Cozzetto et al., 2007)]. The ModFOLD results page simply consists of a table of model data sorted by decreasing predicted model quality score. Each row in the table also includes the calculated P-value and the associated colour coded confidence level (e.g. green indicates high confidence in a model, red indicates a random model). The ModFOLDclust results table differs as it is ranked according to the global clustering score. Each row also includes a small JPEG image of a plot depicting the residue error versus the residue number. Each small image links to a page that displays a larger view of the plot and a link to download a PostScript version. Each row in the results table also displays a small 3D cartoon view of the model that is colour coded with the residue error according to the temperature colouring scheme (Figure 1). Again, each small image links to a page that shows a larger image of the 3D view and contains a link to download a PDB file of the model with the predicted residue accuracy (Å) in the B-factor column. Figure 1 shows an example of the ModFOLDclust results for a model of CASP7 target T0284 compared with the observed error according to a superposition with the native structure.


Figure 1
View larger version (42K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. The ModFOLDclust predicted per-residue error (left) for an example model compared to the observed error obtained from the alignment to the native structure (right). Each image was rendered using Pymol (http://www.pymol.org). The colours represent the residue accuracy according to the temperature scheme (blue indicates residues closest to the native structure; red, those furthest from the native structure).

 

    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MODFOLD METHOD
 3 THE MODFOLDCLUST METHOD
 4 SERVER IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
This work was supported by an RCUK Academic Fellowship.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Anna Tramontano

Received on November 14, 2007; revised on January 7, 2008; accepted on January 7, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MODFOLD METHOD
 3 THE MODFOLDCLUST METHOD
 4 SERVER IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Cozzetto D, et al. Assessment of predictions in the model quality assessment category. Proteins (2007) 69(Suppl. 8):175–183.[CrossRef][Web of Science][Medline]

    Eramian D, et al. A composite score for predicting errors in protein structure models. Protein Sci (2006) 15:1653–1666.[CrossRef][Web of Science][Medline]

    Fischer D. 3D-SHOTGUN: a novel, cooperative, fold-recognition meta-predictor. Proteins (2003) 51:434–441.[CrossRef][Web of Science][Medline]

    Ginalski K, et al. 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics (2003) 19:1015–1018.[Abstract/Free Full Text]

    Kajan L, Rychlewski L. Evaluation of 3D-Jury on CASP7 models. BMC Bioinformatics (2007) 8:304.[CrossRef][Medline]

    Levitt M, Gerstein M. A unified statistical framework for sequence comparison and structure comparison. Proc. Natl Acad. Sci. USA (1998) 95:5913–5920.[Abstract/Free Full Text]

    McGuffin LJ. Benchmarking consensus model quality assessment for protein fold recognition. BMC Bioinformatics (2007) 8:345.[CrossRef][Medline]

    McGuffin LJ, et al. High throughput profile-profile based fold recognition for the entire human proteome. BMC Bioinformatics (2006) 7:288.[CrossRef][Medline]

    McGuffin LJ, et al. The genomic threading database: a comprehensive resource for structural annotations of the genomes from key organisms. Nucleic Acids. Res (2004) 32:D196–D199.[Abstract/Free Full Text]

    Pettitt CS, et al. Improving sequence-based fold recognition by using 3D model quality assessment. Bioinformatics (2005) 21:3509–3515.[Abstract/Free Full Text]

    Wallner B, Elofsson A. Can correct protein models be identified? Protein Sci (2003) 12:1073–1086.[CrossRef][Web of Science][Medline]

    Wallner B, Elofsson A. Identification of correct regions in protein models using structural, alignment, and consensus information. Protein Sci (2006) 15:900–913.[CrossRef][Web of Science][Medline]

    Wiederstein M, Sippl MJ. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res (2007) 35:W407–410.[Abstract/Free Full Text]

    Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins (2004) 57:702–710.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
P. Benkert, M. Kunzli, and T. Schwede
QMEAN server for protein model quality estimation
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W510 - W514.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. J. McGuffin
Intrinsic disorder prediction from the analysis of multiple protein fold recognition models
Bioinformatics, August 15, 2008; 24(16): 1798 - 1804.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/4/586    most recent
btn014v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by McGuffin, L. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by McGuffin, L. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?