Bioinformatics Advance Access originally published online on October 4, 2005
Bioinformatics 2005 21(23):4248-4254; doi:10.1093/bioinformatics/bti702
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Pcons5: combining consensus, structural evaluation and fold recognition scores
Stockholm Bioinformatics Center, Stockholm University SE-106 91 Stockholm, Sweden
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: The success of the consensus approach to the protein structure prediction problem has led to development of several different consensus methods. Most of them only rely on a structural comparison of a number of different models. However, there are other types of information that might be useful such as the score from the server and structural evaluation.
Results: Pcons5 is a new and improved version of the consensus predictor Pcons. Pcons5 integrates information from three different sources: the consensus analysis, structural evaluation and the score from the fold recognition servers. We show that Pcons5 is better than the previous version of Pcons and that it performs better than using only the consensus analysis. In addition, we also present a version of Pmodeller based on Pcons5, which performs significantly better than Pcons5.
Availability: Pcons5 is the first Pcons version available as a standalone program from http://www.sbc.su.se/~bjorn/Pcons5. It should be easy to implement in local meta-servers.
Contact: bjorn{at}sbc.su.se
| INTRODUCTION |
|---|
|
|
|---|
The use of many different methods to predict the structure of a protein is now state-of-the-art in protein structure prediction. This is facilitated by the different meta- or consensus predictors that are available, e.g. through the meta-server at http://bioinfo.pl/Meta/ (Bujnicki et al., 2001). The consensus predictors use the result from different prediction methods to select the best protein model. In principle they are all based on the simple approach of selecting the most abundant representative among the set of high scoring models. Pcons (Lundström et al., 2001) was the first fully automated consensus predictor, followed by several others (Fischer, 2003; Ginalski et al., 2003). All benchmarking results obtained in the last two years, both at CASP (Moult et al., 2003) and in LiveBench (Rychlewski et al., 2003) indicate that consensus prediction methods are more accurate than the best of the independent fold recognition methods (Ginalski et al., 2005), e.g. in CAFASP3 the performance was 30% higher than that of the best independent fold recognition methods and comparable to the 23 best human CASP predictors (Fischer, 2003). The main strength of the consensus analysis is coupled to the structural comparison. However, there are also other factors that can be used in the selection process. The score from the fold recognition method and a structural evaluation of the model are such parameters. A problem with using the score from the fold recognition methods directly is that the scoring scheme might change at any time, leading to a frequent re-optimization of the parameters. In this paper we describe the newest version of Pcons, Pcons5. It consists of three parts: consensus analysis, structural evaluation and a final part dependent on the score from the fold recognition server. In addition, we also present an improved version of Pmodeller (Wallner et al., 2003) based on Pcons5 and ProQ (Wallner and Elofsson, 2003).
| METHODS |
|---|
|
|
|---|
Datasets
All datasets used in the development of Pcons5 were constructed from different versions of LiveBench (Bujnicki et al., 2001). These sets contain models that are possible to be obtained for unknown targets and that show a range of quality differences. LiveBench is continuously measuring the performance of different fold recognition web servers by submitting the sequence of recently solved proteins structures, with no obvious close homolog [103 BLAST cutoff (Altschul et al., 1997)] to a protein in the Protein Data Bank (Berman et al., 2000).
The structural evaluation module was trained on the same dataset as used in ProQ (Wallner and Elofsson, 2003) (LiveBench-2 data). The parameters for the consensus analysis and score evaluation as well as the final combination were performed on a dataset constructed from LiveBench-4. The LiveBench-4 dataset was collected during the period November 7, 2001 to April 25, 2002 and contains protein structure predictions for 107 targets from 14 individual servers and 3 consensus servers. In total 10 974 protein models for these 11 servers were used: PDBBLAST, FFAS (Rychlewski et al., 2000), Sam-T99 (Karplus et al., 1998), mGenTHREADER (Jones, 1999), INBGU (Fischer, 2000), three FUGUE servers (Shi et al., 2001), 3D-PSSM (Kelley et al., 2000), Orfeus (Ginalski et al., 2003) and Superfamily (Gough and Chothia, 2002). The models used were simple backbone copies of the aligned residues from the template.
| METHOD DEVELOPMENT |
|---|
|
|
|---|
Pcons5 consists of three different modules: consensus analysis, structural evaluation and score evaluation. It has been developed with the goal to be independent of the use of a fixed set of methods/servers, i.e. it should work with any number of methods and with any number of models. Each of the modules in Pcons5 produce two scores reflecting different aspects of model quality. These scores are combined to produce the final score using a weighted sum. In the following subsections the three different modules will be described.
Consensus analysis
The consensus analysis is performed in a similar way as in 3D-Jury (Ginalski et al., 2003), with the only difference being that LGscore (Cristobal et al., 2001) is used to compare the models. The comparison is done for all and for the first ranked models only, as in the different versions of 3D-Jury. This results in two scores: one reflecting the average similarity to all other models [Equation (1)] and the other reflecting the similarity to all first ranked models [Equation (2)]
![]() | (1) |
![]() | (2) |
Structural evaluation
The structural evaluation is done using a backbone version of ProQ (Wallner and Elofsson, 2003). ProQ uses distribution of atomatom contacts, residueresidue contacts, secondary structure information and surface accessibility for different amino acids to assess the quality of protein models. The original version was developed for protein models with all atoms. The version used in Pcons5 uses only the backbone atoms, usually obtained by copying the aligned coordinates from the template. This version of ProQ is not as accurate as the original ProQ version, but since there is no need to build all-atom models, the overall method is considerably faster.
Since the structural evaluation is performed on backbone models it is not possible to use exactly the same structural information as in the all-atom version of ProQ, e.g. using contacts between different types of atoms was no longer possible. However, the same six residue types as in ProQ were used, but the residueresidue contact cutoff had to be increased to 14 Å to compensate for the non-existing side-chains. The cutoff was chosen by trying different cutoffs in the range 620 Å. The calculation of surface accessibility also had to be changed. We chose to use a reduced representation defining buried and exposed residues based on number of neighboring CA atoms. Residues with <16 atoms within 10 Å were defined as exposed and residues with >20 atoms within 10 Å as buried. These definitions showed the largest agreement with Naccess (Lee and Richards, 1971). The secondary structure information was the fraction of agreement between predicted secondary structure using PSIPRED (Jones, 1999) and the actual secondary structure in the model. As in the original ProQ version, neural nets were trained to predict LGscore (Cristobal et al., 2001) and MaxSub (Siew et al., 2000), based on the structural features. A final correlation coefficient of 0.65 was obtained, in comparison with 0.76 for the all-atom ProQ.
Score evaluation
A good indicator of model quality is the score from the fold recognition method or server, a high score is usually connected to a good model. However, since Pcons5 should be independent of a fixed number of servers it is impossible to include the raw score from the server directly in Pcons5. Instead, each raw score is scaled to a common scale based on the reliability of the score. If the reliability is not known Pcons5 will not use the score to compute the final score.
The score evaluation was designed to be easy to update and facilitate the inclusion of new methods, without the need to re-optimize all parameters. Further, if the scoring of a method suddenly changes it should not impact the result too much. To limit the impact on the final score, all scores were scaled using two levels, good and very good. In principle it works as follows: if the score is good the model will obtain one extra point and if the score is even better (very good) the model will have the possibility to get one additional point. Thus, even an extremely high score could only yield two additional points.
The reliability of each server score was assessed by correlate fold recognition score with model quality from LiveBench-4. For each server two cutoffs were used to define good models and two were used to define very good models. These cutoffs were decided by analyzing the quality of models associated with a certain score. In more detail, the first cutoff was set to the score for which 50% of all models had an LGscore > 1.5 and the second to the score for which 90% of all models had LGscore > 1.5. The very good models were defined in a similar manner but with LGscore > 3. For scores falling between these cutoffs a linear fit was used. The process is exemplified here by the PSI-BLAST method, which has a familiar E-value score (Figures 1 and 2). Fifty percent of all models with PSI-BLAST E-value < 102 have LGscore > 1.5 and 90% of the models with a score > 106.2 have LGscore > 1.5. And for LGscore > 3 the cutoffs are 1020.9 and 10124.3 respectively. For scores in between the cutoff values a linear fit is used, e.g. a score of 105 would yield a scaled score of 0.81 on the good scale and 0 on the very good (Fig. 1). In a way this is a reflection of the reliability of a hit in the database with a certain score. An E-value of 105 is mostly likely correct but the alignment is probably not optimal. If however the E-value is 1072.6 the model is very likely to be of high quality and this is also reflected by a significantly higher scaled score of 1 on the good scale and 0.5 on the very good scale (Fig. 2).
|
|
Compiling the final Pcons5 score
The final Pcons5 score is a combination of the six different scores from the three different modules described above. They were combined using multiple linear regression to fit the LGscore quality measure, with the following coefficients
![]() | (3) |
The fit is rather good explaining 86% of the variance in the data (Table 1). If the range of the parameters are known their influence on the final score can be assessed directly from the size of the coefficients. The range of Call, Cfirst and ProQLG are all comparable in size (as they are trained to predict the LGscore of the model), ProQMX needs to be multiplied by ten to put it on the same scale and Scoregood and Scorevery good are roughly one-third of the three first parameters. Thus, the consensus analysis is the most important factor, followed by the structural evaluation using ProQLG, while ProQMX and the two server-specific scores influence the final score to a lesser degree. This is also in agreement with the R2 values in Table 1.
|
Pmodeller5
For the previous Pcons version we have developed a corresponding Pmodeller version (Wallner et al., 2003). Pmodeller uses Modeller (Sali and Blundell, 1993) to build all-atom models which are assessed using ProQ and the final Pmodeller score is a combination of the Pcons score and ProQ score. In CASP5 it was shown that Pmodeller performs better than its corresponding Pcons version.
In Pmodeller5, we have used a slightly different approach. Instead of a linear combination of the Pcons and ProQ scores, the combination is done in two steps. First Pcons5 is used to find the best scoring models, then all-atom models are built using Modeller6v2 (Sali and Blundell, 1993) for all models with a score within 10% from the highest. These models are then subjected to a re-ranking using the original all-atom version of ProQ (Wallner and Elofsson, 2003), which is significantly better than the backbone ProQ module used in Pcons5. The final Pmodeller score consists only of the ProQ score. The use of only the top 10% scoring models will ensure that only the best models are included in the final ranking. At the same time the algorithm gets significantly faster, since there is no need to build all-atom models for low scoring models.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
It is important that new methods are benchmarked and compared with existing methods. Pcons5 has been thoroughly benchmarked in LiveBench and both Pcons5 and Pmodeller5 participated in CASP6.
Performance in LiveBench
ROC analysis from LiveBench-8 is seen in Figure 3. With the exception of two incorrect hits, Pcons5 consistently performs better than the previous versions of Pcons. In addition, it also performs better than the 3D-Jury method that uses consensus from eight selected servers (more or less the same servers that Pcons5 was using), i.e. this 3D-Jury method corresponds to the consensus analysis module in Pcons5. Consequently, the structural evaluation using ProQ and the score from the server is the reason for the 10% improvement relative to the 3D-Jury method. The performance of the best independent server, SAM-T02 (Karplus et al., 2003), used by Pcons5 in LiveBench-8 is remarkable. Pcons5 is only marginally better (for >2 incorrect) and it outperforms all previous Pcons versions and even 3D-Jury which also uses SAM-T02 in its consensus analysis. However, one problem with Pcons5 on LiveBench is that we have no control over servers that are used. Sometimes the results from certain independent servers are missing, e.g. if the pressure of the servers is high. Therefore, the comparison with the independent servers is better done on the CASP6 data, where it was guaranteed that the results from as many servers as possible were used as input to Pcons5.
|
Performance in CASP6
Pcons5 uses a number of independent servers as its input, normally these are submitted through the meta-server (http://bioinfo.pl/meta/). During CASP6 there was a problem running Pcons5 as an automatic server using the meta-server. Instead two different versions of Pcons5 participated in CASP (Pcons5 and SBC-Pcons5). Pcons5 used a limited number of independent servers and was run through the genesilico.pl meta-server (http://genesilico.pl/meta/), while SBC-Pcons5 used more (and better) servers from the bioinfo.pl meta-server once these data were available. Unfortunately these data were not always ready within the time limit to participate as a server in CASP. This forced us to have SBC-Pcons5 registered as a manual group, even though it was run without any human intervention. For each of the Pcons version a corresponding Pmodeller version also participated in CASP (Pmodeller5 and SBC-Pmodeller5).
As expected the SBC-* versions of Pcons and Pmodeller using more and better servers performed significantly better with 10% higher sum of GDT_TS than the versions using fewer servers. This shows that the success of the consensus approach is dependent on a good set of individual servers. It can be used on a limited number of servers, but the performance can only be expected to be as good as the models it can choose between. The following analysis will be on the SBC-* versions of Pcons and Pmodeller. The relative performance of Pcons and Pmodeller is similar for the versions using fewer servers (data not shown).
To compare the performance of Pmodeller5, Pcons5, with the server they are using and with the other groups participating in CASP6, the GDT_TS (Zemla et al., 1999) score for the first ranked models was used. Other scoring schemes like MaxSub (Siew et al., 2000) or TM-score (Zhang and Skolnick, 2004) produce similar results (data not shown). Identical to our previous analysis of CASP5 results (Wallner et al., 2003) we made two assumptions in our analysis. First, insignificant differences in performances were ignored, by considering two models with a difference <0.05 GDT_TS score to be of similar quality. Second, models where none of the compared methods made a correct prediction were also ignored. This was done by ignoring all targets where none of the compared methods could align >30 residues, i.e. where the GDT_TS multiplied by the length of the target is <0.30. Targets were also divided into comparative modeling targets (CM) by concatenating CM easy and CM hard and to fold recognition/new fold (FR/NF) by combining FR-homologous, FR-analogous and new fold as defined by the CASP assessors (see http:/predictioncenter.org). This resulted in a total of 59 domains, divided into 41 CM targets and 18 FR targets, after filtering out models where no predictor made a correct prediction.
Pcons5 versus Pmodeller5
Pmodeller5 performed significantly better than Pcons5 for 10 targets and only significantly worse for 3, Table 2. For the FR&NF models it did not make any model worse than the corresponding Pcons5 model. Since Pmodeller5 uses the result from Pcons5 it is possible that the predictions are based on the same alignment. This is the case for 20% of the Pmodeller5 models, but in none of these cases is the model significantly improved by the homology modeling procedure. Thus the main reason for the increase in performance is the re-ranking of the models using ProQ (Wallner and Elofsson, 2003). This was also the main conclusion from comparisons of the previous versions of Pcons and Pmodeller (Wallner et al., 2003).
|
Comparison to servers used by Pcons5 and Pmodeller5
The servers used by Pcons5 and Pmodeller5 are listed in Table 3 together with the number of times it was selected by either Pcons5 and Pmodeller5. RAPTOR (Xu et al., 2003) is the most frequently selected method both by Pcons5 and Pmodeller5. The main differences in server preference are that Pmodeller5 selects Eidogen-SFST (http://www.eidogen.com) and INBGU (Fischer, 2000) models more frequently than Pcons5, while Pcons5 seems to prefer mGenTHREADER (Jones, 1999) and BasC (Ginalski et al., 2004) models. In fact, all four INBGU models selected by Pmodeller5 significantly increase the model quality compared with the Pcons5 model for the same targets. Four of the selected Eidogen-SFST models are also better than the corresponding Pcons5 model.
|
For each group an average rank was also calculated using the following formula:
![]() |
In Table 4 the performance of all the groups participating in CASP6 sorted by average rank is shown. Unfortunately not all servers used by Pcons5 participated in CASP6, e.g. the INBGU server and some servers hosted by bioinfo.pl only participated in CAFASP4. However, according to the CAFASP4 evaluation the best independent server was Eidogen-EXPM. Thus, the servers only participating in CAFASP4 could be expected to be ranked slightly below Eidogen-EXPM.
|
Pmodeller5 shows the highest average rank of all servers. Pcons5 performs significantly worse compared with Pmodeller5. One of the servers used by Pcons5, Eidogen-SFST, is actually ranked slightly higher than Pcons5. Even though it is disappointing that Pcons5 is not ranked higher than Eidogen-SFST, it is still impressive that the ProQ re-ranking managed to make Pmodeller5 the best server.
The sum of the GTD_TS for the first ranked model from each group was also used to measure performance (Table 5). Here, Pcons5 performs better than the best individual server, in particular for the harder targets. Pmodeller5 is better than Pcons5 for both hard and easy targets, but the real improvement is observed for the easy targets, where it performs almost as well as the best manual groups.
|
One advantage with a consensus method is that it most often selects a model that is better than the average model and seldom a model that is worse than average. However, even though it usually makes a good choice, it often misses the best possible model, i.e. the best model is ranked high but not at the top. Here, a more specific method or energy function can be used to evaluate the top hits. In our case, by using ProQ we increased the number of top hits produced from 13 to 24% of all targets.
| CONCLUSIONS |
|---|
|
|
|---|
We have developed a new version of Pcons, Pcons5, that uses structural evaluation and reliability assessment of the server score on top of the consensus analysis. This add-on improves the performance by 10% compared with only using consensus on the LiveBench-8 dataset. The performance compared with previous versions is also slightly higher. The new version is easy to update and works even for unseen methods.
In addition to the development of Pcons5 a new version of Pmodeller has also been developed, Pmodeller5. This method uses ProQ to evaluate the best hits from Pcons5. Pmodeller5 was among the best servers in CASP6 and consistently ranked higher than Pcons5.
Pcons5 is the first Pcons version available as a standalone program from: http://www.sbc.su.se/~bjorn/Pcons5. It should be easy to implement in local meta-servers. The model evalution module in Pmodeller, ProQ, is also available as a standalone program from: http://www.sbc.su.se/~bjorn/ProQ
| Acknowledgments |
|---|
This work was supported by grants from the Swedish Natural Sciences Research Council and by a grant from the Graduate Research School in Genomics and Bioinformatics.
Conflict of Interest: none declared.
Received on July 20, 2005; revised on September 5, 2005; accepted on October 1, 2005
| REFERENCES |
|---|
|
|
|---|
Altschul, S.F., et al. (1997) Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res, . 25, 33893402
Berman, H.M., et al. (2000) The Protein Data Bank. Nucleic Acids Res, . 28, 235242
Bujnicki, J.M., et al. (2001) Livebench-2: large-scale automated evaluation of protein structure prediction servers. Proteins, 45, Suppl. 5, 184191[CrossRef].
Bujnicki, J.M., et al. (2001) Structure prediction meta server. Bioinformatics, 17, 750751
Cristobal, S., et al. (2001) A study of quality measures for protein threading models. BMC Bioinformatics, 2, 5[CrossRef][Medline].
Fischer, D. (2000) Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pac. Symp. Biocomput, . 119130.
Fischer, D. (2003) 3D-SHOTGUN: a novel, cooperative, fold-recognition meta-predictor. Proteins, 51, 434441[CrossRef][Web of Science][Medline].
Ginalski, K., et al. (2003) 3D-jury: a simple approach to improve protein structure predictions. Bioinformatics, 19, 10151018
Ginalski, K., et al. (2005) Practical lessons from protein structure prediction. Nucleic Acids Res, . 33, 18741891
Ginalski, K., et al. (2003) ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res, . 31, 38043807
Ginalski, K., et al. (2004) Detecting distant homology with meta-BASIC. Nucleic Acids Res, . 32, (Web Server issue) W576W581
Gough, J. and Chothia, C. (2002) SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res, . 30, 268272
Jones, D.T. (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Biol, . 287, 797815[CrossRef][Web of Science][Medline].
Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol, . 292, 195202[CrossRef][Web of Science][Medline].
Karplus, K., et al. (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics, 14, 846856
Karplus, K., et al. (2003) Combining local-structure, fold-recognition and new fold methods for protein structure prediction. Proteins, 53, Suppl. 6, 491496.
Kelley, L.A., et al. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol, . 299, 523544[CrossRef].
Lee, B. and Richards, F.M. (1971) The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol, . 55, 379400[CrossRef][Web of Science][Medline].
Lundström, J., et al. (2001) Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci, . 10, 23542362[CrossRef][Web of Science][Medline].
Moult, J., et al. (2003) Critical assessment of methods of protein structure prediction (CASP)-round V. Proteins, 53, Suppl. 6, 334339.
Rychlewski, L., et al. (2003) Livebench-6: large-scale automated evaluation of protein structure prediction servers. Proteins, 53, Suppl. 6, 542547.
Rychlewski, L., et al. (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci, . 9, 232241[Web of Science][Medline].
Sali, A. and Blundell, T.L. (1993) Comparative modelling by statisfaction of spatial restraints. J. Mol. Biol, . 234, 779815[CrossRef][Web of Science][Medline].
Shi, J., et al. (2001) Fugue: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol, . 310, 243257[CrossRef][Web of Science][Medline].
Siew, N., et al. (2000) Maxsub: an automated measure to assess the quality of protein structure predictions. Bionformatics, 16, 776785.
Wallner, B. and Elofsson, A. (2003) Can correct protein models be identified? Protein Sci, . 12, 10731086[CrossRef][Web of Science][Medline].
Wallner, B., et al. (2003) Automatic consensus-based fold recognition using Pcons, ProQ and Pmodeller. Proteins, 53, Suppl. 6, 534541.
Xu, J., et al. (2003) RAPTOR: optimal protein threading by linear programming. J. Bioinform. Comput. Biol, . 1, 95117[CrossRef][Medline].
Zemla, A., et al. (1999) Processing and analysis of CASP3 protein structure predictions. Protein, Suppl. 3, 2229.
Zhang, Y. and Skolnick, J. (2004) Scoring function for automated assessment of protein structure template quality. Proteins, 57, 702710[CrossRef][Web of Science][Medline].
This article has been cited by other articles:
![]() |
J. White, Z. Li, R. Sardana, J. M. Bujnicki, E. M. Marcotte, and A. W. Johnson Bud23 Methylates G1575 of 18S rRNA and Is Required for Efficient Nuclear Export of Pre-40S Subunits Mol. Cell. Biol., May 15, 2008; 28(10): 3151 - 3161. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wu and Y. Zhang LOMETS: A local meta-threading-server for protein structure prediction Nucleic Acids Res., May 11, 2007; 35(10): 3375 - 3382. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Darewicz, J. Dziuba, and P. Minkiewicz Computational Characterisation and Identification of Peptides for in silico Detection of Potentially Celiac-Toxic Proteins Food Science and Technology International, April 1, 2007; 13(2): 125 - 133. [Abstract] [PDF] |
||||
![]() |
P. Fariselli, I. Rossi, E. Capriotti, and R. Casadio The WWWH of remote homolog detection: The state of the art Brief Bioinform, March 1, 2007; 8(2): 78 - 87. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Chivian and D. Baker Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection Nucleic Acids Res., October 18, 2006; 34(17): e112 - e112. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei and N. V. Grishin MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information Nucleic Acids Res., September 11, 2006; 34(16): 4364 - 4374. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||










