Bioinformatics Advance Access originally published online on March 25, 2009
Bioinformatics 2009 25(10):1259-1263; doi:10.1093/bioinformatics/btp148
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Structure similarity measure with penalty for close non-equivalent residues
1Howard Hughes Medical Institute, 2Department of Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050 and 3Department of Biochemistry, University of Washington, J Wing, Health Sciences Building, Box 357350, Seattle, WA 98195, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation:Recent improvement in homology-based structure modeling emphasizes the importance of sensitive evaluation measures that help identify and correct modest distortions in models compared with the target structures. Global Distance Test Total Score (GDT_TS), otherwise a very powerful and effective measure for model evaluation, is still insensitive to and can even reward such distortions, as observed for remote homology modeling in the latest CASP8 (Comparative Assessment of Structure Prediction).
Results:We develop a new measure that balances GDT_TS reward for the closeness of equivalent model and target residues (attraction term) with the penalty for the closeness of non-equivalent residues (repulsion term). Compared with GDT_TS, the resulting score, TR (total score with repulsion), is much more sensitive to structure compression both in real remote homologs and in CASP models. TR is correlated yet different from other measures of structure similarity. The largest difference from GDT_TS is observed in models of mid-range quality based on remote homology modeling.
Availability:The script for TR calculation is included in Supplementary Material. TR scores for all server models in CASP8 are available at http://prodata.swmed.edu/CASP8.
Contact: grishin{at}chop.swmed.edu
Supplementary information:All scripts and numerical data are available for download at ftp://iole.swmed.edu/pub/tr_score/
| 1 INTRODUCTION |
|---|
|
|
|---|
The current improvement of methods for sequence-based structure prediction (Kopp et al., 2007; Shi et al., 2009) leads to the increasing importance of accurate and biologically relevant measures of structural model quality. These measures are essential not only as criteria for benchmarking of method performance, but also as standards for the development of new, more powerful approaches to protein structure modeling. Any measure rewards certain aspects of model quality. Using the measure to guide method's design inevitably emphasizes these aspects in the produced models.
Among various measures of model quality, Global Distance Test Total Score (GDT_TS) is widely considered one of the most informative and robust. Proposed in a seminal paper (Zemla et al., 2001) as a benchmarking criterion at early stages of Comparative Assessment of Structure Prediction (CASP), this measure has been successfully used by many independent assessor groups (Kinch et al., 2003; Kopp et al., 2007; Wang et al., 2005; Zemla et al., 2001). In brief, GDT_TS is based on several target–model superpositions, each maximizing the number of equivalent target and model residues separated by a distance shorter than a given distance cutoff. These maximized numbers are normalized by the chain length and then averaged over the superpositions, producing the final score. Thus, GDT_TS is a reward for placing as many model residues as possible in a vicinity of their target counterparts.
This approach, although extremely effective in most cases, may still be insensitive to more modest deviations of models from native-like structure. As an example, in the models based on remote sequence homologs GDT_TS does not penalize and may even reward unrealistic placement of several model residues in a close vicinity of a single target residue (Aloy et al., 2003). When target and its closest homolog of known structure share only remote similarity, it is hard to predict the correct position of model residues based on the homolog as a template. In this situation, placing model residues closer to each other may improve the chances of some of them being rewarded for closeness to the target. Noted in previous CASPs (Aloy et al., 2003), this effect has become more pronounced among recent models (Shi et al., 2009; Fig. 1).
|
The last years have brought a significant improvement in the accuracy of remote homology modeling. As demonstrated by recent CASPs (Kopp et al., 2007; Shi et al., 2009), the current methods produce much more accurate models whose quality often differs only by a small GDT_TS margin; thus optimizing a method for higher GDT_TS might improve the apparent method's standing, even at the cost of less realistic models. As a consequence, more fine-grained evaluation is often needed to distinguish between alternative models of reasonable quality.
Most of the existing approaches to structure comparison reward similar regions but do not explicitly penalize dissimilar regions. Taking an analogy with physical forces, these approaches concentrate on attraction but not on repulsion. It might have been reasonable a few years ago, when the quality of structure predictions was much poorer, and rewarding for the spatial closeness of structure regions to the target would suffice to discriminate between models. It was important to detect any positive feature of a model, since there were more negatives about a model than positives. Today, many models reflect structures well. When the positives start to outweigh the negatives, it becomes important to pay attention to the negatives. Here, we introduce an explicit repulsion component into the GDT_TS score, which penalizes for incorrect pairing of non-equivalent residues in the compared structures. We show that unlike GDT_TS, the new measure is sensitive to mild structure compression and thus may be a valuable tool in discriminating biologically unrealistic structure predictions.
| 2 METHODS |
|---|
|
|
|---|
2.1 Compression of real homologous domains
We choose pairs of structural classification of proteins (SCOP) domain representatives that are confident homologs but do not have a strong structure similarity. From the total of
100 000 ASTRAL40 domain pairs that share the same superfamily, we select two subsets based on different definitions of marginal structure similarity. First set includes
27 000 pairs with lower range but still significant DALI Z-scores, DALI Z between 2.0 and 5.0. Second set includes the lower third of all domain pairs ranked by DALI Z-score (
30 000 pairs, DALI Z between 2.0 and 5.8). In each pair, the set of equivalent residues is chosen according to the blocks (capital letters) of confidently aligned positions in DALI alignment. In separate experiments, each of the domains was compressed (compression ratio of 0.9–1.0), followed by the calculation of GDT_TS and total score with repulsion (TR) on the set of equivalent residues.
2.2 Compression of the CASP-fold recognition models
CASP targets and server models are downloaded from http://predictioncenter.org/casp8. We further processed target and model structures, split them into domains and assigned them to standard categories by prediction difficulty as described at http://prodata.swmed.edu/CASP8. We use target–model pairs for 25 target domains in the fold recognition category, subject each model to varying degrees of compression and calculate corresponding GDT_TS and TR scores. As an example in Figure 3B, we show the plot for models by Robetta server, which does not introduce significant distortions of local geometry.
2.3 Perturbation of the torsion angles
We perturb torsion angles
,
in fold recognition models with no chain breaks. We chose Robetta models as they have very few chain breaks: among 125 FR models by Robetta, only five models for the same target have a break. To every original
/
value, we add a randomly distributed Gaussian term with a mean of 0 and a certain SD. The original structure is modified by consequent rotation of backbone segments according to the new angle values, keeping all other geometric parameters intact.
2.4 Correlation between TR and the existing scores
For each CASP8 target, we perform GDT_TS-based ranking of first models submitted by servers and select top 10 models. We average target–model similarity scores for those models and plot TR against other measures, with each target represented by a single point. To make the scale of DALI Z-scores compatible to TR, we normalize them by DALI Z-scores for the comparison of target domain to itself.
| 3 RESULTS |
|---|
|
|
|---|
Analyzing structure predictions produced by automated servers in the recent CASP, CASP8 (Shi et al., 2009), we observe a number of models that are assigned high GDT_TS scores but include unrealistically distorted regions (Fig. 1). These regions often show violations of secondary structure geometry, shorter distances between C
atoms, sidechain clashes and distortion of hydrogen bond patterns.
In fold recognition modeling, when the closest homolog with 3D structure is relatively distant from the target, it is often difficult to predict the correct location of target residues, even if the general positioning of secondary structure elements can be inferred from the sequence-based alignment. Such situations typically involve ambiguity of angles between secondary structure elements, register shifts in alpha-helices and beta-strands and unknown loop conformations. By definition, GDT_TS rewards relatively close positioning of equivalent C
atoms in model and target, but does not penalize situations where two or more model residues are close to the same target residue. Therefore, a GDT_TS-trained automatic predictor may in some cases choose to concentrate C
atoms in a smaller volume, which increases the probability for the target residue to have the correct model residue in the vicinity, even if the correct residue is unknown. A schematic example of such a conformation is shown in Figure 2A, where both correct and incorrect model C
atoms are located closely to the target C
.
|
Such deviations from native protein geometry may be loosely viewed as an implementation of the minimax principle from game theory: select the conformation that minimizes the maximal possible loss in the case of failure. For example, for an ambiguous helix register, putting residues closer to each other may produce a positive contribution to GDT_TS even if the register is wrong (Fig. 1). In the case of unknown loop conformation, predicting an unrealistic collapsed loop bears less risk of large GDT_TS loss than predicting an extended (and locally correct) conformation that may deviate further from the target if overall orientation is wrong.
More surprisingly, even a simplistic uniform contraction of the model can in many cases produce higher GDT_TS (Fig. 2; Section 3.3).
3.1 Score for model quality with explicit repulsion term
To develop a measure sensitive to the observed structure compression, we complement GDT_TS with an explicit repulsion term that penalizes for close spatial placement of incorrect residue pairs (Fig. 2A). We call this score TR. The score is calculated as follows (Fig. 2B).
- Superimpose model with target using local-global alignment method (LGA) (Zemla, 2003) in the sequence-dependent mode, maximizing the number of aligned residue pairs within distance cutoffs of d = 1, 2, 4 and 8 Å.
- For each pair of target and model residues (ai, bi), calculate a GDT_TS-like score: so = 1/4 (
1 +
2 +
4 +
8), where
d = 1 if C
–C
distance <d Å and
d = 0 otherwise.
- In LGA superposition for d = 4 Å, consider individual aligned residues in both structures. For each residue R, choose residues in the other structure that are spatially close to R, excluding the residue aligned with R and its immediate neighbors in the chain. Count numbers of such residues with C
–C
distance to R within cutoffs of 1, 2 and 4 Å. (As opposed to GDT_TS, we do not use the cutoff of 8 Å as too inclusive: in native proteins, this distance is not prohibited for different residues of the same chain.)
- The average of these counts defines the penalty assigned to a given residue R: p(R)=1/3 (n1 + n2 +n4).
- Finally, for each aligned residue pair (ai, bi), the average of penalties for each residue
=1/2 [p(ai) + p(bi)] is weighted and subtracted from the GDT_TS score for this pair. The final score is prohibited from being negative, in order to avoid rewarding shorter models limited to only the confident structure core: 
The score for the two compared structures is calculated as the sum of scores for individual residue pairs, normalized by target chain length L: TR=(1/L)
si.
The cutoff of 4 Å for the used LGA superposition was set after testing multiple values around typical GDT_TS distance cutoffs. This value produced the scoring of CASP8 models that was most consistent with the manual expert assessment of the model quality.
3.2 The new score has improved sensitivity to model distortions
As an example of effects of mild structure distortion on GDT_TS and TR, we observe behavior of these scores in two simplistic experiments. In each experiment, we calculate the scores on a pair of structures and then perturb one of the structures. We vary the degree of the perturbation and observe corresponding changes in GDT_TS and TR scores. In the first experiment, we uniformly contract one of the structures towards its center of mass, so that its radius of gyration decreases and its residues become slightly closer to each other (Fig. 3A, B). In the second experiment, we perturb torsion angles
,
by adding a random Gaussian term to each angle in the structure (Fig. 3C).
|
By design, TR score is expected to bring improvement in sensitivity at medium levels of modeling difficulty, where sequence alignment to a homolog template is possible yet non-trivial. TR should not be better than GDT_TS either for models based on close homology where model–target misalignments are rare, or for template-free models where even a general fold prediction is a challenge. Therefore in our experiments, we concentrate on cases of clear yet distant template–target homology.
In the first experiment (Fig. 3A), we use real pairs of remotely homologous protein domains. Among all pairs of ASTRAL40 (Chandonia et al., 2004) representatives that share the same superfamily in SCOP classification (Andreeva et al., 2008), we choose
27 000 pairs with marginal structure similarity according to DALI Z-scores (Holm and Sander, 1993) and compress one of the domains in each pair. On average, GDT_TS does not decrease until compression reaches
5% (Fig. 3A). Moreover, in 40% cases GDT_TS actually increases at mild compression levels. In contrast, TR is consistently reduced on compressed structures, with the rate of reduction much higher than for GDT_TS.
We perform the same experiment (Fig. 3B) on pairs of target and model domains from automatic CASP8 server predictions in the fold recognition category, as defined by our analysis (Shi et al., 2009). Similar to the first dataset, mild compression leads to GDT_TS increase in 40% cases. Figure 3B shows the average degree of GDT_TS gain for these models, as well as for the whole model set. At the same time, TR penalizes the compression much more effectively. Full sets of GDT_TS and TR scores for all server models of CASP8 are available at http://prodata.swmed.edu/CASP8.
When we perturb torsion angles in the same set of models, TR decreases with the variance of the perturbation significantly faster than GDT_TS, although the difference is somewhat less pronounced (Fig. 3C).
3.3 Correlation with existing measures for structure similarity
We compared the new score with other similarity scores based on the set of first server models produced for all CASP8 targets. Figure 4 shows correlation plots for TR versus GDT_TS, DALI Z-scores, and TM scores (Zhang and Skolnick, 2005). The table of correlation coefficients of TR with these and other measures is included in the supplement.
|
It is clear that GDT_TS and TR scores are well correlated (Fig. 4A), with Pearson correlation coefficient of 0.99. By design, TR is always equal or lower than GDT_TS. Notably, the trend curve of the correlation is concave, so that the difference is more pronounced around the mid-range GDT_TS. This range roughly corresponds to the targets from the categories of harder comparative modeling and fold recognition, where models become less similar to targets and modeled residues are frequently placed nearby non-equivalent residues, which results in higher penalty by TR. For very low model quality (GDT_TS below 30%) the reward is much lower, and penalty drops as well (Fig. 4A).
TR also shows general correlation with other similarity measures, although to a lesser degree. Correlation coefficient with conceptually different contact-based DALI Z-scores (normalized to have a compatible scale) is relatively high (0.95). Interestingly, TM score (Zhang and Skolnick, 2005) based on a concept similar to GDT_TS is less correlated with TR than DALI Z-scores (Fig. 4C, correlation coefficient of 0.88), but the trend curve's concave shape is similar to the plot of TR against GDT_TS.
In conclusion, we develop a new measure for protein structure similarity with explicit repulsion term that penalizes for spatially close positioning of non-equivalent residues. This measure improves sensitivity of GDT_TS to moderate structure distortion and has a potential value for the assessment of structure similarity and homology-based structure modeling.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We would like to thank Yang Zhang for stimulating discussions and critical reading of the manuscript and Bong-Hyun Kim for sharing DALI Z-scores for ASTRAL domain pairs. The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing high-performance computing resources.
Funding: National Institutes of Health (GM67165 to N.V.G.).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Burkhard Rost
Received on January 21, 2009; revised on March 11, 2009; accepted on March 12, 2009
| REFERENCES |
|---|
|
|
|---|
Aloy P, et al. Predictions without templates: new folds, secondary structure, and contacts in CASP5. Proteins (2003) 53(Suppl. 6):436–456.[CrossRef][Web of Science][Medline]
Andreeva A, et al. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. (2008) 36:D419–D425.
Chandonia JM, et al. The ASTRAL compendium in 2004. Nucleic Acids Res. (2004) 32:D189–D192.
Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J. Mol. Biol. (1993) 233:123–138.[CrossRef][Web of Science][Medline]
Kinch LN, et al. CASP5 assessment of fold recognition target predictions. Proteins (2003) 53(Suppl. 6):395–409.[CrossRef][Web of Science][Medline]
Kopp J, et al. Assessment of CASP7 predictions for template-based modeling targets. Proteins (2007) 69(Suppl. 8):38–56.[CrossRef][Web of Science][Medline]
Shi S, et al. Analysis of CASP8 targets, predictions and assessment methods. Database: The Journal of Biological Database and Curation (2009) in press.
Wang G, et al. Assessment of fold recognition predictions in CASP6. Proteins (2005) 61(Suppl. 7):46–66.[CrossRef][Web of Science][Medline]
Zemla A. LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res. (2003) 31:3370–3374.
Zemla A, et al. Processing and evaluation of predictions in CASP4. Proteins (2001) 45(Suppl. 5):13–21.[CrossRef]
Zhang Y, Skolnick J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. (2005) 33:2302–2309.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



