Bioinformatics Advance Access originally published online on March 7, 2006
Bioinformatics 2006 22(11):1335-1342; doi:10.1093/bioinformatics/btl079
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Predicting protein interaction sites: binding hot-spots in proteinprotein and proteinligand interfaces
Institute of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds Leeds LS2 9JT, UK
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Protein assemblies are currently poorly represented in structural databases and their structural elucidation is a key goal in biology. Here we analyse clefts in protein surfaces, likely to correspond to binding hot-spots, and rank them according to sequence conservation and simple measures of physical properties including hydrophobicity, desolvation, electrostatic and van der Waals potentials, to predict which are involved in binding in the native complex.
Results: The resulting differences between predicting binding-sites at proteinprotein and proteinligand interfaces are striking. There is a high level of prediction accuracy (
93%) for proteinligand interactions, based on the following attributes: van der Waals potential, electrostatic potential, desolvation and surface conservation. Generally, the prediction accuracy for proteinprotein interactions is lower, with the exception of enzymes. Our results show that the ease of cleft desolvation is strongly predictive of interfaces and strongly maintained across all classes of protein-binding interface.
Contact: r.m.jackson{at}leeds.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.
| INTRODUCTION |
|---|
|
|
|---|
Protein interactions are critical for all aspects of cellular function, and determining how they interact has become a major goal in the post-genomic era (Sali et al., 2003; Rhodes et al., 2005). An emerging new approach is to take advantage of structural information to predict physical binding. In particular, prediction of protein binding-sites can guide the structural elucidation of protein complexes, allowing function prediction for numerous unannotated structural genomics targets and the design of molecules that can modulate biological function at a systems-level (Russell et al., 2004).
Proteinprotein interfaces are generally considered to be circular and relatively flat (Jones and Thornton, 1997a), studies of their properties have also shown that hydrophobic area prevails while electrostatic residues and hydrogen-bonding groups are evenly spread across the surface (Xu et al., 1997; Lo Conte et al., 1999), suggesting a uniform distribution of properties across the binding interfaces. However, alanine-scanning mutagenesis has shown that the stability of a complex is determined by only a fraction of the interface residues. This was first shown in the complex between human-growth hormone and the human-growth hormone-binding protein (Clackson and Wells, 1995). They showed that a far greater loss in affinity was seen when two tryptophan residues were mutated to alanine when compared with other mutations made in the interface. Similar results have been collated for a number of other protein complexes (Bogan and Thorn, 1998; Thorn and Bogan, 2001). Analysis of this data shows that these hot-spot residues are usually found in the centre of the interface, and are surrounded by residues that have a lesser effect on stability (Bogan and Thorn, 1998). Structural analysis has shown that the clusters of conserved, hot-spot, residues form clefts in the surface of the protein (Li et al., 2004). Enrichment of hot-spot areas with tryptophan, tyrosine and arginine residues have been shown based on protein family conservation (Hu et al., 2000; Ma et al., 2003; Keskin et al., 2005). However, other residues, including polar residues, may also be highly conserved (Hu et al., 2000). Solvent exclusion around polar or charged interactions lowers the effective dielectric constant thus strengthening the interaction. It has been suggested that the hydrophobic effect of hot-spot residues is almost double that of other residues in an antibody/antigen interface (Li et al., 2005). Multiple hot-spot regions can be found in a single interface (Ma et al., 2003), these tend to cluster at the centre of the interface, and interact with equivalent hot-spot residues in the binding partner protein (Halperin et al., 2004; Keskin et al., 2005). The clustering of conserved residues has been used to predict possible interacting partners based on alignment to a known pair of interacting proteins (Aytuna et al., 2005; Espadaler et al., 2005). Clefts in the protein surface are also important for the prediction of proteinligand interactions. Indeed, ligand-binding sites can often be predicted by identifying the largest exposed cleft (Laskowski et al., 1996). Ligand-binding sites may also be detected as cleft regions with high predicted binding affinity, based on energetic contouring with interacting molecular probes (Laurie and Jackson, 2005).
Here we analyse the ability of different key physicochemical attributes and other binding surface properties, such as surface conservation, to predict the binding interface in proteinprotein and proteinligand complexes. Our approach attempts to define binding hot-spots on the protein surface. These are defined by clefts on the protein surface that are further described by their key attributes. We predict clefts on the protein surface using Q-SiteFinder (Laurie and Jackson, 2005) an energy-based method for the prediction of protein binding clefts. The properties of these predicted clefts are then investigated and compared. Given that the key attributes we use to describe clefts are the same for both proteinprotein and proteinligand interactions, the resulting differences we find between their binding-sites are striking. This has important implications for understanding the nature of protein interactions, for identifying the suitability of sites as drug targets and for identifying critical regions for docking and structure-based drug design.
| METHODS |
|---|
|
|
|---|
Interaction datasets
Enzyme-inhibitor and antibodyantigen complexes of the proteinprotein docking benchmark 1.0 (Chen et al., 2003) were supplemented with a selection of other available protein complexes to create a proteinprotein interaction dataset of 97 pairwise non-obligate hetero-complexes. These additional complexes satisfied the following criteria: (1) No two complexes had a sequence-identity of >60% in an alignment over >80% of the sequence of both proteins involved in the interaction. (2) In keeping with the other complexes of benchmark 1.0 no complex had a resolution of >3.25 Å. (3) No complex had large disordered regions in the interface. Oligomeric interfaces for all obligate proteinprotein interactions of the dataset were obscured using the appropriate multimeric complex as the interacting monomer. This was achieved by analysis of the protein quaternary structure database (Henrick and Thornton, 1998). As such, monomers mentioned in the text may refer to protein structures of more than one peptide chain. Occasionally this left several independent interfaces between interacting proteins. In the cases where more than one independent interface to identical proteins exist, all interfaces of the protein are considered. Where proteins have only one interface only that interface was assessed. Cognate ligands required for the function of the protein were retained, while those not indicated as such in the literature were discarded. The final dataset consisted of 22 enzymeinhibitor complexes, 19 antibody-antigen complexes and 56 protein complexes that could not be included in either classification (other complexes). Details of the proteins used can be found in the Supplementary Material.
A total of 134 ligandprotein complexes from the Gold dataset were used (Nissink et al., 2002) and were prepared as described by Laurie and Jackson (2005). These constitute a subset of the full Gold dataset where proteins of high structural similarity are removed. The dataset was further classified into proteinligand interactions in enzymes (95 complexes) and non-enzymes (39 complexes). The binding-site of a single ligand is assessed for each protein. Additional ligands covalently bound to the protein are treated as protein while other solvent molecules are removed. As with the proteinprotein interaction dataset the interacting proteins are assessed in the absence of the binding molecule.
Pocket detection and occupancy using Q-SiteFinder
Clefts were identified in the protein-surface using Q-SiteFinder (Laurie and Jackson, 2005). The method is briefly described. A grid is built around each protein, such that the spacing between grid points is 0.9 Å in all orthogonal-directions and the entire protein is covered. The non-bonded interaction energy is then calculated using the GRID forcefield (Goodford, 1985) parameters for every grid-point that is sufficiently far from the protein to allow a methyl (CH3) group to be positioned without steric overlap with any protein atom (Jackson, 2002). Probes with calculated interaction energies of less than a 1.3 kcal/mol are retained and clustered according to spatial proximity whereby no probe in any cluster has a centre that is >1.0 Å from the centre of another probe in the same cluster. The total non-bonded interaction energy is then calculated across all probes in the cluster and serves as the means by which the clusters are ranked. The highest ranking site is the cluster with the highest cumulative interaction energy. Occupancy of a cleft is defined as the percentage fill of the cleft by atoms of the binding protein or ligand; a threshold of 25% is applied to define a successful cleft. Proteins with no occupied clefts were excluded from further analysis. These consisted of the larger proteins of two proteinprotein interactions 1HE8 and 1DE4, and six proteinligand complexes (1CGY, 1DR1, 1HDY, 1LDM, 1PBD and 2PDM).
Pocket ranking
Hydrophobicity
The atomistic Solvent Accessible Surface Area (SASA) covered by each cleft was calculated by NACCESS (Hubbard and Thornton, 1993). The solvent accessibilities of all protein atoms were calculated in both presence and absence of the probes of each cluster. The difference between these values represents the covered area of the cleft. Hydrophobic area was defined as the area exposed by atoms that are either carbon or sulphur (as in NACCESS). The cleft with the highest proportion of hydrophobic accessible atom area was ranked first.
Desolvation
The entropy associated with the removal of water from the surfaces covered by the probe clusters was estimated using coefficients for the transfer of N-acetyl derivative amino acids between water and octanol. The total transfer energies have previously been converted to atomistic-based values by linear fitting (Fauchere and Pliska, 1983) and optimized for proteinprotein docking (Fernandez-Recio et al., 2004). The coverage of each atom was calculated as described above, allowing the desolvation energy of the cleft to be calculated by multiplying the area by appropriate solvation parameter taken from Fernandez-Recio et al. (2004). Clefts were ranked such that the cleft most easily desolvated was ranked first.
Electrostatics
The peak, average and total electrostatic potential for each cleft was calculated using the DelPhi v.4 package (Rocchia et al., 2001, 2002). Protons were added to the protein as described by Q-SiteFinder (Laurie and Jackson, 2005). A grid of 101 points in all dimensions was built around each protein such that the molecule occupied 50% of the cubic grid's volume. Amber charges and radii were used to describe the protein and the associated cognate ligands (proteinprotein interface analysis only) (Giammona, 1984; Weiner et al., 1984; Schneider and Suhnel, 1999; Antony et al., 2000; Meagher et al., 2003). A dielectric boundary condition was used, with the dielectric constants of 4 within and 80 outside the molecular surfaces defined by a water probe radii of 1.4 Å. Salt concentration in the solute was set at 0.15 M and a 2 Å ion exclusion radius was applied around the protein. The finite difference PossionBoltzmann calculations were performed until the potentials at the grid points converged to 0.0001 kT e1, after which the potentials were extrapolated to the centre of the individual probes. The total cleft potential was defined as the sum of the modulus of individual probe potentials that form the cluster, while the average potential was this total value divided by the number of probes in the cluster. The peak values for each cleft were defined as the highest modulus individual probe potentials of each cluster. The same calculations were repeated with all proteins with a unit charge applied to all atoms. The same Amber atom radii were used as described, as were all other variables bar the convergence criteria. Unit charge calculations were run until they converged to within 0.01 kT e1.
Conservation
The surface conservation for each cleft was calculated by extracting the protein sequences of each chain in the protein from the associated ATOM records of their Protein Data Bank coordinates. For each chain of the protein, close-homologues were found by performing a PSI-BLAST search over the non-redundant Swiss-Prot database release 47.6 (Altschul et al., 1997; Bairoch et al., 2005). Three iterations were performed and the search refined using sequences with similarity to the query sequence defined at an E-value <0.001. Redundant sequences were removed and the remaining full-length sequences were then aligned by Muscle (Edgar, 2004). The conservation of each position in the alignment relative to the initial protein sequence was calculated by Scorecons (Valdar, 2002). Conservation scores are based on the sum of the weighted pairwise exchange of residues, for each pair of sequences, at each position in the alignment. Values lie between 0 (no conservation) and 1 (completely conserved). SASA was calculated for the protein structure of the same chain by MSMS with vertices spread evenly across the protein surface at a density of 1 vertex per Å2 (Sanner et al., 1996). The conservation scores for each residue were then mapped to the appropriate vertices. Cleft conservation was defined as the average conservation score of vertices within 2.0 Å of the centre of a probe belonging to the cluster.
Calculation of true and false positive rates
The true positive rate (TPR), or sensitivity, was calculated as the number of genuine interface clefts that are ranked in the top k clefts (true positives, TP) divided by the total number of interface clefts identified for this monomer (TP plus false negatives, FN), where k increases by one each time. The false positive rate (FPR), one minus specificity, was the number of clefts ranked in the top k clefts that are not in the interface (FP) divided by the total number of non-interface clefts (FP plus true negatives, TN). Equal TPR and FPR values at all values of k indicate no discrimination by the ranking method.
The sensitivity [TPR = TP/(TP + FN)] and false positive rates [FPR = FP/(FP + TN)] were calculated for the clefts ranked by the above attributes. An attribute that is a successful predictor of interface clefts will show high sensitivity and a low error rate. Plotting TPR against FPR gives a receiver operating characteristic (ROC) plot.
In order to give a single measure of prediction accuracy, which is independent of any decision threshold, we have calculated the ROC integral or area under the curve (AUC) (Hanley and Mcneil, 1982). A value of 0.5 indicates no correlation, a value <0.5 indicates negative correlation and a value approaching 1.0 indicates the theoretical maximum or perfect prediction.
| RESULTS |
|---|
|
|
|---|
Analysis of protein clefts
Q-SiteFinder was used to define the clefts found on the protein surface (see Methods). Q-SiteFinder was run on both bound monomers of each hetero-protein complex in the non-obligate proteinprotein interaction datasets and on all proteins in the proteinligand dataset. Following evaluation at different probe interaction thresholds a value of 1.3 kcal/mol was chosen, based on the success of the method in proteinligand complexes around this value (Laurie and Jackson, 2005) and also based on the visualization of the clefts involved in proteinprotein interactions, since it also best describes the clefts of the protein that were filled by the sidechains of the opposing binding protein. We found that altering the threshold did not greatly affect the predictive power for proteinprotein interactions.
Q-SiteFinder was used to calculate 99 clefts on the protein surface. Occupied or true cleft predictions for each protein are defined as those occupied to >25% of their volume by atoms of the interacting molecule. Figure 1 shows that for all 194 monomers of the proteinprotein dataset there is a slightly skewed distribution of the number of true interface clefts per protein monomer with a peak around six clefts. The separate distributions of smaller and larger monomers show differences. The peak number of successful interface clefts for the smaller (ligand) monomers, at six, is higher than the peak, at 24 clefts for larger (receptor) monomers, however, the latter has a broad peak spanning 29 clefts. Figure 2 shows there are approximately equal percentages of interface areas covered in both sets of monomers. Proteinligand receptors show far fewer occupied clefts with a clear peak at one. Also true interface clefts have a greater coverage than do the non-interface regions in contrast to proteinprotein interfaces. However, given the large standard deviations it is not possible to conclude this observation is significant. Below we attempt to separate those clefts that are occupied from those that are not, based on properties that have been suggested to be important for the prediction and stability of intermolecular interactions.
|
|
Analysis of cleft properties
Intermolecular interactions are stabilized by a number of factors, these include the burial of areas of hydrophobicity, the formation of hydrogen-bonds and electrostatic complementarity (Chothia and Janin, 1975). We have found clefts in the protein surfaces using Q-SiteFinder and re-ranked them according to simple measures of the cleft properties (see Methods). All the different properties were used to rank the clefts to assess their ability to predict which ones are involved in binding in the native complex. ROC plots are used for this purpose (see Methods). Whilst, most of the properties used (van der Waals, electrostatic, hydrophobicity, desolvation, and conservation) are readily interpretable, the unit electrostatic properties involve the calculation of electrostatic potential from a uniform charge density on all protein atoms, as opposed to the standard atom-specific charges. The method typically generates large potentials in larger or enclosed clefts (Bate and Warwicker, 2004), and does not reflect atom-specific electrostatic properties of the site.
Enzymeinhibitor complexes
Results of re-ranking the clefts for the enzymeinhibitor monomers are shown in Figure 3a and b. The results for the enzymes (receptors) show that surface residue conservation is the best predictor of true protein interface regions as well as for their protein inhibitors (ligands). However, it is a much better predictor for enzymes, in agreement with the study of Bradford and Westhead (2003). The unit electrostatic properties (unit peak and average) are also good predictors of interface regions in the enzymes but show no predictive power (or even slight anti-correlation) in the inhibitors. These results can be rationalized in terms of known protein function, in that the substrate/inhibitor binding cleft of the enzymes is conserved in sequence owing to the conserved nature of the catalytic mechanism, and also active sites are often enclosed, pre-organized clefts, a property that may be important for enzyme catalysis. For example, the serine-protease family, are well-represented in the enzyme class. The sequence conserved catalytic oxy-anion hole in serine proteases is a deep cleft which accommodates the conserved binding conformation of the protein inhibitor canonical loop (Jackson and Russell, 2000). Therefore, it is unsurprising that unit electrostatic properties capture this characteristic for the enzymes only. Both of these cleft characteristics may capture the characteristics of the binding clefts of enzyme protein families very well but are not expected to generalize well to other proteinprotein interactions. The only other strong predictor of interface regions is ranking clefts by desolvation energy. This is discussed further below.
|
Antibody-antigen complexes
The ROC plots for the predictions of the interface clefts of the 38 bound monomers of the 19 antibodyantigen complexes can be seen in Figure 3c and d. Although the results for the antigens (ligands) do not show any strong trends, those for the prediction of interface clefts in the antibodies (receptors) do. The highly significant anti-correlation of residue conservation in the antibodies is striking, yet entirely logical given the biological role of antibodies. This reflects antibody structure in which high sequence diverse loops form the complementarity determining regions (CDRs) and antigen binding interface. The CDRs determine specificity on what is otherwise a highly conserved protein framework. Both Q-SiteFinder and total unit electrostatics scores show some weak predictive capacity that may reflect their ability to define the size and position of clefts between the loops that form the CDRs of the antibodies. However, the results for the antibodies show that cleft desolvation energy is by far the best predictor of protein interface regions. It is interesting to see that it is also the most prominent of all attributes in the antigen proteins as well, albeit to a much lesser degree.
Other proteinprotein complexes
The ROC curves for the 112 bound monomers of the 56 other complexes can be seen in Figure 3e. Unlike antibody and enzyme results there are few strong correlations to be seen in either the ligand or receptor sets of proteins (see Supplementary Data). This is probably because this is a highly non-homogenous set of proteinprotein complexes. This diversity almost certainly neutralizes protein sub-class specific characteristics such as those seen in the enzyme and antibody sets. The strongest correlation between interface cleft and high rank in both the receptor and ligand proteins is the desolvation energy. Conservation is only weakly predictive, and fails to re-rank the interface clefts highly for a more diverse set of protein complexes, consistent with the results of Caffrey et al., (2004). By changing the area over which conservation is assessed [from a large patch in Caffrey et al. (2004) investigation, to a small cleft in ours] it was hoped to improve these results in-line with the findings of Li et al. (2004). Li et al. (2004) showed that the conserved residues of interface regions cluster around clefts in the protein surface. By re-ranking the clefts according to conservation score, we anticipated an improved relationship for the prediction of interfaces. However this was not the case. Consistent with the results of all the proteinprotein interfaces analyzed in this piece of work, desolvation energy is the most effective common factor in the identification of interface clefts.
Proteinligand interactions
The ROC curves for the 134 proteinligand complexes can be found in Figure 3f. The difference between these results and those of the proteinprotein interface clefts is striking. Unlike the proteinprotein interfaces, several properties are excellent predictors of protein interface regions. The ranking of clefts based on Q-SiteFinder support both the relative merits of using this method for ligand-binding site prediction and the previous observation that this method has a 90% success rate in the top three predicted clusters when tested on the same dataset (Laurie and Jackson, 2005). Not so surprisingly the electrostatic total (based on amber charges) and unit electrostatic total show a very similar pattern, with the latter being slightly more successful than Q-SiteFinder. These two measures may be most dependent on the number of probe centres that make up the cleft, as defined by Q-SiteFinder, rather than the nature of the properties that define the cleft. Of all the attributes the unit electrostatics (average and peak) stand out and rank slightly above Q-SiteFinder overall, however, their initial prioritization of clefts is in fact weaker. The finding that using unit charges rather than the more physicochemically realistic amber charges improves results (Bate and Warwicker, 2004) is also corroborated by our results. Other, strong predictors are cleft desolvation energy and surface residue conservation. The latter is not unexpected since active/ligand binding sites, have been shown to be sequence conserved across many different protein families (Bartlett et al., 2002) and consequently this property is useful in the identification of enzyme active sites (Greaves and Warwicker, 2005). This is further corroborated here by the difference in predictive power of conservation for enzymes versus non-enzyme proteinligand complexes (see Supplementary Data).
With the exception of the enzymes, proteinprotein interaction clefts have a poor correlation with each of the three measures for the unit electrostatics calculations, implying that clefts in proteinprotein interfaces (defined by Q-SiteFinder) are fundamentally different to those in proteinligand complexes. The observation that high electrostatics potential is indicative of a ligand-binding site seems at odds with the finding that the ease of desolvation of the cleft also correlates well. The ease of desolvation used in this analysis would be expected to favour non-polar surface with a high aliphatic or aromatic content rather than a charged surface. However, it is primarily the unit electrostatic properties that are highly predictive, and these may be most dependent on the shape and depth of the cavity rather than true atom-based electrostatic properties. Furthermore, the poor predictive power of hydrophobicity in proteinligand and most proteinprotein interactions indicate fundamental differences between clefts defined by hydrophobicity and desolvation. This challenges the classical view that hydrophobicity, as defined by non-polar surface area, is a useful predictor of interaction interfaces (Chothia and Janin, 1975).
| DISCUSSION |
|---|
|
|
|---|
This study has assessed the applicability of using protein surface clefts for the prediction of proteinprotein and proteinligand interfaces. There are a number of multi-attribute methods for the prediction of proteinprotein interfaces that use averages of properties across a patch of protein surface in order to determine whether that patch lies in an interface (Jones and Thornton, 1997b; Zhou and Shan, 2001; Fariselli et al., 2002; Neuvirth et al., 2004; Bordner and Abagyan, 2005; Bradford and Westhead, 2005). Combinations of properties allow the prediction of proteinprotein interactions with successes of up to 70%, despite treating the protein interface as an average of its properties. There is some evidence to suggest that the stability of a protein complex is not spread across the interface but instead localized around clefts in the protein surface. By identifying clefts in the protein surface we investigated whether any single property could discriminate those in the protein interface from those elsewhere on the protein surface.
An overall ROC integral or AUC gives a single measure of prediction accuracy, (Hanley and Mcneil, 1982) used in many scientific studies. A table of results for the AUC of all the studies presented, are given in the Supplementary Data. For all proteinprotein interactions, conservation scores appear to be less effective than anticipated (AUC: 58%), confirming the results of Caffrey et al. (2004) who concluded that proteinprotein interfaces are not significantly more conserved than the rest of the protein surface. Only in the enzyme (AUC: 79%) and enzyme proteinligand (AUC: 78%) interfaces is conservation of significant predictive value. Electrostatics also failed to show any general trends in the prediction of proteinprotein interfaces on monomers (AUC: 4554% for all interactions) other than enzymes (AUC: 65%, electrostatic total) consistent with the theory that long-range electrostatics do not play an important role in the funnel concept of proteinprotein interactions (Schlosshauer and Baker, 2004), but are rather used for orientation (Schreiber and Fersht, 1996). Of the properties studied in this investigation only desolvation energy of the clefts had any general predictive power across all proteinprotein interaction types (AUC: 68%). Fernández-Recio et al. (2005) showed that desolvation scores over larger regions of protein surface area, as defined by circular patches, correlated with the interface in 58% of proteins. It appears that the cleft-based approach employed here is more general. Charged interfaces, such as those between enzymes and inhibitors, and those of the antigens form the majority of interfaces that Fernández-Recio et al. (2005) failed to predict. Desolvation scores, as implemented by our cleft-based analysis, correlate nearly as well with charged and antigen interfaces as they do with other types of interface, with the enzyme (AUC: 74%), inhibitor (AUC: 61%) and antigen (AUC: 62%) interfaces being predicted well. A similar high level of success is also seen in proteinligand interfaces (AUC: 72%), which also have charged interfaces with a high electrostatic potential. The level of success may be attributed to the fact that in defining the protein surface in terms of clefts we are only sampling a portion of the protein interface, as opposed to the whole surface of a circular patch. It appears that sites of high electrostatic potential and favourable desolvation coexist independently in the same interface. This is supported by the fact that the correlation between sites ranked by these two attributes for both proteinprotein and proteinligand complexes is insignificant (R2 = 0.020.1) and slightly negatively correlated (data not shown).
Proteinprotein and proteinligand interface clefts show several differences as well as some similarities. The most obvious differences are the failure of electrostatic descriptors overall to identify proteinprotein interface clefts. However, identification of enzyme active sites is possible by electrostatic methods, and the results were improved when using a unit charge applied to each atom in the protein (Bate and Warwicker, 2004). Our results confirm these observations. The power of electrostatic descriptors in the proteinligand interactions, particularly those of the enzyme subset, where the ligands bind at active sites, is only maintained in the enzyme class of proteinprotein interactions. In fact, the proteinprotein enzyme class shows a high degree of correlation with proteinligand interactions for other attributes, and both also show significant predictive power of Q-SiteFinder and residue conservation, in contrast to all other proteinprotein interfaces. Therefore, it would appear that the binding-sites of enzymes are much more predictable than those of other proteinprotein complexes. The very high predictive power of Q-SiteFinder (AUC: 88%) and all the unit electrostatic properties (AUC: 93%) for proteinligand interactions is very encouraging and will allow the targeting of functionally important ligand-binding sites in functional genomics.
Overall, proteinprotein and proteinligand interface clefts do have similarities. Our results show that the ease of cleft desolvation or de-wetting of all classes of interface are strikingly similar and more indicative of interface regions than electrostatics, Q-SiteFinder, or residue conservation, which although highly predictive in some cases do not generalize well to all classes of protein interaction. Recent studies of the role of water in proteinprotein association of the melittin tetramer (Barratt et al., 2005) and in proteinligand binding (Liu et al., 2005) in mouse major urinary protein (MUP) by Molecular Dynamics simulation illustrate the importance of the phenomenon of de-wetting in molecular interactions. In the former, the strongly hydrophobic surfaces induce the evaporation (drying) of water in the interface as the melittin protein subunits approach one-another. The authors conclude that sufficiently hydrophobic protein surfaces can induce a liquidvapour transition providing the driving force towards protein association. In the later study, the MUP ligand-binding-site cleft is pre-organized, hydrophobic and poorly solvated in the unbound form, which may explain the largely enthalpy (as opposed to entropy) driven thermodynamics of ligand binding. We have found that hydrophobicity of the surface does not correlate strongly with interface clefts in either proteinprotein or proteinligand interfaces. Hydrophobic, non-polar surfaces will also correlate closely with areas of low desolvation energy; however, in proteins that form non-obligate protein/ligand complexes (and hence must be independently stable in water) extended hydrophobic solvent exposed surfaces are unlikely to exist. Therefore, a more subtle phenomenon exists, a relationship between ease of desolvation (de-wetting) and interface propensity which may be an intrinsic feature of all protein binding surfaces.
| Acknowledgments |
|---|
The authors would like to thank Alasdair Laurie for help with Q-SiteFinder and the protein-small molecule dataset. NJB is funded by the Medical Research Council.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Dmitrij Frishman
Received on January 20, 2006; revised on February 14, 2006; accepted on February 28, 2006
| REFERENCES |
|---|
|
|
|---|
Altschul, S.F., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, . 25, 33893402
Antony, J., et al. (2000) Theoretical study of electron transfer between the photolyase catalytic cofactor FADH() and DNA thymine dimer. J. Am. Chem. Soc, . 122, 10571065[CrossRef].
Aytuna, A.S., et al. (2005) Prediction of proteinprotein interactions by combining structure and sequence conservation in protein interfaces. Bioinformatics, 21, 28502855
Bairoch, A., et al. (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res, . 33, D154D159
Barratt, E., et al. (2005) Van der Waals interactions dominate ligandprotein association in a protein binding site occluded from solvent water. J. Am. Chem. Soc, . 127, 1182711834[CrossRef][ISI][Medline].
Bartlett, G.J., et al. (2002) Analysis of catalytic residues in enzyme active sites. J. Mol. Biol, . 324, 105121[CrossRef][ISI][Medline].
Bate, P. and Warwicker, J. (2004) Enzyme/non-enzyme discrimination and prediction of enzyme active site location using charge-based methods. J. Mol. Biol, . 340, 263276[CrossRef][ISI][Medline].
Bogan, A.A. and Thorn, K.S. (1998) Anatomy of hot spots in protein interfaces. J. Mol. Biol, . 280, 19[CrossRef][ISI][Medline].
Bordner, A.J. and Abagyan, R. (2005) Statistical analysis and prediction of proteinprotein interfaces. Proteins, 60, 353366[CrossRef][ISI][Medline].
Bradford, J.R. and Westhead, D.R. (2003) Asymmetric mutation rates at enzyme-inhibitor interfaces: implications for the proteinprotein docking problem. Protein Sci, . 12, 20992103
Bradford, J.R. and Westhead, D.R. (2005) Improved prediction of proteinprotein binding sites using a support vector machines approach. Bioinformatics, 21, 14871494
Caffrey, D.R., et al. (2004) Are proteinprotein interfaces more conserved in sequence than the rest of the protein surface? Protein Sci, . 13, 190202
Chen, R., et al. (2003) A proteinprotein docking benchmark. Proteins, 52, 8891[CrossRef][ISI][Medline].
Chothia, C. and Janin, J. (1975) Principles of proteinprotein recognition. Nature, 256, 705708[CrossRef][Medline].
Clackson, T. and Wells, J.A. (1995) A hot spot of binding energy in a hormone-receptor interface. Science, 267, 383386
Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res, . 32, 17921797
Espadaler, J., et al. (2005) Prediction of proteinprotein interactions using distant conservation of sequence patterns and structure relationships. Bioinformatics, 21, 33603368
Fariselli, P., et al. (2002) Prediction of proteinprotein interaction sites in heterocomplexes with neural networks. Eur. J. Biochem, . 269, 13561361[ISI][Medline].
Fauchere, J.L. and Pliska, V. (1983) Hydrophobic paramaters
of amino acid side chains from the partitioning of N-acetyl-amino-acid amides. Eur. J. Med. Chem, . 18, 369375[ISI].
Fernandez-Recio, J., et al. (2004) Identification of proteinprotein interaction sites from docking energy landscapes. J. Mol. Biol, . 335, 843865[CrossRef][ISI][Medline].
Fernandez-Recio, J., et al. (2005) Optimal docking area: a new method for predicting proteinprotein interaction sites. Proteins, 58, 134143[CrossRef][ISI][Medline].
Giammona, D.A. (1984) Ph.D. Thesis, University of California, Davis, CA.
Goodford, P.J. (1985) A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J. Med. Chem, . 28, 849857[CrossRef][ISI][Medline].
Greaves, R. and Warwicker, J. (2005) Active site identification through geometry-based and sequence profile-based calculations: burial of catalytic clefts. J. Mol. Biol, . 349, 547557[CrossRef][ISI][Medline].
Halperin, I., et al. (2004) Proteinprotein interactions; coupling of structurally conserved residues and of hot spots across interfaces. Implications for docking. Structure, 12, 10271038[Medline].
Hanley, J.A. and McNeil, B.J. (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143, 2936
Henrick, K. and Thornton, J.M. (1998) PQS: a protein quaternary structure file server. Trends Biochem. Sci, . 23, 358361[CrossRef][ISI][Medline].
Hu, Z., et al. (2000) Conservation of polar residues as hot spots at protein interfaces. Proteins, 39, 331342[CrossRef][ISI][Medline].
Hubbard, S.J. and Thornton, J.M. (1993) NACCESS. , Manchester, UK Manchester University.
Jackson, R.M. (2002) Q-fit: a probabilistic method for docking molecular fragments by sampling low energy conformational space. J. Comput. Aided Mol. Des, . 16, 4357[CrossRef][ISI][Medline].
Jackson, R.M. and Russell, R.B. (2000) The serine protease inhibitor canonical loop conformation: examples found in extracellular hydrolases, toxins, cytokines and viral proteins. J. Mol. Biol, . 296, 325334[CrossRef][ISI][Medline].
Jones, S. and Thornton, J.M. (1997a) Analysis of proteinprotein interaction sites using surface patches. J. Mol. Biol, . 272, 121132[CrossRef][ISI][Medline].
Jones, S. and Thornton, J.M. (1997b) Prediction of proteinprotein interaction sites using patch analysis. J. Mol. Biol, . 272, 133143[CrossRef][ISI][Medline].
Keskin, O., et al. (2005) Hot regions in proteinprotein interactions: the organization and contribution of structurally conserved hot spot residues. J. Mol. Biol, . 345, 12811294[CrossRef][ISI][Medline].
Laskowski, R.A., et al. (1996) Protein clefts in molecular recognition and function. Protein Sci, . 5, 24382452[Abstract].
Laurie, A.T. and Jackson, R.M. (2005) Q-SiteFinder: an energy-based method for the prediction of proteinligand binding sites. Bioinformatics, 21, 19081916
Li, X., et al. (2004) Proteinprotein interactions: hot spots and structurally conserved residues often locate in complemented pockets that pre-organized in the unbound states: implications for docking. J. Mol. Biol, . 344, 781795[CrossRef][ISI][Medline].
Li, Y., et al. (2005) Magnitude of the hydrophobic effect at central versus peripheral sites in proteinprotein interfaces. Structure, 13, 297307[Medline].
Liu, P., et al. (2005) Observation of a dewetting transition in the collapse of the melittin tetramer. Nature, 437, 159162[CrossRef][Medline].
Lo Conte, L., et al. (1999) The atomic structure of proteinprotein recognition sites. J. Mol. Biol, . 285, 21772198[CrossRef][ISI][Medline].
Ma, B., et al. (2003) Proteinprotein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc. Natl Acad Sci. USA, 100, 57725777
Meagher, K.L., et al. (2003) Development of polyphosphate parameters for use with the AMBER force field. J. Comput. Chem, . 24, 10161025[CrossRef][ISI][Medline].
Neuvirth, H., et al. (2004) ProMate: a structure based prediction program to identify the location of proteinprotein binding sites. J. Mol. Biol, . 338, 181199[CrossRef][ISI][Medline].
Nissink, J.W., et al. (2002) A new test set for validating predictions of proteinligand interaction. Proteins, 49, 457471[CrossRef][ISI][Medline].
Rhodes, D.R., et al. (2005) Probabilistic model of the human proteinprotein interaction network. Nat. Biotechnol, . 23, 951959[CrossRef][ISI][Medline].
Rocchia, W., et al. (2001) Extending the applicability of the nonlinear PoissonBoltzmann equation: multiple dielectric constants and multivalent ions. J. Phys. Chem. B, 105, 65076514[CrossRef].
Rocchia, W., et al. (2002) Rapid grid-based construction of the molecular surface and the use of induced surface charge to calculate reaction field energies: applications to the molecular systems and geometric objects. J. Comput. Chem, . 23, 128137[CrossRef][ISI][Medline].
Russell, R.B., et al. (2004) A structural perspective on proteinprotein interactions. Curr. Opin. Struct. Biol, . 14, 313324[CrossRef][ISI][Medline].
Sali, A., et al. (2003) From words to literature in structural proteomics. Nature, 422, 216225[CrossRef][Medline].
Sanner, M.F., et al. (1996) Reduced surface: an efficient way to compute molecular surfaces. Biopolymers, 38, 305320[CrossRef][ISI][Medline].
Schlosshauer, M. and Baker, D. (2004) Realistic proteinprotein association rates from a simple diffusional model neglecting long-range interactions, free energy barriers, and landscape ruggedness. Protein Sci, . 13, 16601669
Schneider, C. and Suhnel, J. (1999) A molecular dynamics simulation of the flavin mononucleotide-RNA aptamer complex. Biopolymers, 50, 287302[CrossRef][ISI][Medline].
Schreiber, G. and Fersht, A.R. (1996) Rapid, electrostatically assisted association of proteins. Nat. Struct. Biol, . 3, 427431[CrossRef][ISI][Medline].
Thorn, K.S. and Bogan, A.A. (2001) ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics, 17, 284285
Valdar, W.S. (2002) Scoring residue conservation. Proteins, 48, 227241[CrossRef][ISI][Medline].
Weiner, S.J., et al. (1984) A new force-field for mlecular mechanical simulation of nucleic-acids and proteins. J. Am. Chem. Soc, . 106, 765784[CrossRef].
Xu, D., et al. (1997) Hydrogen bonds and salt bridges across proteinprotein interfaces. Protein Eng, . 10, 9991012
Zhou, H.X. and Shan, Y. (2001) Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins, 44, 336343[CrossRef][ISI][Medline].
This article has been cited by other articles:
![]() |
H.-X. Zhou and S. Qin Interaction-site prediction for protein complexes: a critical assessment Bioinformatics, September 1, 2007; 23(17): 2203 - 2209. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



