Bioinformatics Advance Access originally published online on January 31, 2007
Bioinformatics 2007 23(7):789-792; doi:10.1093/bioinformatics/btm018
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A simple shape characteristic of protein–protein recognition
1Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Rd, La Jolla, CA 92037, USA and 2Center for Bioinformatics and Department of Molecular Biosciences, The University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Observation of co-crystallized protein–protein complexes and low-resolution protein–protein docking studies suggest the existence of a binding-related anisotropic shape characteristic of protein–protein complexes.
Results: Our study systematically assessed the global shape of proteins in a non-redundant database of co-crystallized protein–protein complexes by measuring the distance of the surface residues to the protein's center of mass. The results show that on average the binding site residues are closer to the center of mass than the non-binding surface residues. Thus, the study directly detects an important and simple binding-related characteristic of protein shapes. The results provide an insight into one of the fundamental properties of protein structure and association.
Contact: vakser{at}ku.edu
| 1 INTRODUCTION |
|---|
|
|
|---|
Protein–protein interactions play the key role in life processes. The structural information on these interactions is necessary for understanding these interactions, explaining the fundamental principles of molecular recognition and mechanisms of protein association, and exploring the practical implications for structure-based drug design and other applications.
The foundation of knowledge about structures of protein–protein complexes is provided by experimental studies, primarily X-ray crystallography. The rapidly expanding Protein Data Bank (POB) (Berman et al., 2000) contains increasing amount of co-crystallized protein–protein complexes, which serve as a unique resource for studying protein interfaces and other structural and physicochemical characteristics of protein interactions. If one defines protein–protein complex as two separate chains associated through a biological (not crystal packing) interface, the current number of the complexes runs up to tens of thousands (Douguet et al., 2006), depending on the biological interface criteria. Excluding homologous complexes, the number of non-redundant protein–protein pairs is reduced typically to several hundreds, depending on the widely-ranging criteria for non-redundancy (Douguet et al., 2006).
Along with experimental determination of protein–protein structures, computational approaches are increasingly important as a source of structures and as a means of studying the structures. The field of protein–protein structure prediction (docking) techniques is rapidly developing (Gray, 2006; Marshall and Vakser, 2005; Szilagyi et al., 2005; Vajda and Camacho, 2004), taking advantage of better algorithms as well as expanding data sets of experimentally determined protein interfaces.
The knowledge of the binding site is a byproduct of protein–protein docking. However, the docking predictions are often unreliable. Moreover, in many cases the binding partner or its structure is unknown, thus making the docking impossible. Thus, independent from docking, prediction of protein binding sites is important. Such predictions are based on a variety of considerations, including evolutionary conservation (Glaser et al., 2003; Pazos and Sternberg, 2004; Res and Lichtarge, 2005) and physicochemical characteristics (Keskin et al., 2004; Larsen et al., 1998; Lijnzaad and Argos, 1997; Ofran and Rost, 2003; Young et al., 1994; Zhou and Shan, 2001). Along with these, geometry is an important determinant of the binding site. Studies suggest that cavities in protein surface correlate with small ligand binding sites (del Sol et al., 2006; Ho and Marshall, 1990; Nayal and Honig, 2006), as well as protein binding sites (Binkowski et al., 2005; Rajamani et al., 2004). Moreover, characteristics of the entire protein shape, like principal axes of inertia, are correlated with ligand binding (Foote and Raman, 2000).
In the current study, we relate a simple protein shape characteristic to observed protein–protein binding modes. The results can be interpreted in terms of low-resolution protein–protein recognition.
| 2 METHODS |
|---|
|
|
|---|
The non-redundant database of 475 protein–protein complexes (Glaser et al., 2001) from PDB contains independent chains from co-crystallized protein pairs. The selection criteria were protein size
30 residues and the interface area
1000 Å2. The non-redundancy was achieved by requiring that no two complexes have both proteins homologous (
30% sequence identity). The database has been extensively used in systematic studies of protein–protein recognition (e.g. Gray et al., 2003; Papoian and Wolynes, 2003; Tovchigrechko and Vakser, 2001; Tovchigrechko et al., 2002; Vakser et al., 1999).
The surface residues were detected by PSA program (Sali and Blundell, 1993). A residue was considered to be at the interface if its Cß-Cß (C
, in case of Gly) distance from any residue of the other protein was
7 Å. The position the C
atom was used to calculate the distance between the residue and the center of mass of the protein.
| 3 RESULTS AND DISCUSSION |
|---|
|
|
|---|
3.1 Rationale
The size asymmetry in small ligand (organic compound)–receptor (protein) interaction typically plays out in the binding site on the receptor being a cavity (del Sol et al., 2006; Ho and Marshall, 1990; Nayal and Honig, 2006). A binding cavity on a small ligand is obviously geometrically impossible; however, geometry imposes no such restriction on the macromolecular receptor. Geometrically, the binding site on the receptor can be of any type—concave, flat, or convex. The fact that the observed binding sites are concave in all likelihood has to do with the free energy aspects of ligand–receptor interaction (which are beyond the scope of this study).
Transitioning to protein–protein complexes, one still tends to think of the larger protein as the receptor and the smaller one as the ligand. This largely unspoken tradition is common in protein–protein docking, where the larger protein is often assigned to be stationary and the smaller one moves to dock with the larger one. Beyond the semantics of this issue and the fact that in some docking algorithms a smaller moving molecule saves computer time [e.g. FFT approaches (Katchalski-Katzir et al., 1992)], casual observation of co-crystallized protein–protein complexes suggests that the smaller protein often binds in the cavity of the bigger one (e.g. enzyme–protein inhibitor complexes). One important exception would be the antigen–antibody complexes, where regardless of the antibody's target size, the antigenic site is typically convex (Novotny et al., 1986).
Quantitatively, the concave character of the binding site on the larger protein was suggested by earlier low-resolution docking studies (Tovchigrechko and Vakser, 2001; Vakser et al., 1999). These geometry-only based studies, where all structural details smaller than
7 Å are deleted, showed that within a complex, the smaller protein typically has more freedom in angular orientation than the larger protein, suggesting the existence of a prominent binding-related anisotropic shape characteristic of the larger protein (e.g. a binding cavity). Often the anisotropic character of the larger protein docking orientation may be explained by a large flat interface rather than a cavity. In any case, the low-resolution docking, along with studies of binding sites geometry, principal axes of inertia, etc. (see Introduction), conclusively point to the existence of large shape characteristics of proteins that distinguish the binding site. The current study directly detects a simple geometric characteristic of the binding site on the larger protein that facilitates protein–protein recognition.
3.2 Binding site statistics
The non-redundant data set of 475 protein–protein complexes (see Methods) was used to calculate the distance of surface residues to the center of mass in the larger protein in a complex. In case of equal-size homodimers, the larger protein was chosen randomly. For each complex, the shape of the larger protein was assessed according to the formula d =
di
/
do
, where
di
is the average distance of the interface residues to the protein center of mass, and
do
is that of the non-interface surface residues. Thus, if d < 1 the binding site is closer than average and, if d > 1, farther than average to the center of mass. The number of complexes with different d-values is shown in Figure 1 for the entire database (see Methods), as well as for small (1000–2000 Å2) and large (>4000 Å2) interfaces. The data clearly shows a tendency of the interface residues to be closer than average to the center of mass. The effect is not detectable for the small interfaces, but increases dramatically for the large interfaces.
|
The paradigm is illustrated in Figure 2. Examples of actual interfaces are shown in Figure 3. Arguably, a small interface is geometrically less likely than a large one to have a deep concavity or significant flatness detectable by a simple measure of the average distance to the protein center of mass. On the other hand, a large interface on the larger protein within a complex geometrically can be of any type—concave, convex or flat (Fig. 2). The fact that it is by far more likely to be close to the protein center of mass than the rest of the surface does not follow from geometry, but is rather due to free energy aspects of protein binding/folding. The analysis of such possible reasons is beyond the scope of this article, which simply describes quantitative phenomenological detection of this prominent shape characteristic.
|
|
The anisotropic character of the protein shape, in principle, can be used for the binding site prediction. However, our estimates (data not shown) indicate that the simple d measure alone may not be sensitive enough for a useful predictive procedure. The significance of this study is rather in discovery of an important (and simple) characteristic of protein shapes. The results provide an insight into one of the fundamental properties of protein structure and association.
3.3 Future directions
The current study will further develop in four directions. First, a distinction will be made between the obligate complexes (where the components exist in the co-crystallized folds only within the complex) and the non-obligate ones. One possibility is that the binding site concavity may be more pronounced in the non-obligate complexes (at least those with large interfaces), whereas the interfaces in multisubunit (presumably obligate) complexes may be more flat. Second, different complex types, according to their function (e.g. enzyme-inhibitor, electron transfer, etc.), will be tested separately. Third, a more detailed subdivision of the characteristic geometry types (Fig. 2) will be explored. Fourth, more sophisticated geometric determinants (e.g. describing the global shape, capturing local curvature, detecting the surface roughness) will be studied with regard to their capability to correlate with the binding site position.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
The authors wish to thank Andrey Tovchigrechko for helpful comments. The study was supported by NIH R01 GM074255. Funding to pay the Open Access publication charges was provided by NIH.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Anna Tramontano
Received on December 18, 2006; revised on January 15, 2007; accepted on January 17, 2007
| REFERENCES |
|---|
|
|
|---|
Berman HM, et al. The Protein Data Bank. Nucleic Acids Res (2000) 28:235–242.
Binkowski TA, et al. Protein surface analysis for function annotation in high-throughput structural genomics pipeline. Protein Sci (2005) 14:2972–2981.[CrossRef][Web of Science][Medline]
del Sol A, et al. Residue centrality, functionally important residues, and active site shape: analysis of enzyme and non-enzyme families. Protein Sci (2006) 15:2120–2128.[CrossRef][Web of Science][Medline]
Douguet D, et al. DOCKGROUND resource for studying protein-protein interfaces. Bioinformatics (2006) 22:2612–2618.
Foote J, Raman A. A relation between the principal axes of inertia and ligand binding. Proc. Natl. Acad. Sci. USA (2000) 97:978–983.
Glaser F, et al. ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics (2003) 19:163–164.
Glaser F, et al. Residue frequencies and pairing preferences at protein-protein interfaces. Proteins (2001) 43:89–102.[CrossRef][Web of Science][Medline]
Gray JJ. High-resolution protein–protein docking. Curr. Opin. Struct. Biol (2006) 16:183–193.[CrossRef][Web of Science][Medline]
Gray JJ, et al. Protein–protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J. Mol. Biol (2003) 331:281–299.[CrossRef][Web of Science][Medline]
Ho CMW, Marshall GR. Cavity search: an algorithm for the isolation and display of cavity-like binding regions. J. Comput. Aided Mol. Des (1990) 4:337–354.[CrossRef][Web of Science][Medline]
Katchalski-Katzir E, et al. Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proc. Natl. Acad. Sci. USA (1992) 89:2195–2199.
Keskin O, et al. A new, structurally nonredundant, diverse data set of protein–protein interfaces and its implications. Protein Sci (2004) 13:1043–1055.[CrossRef][Web of Science][Medline]
Larsen TA, et al. Morphology of protein-protein interfaces. Structure (1998) 6:421–427.[Medline]
Lijnzaad P, Argos P. Hydrophobic patches on protein subunit interfaces: charactersitics and prediction. Proteins (1997) 28:333–343.[CrossRef][Web of Science][Medline]
Marshall GR, Vakser IA. Protein-protein docking methods. In: Proteomics and Protein-Protein Interaction: Biology, Chemistry, Bioinformatics, and Drug Design.—Waksman G, ed. (2005) New York: Springer. 115–146.
Nayal M, Honig B. On the nature of cavities on protein surfaces: application to the identification of drug-binding sites. Proteins (2006) 63:892–906.[CrossRef][Web of Science][Medline]
Novotny J, et al. Antigenic determinants in proteins coincide with surface regions accessible to large probes (antibody domains). Proc. Natl. Acad. Sci. USA (1986) 83:226–230.
Ofran Y, Rost B. Analysing six types of protein–protein interfaces. J. Mol. Biol (2003) 325:377–387.[CrossRef][Web of Science][Medline]
Papoian GA, Wolynes PG. The physics and bioinformatics of binding and folding – an energy landscape perspective. Biopolymers (2003) 68:333–349.[CrossRef][Web of Science][Medline]
Pazos F, Sternberg MJE. Automated prediction of protein function and detection of functional sites from structure. Proc. Natl. Acad. Sci. USA (2004) 101:14754–14759.
Rajamani D, et al. Anchor residues in protein–protein interactions. Proc. Natl. Acad. Sci. USA (2004) 101:11287–11292.
Res I, Lichtarge O. Character and evolution of protein–protein interfaces. Phys. Biol (2005) 2:S36–S43.[CrossRef][Web of Science][Medline]
Sali A, Blundell TL. Comparative protein modeling by satisfaction of spatial restraints. J. Mol. Biol (1993) 234:779–815.[CrossRef][Web of Science][Medline]
Szilagyi A, et al. Prediction of physical protein-protein interactions. Phys. Biol (2005) 2:S1–S16.[CrossRef][Web of Science][Medline]
Tovchigrechko A, Vakser IA. How common is the funnel-like energy landscape in protein-protein interactions? Protein Sci (2001) 10:1572–1583.[CrossRef][Web of Science][Medline]
Tovchigrechko A, et al. Docking of protein models. Protein Sci (2002) 11:1888–1896.[CrossRef][Web of Science][Medline]
Vajda S, Camacho CJ. Protein–protein docking: is the glass half-full or half-empty? Trends Biotechnol (2004) 22:110–116.[CrossRef][Web of Science][Medline]
Vakser IA, et al. A systematic study of low-resolution recognition in protein-protein complexes. Proc. Natl. Acad. Sci. USA (1999) 96:8477–8482.
Young L, et al. A role for surface hydrophobicity in protein-protein recognition. Protein Sci (1994) 3:717–729.[Web of Science][Medline]
Zhou HX, Shan Y. Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins (2001) 44:336–343.[CrossRef][Web of Science][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


