Skip Navigation


Bioinformatics Advance Access originally published online on January 19, 2005
Bioinformatics 2005 21(8):1311-1315; doi:10.1093/bioinformatics/bti167
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/8/1311    most recent
bti167v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by del Sol, A.
Right arrow Articles by O'Meara, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by del Sol, A.
Right arrow Articles by O'Meara, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Topology of small-world networks of protein–protein complex structures

Antonio del Sol *, Hirotomo Fujihashi and Paul O'Meara

Bioinformatics Research Project, Research and Development Division, Fujirebio Inc. 51 Komiya-cho, Hachioji-shi, Tokyo 192-0031, Japan

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 DISCUSSION
 REFERENCES
 

The majority of real examples of small-world networks exhibit a power law distribution of edges among the nodes, therefore not fitting into the wiring model proposed by Watts and Strogatz. However, protein structures can be modeled as small-world networks, with a distribution of the number of links decaying exponentially as in the case of this wiring model. We approach the protein–protein interaction mechanism by viewing it as a particular rewiring occurring in the system of two small-world networks represented by the monomers, where a re-arrangement of links takes place upon dimerization leaving the small-world character in the dimer network. Due to this rewiring, the most central residues at the complex interfaces tend to form clusters, which are not homogenously distributed. We show that these highly central residues are strongly correlated with the presence of hot spots of binding free energy.

Contact: ao-mesa{at}fujirebio.co.jp

Supplementary information: http://www.fujirebio.co.jp/support/index.php (under construction).


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 DISCUSSION
 REFERENCES
 
Networks have become a powerful and useful tool for modeling and understanding the evolution of different complex systems (Kuramoto, 1984; Strogatz and Steward, 1993; Braiman et al., 1995; Gerhardt et al., 1990; Nowak and May, 1992). Although the connection topology is frequently assumed to be completely random or completely regular (Watts and Strogatz, 1998; Bollabas, 1985) in many cases both of these models seem to give a simplistic representation of real complex systems. Indeed, many real networks lie somewhere between the extremes of order and randomness with respect to their topological characteristics. This is the case of the so-called small-world network, where any pair of vertices can be connected through just a few links. The topology of these kinds of networks are characterized by large values of the clustering coefficient (as for regular graphs), defined as the average over all vertices of the fraction of the number of connected pairs of neighbors for each vertex, and small values of the characteristic path length (as for random graphs), defined as the average minimal distance between all pairs of vertices in the graph.

The representation of protein structures as small-world networks has recently become an interesting approach to study a variety of problems associated to protein function and structure, such as the identification of key residues involved in the protein folding mechanism (Vendruscolo et al., 2002) and the correlation between the topological properties of protein conformations and their kinetic ability to fold (Dokholyan et al., 2002; Greene and Higman, 2003) or the identification of functional sites in protein structures (Shemesh et al., 2004) among other examples.

An interesting application of small-world networks would be the representation of protein–protein complexes as such networks, in order to elucidate different structural characteristics associated with the presence of residues that contribute the most to the binding free energy (hot spots), which are unevenly distributed at the binding interface (Bogan and Thorn, 1998). Although different approaches involving sequence and structural information or energetic calculations have been proposed to study and predict hot spots of binding free energy (Kortemme and Baker, 2002; Sheinerman and Honig, 2002; Verkhivker et al., 2002; Ma et al., 2003; Brinda et al., 2002), the small-world representation of protein–protein complexes could give another complementary view on this problem.

Here, we show that the protein–protein interaction mechanism can be viewed as a specific rewiring occurring in the system of two small-world networks represented by the monomers, where a rearrangement of links takes place upon dimerization leaving the small-world character in the dimer network. Due to this specific rewiring, a rearrangement of residue centrality occurs, leading to the appearance of a significant percentage of central residues at the protein–protein interface. The analysis of 18 protein complexes with experimentally annotated hot spots of binding free energy shows that the most central residues at the protein–protein interface, responsible for the small-world character, are strongly correlated with the presence of hot spots.


    SYSTEMS AND METHODS
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 DISCUSSION
 REFERENCES
 
Datasets
A dataset of 42 dimer complexes, which each contained at least one monomeric structure was obtained by searching the protein data bank (PDB) (http://www.rcsb.org/pdb/) (Berman et al., 2000) and the structural classification of proteins (SCOP) database (http://www.scop.berkeley.edu/) (Murzin et al., 1995). The non-complexed structures were chosen if they had an identical sequence to their bound form with no insertions and deletions. If any of the complexes contained more than two structures in the unbound form the most recently solved structures were used. As a result, a dataset of 58 monomers was compiled.

A set of 18 protein complexes with experimental information on hot spot residues was obtained by searching the Alanine Scanning Energetics database (ASEdb) (http://www.140.247.111.161/hotspot/index.php) (Thorn and Bogan, 2001). Experimentally measured hot spots of binding free energy were defined as residues with a change in binding free energy greater than or equal to 1.0 Kcal/mol. Some additional data were used from previous studies in phenylalanine substitutions (Mainfroid et al., 1996).

The conservation of residues in the protein complexes was analyzed based on multiple sequence alignments generated by ClustalW (Thompson et al., 1994) using homologous protein sequences obtained from the Swissprot database (http://www.us.expasy.org/sprot/) (Boeckmann et al., 2003).

The accessible surface areas (ASAs) of the protein complexes were determined using the DSSP program (Kabsch and Sander, 1983). Experimental enrichment of hot spot information was obtained from the literature (Bogan and Thorn, 1998).

The protein graphs
The protein structures are modeled as networks with amino acid residues being the vertices and all atom contacts between them the edges. Atom contacts are defined when the distance between at least one atom of residue i is at a distance ≤5.0 Å from an atom of residue j (Greene and Higman, 2003).

The characteristic path length L is defined as the average minimal distance between all pairs of vertices in the graph, calculated by:

where Np represents the number of pairs of vertices of the graph, and lij is the minimal path between vertices i and j (Vendruscolo et al., 2002).

The clustering coefficient C is defined as the average over all vertices of the fraction of the number of connected pairs of neighbors for each vertex, calculated by:

where Nv is the number of vertices, Ni is the number of neighbors of the vertex i, and ni is the actual number of edges between the neighbors of i (Vendruscolo et al., 2002).

Statistical analysis
The frequency distributions of the residue number of links and betweenness centrality averaged over both sets of monomers and dimers were plotted and analyzed using Systat statistical software packages. The Kolmogorov–Smirnov test was used to test the statistically significant difference between the monomer and dimer frequency distributions.

Our analysis was carried out on a PC Linux cluster with 40 nodes (dual 3.02 GHz Xeon), and on a Windows PC (3.0 GHz Pentium IV).


    DISCUSSION
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 DISCUSSION
 REFERENCES
 
We start by modeling protein structures as networks (see Systems and Methods). We base our analysis on a representative set of 42 biologically diverse protein complexes (with one or both of their unbound structures available), and find, in agreement with previous studies (Vendruscolo et al., 2002; Dokholyan et al., 2002; Greene and Higman, 2003) that both the dimer and monomer structures exhibit small-world character in accordance with their values of clustering coefficients and characteristic path lengths, in comparison with random and regular graphs with the same number of vertices and average number of neighbors (see Supplementary material). Figure 1a illustrates the frequency distribution of the residue number of links N averaged in both sets of monomers and dimers, indicating that both distributions are Poisson-like, where P(x) = {lambda}xe{lambda}/x! (with the average residue number of links {lambda}), with no statistically significant difference between them. The concept of betweenness centrality used in sociology (Freeman, 1977) defined for each vertex k as the number of pairs of vertices with the shortest path among them passing through k normalized by the total number of pairs of vertices, is a good indicator of the centrality of the vertex in the network. The frequency distribution of the residue betweenness centrality ß averaged in both sets of monomers and dimers follows a power law with an exponential cut-off P(ß) = Cß{eta} exp(–ß/ßc), with the corresponding values for the power law scaling exponent {eta} and the exponential cut-off ßc approximately the same in the monomer and the dimer structures, and no statistically significant difference between the betweenness centrality distributions in the two cases (Fig. 1b). Unlike the frequency distribution of the residue number of links, the betweenness centrality frequency distribution is quite inhomogeneous, showing that a high number of residues have a small value of the betweenness centrality while only a few residues have a large value. This protein representation is in agreement with the wiring model proposed by Watts and Strogatz (1998), where an important role is played by the short cuts, responsible for the small values of the characteristic path length, while the clustering coefficient values remain high.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 1 Frequency distributions of the residue number of links and betweenness centrality averaged over both sets of monomers and dimers. (a) Bell-shaped Poisson frequency distribution of the residue number of links averaged in both the monomers (shown with the pink dots) and dimers (shown with the blue dots). The discrete Poisson fit P(x) = {lambda}xe{lambda}/x! is illustrated with the pink and blue lines for the monomers and dimers respectively. The average residue number of links {lambda} and the correlation coefficients squared R2 are shown in the graph. (b) Frequency distribution of betweenness centrality averaged over both sets of monomers (shown with the pink dots) and dimers (shown with the blue dots). The frequency distributions follow a power law with an exponential cut-off P(ß) = Cß{eta} exp(–ß/ßc) which is illustrated in the graph with the pink and blue lines for the monomers and dimers respectively. The data has been graphed using a logarithmic scale with the power law-scaling exponent {eta}, exponential cut-off ßc, constant C and the correlation coefficients squared R2 for both datasets shown in the graph. There was no statistically significant difference between the monomer and dimer frequency distributions in both (A) and (B).

 
We study the protein–protein interaction mechanism using this representation of protein structures as small-world networks in order to elucidate some of the important topological changes occurring upon dimerization and the existence of topological determinants possibly related to key residues in the complex stability.

The process of dimerization between monomers can be viewed as a particular rewiring (rather than preferential attachment) in the system of the two monomers (each corresponding to a small-world network) due to the conformational changes, with the removal and addition of links occurring in each monomer, the formation of new links between the monomers, but on the other hand, leaving the frequency distributions of the residue number of links and betweenness centrality with no statistically significant difference between both sets of monomers and dimers (see Fig. 2 in Supplementary material). Interestingly, due to this rewiring process, new central residues (with statistically significant high values of central betweenness z-score ≥ 3.0) which are not homogenously distributed appear mainly at the protein–protein interfaces, while other previously central residues in the monomeric structures lose their centrality in the dimer structure. Conversely, there are a number of central residues in the monomer structures, which remain central in the complex (see Fig. 3 in Supplementary material).

Perhaps the most interesting result of this work is the strong correlation between the statistically significant central residues at protein–protein interfaces (topological determinants) with the most contributing residues to the binding free energy in protein–protein interactions. Experimental results based on Alanine scanning mutagenesis (Thorn and Bogan, 2001) and phenylalanine substitution (Mainfroid et al., 1996) of protein–protein interfaces has shown that the free energy contribution of individual amino acids in protein–protein binding is not uniformly distributed at the binding site; instead there are hot spots of binding free energy ({Delta} {Delta} G ≥ 1.0 Kcal/mol) comprised of a small subset of residues at the complex interface (Bogan and Thorn, 1998). Our analysis based on a set of 18 protein complexes with experimental information on hot spot residues and covering different biological examples of protein–protein interactions shows that the statistically significant high betweenness residues (z-score ≥ 3.0) occurring at the protein–protein interfaces are not uniformly distributed, but instead cluster together, surrounded by regions of residues with relatively low values of betweenness centrality, resembling that of the aforementioned free energy of binding distribution. More detailed analysis reveals a clear tendency of the statistically significant high betweeness residues to be located in hot spot regions, with the experimentally annotated hot spots exhibiting statistically significant high betweenness values in the majority of the cases. Table 1 shows that in the 18 complexes analyzed, 81% of these central residues form clusters with an experimentally annotated hot spot at the cluster center with 22 of these statistically significant high betweenness residues been actual hot spots (see Fig. 4 in Supplementary material).


View this table:
[in this window]
[in a new window]
 
Table 1 Statistically significant high betweenness (z-score ≥3.0) residues obtained from the 18 complexes analyzed, and their correlation to hot spots of binding free energy

 
The remaining 19% of our predicted residues occur mainly in those examples of protein complexes with little experimental information on hot spot residues, such as the enzyme/Inhibitor complex 2ptcEI, which contains only one experimentally annotated hot spot of binding free energy. On the other hand, these residues tend to be clustered together, are highly correlated with the experimental data on hot spot enrichment, and are generally conserved in sequence alignment or non-exposed to the solvent in the dimer structure, indicating that many of them are candidates of hot spots.

Despite the complexity involved in real physical interaction networks occurring in the protein structures, our simple network representation of the latter provides some insight into this complicated picture. Indeed, by using only one network topology characteristic (betweenness centrality) we are able to identify hot spot regions at protein–protein interfaces, taking into account the global topology of the complex whilst keeping its simplicity, which in combination with the reduced computational requirements are clear advantages of our method over previous physical models proposed to identify hot spots of binding free energy (Kortemme and Baker, 2002; Sheinerman and Honig, 2002). On the other hand, the graph-spectral method proposed by Brinda et al., including some additional information, such as residue solvent accessibility and sequence conservation, shows that the betweenness centrality turns out to be a better and simpler predictor of hot spot regions. There is a possibility that the correspondence between energy hot spots and structurally conserved residues remarked upon by Ma et al., could be related to the tendency of energy hot spots to remain central in the interacting network.

Finally, we should mention that a graph theoretical representation method similar to ours has been proposed by Shemesh et al. for identifying functional sites in protein structures. These authors reported that the most central residues in protein structure networks are found in functional sites (catalytic or ligand binding sites). Although their measure of centrality differs from our definition of betweenness centrality, it would be interesting to explore the possibility of using the information of residue centrality in the monomeric structures in order to improve the current methods of protein dockings. Some initial results in this direction have been addressed in our recent work (del Sol and O'Meara, 2004) where we show that some central residues in the monomeric structures remain central after dimerization and that possible information on hot spots of binding free energy could be obtained from the unbound structures. We are planning to continue this study in the future.


    Acknowledgments
 
We would like to acknowledge interesting discussions in issues related to small-world view of protein structures with Dr Alfonso Valencia (CNB), and thank Professor Ruth Nussinov (NCI, Tel Aviv University) for helpful discussions on protein–protein interactions.

Received on September 6, 2004; revised on November 11, 2004; accepted on November 17, 2004

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 DISCUSSION
 REFERENCES
 

    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242[Abstract/Free Full Text].

    Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O'Donovan, C., Phan, I., Pilbout, S., Schneider, M. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res., 31, 365–370[Abstract/Free Full Text].

    Bogan, A.A. and Thorn, K.S. (1998) Anatomy of hot spots in protein interfaces. J. Mol. Biol., 280, 1–9[CrossRef][Web of Science][Medline].

    Bollabas, B. Random Graphs, (1985) , London Academic Press.

    Braiman, Y., Lindner, J.F., Ditto, W.L. (1995) Taming spatiotemporal chaos with disorder. Nature, 378, 465–467[CrossRef].

    Brinda, K.V., Kannan, N., Vishveshwara, S. Protein Eng., (2002) 15, 265–277[Abstract/Free Full Text].

    del Sol, A. and O'Meara, P. (2004) Small-world network approach to identify key residues in protein–protein interaction. Proteins, 58, 672–682.

    Dokholyan, N.V., Li, L., Ding, F., Shakhnovich, E.I. (2002) Topological determinants of protein folding. Proc. Natl Acad. Sci. USA, 99, 8637–8641[Abstract/Free Full Text].

    Freeman, L.C. (1977) A set of measures of centrality based on betweenness. Sociometry, 40, 35–43[CrossRef][Web of Science].

    Gerhardt, M., Schuster, H., Tyson, J.J. (1990) A cellular automaton model of excitable media including curvature and dispersion. Science, 247, 1563–1566[Abstract/Free Full Text].

    Greene, L.H. and Higman, V.A. (2003) Uncovering network systems within protein structures. J. Mol. Biol., 334, 781–791[CrossRef][Web of Science][Medline].

    Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637[CrossRef][Web of Science][Medline].

    Kortemme, T. and Baker, D. (2002) A simple model for binding free energy hot spots in protein–protein complexes. Proc. Natl Acad. Sci. USA, 99, 14116–14121[Abstract/Free Full Text].

    Kuramoto, Y. (1984) Chemical oscillation. Waves and Turbulence, , Berlin Springer.

    Ma, B., Elkayam, T., Wolfson, H., Nussinov, R. (2003) Protein–protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc. Natl Acad. Sci. USA, 100, 5772–5777[Abstract/Free Full Text].

    Mainfroid, V., Mande, S.C., Hol, W.G., Martial, J.A., Goraj, K. (1996) Stabilization of human triosephosphate isomerase by improvement of the stability of individual alpha-helices in dimeric as well as monomeric forms of the protein. Biochemistry, 35, 4110–4117[CrossRef][Medline].

    Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536–540[CrossRef][Web of Science][Medline].

    Nowak, M.A. and May, R.M. (1992) Evolutionary games and spatial chaos. Nature, 359, 826–829[CrossRef].

    Sheinerman, F.B. and Honig, B. (2002) On the role of electrostatic interactions in the design of protein–protein interfaces. J. Mol. Biol., 318, 161–177[CrossRef][Web of Science][Medline].

    The First Structural Bioinformatics Meeting Shemesh, A., Amitai, G., Sitbon, E., Shklar, M., Netanely, D., Venger, I., Pietrokovski, S. (2004) Structural analysis of residue interaction graphs. 22–23 ISMB/ECCB2004.

    Strogatz, S.H. and Steward, I. (1993) Coupled oscillators and biological synchronization. Sci. Am., 269, 102–109[Web of Science][Medline].

    Thompson, J.D., Higgins, D.G., Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680[Abstract/Free Full Text].

    Thorn, K.S. and Bogan, A.A. (2001) ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics, 17, 284–285[Abstract/Free Full Text].

    Vendruscolo, M., Dokholyan, N.V., Paci, E., Karplus, M. (2002) Small-world view of the amino acids that play a key role in protein folding. Phys. Rev. E, 65, 061910-1–061910-4.

    Verkhivker, G.M., Bouzida, D., Gehlhaar, D.K., Rejto, P.A., Freer, S.T., Rose, P.W. (2002) Monte carlo simulations of the peptide recognition at the consensus binding site of the constant fragment of human immunoglobulin G: the energy landscape analysis of a hot spot at the intermolecular interface. Proteins, 48, 539–557[CrossRef][Web of Science][Medline].

    Watts, D.J. and Strogatz, S.H. (1998) Collective dynamics of small-world networks. Nature, 393, 440–442 (London)[CrossRef][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
H. David-Eden and Y. Mandel-Gutfreund
Revealing unique properties of the ribosome using a network based analysis
Nucleic Acids Res., August 1, 2008; 36(14): 4641 - 4652.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. H. Paszkiewicz, M. J. E. Sternberg, and M. Lappe
Prediction of viable circular permutants using a graph theoretic approach
Bioinformatics, June 1, 2006; 22(11): 1353 - 1358.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/8/1311    most recent
bti167v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by del Sol, A.
Right arrow Articles by O'Meara, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by del Sol, A.
Right arrow Articles by O'Meara, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?