Skip Navigation


Bioinformatics Advance Access originally published online on May 17, 2007
Bioinformatics 2007 23(15):1909-1918; doi:10.1093/bioinformatics/btm274
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
23/15/1909    most recent
btm274v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Guharoy, M.
Right arrow Articles by Chakrabarti, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Guharoy, M.
Right arrow Articles by Chakrabarti, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Secondary structure based analysis and classification of biological interfaces: identification of binding motifs in protein–protein interactions

Mainak Guharoy and Pinak Chakrabarti *

Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme VIIM, Kolkata 700054, India

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: The increasing amount of data on protein–protein interaction needs to be rationalized for deriving guidelines for the alteration or design of an interface between two proteins.

Results: We present a detaild structural analysis and comparison of homo- versus heterodimeric protein–protein interfaces. Regular secondary structures (helices and strands) are the main components of the former, whereas non-regular structures (turns, loops, etc.) frequently mediate interactions in the latter. Interface helices get longer with increasing interface area, but only in heterocomplexes. On average, the homodimers have longer helical segments and prominent helix–helix pairs. There is a surprising distinction in the relative orientation of interface helices, with a tendency for aligned packing in homodimers and a clear preference for packing at 90° in heterodimers. Arg and the aromatic residues have a higher preference to occur in all secondary structural elements (SSEs) in the interface. Based on the dominant SSE, the interfaces have been grouped into four classes: {alpha}, ß, {alpha}ß and non-regular. Identity between protein and interface classes is the maximum for {alpha} proteins, but rather mediocre for the other protein classes. The interface classes of the two chains forming a heterodimer are often dissimilar. Eleven binding motifs can capture the prominent architectural features of most of the interfaces.

Contact: pinak{at}boseinst.ernet.in

Supplementary information: A separate file is provided with 3 tables and 2 figures, which are referred to with a prefix ‘S’ in text.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The association and dissociation of protein molecules regulate most biological processes and considerable efforts have gone into understanding protein interactions. X-ray crystallography provides the direct snapshot of the interface formed when two protein molecules associate and the Protein Data Bank (PDB) (Berman et al., 2000) is the repertoire of the wealth of such information.

Association between two (or more) protein chains can be classified as strong and permanent (‘obligate’) or as weak and transient (‘non-obligate’) (Jones and Thornton, 1996). In the former, the protein subunits occur only in the complexed state, as exemplified by protein quaternary structures such as the homodimers (Bahadur et al., 2003). Protein molecules that usually exist independently but form complexes depending on factors such as physiological conditions, chemical modifications, binding of ligands, etc., form non-obligate interactions (Chakrabarti and Janin, 2002; Janin and Chothia, 1990; Lo Conte et al., 1999). These two types of interfaces can differ in physicochemical characteristics, most notably interface area (Bahadur et al., 2004). Biological interfaces have been characterized in terms of the secondary structure elements at their interaction sites (Argos, 1988; Dou et al., 2004; Hoskins et al., 2006; Miller, 1989; Neuvirth et al., 2004). However, there has been no attempt to compare the properties of the secondary structure elements in these two interface categories, and more importantly, if increasing interface size affects these properties. Although various interaction databases and prediction servers exist – 3DID (Stein et al., 2005), PIBASE (Davis and Sali, 2005), InterPreTS (Aloy and Russell, 2003), DOCKGROUND (Douguet et al., 2006), PROTCOM (Kundrotas and Alexov, 2006), SCOPPI (Winter et al., 2006), etc.—these do not usually distinguish intrachain (domain–domain) interactions from interchain interactions, and in the latter category, between obligatory and non-obligatory interactions. This obfuscates the visualization of any pattern involving geometrical and structural aspects of protein–protein interactions.

The basic forces (close packing, hydrophobic effects, shape complementarity between associating parts, electrostatic considerations, etc.) that determine the tertiary structure of proteins appear to be similar to the ones that regulate the processes of protein–protein recognition and binding (Tsai and Nussinov, 1997; Tsai et al., 1996, 1997; Saha et al., 2007). Investigating the structural properties of the recognition sites in the two distinct types of interfaces and their comparison to what is seen in protein tertiary structures should provide insights into the inter-related processes of protein folding and protein binding. This article focuses on the secondary structures of interface residues; the characterization of peptide segments at the interface in terms of secondary structure and their association across the recognition surface. These are in turn organized to form certain recognizable motifs that recur in the interfaces between unrelated proteins.

The universe of protein folds has been divided into classes. It has been suggested that the total number of interaction types (proteins sharing similar sequences tend to interact similarly) is limited. According to estimates, most interactions in nature will conform to one of about 10 000 types (Aloy and Russell, 2004), like the 1000 protein folds suggested by Chothia (1992). Although there are databases dealing with protein interfaces, there have been no attempts to classify them along the terms used for fold classification, something that we have attempted here. An offshoot would be to study the correlation between protein class and interface class, and also if the binding sites of the two partner molecules have identical interface classes.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Datasets used and initial calculations
This study uses two sets of non-redundant protein–protein interfaces—the first being a group of 122 homodimers (Bahadur et al., 2003) and the second of 204 protein–protein heterocomplexes (Pal et al., 2007). Atomic coordinates were obtained from the Protein Data Bank (PDB) (Berman et al., 2000). Identification of interface residues was carried out using the program ProFace (Saha et al., 2006). DSSP (Kabsch and Sander, 1983) was used for secondary structure assignments. The secondary structure types considered were: {alpha}- and 310-helix, ß-strand, turn (involving or not involving hydrogen bond) and the unclassified residues (assigned ‘ ’ by the program). Turn and unclassified residues were together assumed to constitute the non-regular (NR) regions.

2.2 Calculation of propensities
The propensity (Pi)SSE of a residue i to occur in a given secondary structural element (SSE) is calculated as follows:


Formula

where ni,sse,int and Nsse,int are the counts of residue i and of all residues belonging to a particular secondary structure type in the interface, respectively; ni,sse,total and Nsse,total are the corresponding counts in the entire tertiary structure. Usually, the normalization is done such that the two factors in the denominator are for the whole database; by restricting these to a given SSE in the present definition, the preference of a residue for that SSE in the interface is compared to that for the same SSE in the overall structure. Thus a value greater (or less) than 1.0 indicates that the residue is observed more (or less) in that SSE when in the interface than in the rest of the structure.

2.3 Definition of secondary structural segments (SSSs)
Interface residues along the polypeptide chain were organized into secondary structural segments (SSSs) on the basis of secondary structure. Each segment consists of interface residues that are close in the primary sequence and located on the same secondary structural element—helix (a contiguous stretch of {alpha}- and 310-helices was assumed to constitute a single helix), strand and non-regular region (an element of which would encompass a continuous stretch devoid of any helix or ß-strand residues). A segment could be an entire SSE or a part of it, being bounded by the two extreme interface residues on that element; there could be intervening non-interface residues in a segment. Each interface was thus divided into a series of helical, strand and non-regular segments, labeled H, S and NR, respectively; each numbered sequentially from the N-terminal onwards. The labels of the SSSs in the second chain had a ‘'’ symbol suffixed—thus the SSSs from the two subunits could be distinguished (H2 and H2', for example).

2.4 Identification of SSS pairs and the calculation of surface area buried between them
We identified all interface atom–atom pairs that were within 4.5 Å (Saha et al., 2005) [as calculations using atom counts, rather than residues, provide more accurate results (Saha and Chakrabarti, 2006)]. An atom may have multiple interface contacts (within the threshold value) and the shortest one was selected. Tracing back to the secondary structures of the involved residues allowed us to assemble statistics on the number of contacts between SSS pairs. We also estimated the buried area between each SSS pair. When interface atom ‘A’ from chain 1 (belonging to SSS ‘X1’, for example) has atom ‘B’ from the chain 2 (a part of the SSS ‘X2'’, say) as its shortest contact, the buried area of atom ‘A’ was taken as contributing to the area buried between the SSS pair (‘X1-X2'’). This operation was performed sequentially for all the interface atoms (in both the chains). The surface areas buried between the different SSS pairs added up to the total interface area (or very close to it).

2.5 Calculation of packing angle between SSS pairs
The following algorithm computes the angles between two helices or two strands that are packed across the interface: the program takes as input the entire length of the two secondary structural elements containing the two SSSs. If, however, an SSE is kinked, for example, when a helix is a composite of {alpha}- and 310-helices (Pal et al., 2005) there is usually an asymmetry in the area buried on the two sides of the kink and the side having the maximum number of interface contacts was considered. A model helix/strand having its axis along the z-axis was superposed onto the input structures. The transformed z-axis provided the axes of each of the two SSEs, which were then used to calculate the angle.

2.6 Classifying interfaces according to secondary structural features
All the interfaces were distributed into four classes [{alpha},ß, mixed {alpha}ß and non-regular (NR)] according to the overall secondary structure composition of the interface residues. The following criteria were used: {alpha} interfaces must contain at least 40% interface residues in helix and <10% in strand; likewise, ß interfaces must contain at least 40% interface residues in strand with <10% participating in helices; mixed interfaces must possess at least 40% interface residues in helices and strands, with at least 10% in one of the groups; lastly, NR interfaces must have >60% residues with backbone conformations corresponding to turn, loop or other unstructured regions. This methodology ensured that we can cover all the interfaces and adequately represent what we visualize using a molecular graphics program.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
3.1 Secondary structure composition of interfaces
Forty percent residues in homodimeric interfaces are helical, significantly higher than the 26% in heterocomplexes (Table 1). In homodimers the contribution of ß-strands is low compared to helical residues (19% versus 40%), whereas they contribute comparably in complexes (24% versus 26%). Non-regular structures (including coils, turns and loops) appear in large numbers in both, but form the single largest group in heterocomplexes. Grouping helical and strand residues as ‘regular’ and the remainder as ‘non-regular’ structures, we find a statistically significant preference for the former in the homodimeric interfaces. We also decided to study the influence of interface size on the relative content of regular and non-regular structures (Fig. 1A). For the homodimers, the composition does not show variation with size. For the heteromers, however, regular secondary structures become more abundant as the proteins form larger interfaces (‘regular’ increases from ~40 to ~64% as the interface size increases 10-fold).


View this table:
[in this window]
[in a new window]

 
Table 1. Statistics on the distribution of regular and non-regular structures in interfaces

 

Figure 1
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Plots of (A) the fraction of interface residues occurring in regular (helix and strand) and non-regular (the rest) structures, and (B) the average lengths of the three different types of SSSs (helix, strand and non-regular) as a function of the interface area (considering the contribution of both the subunits). Interfaces are grouped according to their size into bins of 2000 Å2 (homodimers) and 1000 Å2 (complexes); the average values for each bin are then calculated.

 
3.2 Secondary structure preferences of interface residues
Plots of the propensities of occurrence of all 20 amino acids in three secondary structure elements (SSEs) when located in the interface compared to the total protein structure are given in Figure 2. Arg and the aromatic residues are consistently observed more in all interface SSEs. Interface strands appear more hydrophilic than a buried strand in the protein interior with significant enrichment of Arg (and also Asp in heterocomplexes). Increased hydrophobicity of non-regular (NR) regions in the interface (to facilitate burial) is achieved by a higher percentage of aromatics, Met (and Cys in complexes); Met is a preferred residue in all the SSEs in homodimers. Ala, which has a high helical propensity in proteins, is used less in interface helices. Likewise, Val, Leu and Ile, which have high ß-sheet propensities, are less prominent in the interface.


Figure 2
View larger version (44K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Propensities of residues to occur in a particular secondary structure type (‘Helix’, ‘Strand’ and ‘NR’) in the interface. The residues are arranged according to the environment-based classification of amino acid residues (Guharoy and Chakrabarti, 2005).

 
3.3 Pairing of interface secondary structures
Statistics on pairing of the SSEs (Fig. 3) show differences between the two datasets. Homodimer interfaces are mainly composed of helix–helix, helix–NR and NR–NR pairings.


Figure 3
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Interface secondary structure pairing matrix. The values for homodimers are followed by those for heterocomplexes in brackets.

 
Heterocomplex interfaces have reduced helix–helix packing, and instead, pairings involving NR regions are prominent. Helix–strand and strand–strand combinations are under-represented in both the categories. Suppression of helix–strand pairing can be attributed to their poor steric complementarity (Jiang et al., 2003). Although fewer in number, strand–strand pairs dominate the interface architecture in individual cases (discussed later).

3.4 Dissection of the recognition surface into secondary structural segments
The binding surface of each protein chain was divided into secondary structural segments (SSSs), demonstrated using a specific example in Figure 4. Each SSS comprises of a series of interface residues that are close in the primary sequence and also occur within the same SSE—helix, strand or non-regular (details in Methods section). Two interface SSSs can be contiguous (segments 5 and 6, for example) or separated by a non-interface region (segments 1 and 2). An SSS can also consist of just one residue (segment 8).


Figure 4
View larger version (56K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. Secondary structural segments (SSSs) (helix in orange, strand in blue and NR in red, with the rest of the structure in green) defining the interface of subunit A of the homodimeric structure with the PDB code, 1A3C. The serial number of the SSS, its identifying label and the residue range are provided.

 
The SSSs are then characterized in terms of their numbers and lengths (Table 2), which are useful parameters for assessing their relative importance for protein–protein association. The SSSs are more numerous in homodimer interfaces than in heterocomplexes. However, as the former is on average twice the size of the latter (Bahadur et al., 2004) upon normalization we find 8.7 SSSs per 1000 Å2 of interface area for the homodimers versus 11 for heterocomplexes. Helices are significant contributors to homodimeric interfaces, with an average interface possessing nine helices, each having a length of 7.2 residues—significantly longer than the average lengths of both strands and unstructured segments (3.0 and 3.3). In contrast, none of the structural segment types are conspicuous by their lengths in heterocomplexes. Interestingly however, average lengths of helical interface segments increases significantly as bigger interfaces are formed by the heterodimers (Fig. 1B), a variation not observed in homodimers.


View this table:
[in this window]
[in a new window]

 
Table 2. Statistics on secondary structural segments in interfaces

 
3.5 Association between SSSs and their relative contribution to interface formation
The SSSs described above pack across the interface to form SSS pairs and the extent of their interaction can be quantified in terms of the accessible surface area (ASA) buried. Higher this value, greater is the contribution of the SSS pair to the interface and concomitantly, more important its role in interface formation and stabilization. We chose a cutoff of 5% of the total interface area to identify the major SSS pairs. The contributions of regular and non-regular SSS pairs to the two different types of interfaces were enumerated (Table 3). In homodimers, the regular SSS pairs contribute almost half of the interface on average, whereas for the heterocomplexes, they contribute significantly less (about one-third) compared to pairs involving non-regular segments. This is true both in terms of numbers and the area buried.


View this table:
[in this window]
[in a new window]

 
Table 3. Statistics on the contribution of pairs of regular and non-regular SSSs to the interface

 
The contribution made by the SSS pairs to the interface area is plotted in Fig. S1. Most (85–90%) helix–helix and strand–strand pairs bury <20% interface area, though there are instances (mostly in homodimers) where a single HH or SS pair contributes more (Table S1A). Helix–strand pairs burying >10% interface area are common in complexes, but extremely rare in homodimers. Non-regular SSS pairs contributing >20% area is almost non-existent in homodimers, but occur often in heterocomplexes (Table S1B).

3.6 Packing geometry of the SSS pairs
The packing angles between helix–helix and strand–strand pairs were calculated. Due to geometrical differences between the two types of helices ({alpha} and 310), we separated {alpha}-helix pairs from those involving at least one 310-helix. Furthermore, we selected only those {alpha}-helical pairs where each helix had at least eight residues. This method retains the pairs that bury moderate to large extents of the total interface and usually, only these are important for binding. Almost all the 310 helices were three residues long and only a few extended upto four residues.

The packing angle distributions have reasonably clear distinctions (Fig. 5). {alpha}{alpha} pairs in homodimers have a preference for parallel or antiparallel orientations (angle <40° or >140°, respectively). For heterocomplexes, there is a reversal of the above trend, with the peak occurring at ~90°. Pairs involving 310-helices show a large preference to pack around 90° in both datasets. The preferential angle for packing between two interface strands (Fig. 5C) indicates antiparallel orientation, which is almost exclusively observed in homodimers.


Figure 5
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. The distribution of angles in (°) between (A,B) helix axes and (C) interacting strands. In (A) only {alpha}-helices that are at least 8 residues long are considered, in (B) at least one of the helices is of the type 310-, the other could be {alpha}- or 310-.

 
We also investigated the relative usage of {alpha} and 310 helix–helix pairs across the interface. The homodimer and heterocomplex datasets contain 174 and 170 pairs, out of which those having one or both 310 helices are 37 (21.3%) and 55 (32.4%), respectively. This indicates a possibly greater role of 310-helix pairs in heteromeric interfaces.

3.7 Classifying interfaces based on the prevalence of secondary structural elements
Analogous to protein structural class assignments, we grouped interfaces based on the proportion of interface residues belonging to helix, ß-strand or non-regular (NR) regions. Four classes are identified: {alpha}, ß, mixed ({alpha}ß) and NR. While {alpha}/ß and {alpha} + ß are distinct protein classes, we have used just one {alpha}ß class for the interfaces. Figure 6 shows the distribution of the interfaces among the different classes.


Figure 6
View larger version (27K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6. Pie-charts showing the distribution of four classes of interfaces: {alpha}, ß, {alpha}ß and NR.

 
Both {alpha} and {alpha}ß interfaces are more abundant in homodimers (34 and 47%, respectively) when compared to heterocomplexes (22 and 31%, respectively). ß interfaces are almost equally abundant in both the datasets. However, more protein–protein complex interfaces belong to the NR-type (32%) compared to the homodimers of which only a mere 8% are of this type.

A pertinent question here is whether the interface structural class is dependent upon the protein tertiary structural class. The degree of correspondence between the protein and interface classes is shown in Table 4. The identity is maximum (91%) for {alpha} proteins in heterocomplexes, and somewhat lesser (79%) for homodimers. The match is mediocre (55–59%) for the mixed classes. For ß it is comparable (67%) only for homodimers, but poor (26%) for heterocomplexes (47% of which use non-regular regions for complexation). Another interesting question that can be addressed relates to the equivalence of the interface classes of the two interacting chains in heterocomplexes. Results shown in Table S2A indicate that when the binding region of one protein chain is of the class {alpha}, ß, {alpha}ß or NR, the interface class of the partner would be identical in only 29, 17, 22 and 26% cases, respectively. When the equivalence of the interface classes from enzyme-inhibitor and antigen–antibody complexes was analysed separately (Tables S2B and C), mostly NR regions from both the partners were found in the interface.


View this table:
[in this window]
[in a new window]

 
Table 4. The match between SCOP (Andreeva et al., 2004) class of individual chains and the corresponding interface class

 
3.8 Conservation of residues in different interface classes
Interfaces can be dissected into core and rim regions (Chakrabarti and Janin, 2002) and the residues belonging to the core are usually more conserved than those in the rim, as indicated by the mean sequence entropy values obtained from an alignment of homologous proteins Formula (Guharoy and Chakrabarti, 2005). We compared these values between interface classes (Table 5). The interfaces belonging to {alpha} class are more conserved than the rest, based on both the average values (Formula and Formula ), as well as their ratio. The overall trend of the mean sequence entropy of the core being less than that of the rim is maintained even when the interfaces are split into classes, except for the ß class where the rim region seems to be as conserved as the core making the ratio close to 1. Even from the distribution of sequence entropies of individual interfaces (Fig. S2) it can be seen that both the core and rim regions in {alpha} class have lower values (60 and 72% of the cases are <0.80 in homodimers and complexes, respectively) compared to NR (90 and 54% > 0.80) and mixed interfaces (60 and 67% >0.80).


View this table:
[in this window]
[in a new window]

 
Table 5. Average sequence entropy values in the core and rim regions of different interface classes

 
3.9 Interface architectures
The SSS pairs combine to form recurring super-structures. The packing of major SSSs in individual classes was inspected visually to identify interface motifs. Interfaces that are classified as helical contain at least a pair of interacting helices, but very often contain two (and sometimes more) pairs of helices. Depending on the number and interaction patterns of the helices, and analogous to what is observed in tertiary structures, we identified four distinct motifs: single helix–helix pair (Figs 7A and B), 4-helix bundle, {alpha}-sandwich and coiled-coil (Table S3). Six types of sub-geometries are observed in four-helix bundles occurring in protein interiors and interfaces (Harris et al., 1994; Lin et al., 1995). In homodimers and heterocomplexes, the numbers observed in the various types of bundles are: square (13 and 3), splinter (7 and 2), X (17 and 4) (Fig. 7C), unicornate (18 and 5), bicornate (18 and 10) and splayed (10 and 11). While unicornate and bicornate are the favoured arrangements in homodimers, the preference is for splayed geometry [opposite to what is statistically expected (Lin et al., 1995)] in heterocomplexes, again showing the subtle differences in the two interface categories. When more than two pairs of helices occur side-by-side in aligned orientations the motif is termed {alpha}-sandwich (Fig. 7D). The intertwined helices in coiled-coil motifs are typically long and often these alone make up the entire protein chain (Fig. 7E).


Figure 7
View larger version (43K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 7. Examples of interface motifs and different modes of packing of the SSSs. Single helix–helix pair with (A) antiparallel orientation in 2ARC (PDB code), and (B) parallel orientation in 1AF5. (C) 4-Helix bundle in 1BAM; (D) {alpha}-sandwich in 1CSH; (E) coiled-coil in 2LIG; (F) continuous ß-sheet in 1KBA; (G) ß-sandwich in 1B5E; (H) mixed ß in 1CDC; (I) helix-sheet in 1CXZ; (J) helix-NR in 1LK3; (K) strand-NR in 1EWY and (L) NR–NR in 2TEC. An example of an interface with two distinct motifs (continuous ß-sheet and 4-helix bundle) is shown in (M) for 1A4I. Different levels of shading are used to distinguish the two interacting subunits, with the motif of interest shown in red. Structures shown in (A,B,C,D,E,F,G,H and M) are homodimers and the rest are heterocomplexes.

 
The next three architectural motifs involve ß structures and are observed in ß interfaces and also to some extent in {alpha}ß interfaces. The most common is the continuous ß-sheet, formed by interface strands coming side-on from both subunits. The total number of strands in the complete ß-sheet varies from 2 (one strand from each chain) to a maximum of 16 with an average of 7 and 6.2 in the two categories; an example with 6 strands is provided in Fig. 7F. The two interface strands are usually hydrogen bonded in an antiparallel fashion, with only two exceptions in homodimers and ten in heterocomplexes. The second ß-motif has face-to-face packing of ß-sheets and is termed the ß-sandwich (Fig. 7G). In some homodimers this motif constitutes the entire interface. When the above two motifs exist simultaneously (two continuous ß-sheets packing against each other to form a ß-sandwich) we have the mixed ß-motif (Fig. 7H). The helix-sheet motif (Fig. 7I) is more prevalent in heterocomplexes. Mostly one helix is involved from one side and the other side may have just one strand, an entire sheet belonging to a single chain or a continuous ß-sheet motif (discussed above). The remaining three motifs (helix/strand/NR–NR) are more numerous in heterocomplexes (Fig. 7J–L), often comprising the entire interface. All the important contacts are provided by these motifs, whereas in homodimers these play a subordinate role to the more dominant regular motifs.


    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
4.1 Relative contribution of secondary structural elements
Different types of protein–protein interfaces may exhibit differences in physicochemical features (Bahadur et al., 2004; Ofran and Rost, 2003; Saha et al., 2005), which can also be seen in the two datasets used here. Data presented in Table 1 indicate that homodimers (with obligate interfaces) have more helices at the interface, with a percentage composition that essentially reproduces what has been reported (Tsai et al., 1997); in contrast, non-obligatory interfaces (in heterocomplexes) have a higher participation of non-regular regions, as noted recently (Ansari and Helms, 2005; De et al., 2005). Of greater interest however, is the fact that in heterocomplexes the involvement of regular secondary structures tends to increase with interface size (Fig. 1A). This is due to the presence of longer helical segments (Fig. 1B).

In homodimers, the contribution of regular SSS pairs are almost the same as that of the non-regular SSS pairs (Table 3). They have prominent pairs of regular SSSs with non-regular pairs stabilizing them. Heteromeric interfaces switch between exposed and buried states and must closely mimic the properties of a generic protein surface patch (otherwise the monomeric form will be unstable in solution), and are thus enriched in non-regular SSS pairs. Also their smaller size affects the choice of SSS pairs and their relative orientation (for helices in particular, Fig. 5A). Often, interactions involving NR segments are the only SSS pair types that can be detected in these complexes (Figs 7J–L).

4.2 Propensities of residues to be in an SSE in the interface as opposed to that within the tertiary structure
Figure 2 indicates that aromatic residues and Arg are enriched in interfaces. There are subtle differences between the two datasets. For example, Met is found more in interface helices and strands in homodimers than in heterocomplexes. Asp is found more in the interface strands in the latter—this result is in variance to a recent study (Hoskins et al., 2006) that found Asp to be underrepresented in interface strands. The presence of charged side-chains, such as Asp and Arg, in what would ordinarily be the hydrophobic side of the edge ß strand, may be a feature of negative design to avoid undesirable edge-to-edge aggregation (Richardson and Richardson, 2002). It is interesting to note that the three most common hot-spot (contributing more than 2 kcal/mol to the binding interaction) residues, Trp, Arg and Tyr (Bogan and Thorn, 1998), are also found more in interface SSEs. Asp is enriched in hot spots and also occurs with high propensity in interface strands—thus it may be worthwhile to see if Asp residues providing a large fraction of the binding free energy are actually located in strands.

4.3 Protein class versus interface class and functional implications
The first three interface classes ({alpha}, ß, {alpha}ß) are self-explanatory and are analogous to the ones found in protein domain classification databases [SCOP (Andreeva et al., 2004), CATH (Pearl et al., 2003)]. However, the inclusion of the NR-type interface has important connotations. Unlike protein 3D structures where unstructured regions are mainly responsible for linking the regular secondary structures, there are many complexes where the interface consists of pairs of interacting non-regular structural elements. For example, enzyme–inhibitor complexes favour using NR interface from both (23%) or at least one (48%) of the two partners, while 30% antibody–antigen complexes are of the NR–NR type and a further 39% use an NR interface from only one of the two participating protein components (Table S2B and C). On the contrary, a larger fraction (57%) of signalling complexes do not involve any NR interface on either side and there is no instance of an NR–NR combination (Table S2D). Thus the functionality of a molecule may have some influence on the interface class.

Keskin et al. (2004) divided interfaces into three broad types depending on the degree of similarity of the interfaces vis-à-vis their parent chains. Here, we ask whether interface class is likely to be the same as protein structural class? {alpha} classes of proteins are most likely to use helices in the binding region. Otherwise, the correlation is not very strong (Table 4); indeed, one striking mismatch can be seen in Fig. 7A, where a mainly ß protein (2ARC [PDB] —classified by SCOP as a ß-protein containing a double-stranded beta-helix fold) has {alpha} interface class. The oligomerization interface contains three helices from each of the two subunits. An interesting difference between the two datasets is that a large number of heteromers (except the ones having {alpha}-protein class) form ‘NR’ interfaces. Antibody molecules are very good examples of ß-class proteins forming NR interfaces while binding. Fifty-four percent of the ‘Others’ class proteins and nearly 20% mixed-class protein complexes use ‘NR’ interfaces for specific binding.

4.4 Structural motifs in protein–protein interactions
Interface class usually guides the nature of the binding motif. {alpha} interfaces primarily contain helical motifs, with additional stabilization from helix–NR or NR–NR; however, it is highly unlikely that the motifs would involve strands. The opposite is true for ß interfaces. In the mixed interfaces, the motifs may contain helices or strands or both simultaneously. Lastly, the principal motifs in NR interfaces contain non-regular regions interacting with each other or with short segments of helices/strands from the other chain. In total, eleven binding motifs have been enumerated (Fig. 7 and Table S3). They are fairly broad, and one may use structural details for sub-classification. Interface motifs have been previously discussed in different contexts (Dou et al., 2004; Jones and Thornton, 1996; Keskin and Nussinov, 2005; Tsai et al., 1997), and some of these architectures are quite similar to those in the protein cores (Miller, 1989; Tsai et al., 1997). The motifs are not mutually exclusive and some interfaces may harbour more than one motif, as shown in Fig. 7M, which has a 4-helix bundle, as well as a continuous ß-sheet. Functionally different proteins employing similar motifs for interface construction probably represent examples of convergent evolution, reinforcing the hypothesis that the existence of a limited number of folds in nature may be extended to the realm of protein–protein interactions as well.


    5 CONCLUSIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The secondary structural elements have different importance in mediating interactions in the two types of interfaces formed by homodimers and heterocomplexes—helices are common in the former and the non-regular structures in the latter (Table 1). However, the complexes tend to switch the SSE preferences from non-regular to regular as larger interfaces are formed (Fig. 1). Helical segments in the two interface types show the largest distinction both in terms of average number per interface and average length (Table 2), contributing more towards homodimeric interfaces, in which helix–helix and helix–NR pairings are more prevalent, while NR–NR/H/S are observed more frequently in complexes (Fig. 3). The non-regular SSS pairs occupy three-quarters of an average hetero-interface (Table 3). The orientation of helix–helix pairs across the interface is surprisingly distinct, the homodimers showing a tendency for parallel or antiparallel packing, which is more near right angles in heterocomplexes (Fig. 5). However, the packed strand–strand pairs have similar features (the angle, >140°) in both the datasets. Classification of interfaces into four structural classes analogous to fold classification yields interesting results as well. The frequent use of helices in the construction of homodimer interfaces translates into a higher percentage of {alpha} and mixed ({alpha}ß) interfaces (Fig. 6). The primary use of non-regular regions in the hetero-interfaces manifests itself as higher proportion of NR interfaces compared to homodimers. It turns out that the structural classes of the interface and of the participating proteins do not have to be the same (Table 4). Residues in {alpha} class of interface show the highest degree of conservation (Table 5). The identification of recurring binding motifs (Fig. 7) indicates how simple patterns are used by nature to build large recognition surfaces. Lastly, aromatic residues and Arg have higher occurrences in the SSEs in interfaces relative to those within tertiary structures (Fig. 2).


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The work has been supported by a grant from the Department of Biotechnology, India. M.G. is a recipient of a fellowship from the CSIR, India.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on February 13, 2007; revised on May 8, 2007; accepted on May 14, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Aloy P, Russell RB. InterPreTS: protein interaction prediction through tertiary structure. Bioinformatics (2003) 19:161–162.[Abstract/Free Full Text]

    Aloy P, Russell RB. Ten thousand interactions for the molecular biologist. Nat. Biotechnol. (2004) 22:1317–1321.[CrossRef][Web of Science][Medline]

    Andreeva A, et al. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res. (2004) 32:D226–D229.[Abstract/Free Full Text]

    Ansari S, Helms V. Statistical analysis of predominantly transient protein-protein interfaces. Proteins (2005) 61:344–355.[CrossRef][Web of Science][Medline]

    Argos P. An investigation of protein subunit and domain interfaces. Protein Eng. (1988) 2:101–113.[Abstract/Free Full Text]

    Bahadur RP, et al. Dissecting subunit interfaces in homodimeric proteins. Proteins (2003) 53:708–719.[CrossRef][Web of Science][Medline]

    Bahadur RP, et al. A dissection of specific and non-specific protein-protein interfaces. J. Mol. Biol. (2004) 336:943–955.[CrossRef][Web of Science][Medline]

    Berman HM, et al. The protein data bank. Nucleic Acids Res. (2000) 28:235–242.[Abstract/Free Full Text]

    Bogan AA, Thorn KS. Anatomy of hot spots in protein interfaces. J. Mol. Biol. (1998) 280:1–9.[CrossRef][Web of Science][Medline]

    Chakrabarti P, Janin J. Dissecting protein-protein recognition sites. Proteins (2002) 47:334–343.[CrossRef][Web of Science][Medline]

    Chothia C. One thousand families for the molecular biologist. Nature (1992) 357:543–544.[CrossRef][Medline]

    Davis F, Sali A. PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics (2005) 21:1901–1907.[Abstract/Free Full Text]

    De S, et al. Interaction preferences across protein-protein interfaces of obligatory and non-obligatory components are different. BMC Struct. Biol. (2005) 5:15.[CrossRef][Medline]

    Dou Y, et al. ICBS: a database of interactions between protein chains mediated by ß-sheet formation. Bioinformatics (2004) 20:2767–2777.[Abstract/Free Full Text]

    Douguet D, et al. DOCKGROUND resource for studying protein-protein interfaces. Bioinformatics (2006) 22:2612–2618.[Abstract/Free Full Text]

    Guharoy M, Chakrabarti P. Conservation and relative importance of residues across protein-protein interfaces. Proc. Natl Acad. Sci. USA (2005) 102:15447–15452.[Abstract/Free Full Text]

    Harris NL, et al. Four-helix bundle diversity in globular proteins. J. Mol. Biol. (1994) 236:1356–1368.[CrossRef][Web of Science][Medline]

    Hoskins J, et al. An algorithm for predicting protein-protein interaction sites: abnormally exposed amino acid residues and secondary structure elements. Protein Sci. (2006) 15:1017–1029.[CrossRef][Web of Science][Medline]

    Janin J, Chothia C. The structure of protein-protein recognition sites. J. Biol. Chem. (1990) 265:16027–16030.[Free Full Text]

    Jiang S, et al. The role of geometric complementarity in secondary structure packing: a systematic docking study. Protein Sci. (2003) 12:1646–1651.[CrossRef][Web of Science][Medline]

    Jones S, Thornton JM. Principles of protein-protein interactions. Proc. Natl Acad. Sci. USA (1996) 93:13–20.[Abstract/Free Full Text]

    Kabsch W, Sander C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers (1983) 22:2577–2637.[CrossRef][Web of Science][Medline]

    Keskin O, et al. A new, structurally nonredundant, diverse data set of protein-protein interfaces and its implications. Protein Sci. (2004) 13:1043–1055.[CrossRef][Web of Science][Medline]

    Keskin O, Nussinov R. Favorable scaffolds: proteins with different sequence, structure and function may associate in similar ways. Protein Eng. Design Selection (2005) 18:11–24.[Abstract/Free Full Text]

    Kundrotas PJ, Alexov E. PROTCOM: searchable database of protein complexes enhanced with domain-domain structures. Nucleic Acids Res. (2006) 35:D575–D579.[Web of Science][Medline]

    Lin SL, et al. A study of four-helix bundles: investigating protein folding via similar architectural motifs in protein cores and in subunit interfaces. J. Mol. Biol. (1995) 248:151–161.[CrossRef][Web of Science][Medline]

    Lo Conte L, et al. The atomic structure of protein-protein recognition sites. J. Mol. Biol. (1999) 285:2177–2198.[CrossRef][Web of Science][Medline]

    Miller S. The structure of interfaces between subunits of dimeric and tetrameric proteins. Protein Eng. (1989) 3:77–83.[Abstract/Free Full Text]

    Neuvirth H, et al. ProMate: a structure based prediction program to identify the location of protein-protein binding sites. J. Mol. Biol. (2004) 338:181–199.[CrossRef][Web of Science][Medline]

    Ofran Y, Rost B. Analysing six types of protein-protein interfaces. J. Mol. Biol. (2003) 325:377–387.[CrossRef][Web of Science][Medline]

    Pal L, et al. 310-helix adjoining {alpha}-helix and ß-strand: sequence and structural features and their conservation. Biopolymers (2005) 78:147–162.[CrossRef][Web of Science][Medline]

    Pal A, et al. Peptide segments in protein-protein interfaces. J. Biosci. (2007) 32:101–111.[CrossRef][Web of Science][Medline]

    Pearl FM, et al. The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res. (2003) 31:452–455.[Abstract/Free Full Text]

    Richardson JS, Richardson DC. Natural ß-sheet proteins use negative design to avoid edge-to-edge aggregation. Proc. Natl Acad. Sci. USA (2002) 99:2754–2759.[Abstract/Free Full Text]

    Saha RP, Chakrabarti P. Parity in the number of atoms in residue composition in proteins and contact preferences. Curr. Sci. (2006) 90:558–561.[Web of Science]

    Saha RP, et al. Interresidue contacts in proteins and protein-protein interfaces and their use in characterizing the homodimeric interface. J. Proteome Res. (2005) 4:1600–1609.[CrossRef][Web of Science][Medline]

    Saha RP, et al. ProFace: a server for the analysis of the physicochemical features of protein-protein interfaces. BMC Struct. Biol. (2006) 6:11.[CrossRef][Medline]

    Saha RP, et al. Interaction geometry involving planar groups in protein-protein interfaces. Proteins (2007) 67:84–97.[CrossRef][Web of Science][Medline]

    Stein A, et al. 3DID: interacting protein domains of known three-dimensional structure. Nucleic Acids Res. (2005) 33:D413–D417.[Abstract/Free Full Text]

    Tsai C-J, Nussinov R. Hydrophobic folding units at protein-protein interfaces: implications to protein folding and to protein-protein association. Protein Sci. (1997) 6:1426–1437.[Web of Science][Medline]

    Tsai C-J, et al. Protein-protein interfaces: architectures and interactions in protein-protein interfaces and in protein cores. Their similarities and differences. Crit. Rev. Biochem. Mol. Biol. (1996) 31:127–152.[Medline]

    Tsai C-J, et al. Structural motifs at protein-protein interfaces: protein cores versus two-state and three-state model complexes. Protein Sci. (1997) 6:1793–1805.[Web of Science][Medline]

    Winter C, et al. SCOPPI: a structural classification of protein-protein interfaces. Nucleic Acids Res. (2006) 34:D310–D314.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
23/15/1909    most recent
btm274v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Guharoy, M.
Right arrow Articles by Chakrabarti, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Guharoy, M.
Right arrow Articles by Chakrabarti, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?