Skip Navigation


Bioinformatics Advance Access originally published online on January 19, 2008
Bioinformatics 2008 24(5):652-658; doi:10.1093/bioinformatics/btn022
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/5/652    most recent
btn022v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bernauer, J.
Right arrow Articles by Poupon, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bernauer, J.
Right arrow Articles by Poupon, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

DiMoVo: a Voronoi tessellation-based method for discriminating crystallographic and biological protein–protein interactions

Julie Bernauer 1,2, Ranjit Prasad Bahadur 2, Francis Rodier 3, Joël Janin 2 and Anne Poupon 2,*

1Department of Structural Biology, Fairchild Science Building, Stanford University School of Medicine, Stanford, CA 94305-5126, USA, 2Yeast Structural Genomics, IBBMC UMR 8619, bâtiment 430, Université Paris-Sud, 91405 Orsay and 3LEBS UPR 9063, CNRS, 91191 Gif-sur-Yvette, France

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSION
 ACKNOWLEDGEMENT
 REFERENCES
 

Motivation: Knowledge of the oligomeric state of a protein is often essential for understanding its function and mechanism. Within a protein crystal, each protein monomer is in contact with many others, forming many small interfaces and a few larger ones that are biologically significant if the protein is a homodimer in solution, but not if the protein is monomeric. Telling such ‘crystal dimers’ from real ones remains a difficult task.

Results: It has already been demonstrated that the interfaces of native and non-native protein–protein complexes can be distinguished using a combination of parameters computed with a method on the Voronoi tessellation. We show in this article that the same parameters highlight significant differences between the interfaces of biological and crystal dimers. Using these parameters as descriptors in machine learning methods leads to accurate classification of specific and non-specific protein–protein interfaces.

Availability: Software is available at http://fifi.ibbmc.u-psud.fr/DiMoVo

Contact: anne{at}rezo.net


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSION
 ACKNOWLEDGEMENT
 REFERENCES
 
Proteins fulfill almost all the ‘active’ roles in life: enzymes, antibodies, receptors, etc. For most proteins, biological and biochemical function can be accomplished only through the association of two or more macromolecules. Among the different assemblies that two or more polypeptide chains (monomers) can form, homo-oligomers contain multiple copies of the same monomers, whereas hetero-oligomers and complexes associate different proteins.

The structure of proteins and protein assemblies is experimentally accessible through the interpretation of images obtained by the diffraction of X-rays by a protein crystal. In a crystal, each monomer makes contacts with many others, identical or not. These contacts are non-specific and do not exist in vivo, but they are of the same physico-chemical nature as the specific contacts that stabilize complexes and oligomeric proteins. It has been shown that most of the pairwise interfaces created by the crystal packing are smaller than those of complexes and oligomeric proteins (Bahadur et al., 2003, 2004; Carugo and Argos, 1997; Dasgupta et al., 1997; Janin, 1997). However, some are comparable in size to specific interfaces, and a pair of protein molecules that form a large packing interface in the crystal may be mistaken for a biological homodimer if it has 2-fold symmetry. We call such pairs ‘crystal dimers’.

Some dimeric proteins (either homodimers or heterodimers) are called ‘obligate’ dimers. These are proteins than can exist only associated to each other, and are never found in the monomeric state [see Nooren and Thornton (2003) for a review]. This is the case for the bacteriophage P22 Arc repressor [PDB code: 1ARR (Bonvin et al., 1994)]. The repressor consists in two identical chains that fold together, and both chains participate in the hydrophobic core (Milla et al., 1995). Other dimers are called non-obligates, and can be found as monomers. In these cases, complexation often regulates the activity of the protein. This is the case of sperm lysine (PDB code: 1LYN), which is active as a monomer and inactive as a dimer (Shaw et al., 1995).

Understanding the function of a protein often requires the knowledge of its oligomeric state. Whereas this should be done by experiment in solution, one may attempt to derive the oligomeric state from the crystal structure when it is available. This has proved surprisingly difficult. A recent study uses graph theory to find all the possible assemblies in the crystal, and then computes, for each interface, a quantity related to the free energy change upon dissociation. The PISA server reports the results (Krissinel and Henrick, 2007). Another server, PITA (Ponstingl et al., 2003) scores crystallographic contacts by their contact size and chemical complementarity. NOXClass (Zhu et al., 2006) uses descriptors of the interface, such as its size, its composition, or the chemical complementarity of the contacting residues, as parameters for a support vector machine procedure. Block et al. (2006) used feature selection methods, coupled with four different machine learning methods. Liu et al. (2006) use combinations of different physico-chemical and geometrical parameters.

When applied to test sets of specific interfaces in biological dimers and of non-specific interfaces in crystal dimers, these methods discriminate between the two types with success rates near 92% (Mintseris and Weng, 2003; Ponstingl et al., 2003; Zhu et al., 2006). However, the size of the interface is a predominant parameter in all cases, and the presence of many small interfaces in the non-specific set greatly helps. Indeed, the success rate drops drastically when the non-specific set contains crystal dimers with interfaces similar in size to those in the specific set.

We present here a method based on the Voronoi tessellation and a coarse-grained modelling of the protein structure, and use it to discriminate between the interfaces of biological homodimers and packing interfaces of a similar size formed by monomeric proteins in crystals. This new method, called DiMoVo (DIscrimination between Multimers and MOnomers by VOronoi tessellation), achieves a high accuracy even on crystal dimers that have large interfaces.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSION
 ACKNOWLEDGEMENT
 REFERENCES
 
2.1 Modeling the protein structure
Although the three dimensional structure of proteins is considered unique, atoms on the surface are weakly constrained, and many atomic movements happen when two proteins interact. As they are very difficult to predict, a low-resolution, simplified model, is often more appropriate than a detailed atomic structure when modeling protein–protein interaction. We represent protein structures by spheres representing amino acid residues and build a Voronoi tessellation to calculate a set of parameters that have proven useful in combination with machine learning algorithms (Bernauer et al., 2005, 2007; Poupon, 2004).

Residue-based Voronoi tessellations were built with a previously described procedure (Bernauer et al., 2007) that uses the Computational Geometry Algorithms Library (CGAL) (Boissonnat et al., 1999) and was optimized to take less than one second for each complex.

2.2 Parameters
As described in (Bernauer et al., 2005), the parameters are:

  • the interface area (1 parameter)
  • the number of core interface residues (1 parameter)
  • their Voronoi volumes (20 parameters)
  • the frequency of each residue type at the interface (20 parameters)
  • the frequency of the pairs of residues in contact (210 parameters)
  • the distances between their geometric center (210 parameters)
The latter two categories should comprise 210 parameters each, a number reduced to 21 by grouping residues in six bins: Hydrophobic (VILM), Aromatic (FYW), Positive (HKR), Negative (DE), Polar non-charged (NQ), Small (AGSTCP).

In addition, we tested some of the parameters described in (Bahadur et al., 2004):

  • Residue Propensity score (RP), for a residue of type i, this propensity is computed as the logarithm of the ratio between the area fraction contributed to the protein–protein interface and the area fraction contributed to the protein–solvent interface by residues of type i.
  • Global Density index (GD): the mean number of atoms per unit area in an ellipse fitting the interface.
  • Local Density index (LD): the mean number of neighbors of an interface atom.

2.3 Data sets
2.3.1 Training set
The data sets used for developing the method are those of (Bahadur et al., 2003) for the biological dimers and (Bahadur et al., 2004) for the crystal dimers. The proteins of these sets were selected manually from the Protein Data Bank, and their status of dimer or monomer in solution was checked with the biochemical literature. In addition, we required that the sequence of the crystallized fragment has to be the one used for multimeric studies. Indeed, experimental results showing that the full length protein forms a stable dimer cannot guaranty that a fragment will also form a stable dimer. The monomer set contains 178 crystal dimers selected to have an interface area greater than 800 Å2. The biological dimer set contains 113 biological homodimers.

2.3.2 Test sets
We used two other data sets to compare our method with three other: NOXClass (Zhu et al., 2006), PISA (Krissinel and Henrick, 2005) and PITA (Ponstingl et al., 2003).

  • The Zhu dataset (Zhu et al., 2006). This dataset, compiled from previously published sets (Bradford and Westhead, 2005; Neuvirth et al., 2004), contains ‘obligate’, ‘non-obligate’ interactions and crystal contacts. For this study, ‘obligates’ were equated with homodimers and ‘non-obligates’ were not considered since these were only heterodimers. This dataset has been used to train the NOXClass method.
  • The Ponstingl dataset (Ponstingl et al., 2003). This dataset contains crystallographic dimers, biological dimers, and biological multimers of larger order. For this study, only the ‘monomers’ and ‘dimers’ categories were retained. This dataset has been used to train PISA (Krissinel and Henrick, 2007) and PITA (Ponstingl et al., 2003).
It should also be noted that a few examples from these two datasets were removed for different reasons. We removed items in these datasets when: (i) the two polypeptide chains were not identical, (ii) a co-factor or metallic ion was present at the interface, (iii) absence of a symmetry matrix in file containing only one peptide chain, (iv) one of the four tested methods fails to evaluate the example, (v) the polypeptide chain was a fragment of the protein for which the quaternary structure has been experimentally established or (vi) the quaternary structure had been derived from homologous proteins and not experimentally established.

2.4 Learning
The values of the 84 Voronoi-derived parameters and the indexes RP, GD and LD measured on the Bahadur datasets of monomers and homodimers were used as input for a Support Vector Machine (SVM) procedure (Cristiani and Shawe-Taylor, 2000; Schölhopf, 1997) using a Radial Basis Function (RBF) kernel. As in Bernauer (2006), missing data were replaced by the median value of the corresponding parameter on the whole dataset.

To ensure reliable statistics, and avoid over-fitting, learning was done in leave-one-out with a 5-fold cross-validation procedure.

The R libSVM package (Chang and Lin, 2001) was used to perform support vector machines experiments. The optimization of C and {gamma} was done by a grid search (Hsu et al., 2003) using the ‘tune’ function of libSVM. The ROC curves analyses were performed using the packages ‘ROCR’ (Sing et al., 2005) and ‘verification’.

2.5 Nested tests
We performed nested tests to evaluate the contribution of each parameter and selected the best performing subset. Learning was repeated after eliminating each of the 84 parameters in turn. The parameter whose elimination leads to the highest accuracy is excluded. Each of the 83 remaining parameters is eliminated in turn, and again, the parameter whose elimination leads to the highest accuracy is excluded. The procedure was repeated until only two parameters remained. Each learning procedure was done through leave-one-out.

2.6 Accuracy and recall evaluation
The three datasets (Bahadur, Zhu and Ponstingl) have important overlaps (Fig. 1), comparing the accuracies and recalls (on both monomers and dimers) requires that each method is evaluated only on proteins not belonging to their training dataset.


Figure 1
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. learning datasets. For crystal and biological dimers, region A contains PDB codes found only in Bahadur dataset, region B those found only in the Ponstingl dataset, and region C those found only in Zhu datasets. Regions D, E and F are those examples found in two of the datasets, but not in the third one. Region G contains the examples common to the three datasets. For each dataset, and for both crystal and biological dimers, the total number of examples and the mean interface area are given.

 
For each method, a test set was constructed using all the proteins that were not present in the tested method's training set, and had no more than 30% sequence identity with proteins of the training set. Moreover, the internal redundancy of each test set was filtered out so that two proteins in the set don't share more than 30% identity. The test sets are available on the website.

As the authors of PITA estimate that scores above 70–80 indicate specific interfaces (Ponstingl et al., 2003), (i) scores below 70 were taken to identify crystal dimers, (ii) scores above biological dimers, (iii) interfaces with scores between 70 and 80 are not assigned.

The overall accuracy is the number of correctly assigned examples divided by the number of assigned examples. The crystal dimer (resp. biological dimer) accuracy is the number of correctly assigned crystal dimers (resp. biological dimers) divided by the total number of assigned crystal dimers (resp. biological dimers). The crystal dimer (resp. biological dimer) recall is the number of correctly assigned crystal dimers (resp. biological dimers) divided by the total number of crystal dimers (resp. biological dimers), assigned or not.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSION
 ACKNOWLEDGEMENT
 REFERENCES
 
3.1 Values of the parameters
The area of interface (computed as the sum of the areas of Voronoi facets shared by cells from the two subunits), and the number of interface residues are on average much smaller for crystal dimers than for biological dimers (Fig. 2A and B). These two parameters are highly correlated.


Figure 2
View larger version (55K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Values of the parameters for crystal and biological dimers. The values of the 84 parameters used for learning are shown in black bars for biological dimers and white bars for crystal dimers. (A) Voronoi area of the interface. (B) Number of residues participating in the interface. (C) Frequencies of each of the 20 residues at the interface. (D) Mean volume of the Voronoi cell for each of the 20 residues at the interface. (E) Frequency of each type of pair at the interface. (F) Mean distance between the geometric centers in a pair at the interface for each type of pair. In E and F, S: small residue (AGSTCP), H: hydrophobic (VILM), A: aromatic (FYW), –: negatively charged (DE), +: positively charged (NQ), P: polar (HKR).

 
Figure 2C shows the frequency of the 20 amino acids at the interfaces. As previously described, the biological interfaces are enriched in hydrophobic residues and depleted in polar and charged residues relative to crystal packing interfaces, and also to the solvent-accessible surface (Bahadur et al., 2004). Small residues have similar frequencies in both types of interfaces, except cysteine, which is more frequent in biological interfaces.

Our previous study of protein–protein complexes (Bernauer et al., 2007) indicates that the mean volumes of the Voronoi cells of interface residues are important parameters. Figure 2D shows some interesting variation. The volume occupied by leucines, which is somewhat larger at biological interfaces than in the protein core, is even larger at crystal packing interfaces.

The frequencies of residue pairs in contact at the interface also show interesting differences (Fig. 2E). The hydrophobic–hydrophobic, hydrophobic-small and hydrophobic–aromatic pairs are much more frequent in biological interfaces. This is partly due to the enrichment of biological interfaces in hydrophobic residues, but also to their better physico-chemical complementary compared to crystal packing interfaces.

3.2 Parameter selection
In order to evaluate the relative importance of the 84 parameters, we performed nested tests. The most important thing we learned during these tests is that the method is more accurate if not all parameters are used (Fig. 3). From these tests, we concluded that 27 parameters should be retained (see Table 1). Using these 27 parameters leads to an accuracy of 0.95, with a crystal dimer recall of 0.98 and a biological dimer recall of 0.89.


Figure 3
View larger version (30K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Accuracy (white line), crystal dimer recall (grey line) and biological dimer recall (black line) as a function of the number of parameters. At each step, the accuracy with all parameters but one is evaluated. The parameter giving the highest accuracy (consequently the parameter that is less informative) is eliminated for the next step.

 

View this table:
[in this window]
[in a new window]

 
Table 1. Rank of the parameters in nested tests

 
As expected, the most important parameter is the interface area (Table 1): alone, it yields an accuracy of 0.78 with a crystal dimer recall of 0.90 and a biological dimer recall of 0.59. More surprising is the fact that both area of the interface and number of residues at the interface, although largely correlated, are very important (the number of residues at the interface is ranked 9).

A well represented type of parameter in the retained set is the volume of the Voronoi cell. Frequencies are also well represented in the selected set. In both cases, the importance of these parameters cannot be correlated to their repartitions between biological and crystal packing interfaces. Pair frequencies appear less often, however, the frequency of small-polar pairs is ranked 4. Since this type of pair is less frequent in biological interfaces, its ranking shows that the presence of this type of pair in a biological interface is unfavorable. Similarly, pair distances, which were thought to have a very weak contribution, do appear in the retained set.

To summarize, these result show that the factors that favor biological interfaces are:

  • a larger interface area,
  • more interface residues,
  • a large proportion of F, I and L, a small proportion of D, G, K, N, S and Y,
  • small volumes for C, K, L, S, T and V,
  • a large number of aromatic–aromatic pairs and a small number of small-polar, hydrophobic–polar and negative-polar pairs,
  • small distances in small-negative, hydrophobic–hydrophobic, hydrophobic-positive, aromatic–polar and negative–negative pairs.

3.3 Learning performance
Performance of learning has been evaluated by the Area Under the ROC Curve (AUC). Receiver Operating Characteristics (ROC) curves, issued from signal processing, represent the trade-off between true and false positive rates when interpreting hypotheses. The ideal hypothesis, which generates no false positive and 100% of the true positives, has a ROC curve that is a step function and an AUC of 1. A random selection yielding true and false positives in equivalent numbers has an AUC of 0.5.

Figure 4 shows three ROC curves: learning with 84 parameters, learning with 84 parameters and high-resolution geometric parameters RP, LP and GD, and learning with the selected set of 27 parameters. The curves show that whereas adding the RP, LP and GD parameters has only a minor effect on accuracy (the AUC increases from 0.91 to 0.92), a reduced set of parameters performs much better, with an AUC of 0.97.


Figure 4
View larger version (30K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. ROC curves for three different sets of parameters: the selected set of 27 parameters (see text), the original set of 84 parameters, and the 84 low-resolution parameters with the three high-resolution parameters RP, LP and GD (see text).

 
3.4 Reliability of predictions
Because the SVM, for each submitted example, returns a score, we are able to give a reliability rate for a prediction, depending on the score obtained (Fig. 5).


Figure 5
View larger version (34K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. Accuracy of DiMoVo as a function of score.

 
As expected, since the SVM aims at giving to a crystal dimer a score close to 0, and to a biological dimer a score close to 1, the accuracy of predictions with scores lower than 0.3, or higher than 0.8 is very high. Although not very low, the accuracy around 0.5 is not as good (0.73 accuracy for scores between 0.5 and 0.55, and 0.79 accuracy for scores between 0.45 and 0.5). Consequently, we tested three different options:
  • threshold 0.5: if the score is lower than 0.5, we predict a crystal dimer, if the score is higher than 0.5, we predict a biological dimer
  • 0.45–0.55: if the score is lower than 0.45, we predict a crystal dimer, if the score is higher than 0.55, we predict a biological dimer, if the score is between 0.45 and 0.55 we don't predict.
  • 0.4–0.6: if the score is lower than 0.4, we predict a crystal dimer, if the score is higher than 0.6, we predict a biological dimer, if the score is between 0.4 and 0.6 we don't predict.
Results obtained in leave-one-out on the learning set are given in Table 2. DiMoVo in its simplest flavor (with a threshold at 0.5) is already performing well with an accuracy of 0.95. The important difference between crystal and biological dimer recalls (0.98 and 0.89, respectively) is largely due to the fact that there are more crystal than biological dimers in the learning set. However, randomly removing crystal dimers from the set, which is the usual methodology in this type of method, led to significantly lower accuracies. This could have been due to over-fitting. However, the accuracy and recalls obtained on the two other datasets show that we are not in this situation. This is also confirmed by the number of vectors (171 ± 12), which is significantly lower than the number of structures used in the learning set.


View this table:
[in this window]
[in a new window]

 
Table 2. Accuracy, crystal dimer recall and biological dimer recall for three different flavors of DiMoVo, obtained on the learning set, using a leave-one-out procedure

 
When considering that structures with scores between 0.45 and 0.55 (in DiMoVo 0.45–0.55), or scores between 0.4–0.6 (in DiMoVo 0.4–0.6) cannot be assigned to crystal or biological dimers, the accuracy is improved, but the recalls are lower.

3.5 Comparison with other existing methods
In order to assess the accuracy of our method, and to compare with already existing methods, we have used the four already cited methods: DiMoVo, PITA (Ponstingl et al., 2003), PISA (Krissinel and Henrick, 2005) and NOXclass (Zhu et al., 2006). We used the three datasets (Bahadur, Ponstingl and Zhu) for the tests. Each method was evaluated only on crystal and biological dimers that were not part of its own training set, and that have no more than 30% sequence identity with proteins of the learning set. Because of the large overlap between the different datasets, the number of examples used for the evaluation of each method is different (except for PITA and PISA which were trained on the same dataset).

The results are given Table 3. As can be seen, DiMoVo has both a better overall accuracy, but also better crystal dimer recall than the other three methods. PISA and NOXClass have a better biological dimer recall than DiMoVo, but this is counter-balanced by rather low crystal dimer recalls.


View this table:
[in this window]
[in a new window]

 
Table 3. Comparison of the performances of 4 different methods: NOXClass (Zhu et al., 2006), PISA (Krissinel and Henrick, 2005), PITA (Ponstingl et al., 2003) and DiMoVo

 
To further analyze predictions, and for comparison, we studied the reliability of predictions by the four methods as a function of the interface area. As can be seen in Figure 6, all methods show lesser accuracies for ‘problematic’ cases: those with interface areas from 1400 to 3000 Å2. However, DiMoVo accuracies are never lower than 0.8 (areas between 2000 and 2500 Å2, 0.83 for DiMoVo 0.4–0.6), whereas other methods show significantly lower values: 0.73 for PITA (areas between 2000 and 2500 Å2), 0.5 for PISA (areas between 2500 and 3000 Å2), and 0.57 for NOXClass (areas between 2000 and 2500 Å2). The comparatively lower accuracies obtained with PITA and PISA are attenuated by the fact that these two methods are not only able to predict monomers and dimers, but also multimers. Moreover, PISA gives many very useful details concerning each possible interface. The larger number of possible predictions lowers the accuracy of one particular prediction, namely the discrimination between crystal and biological dimer.


Figure 6
View larger version (33K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6. Accuracy as a function of the interface area. The accuracies of crystal/biological assignment by PITA, PISA, NOXClass and the three different flavors of DiMoVo (see text) were estimated for examples with interface areas in different intervals.

 
Our improved performances for difficult cases are due to three facts. Firstly, our learning set contains more difficult cases, and especially more crystal dimers with large interface areas. To highlight this point, we have trained DiMoVo using (i) obligates and crystal dimers from Zhu dataset, (ii) obligates, non-obligates and crystal dimers from Zhu dataset and (iii) monomers and dimers form Ponstingl dataset. The results in Table 3 clearly show that these three learning sets lead to lower accuracies and recall. Secondly, and this is a direct consequence of the previous point, interface area, although a very important parameter, has a lower weight in our method as compared to other ones. Finally, we consider many more parameters than other methods, and use these to build a scoring function using a machine-learning procedure.


    4 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSION
 ACKNOWLEDGEMENT
 REFERENCES
 
In this study, we set up an effective method to discriminate between crystal packing and biological interactions using a coarse-grained model for the structure. This method correctly discriminates between crystal and biological dimers for 97% of the tested cases (DiMoVo 0.4–0.6), with very good recalls for both types of dimers. DiMoVo has been shown to compare very favorably with existing methods, especially for difficult cases, namely those for which the area of the interface is around 2000 Å2.

Interestingly, the initial set of 84 parameters had to be reduced to a subset of 27 parameters to obtain maximal accuracy. The parameters of this subset belong to all of the initial categories: volumes, frequencies, pair frequencies and pair distances. It has also shown that strong correlation of the two parameters was not a criterion for excluding one. Indeed, the number of residues constituting the interface, and the area of the interface, which are two strongly correlated parameters, are both present in the final subset, and excluding one decreases significantly the accuracy.

This work shows that the formalism used for the description of the protein structure is very powerful. Although only one point per residue is used, and the relative weights of the different types of residues are ignored, the obtained Voronoi tessellation allows computing well-discriminating descriptors. To try to improve further the discrimination between biological and crystal contact interfaces, we will try to use more complex mathematical tessellations, such as power diagrams. However, the first attempts we have made in this direction show that it is very difficult to adjust the weights assigned to each type of residue, since no simple relation can be established between these weights and the size or molecular weights of the residues. Moreover, if the parameters obtained show a larger gap between specific and non-specific interfaces, the standard deviations are also larger.


    ACKNOWLEDGEMENT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSION
 ACKNOWLEDGEMENT
 REFERENCES
 
J. Bernauer thanks the Association pour la Recherche contre le Cancer (ARC) for financial support.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Burkhard Rost

Received on October 30, 2007; revised on January 10, 2008; accepted on January 10, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 CONCLUSION
 ACKNOWLEDGEMENT
 REFERENCES
 

    Bahadur RP, et al. Dissecting subunit interfaces in homodimeric proteins. Proteins (2003) 53:708–719.[CrossRef][Web of Science][Medline]

    Bahadur RP, et al. A dissection of specific and non-specific protein-protein interfaces. J. Mol. Biol (2004) 336:943–955.[CrossRef][Web of Science][Medline]

    Bernauer J, et al. A new protein–protein docking scoring function based on interface residue properties. Bioinformatics (2007) 23:555–562.[Abstract/Free Full Text]

    Bernauer J, et al. A docking analysis of the statistical physics of protein–protein recognition. Phys. Biol (2005) 2:S17–S23.[CrossRef][Web of Science][Medline]

    Block P, et al. Physicochemical descriptors to discriminate protein–protein interactions in permanent and transient complexes selected by means of machine learning algorithms. Proteins (2006) 65:607–622.[CrossRef][Web of Science][Medline]

    Boissonnat J-D, et al. Programming with CGAL: The example of triangulations. Symp. Comput. Geometry (1999) 1999:421–422.

    Bonvin AM, et al. Nuclear magnetic resonance solution structure of the arc repressor using relaxation matrix calculations. J. Mol. Biol (1994) 236:328–341.[CrossRef][Web of Science][Medline]

    Bradford JR, Westhead DR. Improved prediction of protein–protein binding sites using a support vector machines approach. Bioinformatics (2005) 21:1487–1494.[Abstract/Free Full Text]

    Carugo O, Argos P. Protein–protein crystal-packing contacts. Protein Sci (1997) 6:2261–2263.[Web of Science][Medline]

    Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. Multiple Classifier Systems 6th International Workshop (2001) CA, USA: Seaside.

    Cristiani N, Shawe-Taylor J. An Introduction to Support Vector Machines and other Kernel-based Learning Methods. (2000) Cambridge: Cambridge University Press.

    Dasgupta S, et al. Extent and nature of contacts between protein molecules in crystal lattices and between subunits of protein oligomers. Proteins (1997) 28:494–514.[CrossRef][Web of Science][Medline]

    Hsu C, et al. A Practical Guide to Support Vector Classification. (2003) National Taiwan University Editions, Taiwan: Department of Computer Science and Information Engineering.

    Janin J. Specific versus non-specific contacts in protein crystals. Nat. Struct. Biol (1997) 4:973–974.[CrossRef][Web of Science][Medline]

    Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J. Mol. Biol (2007) 372:774–797.[CrossRef][Web of Science][Medline]

    Liu S, et al. A combinatorial score to distinguish biological and nonbiological protein–protein interfaces. Proteins (2006) 64:68–78.[CrossRef][Web of Science][Medline]

    Milla ME, et al. P22 Arc repressor: transition state properties inferred from mutational effects on the rates of protein unfolding and refolding. Biochemistry (1995) 34:13914–13919.[CrossRef][Web of Science][Medline]

    Mintseris J, Weng Z. Atomic contact vectors in protein-protein recognition. Proteins (2003) 53:629–639.[CrossRef][Web of Science][Medline]

    Neuvirth H, et al. ProMate: a structure based prediction program to identify the location of protein-protein binding sites. J. Mol. Biol (2004) 338:181–199.[CrossRef][Web of Science][Medline]

    Nooren IM, Thornton JM. Diversity of protein–protein interactions. EMBO J (2003) 22:3486–3492.[CrossRef][Web of Science][Medline]

    Ponstingl H, et al. Automatic inference of protein quaternary structure from crystals. J. Appl. Cryst (2003) 36:1116–1122.[CrossRef][Web of Science]

    Poupon A. Voronoi and Voronoi-related tessellations in studies of protein structure and interaction. Curr. Opin. Struct. Biol (2004) 14:233–241.[CrossRef][Web of Science][Medline]

    Shaw A, et al. Crystal structure and subunit dynamics of the abalone sperm lysin dimer: egg envelopes dissociate dimers, the monomer is the active species. J. Cell. Biol (1995) 130:1117–1125.[Abstract/Free Full Text]

    Schölhopf B. Support Vector Learning. (1997) Munich: R. Oldenbourg Verlag.

    Sing T, et al. ROCR: visualizing classifier performance in R. Bioinformatics (2005) 21:3940–3941.[Abstract/Free Full Text]

    Zhu H, et al. NOXclass: prediction of protein-protein interaction types. BMC Bioinformatics (2006) 7:27.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
N. Tuncbag, G. Kar, O. Keskin, A. Gursoy, and R. Nussinov
A survey of available tools and web servers for analysis of protein-protein interactions and interfaces
Brief Bioinform, May 1, 2009; 10(3): 217 - 232.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/5/652    most recent
btn022v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bernauer, J.
Right arrow Articles by Poupon, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bernauer, J.
Right arrow Articles by Poupon, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?