Bioinformatics Advance Access originally published online on April 1, 2008
Bioinformatics 2008 24(10):1313-1315; doi:10.1093/bioinformatics/btn115
FT-COMAR: fault tolerant three-dimensional structure reconstruction from protein contact maps
1Department of Computer Science and 2Department of Biology, Biocomputing group, University of Bologna, Bologna, Italy
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Fault Tolerant Contact Map Reconstruction (FT-COMAR) is a heuristic algorithm for the reconstruction of the protein three-dimensional structure from (possibly) incomplete (i.e. containing unknown entries) and noisy contact maps. FT-COMAR runs within minutes, allowing its application to a large-scale number of predictions.
Availability: http://bioinformatics.cs.unibo.it/FT-COMAR
Contact: vassura{at}cs.unibo.it
Supplementary information: Supplementary data are available on Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
The knowledge of the protein tertiary structure may help in understanding its biological function. However, three-dimensional (3D) protein structure prediction directly from its primary structure is still a really complex and unsolved problem (Lesk, 2006). A possible approach is the prediction of the protein contact map as an intermediate step between primary and tertiary structure. While different methods have been described for predicting protein contact maps, the problem of an efficient 3D reconstruction has been poorly addressed. A contact map of a given (known) protein 3D structure is a binary matrix M such that Mi,j = 1 iff the Euclidean distance between residue C
atoms i and j in the native structure is less than or equal to a pre-assigned threshold value. The general problem of recovering a set of 3D coordinates having a specific contact map has been proved to be NP-hard (Breu and Kirkpatrick, 1998). Other well studied similar problems are NMR structure determination (Havel, 1998; Moré and Wu, 1999) and protein conformational freedom (De Groot et al., 1997). However, the different nature of distance constraints available in such settings requires the use of methods and tools which are different from the techniques needed for C
-trace contact map reconstruction. A series of heuristic algorithms have been proposed to solve the contact map reconstruction problem (Bohr et al., 1993; Galaktionov and Marshall, 1994; Pollastri et al., 2006; Vendruscolo et al., 1997). Most of them have been tested only on the contest of a specific predictor performance and none of them is available to the scientific community. The Fault Tolerant Contact Map Reconstruction (FT-COMAR) implementation described here has at least the same reconstruction quality of the above quoted methods with a running time short enough to allow a large scale number of predictions (Vassura et al., 2007a, b). | 2 FT-COMAR |
|---|
|
|
|---|
FT-COMAR implements a heuristic procedure for recovering a set of 3D coordinates from a possibly erroneous and incomplete contact map (Vassura et al., 2007a, b, 2008). The algorithm is in two phases. The first phase relies on the metric matrix embedding algorithm (Havel, 1998) to retrieve an initial set of 3D coordinates from partially randomized statistical values of inter residues distances. The second phase of the algorithm iteratively applies a correction/perturbation procedure to the randomly generated set of coordinates. Correction applies to each residue a pseudo-force derived from the input contact map, while perturbation slightly moves each residue coordinate in order to relax the current computed structure, allowing a better correction. This is performed in order to obtain a new set of coordinates as consistent as possible with the given contact map (Vassura et al., 2007a, b, 2008). The refinement applies until the set of coordinates is consistent with the given contact map or until some control parameter reaches the stop condition. The current version of FT-COMAR applies a correction-perturbation procedure to ensure that the distances of each C
atoms preserve the protein constraints. Since a set of 3D coordinates consistent with an erroneous contact map may not exist, the quality of the solution found by FT-COMAR is highly influenced by the degree of correctness of the input contact map. To improve the reconstruction it can be useful to mark as unknown some (possibly) highly unsafe areas of the input contact map. FT-COMAR does not consider the coordinate points related to unknown entries. Ignoring faulty regions during the refinement phase it eventually avoids the propagation of errors, increasing reconstruction quality. The FT-COMAR implementation takes as input an ASCII file containing a symmetric contact map matrix whose possible entries are 0 (non contact), 1 (contact) and –1 (unknown entry) or a list of contacts in the widely used EVAcon format (Graña et al., 2005). Such contact map may be computed with any threshold value, even if previous results show that higher thresholds allow the contact map to carry more information (Vassura et al., 2007b, 2008).
Marking unsafe areas as unknown can greatly improve reconstruction quality when the contact map is noisy. For such reason, a simple filtering procedure is provided: it can be used to detect possibly unsafe entries of the input contact map. This filtering procedure (Vassura et al., 2007a) is based on the common neighbor's property of contact maps at threshold 12 Å (so it has better performances when the input contact map threshold is 12 Å). FT-COMAR has been tested on physical contact maps, obtaining structures with RMSD from the native ones on average < 4 Å, under the following conditions:
- contact thresholds ranging from 7 to 18 Å;
- protein chains ranging from 55 to 786 residue long;
- number of errors ranging from 0% to 10% of the whole contact map.
| 3 RESULTS |
|---|
|
|
|---|
The current version of FT-COMAR has been tested on a set of 100 non-redundant mono domain protein chains evenly distributed among the four main structural classes: all Alpha, all Beta, Alpha + Beta and Alpha/Beta. The reconstruction performances have been compared in terms of RMSD by introducing on the native contact maps three different types of random errors: generic errors, missing contacts and false contacts. Our tests show that in presence of errors the reconstruction quality decreases with the length of the protein and that FT-COMAR largely tolerates missing contacts. In particular, the experimental results show that the reconstruction quality of contact maps with 50% missing contacts is comparable to the reconstruction quality of contact maps with 15% false contacts. FT-COMAR is much more tolerant to under prediction than to over prediction of contacts. Moreover, FT-COMAR can ignore up to 75% of the contact map and still compute a protein 3D structure whose RMSD from the native one is < 4 Å (assuming that the remaining 25% contains no errors). Furthermore, in this last case, the reconstruction quality is independent from the protein length (Vassura et al., 2007a). This suggests that, to improve protein reconstruction from contact maps, contact map prediction should put much more emphasis on the quality more than on the quantity of the prediction. In all our tests, FT-COMAR runs within minutes, allowing its adoption for a large-scale number of predictions. Figures showing results for all experiments updated to the current version of FT-COMAR can be found as Supplementary Material at http://bioinformatics.cs.unibo.it/FT-COMAR/index.html#exp. A direct comparison with other results has been previously described (Vassura et al., 2007a, b, 2008), from which it appears that FT-COMAR outperforms the other methods.
As a blind test for FT-COMAR, we evaluate its reconstruction performances on predicted contact maps for the test set of 100 proteins described above. The predictions are performed with CORNET (Fariselli et al., 2001). When the number of predicted contacts is set equal to the protein length/5, on the set of 100 proteins, CORNET average accuracy (sequence separation >24 residues) is 16.9%. This performance is comparable with those assessed in CASP7, where the mean across all groups was 13% (Izarzugaza et al., 2007). We tested FT-COMAR on these predicted contact maps performing 10 reconstructions per map without using the filtering procedure. The mean RMSD values obtained are shown in Figure 1 as a function of the whole accuracy of the map (sequence separation >6 residues). Not surprisingly, the quality of the reconstructed structure increases at increasing accuracy values. Although we never reached RMSD <5 Å (the lowest average RMSD equals 7.4 Å), we note that, even in presence of extremely noisy and erroneous contact maps, many reconstructed structures have average RMSD lower than 10 Å.
|
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Funding: We thank MIUR for the following grants: PNR 2001-2003 (FIRB art. 8) and PNR 2003 projects (FIRB art. 8) on Bioinformatics for Genomics and Proteomics and LIBI-Laboratorio Internazionale di BioInformatica, both delivered to RC. This work was also supported by the Biosapiens Network of Excellence project no LSHG-CT-2003-503265 (a grant of the European Unions VI Framework Programme).
Conflicts of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Alfonso Valencia
Received on January 3, 2008; revised on March 17, 2008; accepted on March 28, 2008
| REFERENCES |
|---|
|
|
|---|
Bohr J, et al. Protein structures from distance inequalities. J. Mol. Biol (1993) 231:861–869.[CrossRef][Web of Science][Medline]
Breu H, Kirkpatrick DG. Unit disk graph recognition is NP-hard. Comp. Geom (1998) 9:3–24.[CrossRef]
De Groot BL, et al. Prediction of protein conformational freedom from distance constraints. Proteins (1997) 29:240–251.[CrossRef][Web of Science][Medline]
Galaktionov SG, Marshall GR. Properties of intraglobular contacts in proteins: an approach to prediction of tertiary structure. In. In: System Sciences, Vol. V, Proceedings of the Twenty-Seventh Hawaii International Conference on Biotechnology Computing Vol. 5, 4–7 Jan. 1994 (1994) Los Alamitos, California, Wailea, HI, USA: IEEE Computer Society Press. 326–335.
Fariselli P, et al. Prediction of contact maps with neural networks and correlated mutations. Protein eng (2001) 14:835–843.
Graña O, et al. EVAcon: a protein contact prediction evaluation service. Nucleic Acids Res (2005) 33(Web Server issue):W347–W351.
Havel TF. Distance Geometry: Theory, Algorithms, and Chemical Applications (1998) In: Encyclopedia of Computational Chemistry, J. Wiley & Sons Ltd.
Izarzugaza JM, et al. Assessment of intramolecular contact predictions for CASP7. Proteins (2007) 69(Suppl. 8):152–158.[CrossRef][Web of Science][Medline]
Lesk A. Introduction to Bioinformatics (2006) New York: Oxford University Press.
Moré J, Wu Z. Distance geometry optimization for protein structures. J. Global Optim (1999) 15:219–234.[CrossRef]
Pollastri G, et al. Modular DAG-RNN architectures for assembling coarse protein structures. J. Comp. Biol (2006) 13:631–650.[CrossRef]
Vassura M, et al. Fault Tolerance for Large Scale Protein 3D Reconstruction from Contact Maps. Seventh International Workshop on Algorithms in Bioinformatics (WABI 2007) (2007a) 25–37. Vol. 4645 of Springer Verlag Lecture Notes in Bioinformatics. Springer Berlin / Heidelberg, Pennsylvania.
Vassura M, et al. Reconstruction of 3D structures from protein contact maps. In. Proceedings of Bioinformatics Research and Applications third International Symposium (ISBRA 2007) (2007b) 578–589. Vol. 4463 of Springer Lecture Notes in BioInformatics. Springer Berlin / Heidelberg, Atlanta.
Vassura M, et al. Reconstruction of 3D structures from protein contact maps. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2008) IEEE/ACM Transactions on Computational Biology and Bioinformatics, 15 Feb 2008. IEEE Computer Society Digital Library. IEEE Computer Society, 15 April 2008 <http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.27>.
Vendruscolo M, et al. Recovery of protein structure from contact maps. Fold. Des (1997) 2:295–306.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
A. N. Tegge, Z. Wang, J. Eickholt, and J. Cheng NNcon: improved protein contact map prediction using 2D-recursive neural networks Nucleic Acids Res., July 1, 2009; 37(suppl_2): W515 - W518. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

