Skip Navigation


Bioinformatics Advance Access originally published online on February 18, 2005
Bioinformatics 2005 21(10):2167-2170; doi:10.1093/bioinformatics/bti330
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2167    most recent
bti330v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kinjo, A. R.
Right arrow Articles by Nishikawa, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kinjo, A. R.
Right arrow Articles by Nishikawa, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Recoverable one-dimensional encoding of three-dimensional protein structures

Akira R. Kinjo * and Ken Nishikawa

Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics Mishima 411-8540 Japan and Department of Genetics, The Graduate University for Advanced Studies (SOKENDAI) Mishima 411-8540, Japan

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS AND DISCUSSION
 REFERENCES
 

Summary: One-dimensional (1D) structures of proteins such as secondary structure and contact number provide intuitive pictures to understand how the native three-dimensional (3D) structure of a protein is encoded in the amino acid sequence. However, it is still not clear whether a given set of 1D structures contains sufficient information for recovering the underlying 3D structure. Here we show that the 3D structure of a protein can be recovered from a set of three types of 1D structures, namely, secondary structure, contact number and residue-wise contact order which is introduced here for the first time. Using simulated annealing molecular dynamics simulations, the structures satisfying the given native 1D structural restraints were sought for 16 proteins of various structural classes and of sizes ranging from 56 to 146 residues. By selecting the structures best satisfying the restraints, all the proteins showed a coordinate RMS deviation of <4 Å from the native structure, and, for most of them, the deviation was even <2 Å. The present result opens a new possibility to protein structure prediction and our understanding of the sequence–structure relationship.

Contact: akinjo{at}genes.nig.ac.jp


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS AND DISCUSSION
 REFERENCES
 
Deciphering how the three-dimensional (3D) structure of a protein is encoded into the corresponding amino acid sequence is a fundamental step toward understanding a wide spectrum of complex biological phenomena. One approach to this problem is to develop a method for structure prediction, and to interpret the encoding scheme in terms of model parameters and optimization algorithms. However, de novo or ab initio methods for 3D structure prediction are often too complicated to clarify the relation between sequence and structure.

One-dimensional (1D) structure prediction (Rost, 2003) is a more intuitive route to understanding the sequence–structure relationship. 1D structures are 3D structural features projected onto strings of residue-wise structural assignments (Rost, 2003), which include secondary structures (SS), solvent accessibility and contact numbers (CN). Although 1D structures can show intuitive correspondence between amino acid sequence and protein structure, it is still not known whether a given set of 1D structures is sufficient for unique specification of the underlying 3D structure. Clearly, SS alone cannot specify the 3D structure of a globular protein. Using SS and/or other 1D structures such as CN, is it possible at all to recover the native structure? The recent remarkable result by Porto et al. (2004) suggests that the answer is affirmative. They have shown that the principal eigenvector of the contact map of a protein is essentially equivalent to the contact map itself (Porto et al., 2004). Using the correct contact map, we can safely recover the native 3D structure (Vendruscolo et al., 1997). However, when the principal eigenvector is to be used for reconstructing the contact map using the algorithm by Porto et al. (2004) the following conditions must be met with strictly. First, the principal eigenvector must be extremely accurate. Second, very strict definitions for residue–residue contact must be used. Third, the target protein must be compact and consist of a single domain. Lack of one of these conditions will result in combinatorial explosion. It should be also noted that, although the principal eigenvector shows a significant correlation with the contact number vector, it is difficult to interpret its geometrical meaning. Therefore, it is desirable to find 1D structures which are more robust, easier to understand, but still sufficient for the reconstruction of the native 3D structure.

Kabakçglu et al. (2002) have shown that the number of 3D structures that satisfy the native CN is limited. The contact number ni of the i-th residue is defined as ni = {sum}jCi,j where Ci,j is the contact map of the native structure of a protein, i.e. Ci,j = 1 if the residues i and j are in contact and Ci,j = 0 otherwise. In our preliminary study, we constructed many 3D structures that satisfy the native SS and CN for a small all-{alpha} protein, and found that a small percentage of the structures were highly native-like (Kinjo et al., 2005), supporting the result by Kabakçglu et al. (2002). However, we have also found that it is difficult to recover the native structures of larger proteins or those with complex topologies using only SS and CN restraints. Therefore, either some very powerful optimization techniques or other types of 1D structures seemed necessary.

In this paper, we introduce a new kind of 1D structure called residue-wise contact order (RWCO), and show that, given the native SS, CN and RWCO, it is possible to recover the native 3D structures of proteins of various topologies. The contact order was originally introduced to quantify the complexity of the native topology of proteins to investigate the correlation between the native structure and its folding rate (Plaxco et al., 1998). As such, the contact order is a per-protein quantity. Here, we extend the definition of the contact order to make it a per-residue quantity. Using the same notation as the definition of CN, the RWCO oi of the i-th residue is defined by oi = {sum}j|i j|Ci,j, i.e. the RWCO of a residue is expressed as the sum of sequence separations of contacting residues. An example of CN and RWCO is shown in Figure 1. We can see that CN and RWCO exhibit similar trends, but the value of RWCO is larger for the residues making long-range contacts (e.g. the N-terminal and C-terminal strands in Fig. 1), and smaller for those making short-range contacts (e.g. the central {alpha} helix in Fig. 1). Similar to SS and CN, RWCO has a clear geometrical meaning, and the combination of the three types of 1D structures is expected to be more tolerant against small perturbations for the reconstruction of 3D structures.



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 1 An example of CN and RWCO. The MolScript (Kraulis, 1991) drawing in the upper panel shows the native fold of Protein G (2gb1 [PDB] ); in the bottom panel is the corresponding CN (solid line, left ordinate) and RWCO (dashed line, right ordinate).

 

    2 MATERIALS AND METHODS
 TOP
 Abstract
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS AND DISCUSSION
 REFERENCES
 
To identify the 3D structures that satisfy the given 1D structural restraints, we used simulated annealing molecular dynamics simulations. In the present paper, two residues are defined to be in contact if the distance between the Cß atoms (or C{alpha} atoms in case of glycines) is <12 Å. This rather generous cut-off distance has been shown to maximize the correlation between predicted and observed contact numbers (Kinjo et al., 2005). To exclude trivial nearest-neighbor contacts, we set Ci,j = 0, if |ij| < 3. To make CN and RWCO differentiable with respect to atomic coordinates, we slightly modified the definition of residue–residue contact by using a sigmoid function of interresidue distance: Ci,j = 1/{1 + exp[w(ri,j – 12)]} where ri,j is the distance between Cß atoms of residues i and j (Kinjo et al., 2005) (the parameter w determines the sharpness of the sigmoid function, and was set to 3 in this work). We used the EMBOSS distance geometry program (Nakai et al., 1993) with default parameters and modifications for CN and RWCO restraint functions, and an all-atom representation of proteins derived from the AMBER force field (Weiner et al., 1986). The force field used is the so-called distance geometry force field in which all the energy terms are expressed as penalty functions including bond lengths, bond angles (1–3 distance), torsion angles (1–4 distance), short-range (1–4) and long-range (1–5) soft repulsions (no attractions) together with chiral center and chiral volume restraints (Nakai et al., 1993). Therefore, if a structure perfectly satisfies the ideal peptide geometry and all the restraints, the energy value should be the minimum value of zero. Disulphide bonds, if any, were ignored, and no ligands or co-factors were taken into account.

Secondary structures were assigned by the DSSP program (Kabsch and Sander, 1983). For {alpha} helices, distance restraints were imposed on hydrogen-bonding pairs, and dihedral angle restraints were imposed on {varphi} and {psi} angles. For ß strands, distance restraints were imposed between C{alpha} atoms within each strand segment, and loose dihedral angle restraints for {varphi} and {psi} angles were also included.

Given a set of native CN , the CN restraints were imposed as where wn is a weight factor set to 5. Similarly, with the native RWCO {ôi}, the RWCO restraints were imposed as wo{sum}i(ôiôi)2 with the weight factor of 0.5 divided by the sequence length.

To construct a structure, we first generated a random coil which was minimized by 500 steps of the conjugate gradient method. Then a canonical molecular dynamics simulation at a temperature of 1000 K was performed for 10 000 steps, after which the system was cooled by 2 K per 100 steps until the temperature was 100 K. Then, the system was further cooled by 1 K per 100 steps down to 10 K. The molecular dynamics simulations were performed in 4D space to relax the multiple minima problem (Havel, 1991; Nakai et al., 1993). Finally, conjugate gradient minimization was applied for 2000 steps to recover the structure in 3D space. This procedure was iterated 300 times with different initial random coils to yield 300 independent structures for each target protein. We sorted these structures in the increasing order of their total energy to select the best 100 structures.

As target proteins, we chose from the Protein Data Bank (Berman et al., 2000) four all-{alpha}, four all-ß, five {alpha} + ß and three {alpha}/ß proteins whose sequence lengths range from 56 to 146 residues (Table 1, first column). These structures were arbitrarily selected so as to include proteins of varying structural classes and sizes.


View this table:
[in this window]
[in a new window]
 
Table 1 Summary of 3D structures recovred from 1D structuresa

 

    3 RESULTS AND DISCUSSION
 TOP
 Abstract
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS AND DISCUSSION
 REFERENCES
 
For 14 of the 16 target proteins, we obtained reconstructed structures whose C{alpha} root mean square deviations (RMSDs) from the native structure are <2 Å (Table 1, Columns 2–4). Many of them exhibit even <1 Å RMSD. For two other targets, namely 2pcy (plastocyanin) and 135l (turkey egg white lysozyme), we still find structures <3.5 Å RMSD. By selecting the structures of the lowest energy, we can almost always identify highly native-like structures (Table 1, column 5). One exception is 2pcy, whose ‘best’ structure shows 10.9 Å RMSD. However, this structure is actually the mirror image of the native structure. Applying the mirror image transformation to this structure, its RMSD from the native structure is 1.4 Å. Occurrence of mirror image structures is an inherent problem of methods which use distance-based restraints (CN and RWCO are based on interatomic distances). Nevertheless, the result for 2pcy suggests that it is also possible to obtain structures with <2 Å RMSD if we generate a sufficiently large number of structures.

The minimum RMSDs are shown in the last column of Table 1. These structures do not always correspond to those with the lowest energy. Since the average values of the total energy, with over 300 structures generated, are greater by one or two orders of magnitude, most of the minimum RMSD structures are significantly close to the lowest energy.

The yield of native-like structures greatly varies depending on the target protein. The native fold of 1utg (uteroglobin) is a very simple one with four relatively short {alpha} helices, and all the 100 selected structures are within 2 Å RMSD from the native structure. In contrast, only a handful of native-like structures were obtained for 2pcy which has a complex ß sandwich topology. In general, it seems to be more difficult to obtain native-like structures for proteins with a large number of long-range contacts.

A reason for the relatively low yield of native-like structure is the use of a simple simulated annealing method for the optimization. Since all the native-like structures with <2 Å RMSD exhibit low energy values, the restraints used are sufficient for specifying the native-like structures, but many structures are trapped in local minima during optimization. In fact, we observed that setting a high temperature in the initial phase of simulated annealing increased the yield of native-like structures. Therefore, the yield is expected to be even higher if we apply more powerful optimization techniques or improved algorithms.

As can be seen in Figure 1, CN and RWCO are highly correlated with each other. Are they both required to reconstruct the native structures? Performing calculations without using RWCO but following exactly the same protocol as above yielded a much smaller total number of native-like structures (Table 2, values before \). We obtained native-like structures only for small and/or simple proteins such as 1r69 [PDB] , 1utg, 256bA or 1ctf. The optimized structures for larger proteins such as 1mba [PDB] tended to form only relatively short-range contacts. Furthermore, even if the correct native structures were recovered, it was difficult to discriminate them by the penalty function. A slightly better, but qualitatively similar, result was obtained when CN was omitted in the calculations (Table 2, values after \). In this case, compared to the case without RWCO, the optimized structures tended to contain a comparable or smaller number of contacts, but of longer range. From these observations, we conclude that CN and RWCO contain complementary information required to accurately determine the native-like structures.


View this table:
[in this window]
[in a new window]
 
Table 2 Summary of 3d structures recovred from 1D structures without RWCO (values before \) or without CN (values after \)(cf. Table 1)

 
It is of interest to ask whether SS, CN and RWCO uniquely specify the native 3D structure of a protein (except for the mirror image). We expect such is the case, although we cannot give the definite conclusion based on the restraint-based, rather than constraint-based, method used in this study. All the optimized structures do satisfy the given 1D structural restraints to a certain extent, but those with high energies tend to contain significant distortions in their local geometry and large steric overlaps. Thus, given the native SS, CN and RWCO, the number of the structures consistent with these restraints as well as the ideal peptide chain geometry should be very limited. It should be noted that this argument probably applies only if the full-atom representation is used; otherwise there may exist non-native-like structures with low energy values.

Although we have performed a direct optimization of 3D structures by imposing 1D structural restraints it may be also possible to first reconstruct the contact map satisfying the 1D restraints and then recover the 3D structure from the contact map. In an initial phase of the present study, we applied a deterministic depth-first search algorithm similar to that of Porto et al. (2004). However, this method failed to converge. Since both CN and RWCO are accumulative quantities, there may not be any strategy to efficiently eliminate unsuccessful candidates in early stages of the search. Another possibility is applying a Monte Carlo method in contact map space. We have applied a variant of the multicanonical methods (Wang and Landau, 2001), but failed to find a solution exactly satisfying the 1D restraints. Nevertheless, for small proteins, the thus-obtained contact maps that best, but not exactly, satisfy the restraints contained at least 30–40% of the correct native contacts, and appeared similar to the native contact map by visual inspection. Therefore, it may be possible to use such contact maps to construct starting conformations for further optimizations.

Since the three types of 1D structures, SS, CN and RWCO, are sufficient for determining the native 3D structure, it is possible to predict the native structure of a protein if we can accurately predict these 1D structures. Methods for secondary structure prediction are now quite mature and are already routinely used in de novo 3D structure prediction (Rost, 2003). We have previously developed a method to predict CN from amino acid sequence to a decent accuracy with a correlation coefficient of 0.63 (Kinjo et al., 2005). We have recently developed a simple linear regression method for RWCO prediction which yields a moderate correlation of 0.59 between the predicted and native RWCOs (Kinjo and Nishikawa, 2005). At present, we do not expect that the native 3D structure can be obtained by using the predicted 1D structures: 1D predictions of higher accuracies must be achieved. Nevertheless, if the accuracies of 1D structure prediction are sufficiently improved, the missing link between amino acid sequence and the native 3D structure of globular proteins may be completed.


    Acknowledgments
 
We thank Takehiro Nagasima for the valuable comments. Most of the computations were carried out at the supercomputing facility of National Institute of Genetics, Japan. This work was supported in part by a grant-in-aid from the MEXT, Japan.

Received on October 12, 2004; revised on February 2, 2005; accepted on February 12, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 MATERIALS AND METHODS
 3 RESULTS AND DISCUSSION
 REFERENCES
 

    Berman, H.M., et al. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242[Abstract/Free Full Text].

    Havel, T.F. (1991) An evaluation of computational strategies for use in the determination of protein structure from distance constraints obtained by nuclear magnetic resonance. Prog. Biophys. Mol. Biol., 56, 43–78[CrossRef][Web of Science][Medline].

    Kabakçglu, A., et al. (2002) Statistical properties of contact vectors. Phys. Rev. E, 65, 041904[CrossRef].

    Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers, 22, 2577–2637[CrossRef][Web of Science][Medline].

    Kinjo, A.R. and Nishikawa, K. (2005) Predicting residue-wise contact orders of native protein structure from amino acid sequence. arXiv.org. q-bio.BM/0501015.

    Kinjo, A.R., et al. (2005) Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins, 58, 158–165[Medline].

    Kraulis, P.J. (1991) MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J. Appl. Cryst., 24, 946–950[CrossRef].

    Nakai, T., et al. (1993) Intrinsic nature of the three-dimensional structure of proteins as determined by distance geometry with good sampling properties. J. Biomol. NMR, 3, 19–40[Medline].

    Plaxco, K.W., et al. (1998) Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol., 277, 985–994[CrossRef][Web of Science][Medline].

    Porto, M., et al. (2004) Reconstruction of protein structures from a vectorial representation. Phys. Rev. Lett., 92, 218101[CrossRef][Medline].

    Rost, B. (2003) Prediction in 1D: secondary structure, membrane helices, and accessibility. In Bourne, P.E. and Weissig, H. (Eds.). Structural Bioinformatics, , Hoboken, USA Wiley-Liss, Inc., pp. 559–587.

    Vendruscolo, M., et al. (1997) Recovery of protein structure from contact maps. Fold. Des., 2, 295–306[CrossRef][Web of Science][Medline].

    Wang, F. and Landau, D.P. (2001) Efficient, multiple-range random walk algorithm to calculate the density of states. Phys. Rev. Lett., 86, 2050–2053[CrossRef][Web of Science][Medline].

    Weiner, S.J., et al. (1986) An all atom force field for simulations of proteins and nucleic acids. J. Comput. Chem., 7, 230–252[CrossRef].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
J. Song, H. Tan, K. Takemoto, and T. Akutsu
HSEpred: predict half-sphere exposure from protein sequences
Bioinformatics, July 1, 2008; 24(13): 1489 - 1497.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. M. Lisewski and O. Lichtarge
Rapid detection of similarity in protein structure and function through contact metric distances
Nucleic Acids Res., December 2, 2006; 34(22): e152 - e152.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2167    most recent
bti330v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kinjo, A. R.
Right arrow Articles by Nishikawa, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kinjo, A. R.
Right arrow Articles by Nishikawa, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?