Bioinformatics Advance Access originally published online on August 11, 2005
Bioinformatics 2005 21(18):3622-3628; doi:10.1093/bioinformatics/bti621
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A geometric invariant-based framework for the analysis of protein conformational space
1Kanwal Rekhi School of Information Technology, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
2Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
3Department of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
4Department of Chemical Engineering, University of Delaware Newark DE 19716, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: Characterization of the restricted nature of the protein local conformational space has remained a challenge, thereby necessitating a computationally expensive conformational search in protein modeling. Moreover, owing to the lack of unilateral structural descriptors, conventional data mining techniques, such as clustering and classification, have not been applied in protein structure analysis.
Results: We first map the local conformations in a fixed dimensional space by using a carefully selected suite of geometric invariants (GIs) and then reduce the number of dimensions via principal component analysis (PCA). Distribution of the conformations in the space spanned by the first four PCs is visualized as a set of conditional bivariate probability distribution plots, where the peaks correspond to the preferred conformations. The locations of the different canonical structures in the PC-space have been interpreted in the context of the weights of the GIs to the first four PCs. Clustering of the available conformations reveals that the number of preferred local conformations is several orders of magnitude smaller than that suggested previously.
Contact: pramodw{at}iitb.ac.in
Supplementary information: www.it.iitb.ac.in/~ashish/bioinfo2005/
| 1 INTRODUCTION |
|---|
|
|
|---|
Local conformations in protein structure have typically been subjected to three-way classification; helix, strand and loop. Helices and strands are characterized by the regularity in their backbone torsion angles whereas loops can potentially occupy a vast conformational continuum (Richardson, 1981). The Ramchandran plot was the first step in demarcating the feasible and infeasible regions of the conformational space even for loops (Ramchandran et al., 1963). Subsequently, it was shown that loop regions comprised repeating canonical structures (Milner-White, 1986; Sibanda and Thornton, 1985). More recently, loops have been systematically classified using automated procedures based on structural similarity (Oliva et al., 1997; Wintjens et al., 1996). Loop classification has important implications in interpreting the electron density maps (Sibanda et al., 1989) and in protein structure modeling (Bystroff et al., 2000).
The question of whether the protein local conformational space is restricted has been addressed more directly by Sims et al. (2005). They describe the backbone structure of an n-mer peptide by a set of 2n dihedral angles (i.e. the
and
angles). Dimension reduction and visualization of this conformational space on a 2D or a 3D plot shows clusters corresponding to the canonical classes of local conformation. As previously discussed (Tendulkar et al., 2003; Tendulkar et al., 2004), dihedral angles are related to the volume of the tetrahedron traced by the four atoms, and therefore are third order geometric invariants, which are highly sensitive to minor geometric perturbations. Thus, minor errors in assigning the atom positions can cause large errors in dihedral angles. Moreover, dimension reduction techniques, such as principal component analysis, are not applicable to the data vector consisting of
and
angles owing to the angular identity of 0° and 360°.
The current work is an attempt to provide a systematic framework for the analysis of protein structures using geometric invariant theory. Geometric invariant theory is a well-established field in its own right (Mumford et al., 1994; Weyl, 1939) with many applications in diverse areas (Assadi et al., 2001; Hruska, 2005). Previously, we described a fine-grain clustering of local structures by mapping in the geometric invariant (GI) space (Tendulkar et al., 2004). Our results had shown that the local structures are biased in favor of a finite number of conformations. In the present work, we present visualization of the allowed local conformational space by using geometric invariant theory. We address the following key issues in this paper: (1) visualization of the allowed regions in the conformational continuum; (2) correspondence between the dense regions in the conformational space and known conformational classes, such as
-helix, ß-strand, ß-hairpin; and (3) interpretation of the localization of the known conformational classes in the space spanned by the first four PCs.
| 2 METHODS |
|---|
|
|
|---|
2.1 The dataset of local conformations and computation of geometric invariants
The local conformations were drawn from the ASTRAL_95 dataset, version 1.67 (Brenner et al., 2000), resulting in
1.7 x 106 overlapping octapeptide fragments. We use only the C
atoms as an approximate representation of the backbone geometry (Oldfield and Hubbard, 1994; Tendulkar et al., 2004). The geometric invariants were computed from the x, y, z coordinates of the octapeptides. GI is a quantity that remains unchanged under a set of transformations, such as rotation and translation. The procedure for selection and computation of the GIs of octapeptide structures has been described earlier (Tendulkar et al., 2003; Tendulkar et al., 2004). The specific set of non-redundant geometric invariants that we use to describe an octapeptide backbone structure is listed in legend to Figure 4. The sensitivity of a geometric invariant gi to a perturbation in geometry lj can be nominally represented as
gi/
lj. Based on the sensitivity to perturbation in geometry, the GIs can be grouped into three classes: (1) First order invariants such as edge and perimeter, (2) second order invariants such as surface area and (3) third order invariants such as volume. Of the 29 GIs used in this work, 15 are first order, 6 are second order and 7 are third order geometric invariants.
|
2.2 Principal component analysis
Algebraically, principal components are particular linear combinations of the p random variables X1, X2,...,Xp, and depend solely on the covariance matrix of X1, X2,...Xp. The original data matrix [X] is an n x 29 matrix with one row per peptide and one column per geometric invariant. Standardization of [X] provides [Z] whose columns are mean-centric with standard deviation of one. Let [Z] have a covariance matrix
, with eigenvalueeigenvector pairs (
1, e1), (
2, e2),...(
29, e29) where
1
2
29. Principal components are obtained by transforming the standardized data matrix [Z] by the equation: [PC] = [Z] [e], where [e] = [e1, e2,..., e29]. It may be noted that although the matrix [X] contains dimensional quantities, such as volume and length, the matrices [Z] and [PC] contain non-dimensional quantities.
2.3 Clustering of peptide structures
The octapeptide structures were clustered in a space spanned by the first five principal components using a standard K-means clustering algorithm with K = 150 (Matlab, Mathworks Inc., USA). The cluster indices were assigned in the descending order of the cluster size. The cluster centroids were mapped on the conditional bivariate distribution to assign the separate cluster indices to the peaks visible in the plots (Fig. 5).
|
| 3 RESULTS |
|---|
|
|
|---|
3.1 Univariate marginal distributions of the geometric invariant values
The geometric invariants can be used to distinguish between the broad secondary structural categories, as can be seen from the marginal univariate distributions for the eight categories of GIs (Fig. 1). The distance related GIs, which include edges and perimeters, clearly separate the compact structures such as
-helices from the extended structures such as ß-strands (Fig. 1A1E). To exemplify, the values of d1,4 show a multimodal distribution with a sharp peak around 1.1 that corresponds to
-helices (Fig. 1A), a broad peak around +1.3 that corresponds to ß-strand structures and the rest of the distribution occupied by other structures. Similarly, the distribution of values of d1,8, perimeter of triangle1,5,8, perimeters of tetrahedron1,2,3,4 and tetrahedron1, 3,5,7 show multiple modes (Fig. 1B1E). Each of these distributions shows a sharp peak for
-helices in the negative region and a broad peak for ß-strands in the positive region. The distance related GIs show a positive correlation with one another, with correlation coefficient values ranging from 0.35 to 0.97, the highest correlation coefficient being between d1, 4 and perimeter of tetrahedron1,2,3,4. Thus, the distribution of the values of d1,4 and perimeter of tetrahedron1,2,3,4 are similar (Figs. 1A and 1D). This high degree of correlation mainly arises from the highly regular nature of
-helices and moderately regular nature of ß-strands; however, it is not expected for loops. It is, therefore, important to use all the invariants as they are complementary and can jointly distinguish between the different structures. For example, d1,8 shows a small but important peak around 2.0 which corresponds to ß-hairpins (Fig. 1B). None of the other length-related invariants, such as the perimeter of a tetrahedron, is capable of distinguishing between, say, a hairpin turn and a diverging turn.
|
The value of the area of triangle1,5,8 is related to the angle between the lines 15 and 58. Interestingly,
-helices make a collinear system, resulting in values of area that are close to 0. Tight turns, such as ß-hairpin, also result in values of area that are close to zero whereas ß-strands take intermediate values. Large values of area are observed for diverging turns and other loop structures. The distribution of the values of the area of triangle1,5,8 shows two peaks, one around 1.2 and the other around 0.0 (Fig. 1F). These peaks correspond to the
-helical and ß-strand structures with the extreme positive region being occupied mainly by diverging turns. Unlike the distribution of the length-related invariants, the ß-strand does not occupy the extreme positive region of the distribution of the area values.
The value of volume of the tetrahedron1,2,3,4 is related to the angle between the adjacent planes, with each plane made by the consecutive C
atoms. This angle has been previously referred to as a pseudo-dihederal angle (Oldfield and Hubbard, 1994). The sign of the volume can be used to distinguish between a right-handed system and a left-handed one. For example, the tetrahedron1,2,3,4 of a right-handed
-helix makes a right-handed system and leads to a positive value for the volume (Fig. 2). A ß-strand makes a left-handed to nearly coplanar system and gives rise to a small negative value for the volume. The distribution of values of the volume of tetrahedron1,2,3,4 captures this effectively with a sharp peak around +1.0 for helices and two broad peaks around 0.8 and 1.5 (Fig. 1G). Of these, the peak around 0.8 corresponds to ß-strands. In contrast to the tetrahedron1,2,3,4, the tetrahedron1,3,5,7 of a right-handed
-helix makes a left-handed system whereas the ß-strand makes a nearly coplanar system (Fig. 2). Thus, the distribution of values of the volume of tetrahedron1,3,5,7 shows a sharp peak in the negative region corresponding to the
-helices and a peak near 0.0 corresponding to ß-strands (Fig. 1H). With the volume of tetrahedron1,2,3,4 and tetrahedron1,3,5,7 together, we are able to distinguish between the right-handed and left-handed systems in the local conformations.
|
3.2 Interpretation of contributions of geometric invariants to principal components and location of conformations in PC space
It was of interest to analyze the separation of different conformations on individual principal components. To that end, we examined the univariate probability distributions of the first four PCs (Fig. 3). Note that the separation between the various structural conformations on individual PC's is a result of the contributions from different geometric invariants to the corresponding PC. In the eigenvector eI = [e1,i,...,ek,i,...,ep,i], the magnitude of eki measures the importance of the k-th GI to the i-th PC irrespective of the other GIs. Here we provide the analysis of separation of the conformations on individual PCs in light of the corresponding eigenvectors.
|
The first principal component (PC1) explains 51.38% of the variance in the data. The distribution of PC1 values shows a multimodal distribution with maximum separation between
-helices and ß-strands. The sharp peak around +5.5 corresponds to helices, whereas the broad peak around 5.2 corresponds to ß-strands (Fig. 3A). The eigenvector for PC1 shows positive weights for volumes and negative weights for the other geometric invariants. The highest negative weights were observed for perimeters of triangle1,5,8, tetrahedron1,3,5,7 and tetrahedron2,4,6,8, followed by that of tetrahedron3,4,5,6 (Fig. 4A). Thus, larger weights are observed for tetrahedra that span the entire length of the peptide and the tetrahedron that spans the middle portion of the peptide. Among volumes, larger weights were observed for the volume of the central tetrahedron, i.e. tetrahedron3,4,5,6. The length-related invariants have a large positive value for extended structures, such as ß-strands, and a large negative value for compact structures, such as helices (Fig. 1A1E). However, the volumes of tetrahedra traced by consecutive C
atoms have a positive value for right-handed
-helices and a negative or zero value for left-handed and planar structures such as ß-strands and loops (Fig. 1G). PC1value provides a quick sense of extended/compact nature and handedness of the local conformation. Thus, based on the PC1 weights,
-helices are expected to occupy the extreme positive end of PC1 distribution while ß-strands occupy the extreme negative end. Other structures are located between these extremes.
PC2 explains
10% of variance in the data. PC2 values show an essentially unimodal distribution with the regular secondary structures, such as
-helices and ß-strands, taking values close to 0, whereas the irregular structures take positive and negative values (Fig. 3B). Thus, PC2 separates the regular secondary structures, such as
-helices and ß-strands, from the irregular ones. The corresponding eigenvector receives positive contributions from the length-related invariants of the tetrahedron at the N-terminus, whereas negative contributions are received from those at C-terminus (Fig. 3B). For example, the top contributors with positive weight are perimeters of tetrahedron1,2,3,4 and tetrahedron2,3,4,5, and edges d1,4 and d2,5, whereas those with negative weights are perimeters of tetrahedron5,6,7,8 and tetrahedron4,5,6,7, and edges d5,8 and d4, 7. The volumes of the tetrahedron1,2,3,4 and tetrahedron2,3,4,5 have negative weights whereas those of tetrahedron5,6,7,8 and tetrahedron4,5,6,7 have positive weights. The weights associated with the volume and perimeter of a given tetrahedron are of opposite sign owing to the negative correlation between them. Thus, for a regular structure, the values for each of the above mentioned invariants from N- to C-terminus will be similar. A value of zero for PC2 indicates regularity in the structure of the octapeptide, examples being
-helices and ß-strands. For
-helical conformations, an irregularity or extended structure in the N-terminus leads to positive PC2 values whereas an irregularity in the C-terminus leads to negative PC2 values. The opposite trend is observed for ß-strands where an irregularity or compacting of structure in the N-terminus leads to negative values whereas an irregularity in the C-terminus leads to positive values for PC2.
PC3 explains 6.12% of the variance in the data. The distribution of PC3 values is unimodal with the peak at 1.0 corresponding to
-helices (Fig. 3C). The ß-strand structures take values between 2.2 and 0.0. The majority of the irregular structures occupy the positive region in the PC3 distribution. The eigenvector for PC3 has positive contributions from the area of triangle1,5,8, volumes of tertadedron1,2,3,4 and tetrahedron5,6,7,8, and edges d1,4 and d5,8; and negative contributions from edges d1,8, d1,7, d2,8, volumes of the terahedra1,2,3,4 and5,6,7,8 and perimeter of tetrahedron3,4,5,6, tetrahedron4,5,6,7, tetrahedron1,3,5,7 and tetrahedron2,4,6,8 (Fig. 4C). Thus, diverging turns are expected to take a large positive value on PC3 whereas extended structures are expected to take a large negative value. PC4 explains 5.43% of the variance in the data. The PC4 values mostly show a bimodal distribution with a sharp peak around 0.0 corresponding to the
-helix and the broad peak around 0.5 corresponding to the ß-strand (Fig. 3D). Thus PC4 provides some separation between these two regular structures. PC4 is a weighted sum of length-related invariants, with positive weights for invariants of the central tetrahedra, such as 3,4,5,6 and 2,3,4,5, and negative weights for d1,8, d1,7, d2,8 and perimeter of triangle1,5,8 (Fig. 4D). Thus, extended structures are expected to take large negative values on PC4 whereas loops are expected to take positive values. The interpretation of the contributions to the fifth and subsequent PCs is complicated.
3.3 Visualization of the conformational space as conditional bi-variate plots in PC space
Our goal is to visualize the allowed and disallowed regions in the local conformational space spanned by the first four PCs. Direct visualization of the probability distribution in four dimensions is, of course, infeasible; we have, therefore, chosen to visualize the allowed conformational space in the form of conditional bivariate plots as shown in Figure 5. Note that PCA provides orthogonal but non-independent basis, which allows us to capture the distribution of peptide conformations in the form of conditional probability distribution plots. Observe that the plots show regions with relatively high density and regions that are either empty or sparsely populated (Fig. 5). These two types of regions correspond to the preferred conformations and the disallowed conformations. We clustered the data in the PC space with a K-means clustering algorithm (K = 150), with members of a cluster representing geometrically similar octapetide structures. Note that the closeness of two peptides in the geometric invariant space is sufficient to guarantee that the peptides are superimposable, without having to compute the superimposing transformation (Tendulkar et al., 2004). Furthermore, we verified that this holds true in the reduced-dimensional PC-space. We found a one-to-one correspondence between the peaks in the conditional bivariate plots and the clusters obtained by K-means clustering.
The peaks vary substantially in terms of their height and sharpness (Fig. 5).
A peak represents a preferred conformation whereas the volume under the peak is proportional to the number of octapeptide fragments taking up this preferred conformation. The breadth of a given peak along the first two PCs is a measure of tolerance for structural perturbation around the preferred conformation. The peaks are well separated in some bivariate distributions, such as Figure 5I5K and Figure 5M5O, and not so well separated in others. The panels corresponding to well-separated peaks mainly consist of regular secondary structures, such as
-helix, ß-strand and their variants, with perturbations on either end of the octapeptide.
3.3.1 Location of regular secondary structures and their variants in the PC space
The largest peak (peak index 1) corresponds to the right-handed
-helix (Fig. 5I) whereas the third largest peak (peak 3) corresponds to ß-strand structures (Fig. 5M). The peak for the
-helix structures is much sharper than that for ß-strands (Fig. 5I and 5M). A wider spread along PC1 and PC2 for the ß-strand peak implies a greater tolerance of ß-strands for structural perturbations. Variants of the
-helix and the ß-strand are concentrated in bivariate distributions of Figure 5J, 5N and 5O, with a few additional examples found in Figure 5B, 5E and 5F. This is a result of a small perturbation in the structure, which produces different values in the third and fourth PCs. It is interesting to note that the differences in the geometry of the perturbed region give rise to differences in the peak locations. For example, peak 4 (Fig. 5N) corresponds to a helix with a loop in the N-terminus region. Peak 4 shows a shift in all four PCs in comparison with peak 1, the peak for a regular
-helix. Specifically, for peak 4 (Fig. 5N), the PC1 value is smaller than that for peak 1, indicating reduced compactness, whereas the PC2 value is positive compared to a zero value for peak 1, indicating a deviation from regularity in the N-terminus region. Peak 12 (Fig. 5O) takes a smaller positive PC1 value and a larger positive PC2 value compared with the respective values for peak 4 (Fig. 5N). Thus, peak 12 indicates a greater perturbation from regularity in the N-terminus region than that of peak 4. This is in agreement with secondary structure assignments for the structure corresponding to these two peaks, with peak 4 being L1H7 (one amino acid residue in loop and seven residues in
-helix), and peak 12 being L2H6. However, peaks 12 and 30 (Fig. 5K and 5O) share exactly the same secondary structure assignment of L2H6, but differ in their loop regions. Peak 12 corresponds to a diverging turn, whereas peak 30 corresponds to a compact turn in the N-terminus region. Thus, peak 30 takes a larger positive PC1 value than that of peak 12. Similarly, peaks 55 and 85 (Fig. 5B and 5F) share exactly the same secondary structure assignment of L3H5 but differ in their peak positions owing to differences in their loop regions. Deviations in the C-terminus region of a helix result in a negative PC2 value, as exemplified by peaks 19 and 14 (Fig. 5O and 5K). The secondary structure assignments for peaks 19 and 14 are H6L2 and H5L3, respectively.
It is well known that in addition to structural deviations in the N- and C-terminus regions, helices also show a deviation in the middle portion. These are conventionally known as kinked helices (Richardson, 1981). We find that peak 75 (Fig. 5E) corresponds to a kinked helix with a small shift in the PC1 and PC4 values compared with those for peak 1 (Fig. 5I). The PC2 value does not deviate from that for peak 1, as the kinked helix is likely to be symmetric on both sides of the kink.
Examples of deviations in ß-strand structures are visible in peaks 79 (Fig. 5O) and 44 (Fig. 5J). Although both peaks correspond to a secondary structure assignment of B4L4, peak 79 is a diverging turn, whereas peak 44 has a relatively compact turn in the C-terminus. This results in a smaller negative PC1 value for peak 79. Similarly, peak 38 shows a deviation in the N-terminus region and thus takes a negative PC2 value. Other instances of the deviations in ß-strand structures are not as notable as those in the
-helix. This may be potentially because of the broad nature of the peak for ß-strands (Fig. 5M).
3.3.2 Location of loops in PC space
The peaks in some of the bivariate distributions such as Figure 5A5H, 5K and 5L, are not well separated. These peaks are separated better when the bivariate distributions are conditioned on additional principal components, such as PC5 (data not shown). These bivariate distributions are mainly dominated by loops, such as helixloophelix, ß-strandloophelix. Based on the weight matrix for the principal components, it is expected that compact loops (e.g. ß-hairpin, helix-hairpin) take a positive PC1 value whereas extended loops (e.g. diverging beta-turns) take a zero or a negative value. Moreover, right-handed turns take a larger positive PC1 value than their left-handed counterparts. This is attributed to the significant contribution of the volume of tetrahedra to PC1. The volumes take a positive value for right-handed turns. For example, peaks 85 (Fig. 5F), 56 (Fig. 5H), 37 (Fig. 5D) and 26 (Fig. 5C) all correspond to compact loop structures. However, peaks 25 (Fig. 5E), 41 (Fig. 5J) and 97 (Fig. 5G) are examples of diverging turns. We find several peaks with identical secondary structure nomenclature with different peak locations. For example, peaks 86, 95 (Fig. 5L) and 37 (Fig. 5D) share a secondary structure assignment of B2L5B with variations in their loop structures. These and other peaks of loops correspond to the reported loop conformations (Espadaler et al., 2004; Milner-White, 1986; Oliva et al., 1997).
| 4 DISCUSSION |
|---|
|
|
|---|
It has been previously reported that the protein local conformation space is highly restricted. Visualizing this conformational space has remained a challenge, however, mainly because of the need for pairwise comparison and alignment of structures (Wangikar et al., 2003) and owing to lack of unilateral structure descriptors. Recent reports by Sims et al. (2005) and Ikeda et al. (2005) have been positive steps toward visualization of the conformational space. Both these reports essentially use a set of geometric invariants as unilateral descriptors of local conformation. Sims et al. use only the third order geometric invariants (i.e. a set of dihedral angles) whereas Ikeda et al. use only the first order geometric invariants (i.e. a set of distances). We used a collection of geometric invariants with some invariants of the first order, such as lengths, some of second order, such as area and some of third order, such as volume. The GIs of different orders are fairly uncorrelated to one another and thus are non-redundant. Furthermore, the GIs of different orders differ in sensitivity to perturbations in the geometrical structure. For example, the first order GIs, such as distances are insensitive to the handedness and thus result in identical values for non-superimposable mirror images. Thus, it is important to select carefully the geometric invariants that can uniquely describe a geometric structure and yet minimize the redundancy in this description by omitting highly correlated geometric invariants.
The distribution of the conformations in the geometric invariant-based PC space provides a visual map of the allowed and disallowed conformations. Furthermore, various other known canonical structures such as, N-capping helices (Aurara et al., 1994) and loops (Milner-White, 1986; Oliva et al., 1997; Wintjens et al., 1996), are well separated in the space spanned by the first four PCs. Note that the weights of the geometric invariants for PCs were automatically determined by the PCA methodology to obtain maximum separation between the major canonical structures, without the benefit of such structural knowledge. This implies that we have been able to select a suite of geometric invariants that provides an adequate description of the C
-geometry. Thus, the strategy presented here of computing geometric invariants and then reducing the dimensions via PCA can be directly used by structural biologists for various kinds of structure analysis.
The method presented here can be applied for visualizing the local conformational space by using a peptide of arbitrary length as a unit of local conformation. Thus, even though the observed distribution of structures in the PC space is dependent on the peptide length and the selected geometric invariants, we envisage that the conclusion about local conformational space being restrictive will remain unchanged as long as a reasonable suite of geometric invariants is selected for a reasonable peptide length.
The method presented here has potential applications in protein structure prediction and validation. The current protein structure prediction algorithms search a vast protein conformational space using a computationally expensive energy minimization protocol (Sali and Blundell, 1990; Tramontano, 1998). Visualizing the allowed and disallowed regions in the conformational space provides a useful method for eliminating the disallowed conformations with significant savings in computational time. Moreover, the peak size in the distribution is indicative of the likelihood of the structure occurring in a randomly selected natural protein. This can be useful in checking the integrity of both predicted and experimentally deduced structures. Furthermore, it would be of interest to see the distributions of local conformations for proteins made up of unnatural or D-amino acids. It is envisaged that these proteins would take up conformations typically forbidden for proteins made up of natural L-amino acids.
| Acknowledgments |
|---|
Authors acknowledge the useful discussions with Sunita Sarawagi of Indian Institute of Technology, Bombay. This work was partially funded by a grant from the Council of Scientific and Industrial Research, Government of India. AVT gratefully acknowledges support from the Infosys Fellowship.
Conflict of Interest: none declared.
Received on June 16, 2005; revised on August 5, 2005; accepted on August 8, 2005
| REFERENCES |
|---|
|
|
|---|
Assadi, A., et al. (2001) A learning theoretic approach to perceptual geometry in natural scenes. Neurocomputing, 38, 10771085[CrossRef].
Aurara, R., et al. (1994) Rules for alpha-helix termination by glycine. Science, 264, .
Berman, H.M., et al. (2000) The protein data bank. Nucleic Acids Res., 28, 235242
Brenner, S.E., et al. (2000) The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res., 28, 254256
Bystroff, C., et al. (2000) HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J. Mol. Biol., 301, 173190[CrossRef][Web of Science][Medline].
Espadaler, J., et al. (2004) ArchDB: automated protein loop classification as a tool for structural genomics. Nucleic Acids Res., 32, D185D188
Hruska, G.C. (2005) Geometric invariants of spaces with isolated flats. Topology, 44, 441458[CrossRef].
Ikeda, K., et al. (2005) Visualization of conformational distribution of short to medium size segments in globular proteins and identification of local structural motifs. Protein Science, 14, 12531265[CrossRef][Medline].
Milner-White, E.J. (1986) Classification of beta-hairpin turns. Biochem. Soc. Trans., 14, 877.
Mumford, D., Fogarty, J., Kirwan, F. Geometric invariant theory, (1994) , NY Springer.
Oldfield, T.J. and Hubbard, R.E. (1994) Analysis of c-alpha geometry in protein structures. Proteins, 18, 324337[Medline].
Oliva, B., et al. (1997) An automated classification of the structure of protein loops. J. Mol. Biol., 266, 814830[CrossRef][Web of Science][Medline].
Ramchandran, G.N., et al. (1963) Conformation of polypeptides and proteins. J. Mol. Biol., 7, 9599[Web of Science][Medline].
Richardson, J.S. (1981) The anatomy and taxonomy of protein structure. Advan. Protein Chem., 34, 167339[Medline].
Sali, A. and Blundell, T.L. (1990) Definition of general topological equivalence in protein structures: a procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. J. Mol. Biol., 212, 403428[CrossRef][Web of Science][Medline].
Sibanda, B.L., et al. (1989) Conformation of [beta]-hairpins in protein structures : a systematic classification with applications to modelling by homology, electron density fitting and protein engineering. J. Mol. Biol., 206, 759777[CrossRef][Web of Science][Medline].
Sibanda, B.L. and Thornton, J.M. (1985) Beta-hairpin families in globular proteins. Nature, 316, 170175[CrossRef][Medline].
Sims, G.E., et al. (2005) Protein conformational space in higher order
-
maps. Proc. Natl Acad. Sci. USA, 102, 618621
Tendulkar, A.V., et al. (2004) Clustering of protein structural fragments reveals modular building block approach of nature. J. Mol. Biol., 338, 611629[CrossRef][Web of Science][Medline].
Tendulkar, A.V., et al. (2003) Parameterization and classification of the protein universe via geometric techniques. J. Mol. Biol., 334, 157172[CrossRef][Web of Science][Medline].
Tramontano, A. (1998) Homology modeling with low sequence identity. Methods, 14, 293300[CrossRef][Medline].
Wangikar, P.P. (2003) Functional sites in protein families uncovered via an objective and automated graph theoretic approach. J. Mol. Biol., 326, 955978[CrossRef][Web of Science][Medline].
Weyl, H. The classical groups, their invariants and representations, (1939) , Princeton Princeton University Press.
Wintjens, R.T. (1996) Automatic classification and analysis of [
][
]-turn motifs in proteins. J. Mol. Biol., 255, 235253[CrossRef][Web of Science][Medline].
This article has been cited by other articles:
![]() |
G. E. Sims and S.-H. Kim A method for evaluating the structural quality of protein models by using higher-order {varphi}-{psi} pairs scoring PNAS, March 21, 2006; 103(12): 4428 - 4432. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





