Skip Navigation


Bioinformatics Advance Access originally published online on August 29, 2006
Bioinformatics 2006 22(21):2695-2696; doi:10.1093/bioinformatics/btl461
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/21/2695    most recent
btl461v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Google Scholar
Right arrow Articles by Grant, B. J.
Right arrow Articles by Caves, L. S. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Grant, B. J.
Right arrow Articles by Caves, L. S. D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Bio3d: an R package for the comparative analysis of protein structures

Barry J. Grant 1,*, Ana P. C. Rodrigues 2, Karim M. ElSawy 3, J. Andrew McCammon 1,4 and Leo S. D. Caves 3

1 Department of Chemistry and Biochemistry, University of California, San Diego La Jolla, CA 92093, USA
2 The Burnham Institute for Medical Research, La Jolla CA 92037, USA
3 Department of Biology, University of York York YO10 5YW, UK
4 Howard Hughes Medical Institute University of California, San Diego, La Jolla, CA 92093, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 COMPARATIVE ANALYSIS OF...
 3 SUMMARY
 4 CONCLUSIONS AND PERSPECTIVES
 REFERENCES
 

Summary: An automated procedure for the analysis of homologous protein structures has been developed. The method facilitates the characterization of internal conformational differences and inter-conformer relationships and provides a framework for the analysis of protein structural evolution. The method is implemented in bio3d, an R package for the exploratory analysis of structure and sequence data.

Availability: The bio3d package is distributed with full source code as a platform-independent R package under a GPL2 license from: http://mccammon.ucsd.edu/~bgrant/bio3d/

Contact: bgrant{at}mccammon.ucsd.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 COMPARATIVE ANALYSIS OF...
 3 SUMMARY
 4 CONCLUSIONS AND PERSPECTIVES
 REFERENCES
 
The detailed comparison of homologous protein structures can be used to infer pathways for evolutionary adaptation and, at closer evolutionary distances, mechanisms for conformational change. Traditionally, such investigations have involved careful visual inspection combined with structural alignment methods. These procedures are both time consuming and labor intensive, and require expert insight into the systems studied. With the growing number of determined protein structures, the availability of automatic procedures for analyzing the differences and similarities between structures becomes increasingly desirable.

The bio3d package contains utilities to process, organize and explore structure and sequence data. Features include the ability to read and write structure, sequence and dynamic trajectory data, perform atom summaries, atom selection, re-orientation, superposition, rigid core identification, clustering, distance matrix analysis, structure and sequence conservation analysis, and principal component analysis (PCA). Bio3d takes advantage of the extensive graphical and statistical capabilities of the R environment (R development core team, 2006; http://www.R-project.org), and thus represents a useful framework for exploratory analysis of structural data.


    2 COMPARATIVE ANALYSIS OF PROTEIN STRUCTURES WITH Bio3d
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 COMPARATIVE ANALYSIS OF...
 3 SUMMARY
 4 CONCLUSIONS AND PERSPECTIVES
 REFERENCES
 
The bio3d package employs refined structural superposition and PCA to examine the relationship between different conformers. Conventionally, structural superposition of protein structures minimizes the root mean square difference between their full set of equivalent residues. However, for the current application such a superposition procedure can be inappropriate. For example, in the comparison of a multi-domain protein that has undergone a hinge-like rearrangement of its domains, standard ‘all atom’ superposition would result in an underestimate of the true atomic displacement by attempting superposition over all domains (whole structure superposition). A more appropriate and insightful superposition would be anchored at the most invariant region and hence more clearly highlight the domain rearrangement (sub-structure superposition). To avoid such problems, the current protocol includes an iterated superposition procedure, where residues displaying the largest positional differences are excluded at each round until only the invariant ‘core’ residues remain (Gerstein and Altman, 1995).

Following core identification and subsequent superposition, PCA is employed to examine the relationship between different conformers/structures based on their equivalent residues. The application of PCA to both distributions of experimental structures and Molecular Dynamics trajectories, along with its ability to provide considerable insight into the nature of conformational differences in a range of protein families and other biomolecules, has been discussed previously (Abseher et al., 1998; Caves et al., 1998; Elsawy et al., 2005; van Aalten et al., 1997). Briefly, the resulting principal components (orthogonal eigenvectors) describe the axes of maximal variance of the distribution of structures. Projection of the distribution onto the subspace defined by the largest principal components results in a lower dimensional representation of the structural dataset. The percentage of the total mean square displacement (or variance) of atom positional fluctuations captured in each dimension is characterized by their corresponding eigenvalue. Experience suggests that 3–5 dimensions are often sufficient to capture over 70% of the total variance in a given family of structures. Thus, a handful of principal components are sufficient to provide a useful description while still retaining most of the variance in the original distribution. These low-dimensional representations, here termed ‘conformer plots’, succinctly display the relationships between different conformers, highlight the major differences between structures and enable the interpretation and characterization of multiple interconformer relationships (see example conformer plot, Fig. 1).


Figure 1
View larger version (49K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Results of PCA on the kinesin molecular motor using standard Euclidean distance. (a) Conformer plot: Projection of the kinesin X-ray structures (circles) and transient MD conformers (shaded density contours) onto the principal planes obtained from analysis of all kinesin X-ray structures. (b) Eigenvalue spectrum: Results obtained from diagonalization of the atomic displacement correlation matrix of C{alpha} atom coordinates from the kinesin crystal structures. Inset shows histograms for the projection of the distribution of structures onto the first six principal components. (c)Interpolation: Front and back views of the kinesin motor domain, with the first principal component represented as equidistant atomic displacements from the mean structure. Displacements are scaled by two times the standard deviation of the distribution along the first principal component. Molecular figure was generated using VMD (Humphrey et al., 1996).

 
To further aid interpretation, a graphic ‘trajectory’ can be produced that interpolates between the most dissimilar structures in the distribution, as determined from the conformer plots. This involves dividing the difference between the conformers into a number of evenly spaced steps along the principal components, forming the frames of the trajectory. Such trajectories can be directly visualized in a molecular graphics program, such as VMD (Humphrey et al., 1996). Furthermore, the interpolated structures can be analyzed for possible domain and shear movements with the DynDom package (Hayward and Berendsen, 1998), or used as initial seed structures for more advanced reaction path refinement methods such as Conjugate Peak Refinement (Fischer and Karplus, 1992).


    3 SUMMARY
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 COMPARATIVE ANALYSIS OF...
 3 SUMMARY
 4 CONCLUSIONS AND PERSPECTIVES
 REFERENCES
 
The bio3d comparative analysis results are in good agreement with descriptions established by human experts (Grant, 2004). In addition, the tools provide quantitative and visual information allowing for a more complete appreciation of interconformer relationships. Access to the open source software, full documentation, quick start guide and example data are available at http://mccammon.ucsd.edu/~bgrant/bio3d/


    4 CONCLUSIONS AND PERSPECTIVES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 COMPARATIVE ANALYSIS OF...
 3 SUMMARY
 4 CONCLUSIONS AND PERSPECTIVES
 REFERENCES
 
The structure comparison procedures described here should facilitate the examination of diverse protein families, helping to identify common structural and dynamic features. Such analysis of structural homologues can provide invaluable conformational landmarks useful for assessing both new crystallographic structures and the results of theoretical methods. More generally, the current analysis methods may prove valuable to any study where knowledge of backbone flexibility must be modeled. For example, in flexible protein–protein docking and the generation of homology models where sampling along identified principal components may generate plausible alternative conformations. Another important area of research is deciphering possible networks of communication within proteins and, in particular, understanding allosteric mechanisms that appear to be preserved in distant relatives. Theoretical studies combined with comparative analysis of structural homologues are an initial step in this direction.


    Acknowledgments
 
We would like to thank members of the Caves and McCammon groups for fruitful and entertaining discussions. This work was supported in part by the National Institutes of Health, National Science Foundation, the Howard Hughes Medical Institute, the National Biomedical Computation Resource and the National Science Foundation Center for Theoretical Biological Physics. Funding to pay the Open Access publication charges was provided by The Howard Hughes Medical Institute.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Anna Tramontano

Received on July 18, 2006; revised on August 22, 2006; accepted on August 23, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 COMPARATIVE ANALYSIS OF...
 3 SUMMARY
 4 CONCLUSIONS AND PERSPECTIVES
 REFERENCES
 

    Abseher, R., et al. (1998) Essential spaces defined by NMR structure ensembles and molecular dynamics simulation show significant overlap. Proteins, 31, 370–382[CrossRef][ISI][Medline].

    Caves, L.S.D., et al. (1998) Locally accessible conformations of proteins: multiple molecular dynamics simulations of crambin. Protein Sci, . 7, 649–666[Abstract].

    Elsawy, K.M., et al. (2005) The physical determinants of the DNA conformational landscape. Nucleic Acids Res, . 33, 5749–5762[Abstract/Free Full Text].

    Fischer, S. and Karplus, M. (1992) Conjugate peak refinement: an algorithm for finding reaction paths and accurate transition states in systems with many degrees of freedom. Chem. Phys. Lett, . 194, 252–261[CrossRef].

    Gerstein, M. and Altman, R.B. (1995) Average core structures and variability measures for protein families: application to the immunoglobulins. J. Mol. Biol, . 251, 161–175[CrossRef][ISI][Medline].

    Grant, B.J. (2004) Kinesin sequence, structure and dynamics. PhD Thesis. , York, UK University of York.

    Hayward, S. and Berendsen, H. (1998) Systematic analysis of domain motions in proteins from conformational change: new results on citrate synthase and T4 lysozyme. Proteins, 30, 144–154[CrossRef][ISI][Medline].

    Humphrey, W., et al. (1996) VMD: visual molecular dynamics. J. Mol. Graph, . 14, 33–38[CrossRef][ISI][Medline].

    R Development Core Team. (2006) R: a language and environment for statistical computing. Vienna, Austria.

    van Aalten, D.M.F., et al. (1997) Protein dynamics derived from clusters of crystal structures. Biophys. J, . 73, 2891–2896[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Biophys. JHome page
A. A. Gorfe, C.-e. A. Chang, I. Ivanov, and J. A. McCammon
Dynamics of the Acetylcholinesterase Tetramer
Biophys. J., February 15, 2008; 94(4): 1144 - 1154.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/21/2695    most recent
btl461v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Google Scholar
Right arrow Articles by Grant, B. J.
Right arrow Articles by Caves, L. S. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Grant, B. J.
Right arrow Articles by Caves, L. S. D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?