Bioinformatics Advance Access originally published online on June 2, 2005
Bioinformatics 2005 21(15):3316-3317; doi:10.1093/bioinformatics/bti523
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PROVAT: a tool for Voronoi tessellation analysis of protein structures and complexes
Department of Biochemistry, University of Cambridge 80 Tennis Court Road, Cambridge, CB2 1GA, UK
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: Voronoi tessellation has proved to be a useful tool in protein structure analysis. We have developed PROVAT, a versatile public domain software that enables computation and visualization of Voronoi tessellations of proteins and protein complexes. It is a set of Python scripts that integrate freely available specialized software (Qhull, Pymol etc.) into a pipeline. The calculation component of the tool computes Voronoi tessellation of a given protein system in a way described by a user-supplied XML recipe and stores resulting neighbourhood information as text files with various styles. The Python pickle file generated in the process is used by the visualization component, a Pymol plug-in, that offers a GUI to explore the tessellation visually.
Availability: PROVAT source code can be downloaded from http://raven.bioc.cam.ac.uk/~swanand/Provat1, which also provides a webserver for its calculation component, documentation and examples.
Contact: swanand{at}cryst.bioc.cam.ac.uk
| 1 INTRODUCTION |
|---|
|
|
|---|
Voronoi tessellation has emerged as an effective tool in protein structure analysis (Poupon, 2004). It is essentially a space-filling division of space among given points (sites), assigning a (un/bounded) convex, polyhedral region of space to each. Previously it has been used for estimation of atomic volumes, detection of cavities, analysis of folds, derivation of statistical potentials and so on. A major attraction of Voronoi tessellation is the robust neighbouhooddefinition between sites.
A computer application of Voronoi tessellation should be versatile and flexible. It should allow choice of sites at different levels of detail, e.g. one site per residue, one site each for sidechain and mainchain and one site per atom, in order to facilitate the study of packing. It should be possible to solvate the structure using several different approaches and to detect the missing atoms so that erroneous volumes and neighbour relationships are not obtained, but the program should also be capable of tolerating missing atoms, for example in C
-only models. Thus generating sites is not trivial. Moreover, visualization of tessellation is important for the obvious benefits associated with it.
While these issues can be tackled individually with specialized tools, this approach is tedious to calculate and visualize the tessellation of the desired kind. It is difficult to experiment with different tessellation strategies using this approach. PROVAT provides a flexible tessellation pipeline that can cope with all these challenges. Site extraction, missing atoms checks, solvation and style of contact description file can be altered at the command-line or by a webserver. Visualization can be performed on all platforms that execute Pymol, with a versatile plug-in.
| 2 PROGRAM OVERVIEW |
|---|
|
|
|---|
PROVAT defines a metasite as any group of atoms, e.g. a peptide group, an aromatic ring, a C
atom etc. Coordinates of member atoms in a metasite can be averaged to produce a site for tessellation or coordinates of each member atom can be considered as a site, in which case metasite polyhedra may be non-convex. Metasites can be assigned a physicochemical character, which can be later used for grouping metasites together and for colouring faces etc. Metasites of interest are specified in an XML file called as a tessellation recipe. A reference frame consisting of three atoms can be specified for each metasite, so that orientation of neighbouring metasites can be output if desired, in addition to the area of shared faces.
PROVAT detects missing protein backbone atoms and missing intermediate residues by checking the distance between consecutive C
atoms in the same chain. Missing sidechain atoms are detected and can be corrected with Scwrl (Canutescu et al., 2003). All quality checks can be turned off if desired.
PROVAT offers two ways to solvate the given structure. The first option uses the molecular dynamics package Gromacs (Lindahl et al., 2001); this is recommended for high quality structures that contain all the expected atoms. A second simpler way is to construct solvent atoms (oxygens only, all hydrogens are ignored) on a cubic grid around the protein, parameterized with the minimum distance between any pair of solvent atoms (dll) and the minimum distance between any pair of solvent and protein atoms (dpl). This approach has been validated by Zimmer et al. (1998) with dll and dplrecommended as 3 and 4 Å respectively, though these values areadjustable in PROVAT.
PROVAT extracts sites from the structure according to a tessellation recipe and uses Qhull (Barber et al., 1996) to compute tessellation. Various styles (all neighbours with/out orientation, neighbours grouped by physicochemical nature, exposed surface) are provided for storing the neighbourhood information thus derived.
For visualization, the Python object containing tessellation information is pickled. Pickling is an object serialization mechanism in Python for object persistence. The pickled object is later read into the visualization module, which is a Pymol (DeLano, 2002) plug-in. Since Pymol is a popular cross-platform molecular visualization software, it is possible to render tessellations along with usual Pymol features to explore the structure, which might not be possible with a stand-alone visualizer. With the PROVAT plug-in it is possible to render (and store as text file) Voronoi regions associated with metasites, their neighbours, solvent-exposed protein surface, interaction surface between protein and another protein/ligand/DNA, sticks representation of neighbours and so on for metasites of interest in the structure. Transparency of faces, colours and stick width can be altered. It is possible to ignore faces between covalently bonded metasites while writing neighbourhood information files and visualizing surfaces. Lists of metasites can be constructed for selective visualization of surfaces, polyhedra and interfaces.
We have chosen flexibility over speed in developing PROVAT. For a 300 residue system, it takes about 15 s to run PROVAT on an Intel Pentium 4, 3 GHz PC with 512 MB RAM running Suse Linux 9.2. PROVAT has been developed on a Linux platform owing to its popularity in the bioinformatics community. Due to decoupling of calculation and visualization components, command-line execution for large-scale analysis of structures is possible.
| 3 ILLUSTRATION |
|---|
|
|
|---|
Here we illustrate PROVAT visualization features with PDB 1A1F [PDB] , in which three zinc finger motifs are complexed with DNA (Fig. 1). The tessellation recipe used here has protein atoms grouped among metasites with five physicochemical types, partially positive (green), partially negative (red), hydrophobic (grey), aromatic (yellow) and solvent (blue). Phosphate oxygens in DNA are assigned as partially negative, whereas separate metasites are defined for sugars, bases and zinc. The exposed surface of the system, generated with PROVAT, is shown in Figure 1a. The interface between protein and DNA is shown in Figure 1b. Using PROVAT, the complex environment of a residue (His-153 here) can be visualized with its polyhedron coloured according to the physicochemical nature of neighbours (Fig. 1c).
|
The hydrogen bond network can be viewed as sticks between spatially adjacent metasites of mainchain carbonyl and amide groups (Fig. 1d). Specific interactions between protein and DNA can be viewed between adjacent metasites of protein sidechains and DNA bases (Fig. 1e). Similarly, non-specific interactions can be viewed between adjacent metasites of mainchain carbonyls/amides and DNA phosphate oxygens (Fig. 1f). More illustrations and usage instructions for PROVAT can be found online.
| 4 CONCLUSION |
|---|
|
|
|---|
Modularity and flexibility of PROVAT distinguishes it from a similar tool like Voro3D (Dupuis et al., 2005). With the versatility of PROVAT, many interesting analyses are possible by calculating unambiguous neighbourhood descriptions based on the Voronoi definition. By clustering the data generated with PROVAT, packing preferences can be derived at various levels of granularity. Orientation sensitivity can be added to them by providing appropriate local reference frames to PROVAT. These preferences are interesting in their own right from a physicochemical viewpoint. In a typical protein structure prediction exercise, such preferences derived to match the coarseness of protein models may be useful to refine and score the models. In evolutionary analysis of protein structures, packing preferences may assist the identification of functional sites. PROVAT-enabled analysis of interactions of proteins with other proteins, ligands and DNA can provide valuable insights. In conclusion, we believe that PROVAT will facilitate new perspectives on macromolecular structures.
| Acknowledgments |
|---|
S.G. would like to thank the Cambridge Commonwealth Trust and Overseas Research Studentship for financial support.
Conflict of Interest: none declared.
Received on April 19, 2005; revised on May 28, 2005; accepted on May 29, 2005
| REFERENCES |
|---|
|
|
|---|
Barber, C.B., et al. (1996) The quickhull algorithm for convex hulls. ACM Trans. Math. Softw., 22, 469483[CrossRef].
Canutescu, A.A., et al. (2003) A graph theory algorithm for protein side-chain prediction. Protein Sci., 12, 20012014[CrossRef][Web of Science][Medline].
DeLano, W. The PyMOL User's Manual, (2002) , San Carlos,CA, USA DeLano Scientific.
Dupuis, F., et al. (2005) Voro3d: 3d Voronoi tessellations applied to protein structures. Bioinformatics, 21, 17151716
Lindahl, E., et al. (2001) Gromacs 3.0: A package for molecular simulation and trajectory analysis. J. Mol. Mod., 7, 306317.
Poupon, A. (2004) Voronoi and Voronoi-related tessellations in studies of protein structure and interaction. Curr. Op. Struct. Biol., 14, 233241[CrossRef][Web of Science][Medline].
Zimmer, R., Wohler, M., Thiele, R. (1998) New scoring schemes for protein fold recognition based on Voronoi contacts. Bioinformatics, 14, 295308
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
