Bioinformatics Advance Access originally published online on April 12, 2005
Bioinformatics 2005 21(12):2856-2860; doi:10.1093/bioinformatics/bti444
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Three-dimensional computation of atom depth in complex molecular structures
1Biomolecular Structure Research Center and Department of Molecular Biology, Università di Siena I-53100 Siena, Italy
2SienaBioGrafix Srl I-53100 Siena, Italy
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: For a complex molecular system the delineation of atomatom contacts, exposed surface and binding sites represents a fundamental step to predict its interaction with solvent, ligands and other molecules. Recently, atom depth has been also considered as an additional structural descriptor to correlate protein structure with folding and functional properties. The distance between an atom and the nearest water molecule or the closest surface dot has been proposed as a measure of the atom depth, but, in both cases, the 3D character of depth is largely lost. In the present study, a new approach is proposed to calculate atom depths in a way that the molecular shape can be taken into account.
Results: An algorithm has been developed to calculate intersections between the molecular volume and spheres centered on the atoms whose depth has to be quantified. Many proteins with different size and shape have been chosen to compare the results obtained from distance-based and volume-based depth calculations. From the wealth of experimental data available for hen egg white lysozyme, H/D exchange rates and TEMPOL induced paramagnetic perturbations have been analyzed both in terms of depth indexes and of atom distances to the solvent accessible surface. The algorithm here proposed yields better correlations between experimental data and atom depth, particularly for those atoms which are located near to the protein surface.
Availability: Instructions to obtain source code and the executable program are available either from http://sienabiografix.com or http://sadic.sourceforge.net
Contact: niccolai{at}unisi.it
Supplementary information: http://www.Sienabiogzefix.com/publication
| INTRODUCTION |
|---|
|
|
|---|
Structural biology is nowadays rapidly growing, due to a synergistic post-genomic effect of the large developments of X-ray crystallography, nuclear magnetic resonance (NMR) and bioinformatics. In the Protein Data Bank (PDB) (Berman et al., 2000), a wealth of resolved and predicted protein structures are available and, on this basis, structural descriptors have been developed to correlate accessible molecular surface (Lee and Richards, 1971; Richmond, 1984; Quillin and Matthews, 2000; Totrov and Abagyan, 1996; Gerstein et al., 1995), molecular volumes (Richmond, 1984) and potential binding sites (Lo Conte et al., 1999; Shulman-Peleg et al., 2004; Tsuchiya et al., 2004; Innis et al., 2004; Gutteridge et al., 2003) with functional features, protein folding and structural stability (Serrano et al., 1992).
Recently, the calculation of atom depth from the protein surface has been proposed as an additional criterion to define protein structures more accurately (Pintar et al., 2003b; Chakravarty and Varadarajan, 1999) by exploring the interior of the molecule, as the strength of van der Waals (VdW) and electrostatic interactions might be dependent on the distance from the molecular surface (Chakravarty and Varadarajan, 1999; Richards, 1977). Moreover, once the deepest molecular moieties can be defined, a systematic analysis of their properties can be carried out to gain information on molecular structure and stability.
Atom depth can be defined as the distance between an atom and the nearest surface water molecule, either experimentally defined or hypothetically present. However, by evaluating the closest distance between an atom and a dot of the solvent accessible surface (Chakravarty and Varadarajan, 1999) or the distance between an atom and its closest solvent accessible neighbor (Pintar et al., 2003a), contributions from the 3D molecular shape to the actual atom depth are largely lost.
Hence, to estimate atom depth a new approach reflecting the molecular shape is proposed here by measuring intersections between the molecular volume and a sphere of a suitable radius, centered on the atom whose depth has to be quantified. It is apparent, indeed, that smaller the exposed volume, deeper is the 3D insertion of the investigated atom inside the molecular structure. Since, in general, depth is a very relative quantity which depends on the overall size and shape of the object under discussion, an atom depth index is suggested as a more appropriate parameter to discuss atom insertions within each investigated molecular systems. These depth indexes, calculated by using SADIC (Simple Atom Depth Index Calculator) algorithm, for instance, could constitute a new rational basis to reanalyze inner and outer amino acid compositions of proteins or to improve the analysis of depth-related physical phenomena. Among the latter molecular processes, H/D isotopic exchange of protein amide hydrogens is particularly relevant being commonly referred to as molecular surface exposures. Exchange rates are very frequently determined from NMR (Roder et al., 1985) or mass spectrometry (Miranker et al., 1996) studies to explore protein conformations and dynamics. It has also been shown that NMR studies of through-the-space interactions, occurring between paramagnetic probes and protein nuclei, can be interpreted in terms of protein surface exposures (Niccolai et al., 2001, 2003; Pintacuda and Otting, 2002).
In the present report, volume-based and distance-based atom depth have been evaluated and compared for proteins of different size and shape. Calculated depths have been also correlated with H/D exchange and paramagnetic perturbation data available for hen egg white lysozyme (HEWL).
| ALGORITHM |
|---|
|
|
|---|
SADIC algorithm is based on the simple idea of sampling the space around each atom of a given molecule by evaluating, for selected distances from the atom center, the portion of volume that is external to any protein atom. In other words, such volume, henceforth called the exposed volume, represents the space external to the molecular surface comprised at a distance r in all directions around the atom. Therefore, the size of the exposed volume is a direct measure of atom depth with respect to the molecular surface, as smaller the exposed volume, deeper is the atom within the molecular structure. When dealing with exposed volumes instead of linear distances, as previously proposed for depth calculations (Chakravarty and Varadarajan, 1999; Pintar et al., 2003b), the information on surface shape is considered. SADIC algorithm can yield an accurate indication of atom depth, since distances from the atom center to the solvent exposed surface are simultaneously evaluated in all directions. It follows that atoms located in protruding loops have exposed volumes greater than those exhibited by atoms which are equally close to the surface but located at the bottom of a pocket.
In principle, this algorithm can be used to analyze local depth for objects having any size and shape, provided that only an inside and an outside can be unambiguously assigned. The 3D model of a molecule, as an assembly of sphere shaped atoms, satisfies this requirement, since all the points located inside one of these spheres, i.e. closer to an atom center than the VdW atom radius, can be considered inside the molecule.
In order to approximate the volume of the intersection between the molecule and a sphere with a given radius r and center C, the sphere interior is split into units whose volume is known. For each volume unit a representative point is taken: we approximate the exposed volume by testing the sampling points against the molecule and summing the volume relative to all the points outside the molecular model.
The choice of r is of critical importance, since too small or too large r values would yield null or large exposed volumes, respectively at a very similar extent for all the atoms of the investigated molecule. It should be noticed that the values obtained by sampling inside a sphere of radius r can be effectively used to calculate exposition for each sphere of radius r'
r. To exploit this possibility better, sampling points are chosen over concentric spheres with growing radii r0
rn = r.
A simplistic pattern to sample a sphere interior consists of a regular grid in spherical coordinates. This method has the drawback to produce the same number of samples at each radius, thus yielding more packed points toward the poles, and the center and coarser points toward the equator and the outside. In order to overcome the problem, a different sampling pattern is used by SADIC, as described in detail in the Supplementary information section.
| IMPLEMENTATION |
|---|
|
|
|---|
The current SADIC implementation is written in Python and C programming languages (see Supplementary information). The program consists of an object-oriented library providing classes responsible to generate sampling patterns, to model solid objects (they can be subclassed to add new capabilities to the framework) and to parse PDB. (Berman et al., 2000) entity files. An executable with a command line interface is provided: the program can read PDB entity files either from a local file system or from an external database through its URL (http, ftp and file protocols are supported) or its pdb ID code. The user can perform a molecule sampling on a list of given points in the space or on a selection of entity atoms, either absolutely referred by serial number or selected by atom name (e.g. using CA to refer to protein backbone
-carbon), residue number, chain identifier. If the entity file contains more than one structure, as is the case of NMR determined structures, the sampling can be separately performed on a selection of structures. In this case, average and SD of the results may be automatically calculated. | RESULTS AND DISCUSSION |
|---|
|
|
|---|
SADIC program has been developed to obtain a new tool for a structural characterization of complex molecules, such as proteins and nucleic acids, by considering the atom depth. Since depth is a characteristic which can be conveniently discussed only in relation to the size of each investigated system, SADIC outputs are more conveniently analyzed as atom depth indexes, D, rather than absolute exposed volumes.
Thus, for an atom i of a given molecule and a sampling radius r, a depth index Di,r may be defined as
![]() | (1) |
As already pointed out in the Algorithm section, to avoid flattening of the algorithm outputs towards similar Di,rs, the selection of the r value represents a very critical step. In Figure 1 the evolution of Di,rs of a representative selection of HEWL C
carbons is shown: Thr47 and Ile58 C
atoms are both equally close to the solvent exposed surface, but in the convex and concave molecular regions, respectively. Each of the Trp28 and Ser50 C
atoms are, instead, deeply inserted in one of the two HEWL domains. For small and large r values, all Di,rs converge to 0 and 2, respectively and the atom depth index calculated by SADIC loses its structural information. Conversely, in an intermediate region of r values, centered for HEWL at
9 Å, a large dispersion of Di,rs can be observed. Then, to analyze conveniently atom depths it seems appropriate to choose the biggest sphere radius which determines the condition Dn,r = 0 only for one nth inner most atom. In the case of HEWL this condition is met by Trp28 C
carbon, thus resulting as the most internal atom, at a r value of 9 Å.
|
Thus, once a suitable sphere radius has been chosen, calculated Di,rs readily describe the topology of each atom, as values close to 0 or 1 defines the inner or outer atoms (Fig. 2). Furthermore, the Di,r > 1 condition defines atoms which are very close to a convex molecular surface, as in the case of HEWL
carbons of Thr47, Asp48 and Gly117, whose Di,9 are 1.22, 1.10 and 1.15, respectively.
|
The validity of the proposed algorithm has been tested on many proteins by comparing SADIC outputs with distance-based atom depths. Thus, as shown in Figure 3, Di,rs of a small spherical protein and of a large oblate one are compared with atom distances calculated from the closest exposed neighbor, dpxi (Pintar et al., 2003b) and from the nearest surface water molecule (Chakravarty and Varadarajan, 1999), dnwi. Among the different sets of data, a good agreement exists, as the C
carbons exhibiting the highest Di,r values correspond to the shortest dpxis and dnwis. Conversely, for the C
carbons having the longest dpxi and dnwi values, Di,rs close to 0 are found. It is also evident that a higher detail in describing atom depth, particularly for those atoms which are located near to the protein surface, is reached by SADIC. This feature directly derives from the fact that only Di,rs depend both on surface distances and molecular shape and that equally distant atoms from the surface, but close to concave or convex surface regions, exhibit very different depth indexes.
|
To check how Di,rs can be useful in the structural interpretation of experimental data, reported H/D exchange rates of HEWL amide protons (Pedersen et al., 1993) and paramagnetic perturbations of NMR signals (Niccolai et al., 2003) have been analyzed in terms of atom depth. As shown in Figure 4, Di,9s, dpxis and dnwis correlate similarly with H/D exchange rates, Kexi, and paramagnetic signal attenuations, Ai, only in the case of the innermost HEWL atoms. For the outer ones, Kexis and Ais are, indeed, all grouped in a very narrow range of both dpxi and dnwi values. By simple inspection of Figure 4, it is apparent that Di,9s generally exhibit a higher correlation than distance-based depths with the experimental data, as all the slowest exchange rates of HEWL amide hydrogens have been calculated for atoms with Di,9 <0.6. It should be noted that for the latter amide hydrogens a large variety of distance-based atom depths is derived. Furthermore, any overlapping of the experimentally derived parameters observed at the closest surface distances is largely resolved, while the slow exchange rates measured for Ala10, Phe34 and Leu83 amide groups are more consistent with the corresponding Di,9s.
|
A linear or higher level dependence of Di,9s on HEWL Kexis and Ais cannot be delineated, as fast H/D exchange rates were measured for the deeply inserted Thr40 and Cys94 amide hydrogens, in spite of their close proximity to a concave surface. Moreover, a small Ai value is exhibited by the surface exposed Asp48 C
carbon, while for the buried Ile98 a strong paramagnetic attenuation is observed. These four cases represent the most evident discrepancies, but many other anomalous behaviors of Kexis and Ais versus both volume-based and distance-based depth can be seen in the data shown in Figure 4. In this respect, it should be stressed that both experimental parameters depend on atom depth only at a first approximation and that a more detailed discussion of exchange rates and paramagnetic perturbations in terms of atom depth would be needed. The H/D exchange process, determined by the dynamics of the hydrogen bond network within the protein and its hydration shell, is commonly related to solvent accessibility. The fact that SADIC outputs are more consistent with Kexis than atom depths obtained from distance-based calculations, suggests that a step forward in the structural interpretation of the latter experimental parameter might be achieved. On the other hand, the weaker correlation observed between paramagnetic perturbations induced by TEMPOL and atom depths confirms that complex dynamics control the interaction of protein surfaces with paramagnetic probes (Niccolai et al., 2003). On the basis of the B factors reported in the crystal structure of HEWL with PDB (Berman et al., 2000) ID code 4lzt, it is apparent that local structural flexibility is not responsible for the limited correlation between atom depth and accessibility dependent experimental data. It can be concluded that the use of SADIC algorithm might favor improved depth-oriented discussions on experimental data, possibly enhancing our understanding of structure stability and dynamics of complex molecules.
| Acknowledgments |
|---|
Thanks are due to grants from the Italian Ministry of University PRIN03-059395 and from the University of Siena (PAR 2002). Special thanks are also due to Francesco Niccolai for technical assistance.
Received on February 9, 2005; revised on March 30, 2005; accepted on April 7, 2005
| REFERENCES |
|---|
|
|
|---|
Berman, H.M., et al. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235242
Chakravarty, S. and Varadarajan, R. (1999) Residue depth: a novel parameter for the analysis of protein structure and stability. Structure Fold. Des., 7, 723732[Medline].
Gerstein, M., et al. (1995) The volume of atoms on the protein surface: calculated from simulation, using Voronoi polyhedra. J. Mol. Biol., 249, 955966[CrossRef][Web of Science][Medline].
Gutteridge, A., et al. (2003) Using a neural network and spatial clustering to predict the location of active sites in enzymes. J. Mol. Biol., 330, 719734[CrossRef][Web of Science][Medline].
Innis, C.A., et al. (2004) Prediction of functional sites in proteins using conserved functional group analysis. J. Mol. Biol., 337, 10531068[CrossRef][Web of Science][Medline].
Koradi, R., et al. (1996) MOLMOL: a program for display and analysis of macromolecular structures. J. Mol. Graph., 14, 5155 2932[CrossRef][Web of Science][Medline].
Lee, B. and Richards, F.M. (1971) The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol., 55, 379400[CrossRef][Web of Science][Medline].
Lo Conte, L., et al. (1999) The atomic structure of proteinprotein recognition sites. J. Mol. Biol., 285, 21772198[CrossRef][Web of Science][Medline].
Miranker, A., et al. (1996) Investigation of protein folding by mass spectrometry. FASEB J., 10, 93101[Abstract].
Niccolai, N., et al. (2001) NMR studies of protein surface accessibility. J. Biol. Chem., 276, 4245542461
Niccolai, N., et al. (2003) NMR studies of protein hydration and TEMPOL accessibility. J. Mol. Biol., 332, 437447[CrossRef][Web of Science][Medline].
Pedersen, T.G., et al. (1993) Determination of the rate constants k1 and k2 of the LinderstromLang model for protein amide hydrogen exchange. A study of the individual amides in hen egg-white lysozyme. J. Mol. Biol., 230, 651660[CrossRef][Web of Science][Medline].
Pintacuda, G. and Otting, G. (2002) Identification of protein surfaces by NMR measurements with a paramagnetic Gd(III) chelate. J. Am. Chem. Soc., 124, 372373[CrossRef][Web of Science][Medline].
Pintar, A., et al. (2003a) Atom depth as a descriptor of the protein interior. Biophys. J., 84, 25532561[Web of Science][Medline].
Pintar, A., et al. (2003b) DPX: for the analysis of the protein core. Bioinformatics, 19, 313314
Quillin, M.L. and Matthews, B.W. (2000) Accurate calculation of the density of proteins. Acta Crystallogr. D Biol. Crystallogr., 56, 791794[CrossRef][Medline].
Richards, F.M. (1977) Areas, volumes, packing and protein structure. Annu. Rev. Biophys. Bioeng., 6, 151176[CrossRef][Web of Science][Medline].
Richmond, T.J. (1984) Solvent accessible surface area and excluded volume in proteins. Analytical equations for overlapping spheres and implications for the hydrophobic effect. J. Mol. Biol., 178, 6389[CrossRef][Web of Science][Medline].
Roder, H., et al. (1985) Individual amide proton exchange rates in thermally unfolded basic pancreatic trypsin inhibitor. Biochemistry, 24, 74077411[CrossRef][Medline].
Serrano, L., et al. (1992) The folding of an enzyme. II. Substructure of barnase and the contribution of different interactions to protein stability. J. Mol. Biol., 224, 783804[CrossRef][Web of Science][Medline].
Shulman-Peleg, A., et al. (2004) Recognition of functional sites in protein structures. J. Mol. Biol., 339, 607633[CrossRef][Web of Science][Medline].
Totrov, M. and Abagyan, R. (1996) The contour-buildup algorithm to calculate the analytical molecular surface. J. Struct. Biol., 116, 138143[CrossRef][Medline].
Tsuchiya, Y., et al. (2004) Structure-based prediction of DNA-binding sites on proteins using the empirical preference of electrostatic potential and the shape of molecular surfaces. Proteins, 55, 885894[CrossRef][Web of Science][Medline].
Zhang, X. and Brian, B.W. (1995) EdPDB: a multifunctional tool for protein structure analysis. J. Appl. Cryst., 28, 624630[CrossRef].
This article has been cited by other articles:
![]() |
V. Venditti, N. Niccolai, and S. E. Butcher Measuring the dynamic surface accessibility of RNA with the small paramagnetic molecule TEMPOL Nucleic Acids Res., March 27, 2008; 36(4): e20 - e20. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





