Bioinformatics Advance Access originally published online on February 22, 2005
Bioinformatics 2005 21(10):2347-2355; doi:10.1093/bioinformatics/bti337
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons
EMBL-EBI Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: An increasing number of protein structures are being determined for which no biochemical characterization is available. The analysis of protein structure and function assignment is becoming an unexpected challenge and a major bottleneck towards the goal of well-annotated genomes. As shape plays a crucial role in biomolecular recognition and function, the examination and development of shape description and comparison techniques is likely to be of prime importance for understanding protein structurefunction relationships.
Results: A novel technique is presented for the comparison of protein binding pockets. The method uses the coefficients of a real spherical harmonics expansion to describe the shape of a protein's binding pocket. Shape similarity is computed as the L2 distance in coefficient space. Such comparisons in several thousands per second can be carried out on a standard linux PC. Other properties such as the electrostatic potential fit seamlessly into the same framework. The method can also be used directly for describing the shape of proteins and other molecules.
Availability: A limited version of the software for the real spherical harmonics expansion of a set of points in PDB format is freely available upon request from the authors. Binding pocket comparisons and ligand prediction will be made available through the protein structure annotation pipeline Profunc (written by Roman Laskowski) which will be accessible from the EBI website shortly.
Contact: thornton{at}ebi.ac.uk
| 1 INTRODUCTION |
|---|
|
|
|---|
With the worldwide rise of Structural Genomics initiatives and the increasing number of protein structures that are being solved for which little or no biochemical characterization exist, the challenge of understanding how protein structure is related to function is becoming much more than a mere academic interest. Structure provides an excellent interpretational aid for biochemical data and a good starting point for generating further hypotheses, potentially leading to drug design. However function assignment based on structural analyses is in most cases far from trivial. Structural Genomics may well provide many new folds which will aid and greatly enhance the number of sequences for which homology models can be built, but without functional annotation and sufficient understanding of structure, it could be argued that these new structures are not being fully exploited and add little new biological information.
In this paper, a method for modelling shapesespecially binding pockets in protein structuresis presented. Shapes are considered as functions on the unit sphere by describing each surface point by its spherical coordinates (r,
,
) and setting f(
,
) = r. We present an efficient approach to describe this shape function by the coefficients of a real spherical harmonics expansion. Any (square-integrable) function on the unit sphere can be expanded in this manner and therefore the same methodology can be applied to enhance the description by including other properties such as the electrostatic potential.
1.1 Function prediction from structure
There has recently been a flood of interesting approaches to extract structural information about protein active sites and potential binding partners (Aloy et al., 2001; Exner et al., 2002; Cai et al., 2002; Schmitt et al., 2002; Laskowski et al., 2003; Bate and Warwicker, 2004; Shulman-Peleg et al., 2004 and references within). For reviews related to function prediction from structure, see Orengo et al. (1999) Teichmann et al. (2001) and Wild and Saqi (2004) and Whisstock and Lesk (2003) in particular for a clear description of the problems involved and many examples.
1.2 Shape
Shape has long been recognized as a key concept in chemistry (Mezey, 1993); especially in molecular biology where shape is a major factor (largely through non-covalent interactions) in virtually all processes/interactions within the cell and plays an important role in molecular recognition. It is perhaps surprising that precisely in this area, shape is somewhat ill-defined, although a few accepted operational definitions exist.
For rigid bodies and macroscopic objects, shape is often described by combinations of standard geometrical objects such as the Platonic solids, by sets of intersecting planes or by algebraic equations defining the extent of the object. At the molecular level, the electron density ultimately defines shape (Bader, 1990; Mezey, 1993) either by isocontours or extrema and saddle-points. For some applications (e.g. medium resolution scattering techniques, molecular graphics, structural bioinformatics) this approach is simplified and atoms are commonly approximated by a solid sphere representation.
Biomolecules under physiological conditions are constantly in motion, often undergoing quite dramatic conformational changes induced by the thermal energy of the surroundings and enhanced by the low-lying multiminima form of the free energy landscape (Onuchic et al., 1997). The concept of shape is rather more tricky to define in such cases. A proper description of shape in this context would incorporate some probabilistic method to include coordinate uncertainties and motion, similar to the Gaussian surface description (Duncan and Olson, 1993; Laskowski, 1995) or fuzzy surfaces (Agishtein, 1992). However, it is common to consider the shape of a molecule as being determined by its surface and to treat it as a static entity. Surfaces are frequently determined using van der Waals atomic radii and solvent/probe accessible regions (Lee and Richards, 1971; Greer and Bush, 1978; Connolly, 1983; Voorintholt et al., 1989; Gabdoulline and Wade, 1996).
Mathematically there are several ways of describing and representing shapes including triangulations, polygons, distance distributions and landmark theory. The focus here will be on functional forms. Functions can be either local (piecewise, such as splines) or global in which the whole shape is described by an often very complex expression. Global representations are often termed parametric as the whole shape can be reduced to a number of parameters and each parameter affects the entire shape. Functions can either be explicit, meaning that one coordinate is expressed in terms of the others, or implicit, meaning that the surface points satisfy a given equation (isosurfaces). For example, spherical harmonics can be used for explicit functions, whereas super- and hyperquadrics are implicit representations.
| 2 APPROACH |
|---|
|
|
|---|
The goal is to extract functional information from protein structure based on the analysis and comparison of binding pockets. Our approach makes the basic assumption that proteins which bind similar ligands have clefts of similar size, shape and chemistry.
The first step is therefore to determine what we think might be a binding pocket and to get the best guess of its spatial extent. Although we wish to elaborate here mainly on the shape comparison algorithm, a brief overview of the whole process may help to put things in context. The overall flow of our approach for binding pocket shape description can be summarized as follows:
- Compute the clefts and cavities for a given macromolecule (SURFNET, Laskowski, 1995).
- Use characteristics such as the residue conservation score to identify which clefts may be binding pockets (ConSurf, Armon et al., 2001; Glaser et al., 2003) and to reduce its volume to encompass the most likely site of ligand binding.
- Transform the binding pocket to a standard frame of reference and compute the real spherical harmonic expansion coefficients that best approximate the shape.
- Scan a database of precomputed expansion coefficients of protein binding pockets and ligands and choose the most significant matches.
2.1 Determination of potential binding pockets
Protein surfaces contain clefts and indentations of varying sizes and depths. It has been shown that in enzymes, the active site is commonly found within the largest cleft (Laskowski et al., 1996). While this observation makes the detection of the active site relatively easy, the shapes of the clefts often extend significantly beyond the region occupied by the ligand (Figure 2 in Laskowski et al., 1996) and as such are not well suited for direct shape comparisons with ligands. These larger volumes may be of functional importance but need to be reduced in size for our purposes. A detailed description and analysis of a new algorithm for predicting potential binding pockets will be presented elsewhere (Glaser,F., Morris,R., Laskowski,R. and Thornton,J., in preparation). In brief, the algorithm uses the SURFNET program (Laskowski, 1995) to generate three-dimensional (3D) shapes of the protein's cavities and clefts. SURFNET does this by placing spheres between protein atoms such that the radius of these spheres does not penetrate the atomic (van der Waals) radii of these two atoms or any nearby atoms. The union of these spheres is used by SURFNET to describe the 3D cleft shape. These SURFNET spheres used for the definition of the clefts in the protein are then filtered by the residue conservation score of the nearest residue. Residue conservation is calculated using the ConSurf algorithm (Armon et al., 2001). The retained non-overlapping clusters of SURFNET spheres are ranked by volume and total assigned residue conservation. The cluster with the top ranking is taken to determine the potential binding pocket. The binding pocket shape is thus determined by the union of SURFNET spheres that are near conserved residues.
|
2.2 Harmonics and real spherical harmonics
Spherical harmonics have a well-established standing in the molecular sciences. They are perhaps best known as the orbital shape determining functions as solutions of the angular part of Schrödinger's equation for the hydrogen atom (eigenfunctions of the angular momentum operators L2 and Lz), although they occur in a great variety of different physical problems such as electromagnetism, gravity, mechanics or hydrodynamics. Their attractive properties when dealing with rotations, spherical averaging procedures or smooth surface representations on the sphere have led to their extensive use in protein crystallography for the rotation function in molecular replacement (Crowther, 1972), in the computation of radially averaged normalized structure factor profiles (Morris and Bricogne, 2003), in molecular docking (Ritchie and Kemp, 1999), in small-angle scattering low resolution shape determination (Stuhrmann, 1970; Svergun, 1991) and protein surface display routines (Duncan and Olson, 1993). Recently, Cai et al. (2002) built on a very similar approach to that of ritchie99,ritchie00 and described an efficient virtual screening algorithm using spherical harmonic molecular surfaces.
Spherical harmonics, Ylm(
,
), are single-valued, smooth (infinitely differentiable), complex functions of two variables,
and
, indexed by two integers, l and m. In quantum physics terminology, l is the angular quantum number and m the azimuthal quantum number. Roughly speaking, l gives the number of local minima of the function and therefore represents a spatial frequency. [See any quantum mechanics or functional analysis textbook for more definitions and properties, e.g. Cohen-Tannoudji et al. (1977) and Edmonds (1996).]
Spherical harmonics form a complete set of orthonormal functions and thus form a vector space analogue to unit basis vectors. In the same way that vector projections onto each axis (scalar product between vectors) can be used to describe any vector in the familiar form x = (x,y,z)T, expansion coefficients (scalar product between functions) can be used to describe functions. Any (square-integrable) function of
and
can be expanded as follows:
![]() | (1) |
and
. For a closed surface in 3D to be single-valued and therefore a function of
and
requires that any ray leaving the expansion centre should only penetrate the surface once, i.e. a continuous mapping exists between the surface and the unit sphere S2. Such figures are called single-valued surfaces or star-shape surfaces (Fig. 1). Proteins and ligands at high resolution are often not star-shaped, although their low resolution images often approximately fulfil this requirement (Svergun, 1991; Ritchie and Kemp, 1999). Even if a surface is not truly star-shaped, the approximation of using an outer shell can nevertheless give useful and discriminative information about the shape. However, for binding pockets we have found that the star-shape requirement is often fulfilled (the star-shape requirement is not used at any point for the construction of the binding pockets).
|
The expansion coefficients, clm, can be obtained by multiplying the above equation by the complex spherical harmonics and integrating over the solid angle
,
![]() | (2) |
In this manner, a spectral decomposition of any (square-integrable) function may be carried out. The lower l values correspond to the low frequencies and describe the overall low-resolution shape, whereas the higher values add finer, high-frequency detail to the picture. The termination of the series at a given l thus corresponds to a (spatial) frequency filtering method (low-pass filter).
The complex-valued spherical harmonics can be combined to give real valued functions that share the same orthonormal and completeness properties. Real spherical harmonics are better suited for describing real surface functions. Surface harmonics are often defined as any combination of real spherical harmonics for fixed l and commonly used in shape analysis and deformational studies. In general, a surface harmonic is simply a harmonic function whose domain is a surface and is not restricted to any coordinate system or specific family of functions.
| 3 ALGORITHMIC DETAILS |
|---|
|
|
|---|
3.1 Orientation
To be able to compare expansion coefficients directly they must be put in a standard frame of reference. Each molecule or binding pocket is translated so that its centre of geometry coincides with the origin of the coordinate system. The system is then rotated such that the second moment about the mean, the variancecovariance matrix,
![]() | (3) |
around each axis (axis flips) owing to the symmetry of the second moment tensor (and the fact that negative eigenvectors remain eigenvectors with the same eigenvalues). In a similar manner, one can compute the third moment around the mean, which is a measure related to the skewness of a distribution. We have chosen to define an orientation for which the two diagonal elements of the third moment with the largest absolute values are made positive (the third diagonal element is then determined by the requirement that the system remains right-handed) as our standard orientation. This can always be achieved by a rotation about any of the axes, x,y,z by
. The final position is indistinguishable from the original one by the first and second moments. As spherical harmonics enjoy mathematically convenient rotational propertiesthe coefficients can be rotated in the same way as vectors with so-called Wigner matrices (Edmonds, 1996; Chaichian and Hagedorn, 1997)the orientation convention introduced above is merely to speed up the process by avoiding the need to search for optimal rotations between coefficients. The registration of 3D shapes is, however, not a trivial task and can be severely hampered by errors in the original shapes (Besl and McKay, 1992; Lanzavecchia et al., 2001; Dugan and Altman, 2004). This skewness method should therefore be seen as a heuristic. Another approach would be either to store and search for all axis flips or to fall back on the optimization problem of finding the best alignment. An attractive alternative would be the use of rotationally invariant descriptors (Kazhdan et al., 2003).
3.2 Legendre polynomials
As solutions of the angular part of Laplace's equation in spherical coordinates, spherical harmonics are functions of
and
that can be separated further into a purely
dependent term multiplied by a purely
dependent term. The
functions are the well-known Legendre polynomials with argument cos
, and the
functions are simply exponential functions of im
. The computation of spherical harmonics therefore requires the evaluation of Legendre polynomials. It is well knownespecially in areas such as astrophysics and geophysicsthat routines for the Legendre polynomials based on standard recurrence relationships (as found in the Numerical Recipes, Press et al., 1996) become unstable and lead to overflow problems for higher orders of l. We therefore employ a stable recursion formula using extended-range arithmetic (Smith et al., 1981).
3.3 Integration on S2 and spherical t-designs
As may be seen from Equation (2), the computation of the expansion coefficients requires that the function to be expanded is integrated over the whole unit sphere. For functions that are available in analytical form, this integration can be carried out analytically in favourable cases; otherwise one must resort to numerical integration. The determination of integration points and their correct weights in a summation approach to the integral is, however, far from trivial and is still an active area of research (Jetter et al., 1998). Techniques exist for determining the integration weights given a set of sample points and also for creating such a layout. Such techniques are often demanding and would require intensive computation to first establish a good point layout and then to optimize the parameters. Instead, we have employed mathematical objects known as spherical t-designs (Goethals and Seidel, 1979). A spherical t-design is a set of points, {p1,p2,...,pN}, for which the integral of any polynomial, f(x), of degree at most t over the sphere is equal to the average value (with equal weights) of the polynomial over the set of points
![]() | (4) |
and
. For binding pockets, this radius is computed by rolling a 1.4 Å ball over the union of spheres obtained from the combination of SURFNET and conservation score filtering. The radius is the distance from the expansion centre to the closest surface point of the ball that is penetrated by a ray travelling outwards from the expansion at
and
. This gives a smoothed surface that is well suited for the expansion in spherical harmonics. For molecules (ligands), the approach is similar but using the original molecule's atoms (and their van der Waal radii) instead of the SURFNET spheres. Spherical t-designs are actually only proven to exist algebraically up to t = 9, but considerable numerical testing provides evidence to suggest that spherical t-designs have been found up to t = 21. These points are very uniformly spread and represent a highly efficient (optimal) sampling on the sphere for a given degree of function variability (order of the polynomial). We have tested a number of published sphere integration schemes and have consistently found the best and most stable results with the spherical t-designs. In particular, we have used the 240 point set that is suspected to be a spherical design of order 21 (Hardin and Sloane, 1996). For higher order expansions we use approximate equal-weight integration layouts of up to 900 points, published by Fliege and Maier (1996). Triangulation methods typically represent a huge oversampling of surface (depending on the details one is interested in and the degree of surface subdivision). Our set of points is far smaller (and yet still sufficient) than that typically obtained from a surface triangulation algorithm and does not require the computation of such or the integration weights and is therefore a major factor in the efficiency of our method.
Although this direct coefficient computation has huge advantages in terms of speed, one can envision cases in which a different approach may be more appropriate. Protein models can exhibit a great deal of variance with respect to the accuracy of individual atoms. To a very first approximation this is reflected in the spread of the crystallographic atomic displacement factors (temperature or B-factors). There are means of estimating the individual coordinate errors from Cruickshank's diffraction precision index (Cruickshank, 1999; Schneider, 2000) and this positional uncertainty (Ten Eyck, 2003) could be taken into account to construct a probability surface. It would then seem natural to fit the expansion coefficients so as to minimize the error-weighted squared differences between the probability surface values and the reconstructed radii. A similar method has been used for geophysical data and can be implemented to perform very efficiently due to the independency of the coefficients (Matheny and Goldgof, 1995).
3.4 Implementation
The above method was developed in LISP and then rewritten in C for speed and portability reasons. For the computation of the surface and the expansion coefficients to the order of l = 20, the compiled LISP code typically required execution times of under 1 s on a standard linux box running Redhat 7.3 and using the LISP package CMUCL (CMUCL User's Manual, 2004). In C we made use of the GNU GSL (Galassi et al., 2003) libraries for matrix manipulations and the associated Legendre polynomials (stable to l = 150) for the computation of the spherical harmonics. The C code allowed for aggressive optimization that pushed the execution time down to typically well under 0.01 s per structure. A JAVA version of the method using the Colt Library takes approximately 2 s for the same task. Note that this does not reflect in any way a fair comparison of the programming languages mentioned here, but merely indicates the efficiency of the method that runs in reasonable times even for byte code.
| 4 RESULTS |
|---|
|
|
|---|
Real spherical harmonic expansions have been computed for all ligands in the EBI-MSD (Boutselakis et al., 2003), the HIC-Up database (Kleywegt and Jones, 1998), for a few selected enzyme families with well-known binding pockets and for a large number of predicted binding pockets derived from surface cleft analyses and residue conservation scores. A detailed analysis of more biochemical focus will not be presented here. In this paper, we instead focus on the methodology but give examples that show the power of our approach and also the current limitations with suggestions for future improvements.
4.1 Accuracy
For functions on the sphere, spherical harmonic expansions can be constructed to any arbitrary error threshold (within the numerical accuracy of the method). However, it makes sense to truncate the expansion to match the type of feature detail one is looking at and thereby decrease the chances of fitting noise. To roughly capture the overall shape of a small molecule or binding pocket, we have found that terminating the expansion series at lmax = 6 is usually sufficient (giving rise to
coefficients). However, for the chosen spherical t-design integration layout, the best results in terms of reproducing the original values were obtained for lmax = 14 (N = 225 expansion coefficients). The accuracy was measured in root mean square deviations (rmsd) between the original sample points and reconstructed values. Taking different sets of points with which to compute the rmsd changed the picture very little, showing that the spherical harmonics are very well behaved between the sample points and represent well the overall shape. We found values of about 0.0010.02 Å rmsd for most small molecules and binding pockets we tested (the values typically increase with the overall size). Figure 2 shows a few selected approximations for a predicted binding pocket. The comparison and clustering studies take place in 225D space (lmax = 14). The difference in coefficient space between shape i and shape j is calculated as a standard L2 distance,
![]() | (5) |
|
|
4.2 Sensitivity
The combination of SURFNET with residue conservation scores often gives a good binding pocket prediction but unfortunately of varying accuracy when compared with the known location of the ligand (Glaser et al., 2003, in preparation). Small changes in the PDB coordinates (for instance due to side chain flexibility) can give rise to a different set of spheres from which the pocket is built. In Figure 5 we show the effect of randomizing the SURFNET sphere centres for a predicted binding pocket. As can be seen this can potentially introduce quite large shifts in the expansion coefficient vectors, given significant binding pocket rearrangements (curve top). When comparing binding pockets with ligands using shape alone, it therefore does not make sense to distinguish between the different molecules shown in Figure 4 as the binding pocket error will be in the order of these differences or larger.
|
In Figure 6, the average deviations of not getting exactly the same reference frames are shown. The rotations were selected by randomly sampling quaternion space (Kuipers, 2002). The plots show that deviations from the correct centre of geometry (the expansion centre) of about 0.5 Å and rotational deviations of up to about 20° generate differences in the expansion coefficients comparable to the differences within the ligand cluster shown in Figure 4.
|
4.3 Conformational flexibility
Structural flexibility is a major challenge in any comparison technique based on 3D objects. It is not immediately clear how best to handle the multiple conformations commonly observed in high resolution X-ray structures and especially in NMR models in structural comparisons, although there have been promising attempts (Schneider, 2002). For our approach, we are faced with conformational flexibility both on the side of the protein and ligand. On the protein side, the overall effect results in a plot very similar to Figure 5 for small changes. For larger changes such as domain movements, the binding pockets are often no longer detectable or are distorted beyond recogniton from the liganded protein structure. In Figure 7 the distances between expansion coefficients [computed with Equation (5)] are displayed for some diverse conformations of current depositions of nicotinamide-adenine-dinucleotide (NAD) and adenosine-triphosphate (ATP) in the PDB (Bernstein et al., 1977). For other flexible ligands the behaviour and spread of coefficient distances is very similar (data not shown).
|
4.4 Binding pocket comparisons
In order to show the potential of the present method for binding pocket comparisons, we generated a test set of 40 proteins with low pairwise sequence identity. The test set contains 10 examples each of three different ligands [ATP, NAD and heme (HEM)] with the remaining 10 examples all bound to five distinct but chemically similar steroids [estradiol, EST; progesterone, STR; equitinin, EQU; testosterone, TES; and dihydrotestosterone, DHT). Binding pockets were determined using SURFNET (Laskowski et al., 1996) as described in Section with an additional filtering method based on the proximity of the SURFNET spheres to atoms belonging to residues known to interact with the ligand in question. The residue interaction information was obtained from PDBsum (Laskowski et al., 2005). The binding pocket shapes were then expanded in real spherical harmonics and their coefficients were compared using Equation (5). Figure 8 shows a dendrogram obtained from hierarchical clustering based on these distances and indicates the extent to which predicted binding pockets can be matched. As can be seen, steroid binding pocket shapes are similar to each other and sufficiently dissimilar to the remaining three binding pocket types to provide a clear distinction. The other binding pockets do not show such a clear separation; the ATP binding pockets, in particular, show large variability and do not cluster well. The predicted heme and NAD binding pockets exhibit pronounced tendencies to cluster into separate groups. We expect the performance to increase as more properties are included into the binding pocket feature vector.
|
| 5 DISCUSSION |
|---|
|
|
|---|
A fast method has been presented for capturing the global shape of a protein's binding pocket or ligand. The method can also be applied directly to the protein itself. The surface is treated as a (single-valued) function of the spherical coordinates,
and
, that can then be expanded in a linear combination of real spherical harmonics. The use of spherical designs for the integration provides a robust, fast and elegant approach for the determination of the expansion coefficients. The computation time for the expansion typically takes well under 0.01 s (C version) on a standard i686 linux PC. If two objects share a common orientation, these coefficients can be directly compared using the standard Euclidean distance (L2) metric in N-dimensional space, where N is determined by the order of the spherical harmonics chosen to represent the shape. We have presented a robust heuristic that defines a standard frame of reference based on the first, second and third moments of a 3D object. The method presented here has, however, a number of inherent difficulties. A problematic hurdle is predicting the binding pocket correctly. The whole idea behind comparing binding pockets by shape is based on actually having something close to the functionally relevant shape to start with. When comparing binding pocket shapes to ligands, in particular, it is important to get a good geometric model of where the ligand may bind. Our approach will, therefore, perform badly in cases where the ligand lies on a fairly flat surface without much indentation or the ligand sticks out into the solvent. In many instances, the binding pocket could not be determined well with current methods, thus hampering any further steps. Given that each expansion coefficient is an integral over the whole surface with the spherical harmonics [Equation (2)], i.e. their support is the full S2, it is not possible to relate the coefficients to local shape features. Each coefficient always acts globally. It is therefore not well suited to find local matches (subsolutions).
Another problem is how to deal with flexibility within the protein and the ligand. The inclusion of such effects into our approach is not straightforward without dealing with ensembles of potential conformations, probability surfaces or averaged coefficients for similar conformations. Given the speed of our approach, we are currently circumventing this problem and obtaining satisfactory results by simply storing all coefficients of various conformations (based on PDB entries rather than exploring the whole conformational space) and comparing against these.
The registration of a 3D object is not trivial and our heuristic for determining the coordinate frame of reference is not faultless. In the field of computer vision this problem has led to the development of 3D shape retrieval systems based on rotation invariant descriptors (Funkhouser et al., 2003; Kazhdan et al., 2003). This approach loses information in that the original shape cannot be reconstructed from its descriptors but this is over-compensated for by the avoidance of registration errors.
Shape alone does not determine when interactions occur. At least of equal importance is the electronic configuration of all interacting partners. The electrostatic potential describes the total effective interaction energy that would be exerted on a point charge placed in this field. As the electrostatic potential is governed by Poisson's or Laplace's equation (for zero charge density), spherical harmonics are again a good choice for describing its solutions (Tsirelson and Ozerov, 1996; Ritchie and Kemp, 2000). The integration of the electrostatic potential is currently being analysed and will be presented elsewhere.
| 6 CONCLUSION |
|---|
|
|
|---|
In this paper, the method of using real spherical harmonics to describe surfaces has been explained in detail, including implementation issues. It has been shown how this approach can be employed to compare (star-like) shapes efficiently. For protein binding pockets this method offers a robust, compact and fast shape-driven description and comparison method. It is not well suited for the location of subgroups or subpatterns for which alternative approaches are currently being tested.
| Acknowledgments |
|---|
R.J.M. is grateful for financial support from SPINE contract-no QLG2-CT-2002-00988. We thank Roman Laskowski for providing help with and modifications to SURFNET, Gareth Stockwell for the NAD and ATP datasets, Fabian Glaser for advice on ConSurf and Jonathan Barker for help with Figure 4. R.J.M. would like to thank Gèrard Bricogne for the exposure to some of the mathematical methods employed in this approach.
Received on November 5, 2004; revised on February 3, 2005; accepted on February 17, 2005
| REFERENCES |
|---|
|
|
|---|
Agishtein, M.E. (1992) Fuzzy molecular surfaces. J. Biomol. Struct. Dynam., 9, 759768[Web of Science][Medline].
Aloy, P., et al. (2001) Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. J. Mol. Biol., 311, 395408[CrossRef][Web of Science][Medline].
Armon, A., et al. (2001) ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J. Mol. Biol., 307, 447463[CrossRef][Web of Science][Medline].
Bader, R.F.W. Atoms in MoleculesA Quantum Theory, (1990) , Oxford ISBN: 0-19855-865-1 Oxford University Press.
Bate, P. and Warwicker, J. (2004) Enzyme/non-enzyme discrimination and prediction of enzyme active site location using charge-based methods. J. Mol. Biol., 340, 263276[CrossRef][Web of Science][Medline].
Bernstein, F.C., et al. (1977) The Protein Data Bank. A computer-based archival file for macromolecular structures. J. Mol. Biol., 112, 535542[Web of Science][Medline].
Besl, P.J. and McKay, N.D. (1992) A method for registration of 3D shapes. IEEE Trans. Pattern Anal. Mach. Intell., 14, 239256[CrossRef].
Boutselakis, H., et al. (2003) E-MSD: the European Bioinformatics Institute Macromolecular Structure Database. Nucleic Acids Res., 31, 458462
Cai, W., et al. (2002) Proteinligand recognition using spherical harmonic molecular surfaces: towards a fast efficient filter for large virtual throughput screening. J. Mol. Graph. Modell., 20, 313328[CrossRef][Web of Science][Medline].
Chaichian, M. and Hagedorn, R. Symmetries in Quantum Mechanics: From Angular Momentum to Supersymmetry, (1997) ISBN 0-750304073 IOP Institute of Physics.
Technical Report CMU-CS-92-161 CMUCL User's Manual. (2004) .
Cohen-Tannoudji, C., Dui, B., Laloê, F. Quantum Mechanics, (1977) ISBN 0-471-16432-1 and 0-471-16434-8 Wiley Interscience Vols. 1 and 2, .
Connolly, M.L. (1983) Analytical molecular surface calculation. J. Appl. Cryst., 16, 548558[CrossRef].
Crowther, R.A. (1972) The fast rotation function. In Rossman, M.G. (Ed.). The Molecular Replacement Method, , NY Gordon and Breach, pp. 173178.
Cruickshank, D.W.J. (1999) Remarks about protein structure precision. Acta Cryst. D, 55, 583601[CrossRef][Medline].
Dugan, J.M. and Altman, R. (2004) Using surface envelopes for discrimination of molecular models. Protein Sci., 13, 1524[CrossRef][Web of Science][Medline].
Duncan, B.S. and Olson, A.J. (1993) Shape analysis of molecular surfaces. Biopolymers, 33, 219229[CrossRef][Web of Science][Medline].
Edmonds, A.R. Angular Momentum in Quantum Mechanics, (1996) , New Jersey ISBN 0-691-02589-4 Princeton University Press.
Exner, T.E., et al. (2002) Pattern recognition strategies for molecular surfaces. I. Pattern generation using fuzzy set theory. J. Comput. Chem., 23, 11761187[CrossRef][Web of Science][Medline].
Technical Report, Ergebnisberichte Angewandte Mathematik 139T Fliege, J. and Maier, U. (1996) A two-stage approach for computing cubature formulae for the sphere. , Fachbereich Mathematik Universität Dortmund.
Funkhouser, T., et al. (2003) A search engine for 3D models. ACM Trans. Graph., 22, 128.
Gabdoulline, R.R. and Wade, R.C. (1996) Analytically defined surfaces to analyze molecular interaction properties. J. Mol. Graph., 14, 341353[CrossRef][Web of Science][Medline].
Galassi, M., Davies, J., Theiler, J., Gough, B., Jungman, G., Booth, M., Rossi, F. GNU Scientific Library Reference Manual, (2003) 2nd edition , Bristol, UK ISBN 0-9541617-3-4 Netword Theory Ltd.
Glaser, F., et al. (2003) ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics, 19, 163164
Goethals, J.-M. and Seidel, J.J. (1979) Spherical designs. In Ray-Chaudhuri, D.K. (Ed.). Relations between Combinatorics and Other Parts of Mathematics, Proceedings of the Symposium on Pure Mathematics, , AMS, Boston, MA, USA Vol. 34, , pp. 255272.
Greer, J. and Bush, B. (1978) Macromolecular shape and surface maps by solvent exclusion. Proc. Natl Acad. Sci. USA, 75, 303307
Hardin, R.H. and Sloane, J.A. (1996) McLaren's improved snub cube and other new spherical designs in three dimensions. Discrete Comput. Geometry, 15, 429441.
Jetter, K., Stöckler, J., Ward, J.D. (1998) Norming sets and spherical cubature formulas. In Chen, Z., Li, Y., Micchelli, C.A., Xu, Y. (Eds.). Computational Mathematics, , NY ISBN: 0-8247-1946-8 Marcel Dekker Inc., pp. 237245.
Kazhdan, M., Funkhouser, T., Rusinkiewicz, S. (2003) Rotation invariant spherical harmonic representation of 3D shape descriptors. In Kobbelt, S., Schröder, S., Hoppe, S. (Eds.). Eurographics Symposium on Geometry Processing, EG Digital Library.
Kleywegt, G.J. and Jones, T.A. (1998) Databases in protein crystallography. Acta Cryst. D, 54, 11191131[CrossRef][Medline].
Kuipers, J.B. Quaternions and Rotations: A Primer with Applications to Orbits, aerospace, and Virtual Reality, (2002) , New Jercey ISBN 0-691-10298-8 Princeton University Press.
Lanzavecchia, S., et al. (2001) Alignment of 3D structures of macromolecular assemblies. Bioinformatics, 17, 5862
Laskowski, R. (1995) SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J. Mol. Graph., 13, 323330[CrossRef][Web of Science][Medline].
Laskowski, R., et al. (1996) Protein clefts in molecular recognition and function. Protein Sci., 5, 24382452[Web of Science][Medline].
Laskowski, R., et al. (2003) From protein structure to biochemical function? J. Struct. Func. Genomics, 4, 163177.
Laskowski, R.A., et al. (2005) PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids. Nucleic Acids Res., 33, D266D268
Lee, B. and Richards, F.M. (1971) The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol., 55, 379400[CrossRef][Web of Science][Medline].
Matheny, A. and Goldgof, D.B. (1995) The use of three- and four-dimensional surface harmonics for rigid and nonrigid shape recovery and representation. IEEE Trans. Pattern Anal. Mach. Intell., 17, 967981[CrossRef].
Mezey, P.G. Shape in Chemistry. An Introduction to Molecular Shape and Topology, (1993) ISBN: 0-89573-727-2 VCH Publishers.
Morris, R.J. and Bricogne, G. (2003) Sheldrick's 1.2 Å rule and beyond. Acta Crystallogr. D, 59, 615617[CrossRef][Medline].
Onuchic, J.N., et al. (1997) Theory of protein folding: the energy landscape perspective. Annu. Rev. Phys. Chem., 48, 545600[CrossRef][Web of Science][Medline].
Orengo, C.A., et al. (1999) From protein structure to function. Curr. Opin. Struct. Biol., 9, 374382[CrossRef][Web of Science][Medline].
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T. Numerical Recipes in C: The Art of Scientific Computing, (1996) , Cambridge, UK ISBN: 0-521-43108-5 Cambridge University Press.
Ritchie, D.W. and Kemp, G.J.L. (1999) Fast computation, rotation, and comparison of low resolution spherical harmonic molecular surfaces. J. Comp. Chem., 20, 383395[CrossRef].
Ritchie, D.W. and Kemp, G.J.L. (2000) Protein docking using spherical polar Fourier correlations. Proteins, 39, 178194[CrossRef][Web of Science][Medline].
Shulman-Peleg, A., et al. (2004) Recognition of functional sites in protein structures. J. Mol. Biol., 339, 607633[CrossRef][Web of Science][Medline].
Schneider, T.R. (2000) Objective comparison of protein structures: error-scaled difference distance matrices. Acta Cryst. D, 56, 714721[CrossRef][Medline].
Schneider, T.R. (2002) A genetic algorithm for the identification of conformationally invariant regions in protein molecules. Acta Cryst. D, 58, 196298.
Schmitt, S., et al. (2002) A new method to detect related function amoung proteins independent of sequence and fold homology. J. Mol. Biol., 323, 387406[CrossRef][Web of Science][Medline].
Smith, J.M., et al. (1981) Extended-range arithmetic and normalised Legendre polynomials. ACM Trans. Mathe. Software, 93105.
Stuhrmann, H. (1970) Interpretation of small-angle scattering functions of dilute solution and gases. A representation of the structures related to a one-particle scattering function. Acta Crystallogr. A, 26, 297306[CrossRef].
Svergun, D. (1991) Mathematical methods in small-angle scattering data analysis. J. Appl. Cryst., 24, 485492[CrossRef].
Teichmann, S.A., et al. (2001) Determination of protein function, evolution and interactions by structural genomics. Curr. Opin. Struct. Biol., 11, 354363[CrossRef][Web of Science][Medline].
Ten Eyck, L.F. (2003) Full matrix refinement as a tool to discover the quality of a refined structure. Methods Enzymol., 374, 345369[CrossRef][Web of Science][Medline].
Tsirelson, V.G. and Ozerov, R.P. Electron Density and Bonding in Crystals, (1996) ISBN-0-7503-0284-4 Institute of Physics Publishing, pp. 147167.
Voorintholt, R., et al. (1989) A very fast program for visualizing protein surfaces, channels and cavities. J. Mol. Graph., 7, 243245[CrossRef][Web of Science][Medline].
Whisstock, J.C. and Lesk, A. (2003) Prediction of protein function from protein sequence and structure. Quart. Rev. Biophysics, 36, 307340[CrossRef][Web of Science][Medline].
Wild, D.L. and Saqi, M.A. (2004) Structural proteomics: inferring function from protein structure. Curr. Proteomics, 1, 5965[CrossRef].
This article has been cited by other articles:
![]() |
I. Wallach and R. H. Lilien Prediction of sub-cavity binding preferences using an adaptive physicochemical structure representation Bioinformatics, June 15, 2009; 25(12): i296 - i304. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Xie, L. Xie, and P. E. Bourne A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery Bioinformatics, June 15, 2009; 25(12): i305 - i312. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Wallach and R. Lilien The protein-small-molecule database, a non-redundant structural resource for the analysis of protein-ligand binding Bioinformatics, March 1, 2009; 25(5): 615 - 620. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Xie and P. E. Bourne Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments PNAS, April 8, 2008; 105(14): 5441 - 5446. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Sommer, O. Muller, F. S. Domingues, O. Sander, J. Weickert, and T. Lengauer Moment invariants as shape recognition technique for comparing protein binding sites Bioinformatics, December 1, 2007; 23(23): 3139 - 3146. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||














