Skip Navigation


Bioinformatics Advance Access originally published online on October 31, 2007
Bioinformatics 2007 23(23):3139-3146; doi:10.1093/bioinformatics/btm503
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
23/23/3139    most recent
btm503v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Sommer, I.
Right arrow Articles by Lengauer, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sommer, I.
Right arrow Articles by Lengauer, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Moment invariants as shape recognition technique for comparing protein binding sites

Ingolf Sommer 1,*, Oliver Müller 1, Francisco S. Domingues 1, Oliver Sander 1, Joachim Weickert 2 and Thomas Lengauer 1

1Max-Planck-Institute for Informatics, Stuhlsatzenhausweg 85, 66123 Saarbrücken and 2Faculty of Mathematics and Computer Science, Saarland University, Building E1.1, 66041 Saarbrücken, Germany

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: An approach for identifying similarities of protein–protein binding sites is presented. The geometric shape of a binding site is described by computing a feature vector based on moment invariants. In order to search for similarities, feature vectors of binding sites are compared. Similar feature vectors indicate binding sites with similar shapes.

Results: The approach is validated on a representative set of protein–protein binding sites, extracted from the SCOPPI database. When querying binding sites from a representative set, we search for known similarities among 2819 binding sites. A median area under the ROC curve of 0.98 is observed. For half of the queries, a similar binding site is identified among the first two of 2819 when sorting all binding sites according the proposed similarity measure. Typical examples identified by this method are analyzed and discussed. The nitrogenase iron protein-like SCOP family is clustered hierarchically according to the proposed similarity measure as a case study.

Availability: Python code is available on request from the authors.

Contact: sommer{at}mpi-inf.mpg.de

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Geometric similarity plays an important role in structural biology, e.g. in the detection of the similarity of protein structures (Sierk and Kleywegt, 2004), or in the analysis of protein–protein binding interactions (Via et al., 2000). To support the analysis of protein interactions, there are methods available for backbone structure comparison (Holm and Sander, 1993; Shindyalov and Bourne, 1998), for local spatial comparison of patterns in ligand binding sites (Artymiuk et al., 2005; Kleywegt, 1999; Stark et al., 2003), for comparing surfaces of ligand binding sites (Hofbauer et al., 2004; Morris et al., 2005), for searching ligand binding sites (Jambon et al., 2005; Shulman-Peleg et al., 2004b), for comparing protein binding sites (Hofbauer and Aszodi, 2005) or protein–protein interfaces (Bock et al., 2005, 2007; Shulman-Peleg et al., 2004a; 2005), and for analyzing protein–protein interactions (Cazals et al., 2006). There are datasets of protein–protein interactions (Keskin et al., 2004) and databases emerging for the classification of protein–protein interfaces, such as SCOPPI (Kim and Ison, 2005; Winter et al., 2006) and SNAPPI (Jefferson et al., 2007).

Recently, the structural analysis of protein–protein interactions has received a lot of attention, but still there is a need for methods assisting in the structural comparison of protein–protein binding sites. Here, we address the problem of efficiently identifying similarities in defined binding sites employing shape recognition techniques.

While the concept of shape is mathematically difficult to formulate (Edelsbrunner and Mücke, 1994), shape recognition techniques have been developed and employed very successfully. Among others, there are techniques based on extended Gaussian images (Horn, 1984), geometric hashing (Wolfson and Rigoutsos, 1997), spherical harmonics (Kazhdan et al., 2003), shape distributions (Osada et al., 2002), barcode shape descriptors (Collins et al., 2004) and moment invariants (Hu, 1962) used for pattern recognition. Some of these, like geometric hashing (Shulman-Peleg et al., 2005), spherical harmonics (Cai et al., 2002; Morris et al., 2005) and shape histograms (Ankerst et al., 1999) have been successfully employed in structural bioinformatics. Also for identifying similarities between small-molecule ligands, shape comparison techniques have been successfully employed (Ballester and Richards, 2007; Grant et al., 1996).

Moment invariants are well established in 2D image analysis (Hu, 1962), have been extended for 3D pattern only much later (Flusser et al., 2003; Mamistvalov, 1998) and have not been applied to problems in structural bioinformatics yet.

We propose a novel, efficient approach for characterizing protein–protein binding sites such that similarities among them can be identified and protein binding sites can be classified structurally. The approach is based on a descriptor, capturing the essentials of the shape of each binding site in a vector of 3D moment invariants. This allows for comparing the shape independently from the sequence similarities between the binding sites. The approach is thus alignment-free, furthermore it does not need spatial superposition, and it is very runtime efficient.

In the rest of the article, we will briefly review the theory of 3D moment invariants, describe how to represent parts of macromolecules such that moment invariants can be computed, and present experiments on a representative set of structurally known binding sites to demonstrate the power of the proposed descriptor.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Geometric and central moments in 3D
The raw moments of order p of a 3D density function {rho}(x) = {rho}(x1, x2, x3) are defined in terms of the integral


Formula

where p = p1 + p2 + p3. The central moments are defined as


Formula

whereFormula ,Formula andFormula . Trivially, when the center of massFormula is at the origin, the raw moments become the central moments.

2.2 Moment invariants
Moment invariants are rational functions of the moments that remain constant in value when the density {rho} is subjected to transformation. For example, the following functions O3, O4, O5, FL remain invariant, when {rho} is subjected to orthogonal coordinate transformations (Flusser et al., 2003; Mamistvalov, 1998); i.e. the values of O3, O4, O5, FL remain constant when {rho} is rotated in 3D space:


Formula

These moment invariants characterize the density of an object independently from the object's position or orientation. The particular functions are not invariant to scale. Since moments are continuous, the employed invariant functions of the moments are continuous as well. Slight changes in the density correspond to slight changes in the moment invariants. Similar density functions can be identified by identifying similar moment invariants. Thus, a feature vector v = (O3, O4, O5, FL) of moment invariants can serve to describe densities independently from their position and orientation in 3D space.

Comparing different density functions now corresponds to comparing their feature vectors. In order to use a Euclidean metric, we normalize the feature vectors. Given N feature vectors, following the scale normalization routine in R (Ihaka and Gentleman, 1996), each feature is divided by its sample variance, i.e.


Formula

where the features of vector vi are indexed with j, and


Formula

Other normalization schemes were suggested, e.g. (Mukundan and Ramakrishnan, 1998); in our experiments the specific kind of normalization affects the overall result only to a limited extent (data not shown).

Normalized feature vectors v' of density functions can be stored and a database or data structure of feature vectors can be queried for similarities. Search for similarities can be performed efficiently, e.g. by storing feature vectors in a space efficient k-d-tree or in a more runtime-efficient and less space-efficient range tree (Mehlhorn, 1984).

2.3 Describing molecules with moment invariants
For many molecules, coordinates of atoms are available from X-ray crystallography or NMR experiments. The coordinates of proteins, e.g. are deposited in the Protein Data Bank (PDB). With coordinates available, fitting a 3D Gaussian onto the center of each atom and summing over the Gaussians, the density of a molecule can be approximated (Grant et al., 1996). The density can be interpreted as the molecule's shape. A feature vector characterizing the density thus serves to describe the molecule's shape.

When all atoms are employed in the above summation of Gaussians, the complete molecule's shape is characterized. Alternatively, the summation can be performed for only selected parts of the molecule. For example, the shape of the surface of a protein molecule can be approximated by summing over its surface atoms. Similarly, by selecting atoms comprising the binding site of a protein, the shape of that binding site is described.

2.4 Computing 3D moments of sums of Gaussians
With a molecule's density represented as sum of Gaussians, one can easily compute its moments and moment invariants. For a 3D Gaussian gk, with standard deviation {sigma}, centered at position ak


Formula

the first few raw moments are as follows (Rinne, 2003, for example):


Formula

And since the distributions are independent along the coordinate axes:


Formula

The raw moments of a density function {rho} that is a sum of Gaussians {rho} = {sum}kgk, are readily computed as the sum of the moments of the individual Gaussians. In order to compute the central moments of the function {rho}, its center of mass can be shifted to the origin such that:


Formula

and


Formula

These values are used to compute the moment invariants O3, O4, O5, FL introduced in Section 2.2.

2.5 Experiments
We assess the moment invariant method on a representative set of protein–protein binding sites. We test how well the method identifies similar binding sites among a large representative set of protein binding sites.

2.5.1 Definition of the binding sites
For the experiments, we use a set of protein–protein binding sites from the SCOPPI database (Kim and Ison, 2005; Kim et al., 2006; Winter et al., 2006). The database provides an evolutionary and structural classification of protein–protein interactions. Based on SCOP (Murzin et al., 1995), SCOPPI superposes the domains of each family and classifies inter-domain interactions. One classification criterion, in particular, is the relative position of a binding site with respect to the structurally superposed domains of the family. Distinct positions are classified as distinct face types. The SCOPPI face type thus defines the position of the binding site with respect to the protein domain. Two binding sites with the same face type can be geometrically similar, but this is not always the case. SCOPPI classifies each binding site uniquely into one group defined by its family and face type.

SCOPPI provides removal of redundancy according to percent sequence identity of the protein domains. Here, we use a subset of SCOPPI whose domains have pairwise < 80% sequence identity and use only those SCOP families with at least 15 members. For the SCOP 1.69 database, this leaves us with 2819 binding sites from 96 SCOP families, falling into 501 groups of family and face type. We will refer to this set as PBSALL (for all protein binding sites).

For each binding site, the residues involved in the interaction are computed according to a criterion defined in Jones and Thornton (1995). Namely, binding site residues are defined as those residues whose side chains have an accessible surface area that decreases by more than 1Å2 on dimerization. A visual catalog of the binding sites and binding site residues employed is provided in the Supplementary Material.

2.5.2 Guaranteeing geometric similarity within binding sites
While binding sites within the same group of family and face type bind at similar positions, some display large differences (see Supplementary Material for some examples). Starting from the set PBSALL, a set of binding sites that falls into groups that are known to be geometrically similar is constructed by employing two restriction criteria.

(1) Binding sites that differ substantially in the number of residues involved in the binding cannot be geometrically similar. Consequently, groups for which the minimal and maximal number of binding site residues that occur within a group differ by at least 50 % are discarded.

(2) Furthermore, we confirm structural relatedness of the binding sites using the TM-align program (Zhang and Skolnick, 2005). TM-align is intended to compare predicted protein structure models against native structures. For the comparison, TM-align relies on the sequence order of the models to be compared; it still can align non-contiguous residues in sites. Independently from the sequence length, it scores model versus native structure on a scale of 0.0 (unrelated) to 1.0 (highly similar). Here, TM-align is employed for discarding binding sites, which are geometrically not similar. For evolutionary-related binding sites, the structure of binding site residues, corresponding to a subset of the atoms defined in the PDB files, can be directly compared with TM-align. Comparing only binding sites from the same group of family and face type, TM-align score values range from 0.05 to 0.8, mostly. The values for arbitrary unrelated binding sites range from 0.0 to 0.45, mostly (see Supplementary Material for details). Thus, all groups which contain a binding site that, compared to another binding site within the same group, displays a TM-align score of less than or equal to 0.45 are discarded.

Applying these two criteria leaves only groups of geometrically similar binding sites. We call such a group a SIMGROUP. Selecting SIMGROUPs containing at least three binding sites one obtains 53 SIMGROUPs from 32 SCOP families, containing a total of 224 binding sites. We will refer to this set of binding sites as PBSGEOM (for protein binding sites, restricted to geometrical similarity).

2.5.3 Precomputation of moment invariants
The residues involved in the interaction of each binding site are computed as described above. For each of these residues, all atoms are modeled as Gaussian distributions. As an approximation, we choose the standard deviation {sigma} = 0.523 such that for each Gaussian 99% of the density is within the Van-der-Waals radius 1.76 of a Carbon atom [VDW radius as in (Chothia, 1975), for the derivation of {sigma}, see Supplementary Material]. Varying {sigma} to some extent (0.1 ≤ {sigma} ≤ 1; data not shown) does not have a large impact on the results.

For each binding site, from the sum of Gaussians the moment invariants are computed as described in Section 2.4. For the 2819 binding sites in PBSALL, this yields a 2819 x 4 dimensional matrix. This matrix is normalized columnwise as described in Section 2.2.

2.5.4 Experimental protocol
We test the method on all binding sites b from PBSGEOM. By construction of PBSGEOM, each b belongs to a SIMGROUP of geometrically similar binding sites and we measure how well we can distinguish these members from the other unrelated binding sites. For b, based on the moment invariants we compute the similarity to each binding site in PBSALL and sort those decreasingly. In a perfect scenario, the binding sites from b's SIMGROUP would be on the first few ranks with all other binding sites following.

When querying for an individual b, and searching for members from its SIMGROUP, we report the best hit that corresponds to the top rank of one of the SIMGROUP members. Furthermore, we analyze the number of interspersed false positives when identifying all SIMGROUP members. This number is computed as maximum rank of a SIMGROUP member minus the size of the SIMGROUP. In a perfect scenario the best hit would be on rank one, the number of interspersed false positives would be zero.

As an additional summarizing quality measure we use the AUC-value, the ‘area under the curve’ in the ROC plot (Fawcett, 2006; Sing et al., 2005). The AUC-value summarizes the distribution of the ranks of the other binding sites from b's SIMGROUP for one query. An AUC-value of 1.0 corresponds to perfect classification, whereas for random assignments an AUC-value of 0.5 can be expected.


    3 RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
3.1 All against all
As described in the previous Section 2.5.4, for each binding site from PBSGEOM we report the ranks of its SIMGROUP members when searching within PBSALL. For 224 queries we report best hit, interspersed false positives and AUC-value. Tables 1 and 2 provide summary statistics (minima, 25%, 50%, 75% quantiles and maxima) of these values, Figure 1 provides histograms for the best hit ranks and AUC values; the Supplementary Material provides the associated P-values.


Figure 1
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. (A) Histogram of the best hit ranks (i.e. the closest SIMGROUP members identified, c.f. Section 3.1). The possible scale of ranks is 1–2818 with an observed maximum of 1902 (x axis); for the histogram we use a bin width of 30. The figure summarizes 224 queries; the relative frequency of 0.8 corresponds to 179 cases (y axis). The insert details the observed counts within the first bin; 40%, i.e. 89 of the cases identify their closest SIMGROUP member on rank one, which is optimal. (B) Histogram of the AUC values observed for 224 queries. The x axis is split into bins of width 0.012. The relative frequency of observations is normalized with 224 (y axis).

 

View this table:
[in this window]
[in a new window]

 
Table 1. Summary statistics for the best hit ranks and interspersed false positives for the search results of 224 queries

 

View this table:
[in this window]
[in a new window]

 
Table 2. Summary statistics for the AUC values for the search results of 224 queries

 
3.1.1 Identifying any similar binding site
The median of the best hit ranks is 2, thus for at least half of the binding sites we find a geometrically similar binding site among the first two (0.1 %) of 2819 inspected binding sites. The maximum of the best hit ranks is 1902 (67.5%), thus for this ‘worst case’ instance, geometrically similar binding sites are not detected. For three quarters of the query binding sites, we detect a geometrically similar one among the first 19 (0.7%) rank positions. For comparison, we also used the difference in the number of binding site residues for measuring similarity of binding sites (histogram provided in the Supplementary Material). This simple measure yields a median of the best hit ranks of 52; it is significantly outperformed by the moment invariant method (with a median of 2).

3.1.2 Identifying all similar binding sites
Similarly, the median of the interspersed false positives is 149 (5.3%), the 75% quantile of the number of interspersed is 432.3 (15.3%). For 25% of the queries, we identify all SIMGROUP members with less than 27 (1%) false positives. For 5% of the queries, we identify all SIMGROUP members without any false positives.

In summary it is fair to say, that we find similar binding sites for most queries, but for many we do not identify all related binding sites.

3.2 Inspecting individual cases
3.2.1 Nomenclature
In order to discuss individual binding sites (as defined in Section 2.5.1) we will refer them by the PDB identifier of the structure, the SCOP domain on which the binding site resides and the SCOP domain with which it is interacting. Using the sunids from SCOP, we concatenate this information to obtain an identifier that is unique for each binding site: < PDB – ID>_<SCOP-domain of binding site>_<Chain – ID of binding site>_<SCOP-domain of binding partner>_<Chain – ID of binding partner >.

3.2.2 One case that went wrong
We face 15 out of 224 cases for which the ranks of the best hits are above 200. Here, we analyze the worst case. When querying for 1bgy_75810_O_43691_Q, we identify the members of its group 1bcc_75804_C_43699_E, and 1kyo_75907_N_73272_P on ranks 1902 and 2041, respectively. Instead, we falsely identify 1d3a_30144_D_42110_C on rank 1. The situation is depicted in Figure 2A. The binding sites within the similarity group of 1bgy_75810_O_43691_Q have disconnected residues. For 1bgy_75810_O_43691_Q, the disconnected residues are shifted quite a bit with respect to the two other structures. This affects the center of the mass (yellow spheres in the graphics), which is used as reference point for computing the moment invariants. Therefore, the feature vectors are not similar anymore. Furthermore, this example clearly points out that the method is unable to identify partial matches between structures. For other binding sites, we observe a tendency that they are not identified when larger appendices are present in one but not in the other structure.


Figure 2
View larger version (49K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Visualization of three sample cases (A, B and C). In each case we have a query binding site (leftmost box) and geometric similarities as identified by the moment invariants (adjacent boxes from left to right). Each box contains a front view (above) and side view (below) of the protein domain (in gray) and of the binding sites atoms (red spheres). Moment invariants were computed for the binding sites atoms (red spheres) around the center of mass (yellow spheres, depicted in A only). For each query, we have members of the SIMGROUP in green boxes. (A) One case that went badly wrong (see Section 3.2.2). With query 1bgy_75810_O_43691_Q, we identify 1d3a_30144_D_42110_C (orange box) as geometrically similar instead of the two other binding sites from the SIMGROUP 1bcc_75804_C_43699_E, 1kyo_75907_N_73272_P. (B) With query 1hbm_60890_B_60888_A, we identify the SIMGROUP members 1e6y_18536_E_18530_D and 1e6v_18534_E_18528_D on rank one and three, respectively; binding site 1pv8_95155_E_95156_F is identified on rank two. (C) Query 1eys_43516_M_43515_L identifies both its SIMGROUP members 3prc_43438_M_43437_L and 1pcr_43495_M_43494_L on the first two ranks.

 
3.2.3 Cases that work well
When querying for binding site 1hbm_60890_B_60888_A (see Fig. 2B), we correctly identify its SIMGROUP members 1e6y_18536_E_18530_D_48911 and 1e6v_18534_E_18528_D on ranks one and three, respectively; 1pv8_95155_E_95156_F that is no member of the SIMGROUP is incorrectly identified on rank two.

Figure 2C depicts query 1eys_43516_M_43515_L, which identifies its two SIMGROUP members 3prc_43438_M_43437_L and 1pcr_43495_M_43494_L on the first two ranks.

3.3 Clustering a family of binding sites
Figure 3 shows an example for applying clustering with moment invariant similarities to binding sites within a SCOP family. The SCOP family c.37.1.10 of nitrogenase iron protein-like proteins are clustered according to similarities of their binding sites. All members (within PBSALL) of that family, irrespective of their face type, are clustered hierarchically using average linkage. Cutting the hierarchical tree, four distinct clusters are identified in the example and the average silhouette values are computed for them. Generally, silhouette values lie in the range [– 1, 1]. Entries with a silhouette value close to 1.0 are well clustered in the sense that distances to other entries in the same cluster are small compared to distances to entries in the closest different cluster. We observe two clusters with average silhouette values of 0.85 and 0.78, one cluster joins a variety of the binding sites with a silhouette value of 0.53, and one binding site is clearly marked as a geometric outlier with a silhouette value of 0.0. Low silhouette values correspond to inhomogenity regarding the external SCOPPI labeling, high silhouette values are found in clusters that reproduce the face types well.


Figure 3
View larger version (56K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. The binding sites of SCOP family (c.37.1.10) of nitrogenase iron protein-like proteins are clustered according to moment invariant geometric similarity. All binding sites (within PBSALL) of that family are clustered irrespective of their face type. The height refers to distances when clustering hierarchically using average linkage. The average silhouette width is computed for the four clusters (shown in pastel colors), which result from cutting the hierarchical tree. The protein structures are oriented such that the binding sites face the camera, they have not been structurally superposed.

 
This example serves to visualize the method's capabilities as well as to illustrate a concrete application. Comparing all members within one family, the SCOPPI face types are reproduced to a certain extent. Another application (data not shown) would be, to cluster according to geometrical similarities within the SCOPI face types.


    4 CONCLUSIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Moment invariants are presented here as shape recognition technique for classifying protein–protein binding sites. Compared to other shape recognition techniques, moment invariants have some drawbacks and some advantages. Like some other shape recognition techniques, e.g. spherical harmonic-based concepts, the computation heavily depends on a reference point. The moment invariant feature vector continuously varies when affinely transforming the density with respect to the reference point. With their global, coarse-grained, view on density, moment invariant methods complement the geometric hashing methods popular in bioinformatics, which successfully detect similarities of features conserved in detail.

Computing moment invariants is extremely fast; in preprocessing, parsing the PDB files is the limiting factor for the descriptor construction, and with given descriptors a binding site can be compared to several thousand others within a second on a standard PC.

The current technique affords a global comparison of densities. For comparisons on a finer level, detail can be added in various ways: geometrically, the density can be partitioned—e.g. into a number of concentric shells within a sphere—to yield a larger and more specific feature vector. Similarly, chemical types of atoms can be used to partition the density, resulting in a more specific feature vector.

The problem of detecting similarities in protein–protein binding sites is highly relevant; there are no established ‘gold standards’ yet to which new methods can be compared to. Thus we assess the method proposed here, based on the SCOPPI database of protein–protein interfaces. From this database, groups of binding sites are extracted and the geometric similarity within the groups is verified using the sequence-based TM-align tool for structure superposition. While the verified similarity groups are used to demonstrate the performance of the method proposed here, our method does not rely on sequence alignment and can detect geometric similarity when there is no sequence similarity.

The method currently compares complete binding sites, but does not find partial matches. The main application will be to group predefined binding sites according to geometrical similarities. In our view, complete matching is also a necessary step in the direction of partial matching, as partial matching can be handled by subdivision of the objects to be compared. We are in the process of refining the method in order to also handle partial matches and to be less dependent on reference points.

In conclusion, moment invariants are a fast and robust technique for comparing densities on a global level, which we believe to be useful also for other applications in structural bioinformatics.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
We thank Adrian Alexa, Holger Theisel, Christof Winter and Hongbo Zhu for helpful comments on the manuscript. We thank Christof Winter for providing SCOPPI data. This work forms part of the BioSapiens project, which is funded by the European Commission within its FP6 Programme under the thematic area ‘Life sciences, genomics and biotechnology for health’, contract number LSHG-CT-2003-503265.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Burkhard Rost

Received on July 7, 2007; revised on September 13, 2007; accepted on October 2, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Ankerst M, et al. Nearest neighbor classification in 3D protein data bases. In: Proceedings 7th International Conference on Intelligent Systems for Molecular Biology (1999) 34–43.

    Artymiuk PJ, et al. Graph theoretic methods for the analysis of structural relationships in biological macromolecules. J. Am. Soc. Inf. Sci. Technol (2005) 56:518–528.[CrossRef]

    Ballester PJ, Richards WG. Ultrafast shape recognition to search compound databases for similar molecular shapes. J. Comput. Chem (2007) 28:1711–1723.[CrossRef][Web of Science][Medline]

    Bock ME, et al. Identifying similar surface patches on proteins using a spin-image surface representation. In: Combinatorial Pattern Matching: 16th Annual Symposium, CPM 2005, Lecture Notes in Computer Science 3537 (2005) Heidelberg: Springer, Berlin. 417–428.

    Bock ME, et al. Discovery of similar regions on protein surfaces. J. Comput. Biol (2007) 14:285–299.[CrossRef][Web of Science][Medline]

    Cai W, et al. Protein-ligand recognition using spherical harmonic molecular surfaces: towards a fast and efficient filter for large virtual throughput screening. J. Mol. Graph. Model (2002) 20:313–328.[CrossRef][Web of Science][Medline]

    Cazals F, et al. Revisiting the voronoi description of protein–protein interfaces. Protein Sci (2006) 15:2082–2092.[CrossRef][Web of Science][Medline]

    Chothia C. Structural invariants in protein folding. Nature (1975) 254:304–308.[CrossRef][Medline]

    Collins A, et al. A barcode shape descriptor for curve point cloud data. In: Eurographics Symposium on Point-Based Graphics—Alexa M, Rusinkiewicz S, eds. (2004) http://graphics.ethz.ch/events/pbg/.

    Edelsbrunner H, Mücke EP. Three-dimensional alpha shapes. ACM Trans. Graph (1994) 13:43–72.[CrossRef]

    Fawcett T. An introduction to ROC analysis. Pattern Recognit. Lett (2006) 27:861–874.[CrossRef]

    Flusser J, et al. Moment forms invariant to rotation and blur in arbitrary number of dimensions. IEEE Trans. Pattern Anal. Mach. Intell (2003) 25:234–245.[CrossRef]

    Grant JA, et al. A fast method of molecular shape comparison: a simple application of a gaussian description of molecular shape. J. Comput. Chem (1996) 17:1653–1666.[CrossRef][Web of Science]

    Hofbauer C, Aszodi A. Sh2 binding site comparison: a new application of the surfcomp method. J. Chem. Inf. Model (2005) 45:414–421.[CrossRef][Web of Science][Medline]

    Hofbauer C, et al. Surfcomp: a novel graph-based approach to molecular surface comparison. J. Chem. Inf. Comput. Sci (2004) 44:837–847.[CrossRef][Web of Science][Medline]

    Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J. Mol. Biol (1993) 233:123–138.[CrossRef][Web of Science][Medline]

    Horn B. Extended Gaussian images. Proc. IEEE (1984) 72:1671–1686.[CrossRef]

    Hu M-K. Visual pattern recognition by moment invatiants. IRE. Tran. Inf. Theory (1962) 8:179–187.

    Ihaka R, Gentleman R. R: a language for data analysis and graphics. J. Comput. Graph. Stat (1996) 5:299–314.[CrossRef]

    Jambon M, et al. The SuMo server: 3D search for protein functional sites. Bioinformatics (2005) 21:3929–3930.[Abstract/Free Full Text]

    Jefferson ER, et al. SNAPPI-DB: a database and api of structures, interfaces and alignments for protein–protein interactions. Nucleic Acids Res (2007) 35:D580–D589.[Abstract/Free Full Text]

    Jones S, Thornton JM. Protein-protein interactions: a review of protein dimer structures. Prog. Biophys. Mol. Biol (1995) 63:31–65.[CrossRef][Web of Science][Medline]

    Kazhdan M, et al. Rotation invariant spherical harmonic representation of 3D shape descriptors. Kobbelt L, Schröder P, Hoppe H, eds. (2003) Eurographics Symposium on Geometry Processing, http://www-i8.informatik.rwth-aachen.de/old-site/SGP/sgp03/.

    Keskin O, et al. A new, structurally nonredundant, diverse data set of protein–protein interfaces and its implications. Protein Sci (2004) 13:1043–1055.[CrossRef][Web of Science][Medline]

    Kim WK, Ison JC. Survey of the geometric association of domain-domain interfaces. Proteins (2005) 61:1075–1088.[CrossRef][Web of Science][Medline]

    Kim WK, et al. The many faces of protein–protein interactions: a compendium of interface geometry. PLoS. Comput. Biol (2006) 2:1151–1164.[Web of Science]

    Kleywegt G. Recognition of spatial motifs in protein structures. J. Mol. Biol (1999) 285:1887–1997.[CrossRef][Web of Science][Medline]

    Mamistvalov A. n-dimensional moment invariants and conceptual mathematical theory of recognition of n-dimensional data sets. IEEE. Trans. Pattern Anal. Mach. Intell (1998) 20:819–831.[CrossRef]

    Mehlhorn K. Data Structures and Efficient Algorithms. Volume 3: Multi-dimensional Searching and Computational Geometry, volume 3 of EATCS Monographs on Theoretical Computer Science (1984) Berlin, Germany: Springer.

    Morris RJ, et al. Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons. Bioinformatics (2005) 21:2347–2355.[Abstract/Free Full Text]

    Mukundan R, Ramakrishnan K. Moment Functions in Image Analysis (1998) Singapore, New Jersey, London, Hong Kong: World Scientific Publishing.

    Murzin AG, et al. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol (1995) 247:536–540.[CrossRef][Web of Science][Medline]

    Osada R, Funkhouser T, Chazelle B, Dobkin D. Shape distributions. ACM Trans. Graph (2002) 21:807–832.[CrossRef]

    Rinne H. Taschenbuch der Statistik (2003) Germany: Harri Deutsch, Frankfurt a.M.

    Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng (1998) 11:739–747.[Abstract/Free Full Text]

    Shulman-Peleg A, et al. Protein-protein interfaces: recognition of similar spatial and chemical organizations. In: Algorithms in Bioinformatics: 4th International Workshop, WABI 2004, Bergen, Norway, 2004, Volume 3240 of Lecture Notes in Computer Science—Jonassen I, Kim J, eds. (2004a) Berlin, Heidelberg: Springer Verlag. 194–205.

    Shulman-Peleg A, et al. Recognition of functional sites in protein structures. J. Mol. Biol (2004b) 339:607–633.[CrossRef][Web of Science][Medline]

    Shulman-Peleg A, et al. SiteEngines: recognition and comparison of binding sites and protein–protein interfaces. Nucleic Acids Res (2005) 33(Web Server issue):W337–W341.[Abstract/Free Full Text]

    Sierk ML, Kleywegt GJ. Déjá vu all over again: finding and analyzing protein structure similarities. Structure (2004) 12(12):2103–2111.[Medline]

    Sing T, et al. ROCR: visualizing classifier performance in R. Bioinformatics (2005) 21(20):3940–3941.[Abstract/Free Full Text]

    Stark A, et al. A model for statistical significance of local similarities in structure. J. Mol. Biol (2003) 326:1307–1316.[CrossRef][Web of Science][Medline]

    Via A, et al. Protein surface similarities: a survey of methods to describe and compare protein surfaces. Cell. Mol. Life Sci (2000) 57:1970–1977.[CrossRef][Web of Science][Medline]

    Winter C, et al. Scoppi: a structural classification of protein–protein interfaces. Nucleic Acids Res (2006) 34:D310–D314.[Abstract/Free Full Text]

    Wolfson HJ, Rigoutsos I. Geometric hashing: an overview. IEEE Comput. Sci. Eng (1997) 97:10–21.

    Zhang Y, Skolnick J. The protein structure prediction problem could be solved using the current PDB library. Proc. Natl. Acad. Sci. USA (2005) 102:1029–1034.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J R Soc InterfaceHome page
P. J. Ballester, I. Westwood, N. Laurieri, E. Sim, and W. G. Richards
Prospective virtual screening with Ultrafast Shape Recognition: the identification of novel inhibitors of arylamine N-acetyltransferases
J R Soc Interface, July 8, 2009; (2009) rsif.2009.0170v1.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
23/23/3139    most recent
btm503v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Sommer, I.
Right arrow Articles by Lengauer, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sommer, I.
Right arrow Articles by Lengauer, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?