Bioinformatics Advance Access originally published online on March 1, 2005
Bioinformatics 2005 21(10):2537-2538; doi:10.1093/bioinformatics/bti331
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Protein structure topological comparison, discovery and matching service
1Bioinformatics Research Centre, Department of Computer Science, University of Glasgow Glasgow G12 8QQ UK
2School of Biochemistry and Microbiology, University of Leeds Leeds LS2 9JT, UK
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: We describe a fold level fast protein comparison and motif matching facility based on the TOPS representation of structure. This provides an update to a previous service at the EBI, with a better graph matching with faster results and visualization of both the structures being compared against and the common pattern of each with the target domain.
Availability: Web service at http://balabio.dcs.gla.ac.uk/tops or via the main TOPS site at http://www.tops.leeds.ac.uk. Software is also available for download from these sites.
Contact: tops{at}brc.dcs.gla.ac.uk
| INTRODUCTION |
|---|
|
|
|---|
Protein topological comparison provides a means of assessing structural similarities between distantly related domains. In addition, the abstract idea of a fold can be expressed as a motif (pattern), and searched for in the database of known structures. Finally, discovery of patterns for sets of structures is a simple extension of the pairwise pattern discovery used for comparison.
Two of these services, comparison and matching, have been available from a site at the EBI for a while (Gilbert et al., 1999; 2001), and this update provides faster computation and visualization of the results both as topology cartoons (Westhead et al., 1999) and diagrams. The availability of the results immediately (Fig. 1) rather than as an email reply is due to the incorporation of a new faster graph-matching algorithm which exploits the fact that the graphs are vertex oriented. Full details of the algorithm can be found in Viksna and Gilbert (2001). Some of these services have been integrated with the main TOPS site at Leeds (Michalopoulos et al., 2004).
|
Comparison is here defined as pairwise pattern discovery; matching is determination of the presence or absence of a pattern as a subgraph of a structure graph. Indeed, pattern discovery relies on matching as it tries to find the largest pattern that will match all the members of a set of structures.
| COMPARISON |
|---|
|
|
|---|
Structures for comparison are submitted as coordinate files that are converted by DSSP (Kabsch and Sander, 1983) and the TOPS program (Gilbert et al., 2001) into a more abstract topological form. This topology graph is then compared by pairwise pattern discovery with subsets of a database of other, precomputed, domain graphs. Domain definitions from both the CATH and SCOP classifications are available, with subsets of these classifications at different levels; homologous superfamily, topology for CATH; superfamily, fold for SCOP.
The purpose of the comparison service is to give an overview of how a structure relates to all of its neighboursand this can quickly be achieved by a comparison with the CATH topology representatives (or the SCOP folds). The results are ranked by a value called compression, which is a measure of how well the common pattern between two structures describes those structures. When comparing a probe structure to a database, this value can be useful for distinguishing reasonable hits from more distant similarities.
| DISCOVERY |
|---|
|
|
|---|
Common patterns can be discovered for sets of more than two structures. As pattern discovery scales well with the number of graphs in the input set, there is no practical limit to the size of family that could be used. A service is provided for browsing the common patterns of groups of structures based on the representative structures for a particular level of the CATH classification tree. These dynamically discovered patterns can be matched back against the database to determine the number of structures it matches outside of the group that it was generated from. This gives an idea of how specific the pattern is for that group.
| MATCHING |
|---|
|
|
|---|
Several predefined classic patterns are provided to match against the database (Fig. 2). These are based on widely known fold types (the superfolds: Rossmann folds, TIM barrels, jellyrolls, OB fold, plaits and immunoglobulins). To determine the answers to more abstract research questions like what sizes of porin exist?, there is a form for user-defined patterns which must be encoded in a compact string graph. This format is similar in principle to that recently used for fold comparison by alignment of strings (Jonassen and Taylor, 2003) although in our case the strings are only a representation, not a data format.
|
| MULTIPLE STRUCTURE ALIGNMENT |
|---|
|
|
|---|
The patterns discovered by TOPS can be thought of as aligned cores of a set of structures. Two other programs use this capability of multiple structural alignment as the basis for alignment of multiple sequences.
The MSATmultiple sequence alignment by topologysite (Ren et al., 2004) has alignments of sequences driven by the common TOPS pattern of their structures. The sequence segments corresponding to those secondary structure elements that have been matched are aligned with ClustalW. The Topsalign program, on the other hand, uses simulated annealing to optimize the initial TOPS alignment of the secondary structure elements (Williams et al., 2003).
| DISPLAY |
|---|
|
|
|---|
The graphical display of each common pattern as a diagram (similar to those first produced by Koch et al., 1992) gives a more obvious, and more comprehensible, idea of why a particular structure was ranked at that position on the result list. Since the common pattern is an intermediate result in the process of comparison, it is easily available for display and does not require any further calculation. Cartoons have also been pre-generated for all the structures whose graphs are in the database, and these are shown alongside each result. These cartoons are easily scaled since they are stored as coordinates and images are generated anew each time; they are also available in postscript, pdf and svg formats.
The cartoon drawing is largely independent of the comparison, matching and discovery services. The same goes for the linear diagrams, which more easily allows the two of them to be reused as a web page result. Therefore, it can be used by other sites to visualize their results. For example, MSAT patterns can be displayed as highlights on the cartoons of a set of protein topologies. Even though the pictures are generated dynamically, the cgi parameters are constructed into a pseudo-url so that browsers will use the correct filename when saving the image.
| IMPLEMENTATION |
|---|
|
|
|---|
All the software has been written in Java, apart from two accessory C programs (DSSP and TOPS). The service is implemented as servlets running in an Apache Tomcat container on a linux webserver. Picture generation is enabled by the pure java AWT (pja) toolkit. A mysql database is also part of the system.
| Acknowledgments |
|---|
GMT and IM were supported by a joint BBSRC/EPSRC grant, project number 320/BIO14429.
Received on July 23, 2004; revised on February 4, 2005; accepted on February 14, 2005
| REFERENCES |
|---|
|
|
|---|
Gilbert, D., et al. (1999) Motif-based searching in TOPS protein topology databases. Bioinformatics, 15, 317326
Gilbert, D., et al. (2001) A computer system to perform structure comparison using TOPS representations of protein structure. Comput. Chem., 26, 2330[CrossRef][Web of Science][Medline].
Johannissen, L.O. and Taylor, W.R. (2003) Protein fold comparison by the alignment of topological strings. Protein Eng., 12, 949955.
Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 25772637[CrossRef][Web of Science][Medline].
Koch, I., et al. (1992) Analysis of protein sheet topologies by graph theoretical methods. Proteins, 12, 314323[CrossRef][Web of Science][Medline].
Michalopoulos, I., et al. (2004) TOPS: an enhanced database of protein structural topology. Nucleic Acid Res., 32, D251D254
Ren, T., et al. (2004) MSAT: a multiple sequence alignment tool based on TOPS. Appl. Bioinformatics, 3, 149158[CrossRef][Medline].
Viksna, J. and Gilbert, D. (2001) Pattern matching and pattern discovery algorithms for protein topologies. Algorithms in Bioinformatics: First International Workshop, WABI 2001 Proceedings Vol. 2149, , pp. 98111 Lecture Notes in Computer Science. ISBN 3540425160.
Westhead, D., et al. (1999) Protein structural topology: automated analysis, diagrammatic representation and database searching. Protein Sci., 8, 897904[Web of Science][Medline].
Williams, A., et al. (2003) Multiple structural alignment for distantly related all B structures using TOPS pattern discovery and simulated annealing. Protein Eng., 16, 91323
This article has been cited by other articles:
![]() |
S. Shi, B. Chitturi, and N. V. Grishin ProSMoS server: a pattern-based search using interaction matrix representation of protein structures Nucleic Acids Res., July 1, 2009; 37(suppl_2): W526 - W531. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Veeramalai and D. Gilbert A novel method for comparing topological models of protein structures enhanced with ligand information Bioinformatics, December 1, 2008; 24(23): 2698 - 2705. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Shi, Y. Zhong, I. Majumdar, S. Sri Krishna, and N. V. Grishin Searching for three-dimensional secondary structural patterns in proteins with ProSMoS Bioinformatics, June 1, 2007; 23(11): 1331 - 1338. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



