Skip Navigation


Bioinformatics Advance Access originally published online on March 1, 2005
Bioinformatics 2005 21(10):2537-2538; doi:10.1093/bioinformatics/bti331
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2537    most recent
bti331v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Torrance, G. M.
Right arrow Articles by Westhead, D. W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Torrance, G. M.
Right arrow Articles by Westhead, D. W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Protein structure topological comparison, discovery and matching service

G. M. Torrance 1, D. R. Gilbert 1,*, I. Michalopoulos 2 and D. W. Westhead 2

1Bioinformatics Research Centre, Department of Computer Science, University of Glasgow Glasgow G12 8QQ UK
2School of Biochemistry and Microbiology, University of Leeds Leeds LS2 9JT, UK

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 COMPARISON
 DISCOVERY
 MATCHING
 MULTIPLE STRUCTURE ALIGNMENT
 DISPLAY
 IMPLEMENTATION
 REFERENCES
 

Summary: We describe a fold level fast protein comparison and motif matching facility based on the TOPS representation of structure. This provides an update to a previous service at the EBI, with a better graph matching with faster results and visualization of both the structures being compared against and the common pattern of each with the target domain.

Availability: Web service at http://balabio.dcs.gla.ac.uk/tops or via the main TOPS site at http://www.tops.leeds.ac.uk. Software is also available for download from these sites.

Contact: tops{at}brc.dcs.gla.ac.uk


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 COMPARISON
 DISCOVERY
 MATCHING
 MULTIPLE STRUCTURE ALIGNMENT
 DISPLAY
 IMPLEMENTATION
 REFERENCES
 
Protein topological comparison provides a means of assessing structural similarities between distantly related domains. In addition, the abstract idea of a ‘fold’ can be expressed as a motif (pattern), and searched for in the database of known structures. Finally, discovery of patterns for sets of structures is a simple extension of the pairwise pattern discovery used for comparison.

Two of these services, comparison and matching, have been available from a site at the EBI for a while (Gilbert et al., 1999; 2001), and this update provides faster computation and visualization of the results both as topology cartoons (Westhead et al., 1999) and diagrams. The availability of the results immediately (Fig. 1) rather than as an email reply is due to the incorporation of a new faster graph-matching algorithm which exploits the fact that the graphs are vertex oriented. Full details of the algorithm can be found in Viksna and Gilbert (2001). Some of these services have been integrated with the main TOPS site at Leeds (Michalopoulos et al., 2004).



View larger version (40K):
[in this window]
[in a new window]
 
Fig. 1 Matching results.

 
Comparison is here defined as pairwise pattern discovery; matching is determination of the presence or absence of a pattern as a subgraph of a structure graph. Indeed, pattern discovery relies on matching as it tries to find the largest pattern that will match all the members of a set of structures.


    COMPARISON
 TOP
 Abstract
 INTRODUCTION
 COMPARISON
 DISCOVERY
 MATCHING
 MULTIPLE STRUCTURE ALIGNMENT
 DISPLAY
 IMPLEMENTATION
 REFERENCES
 
Structures for comparison are submitted as coordinate files that are converted by DSSP (Kabsch and Sander, 1983) and the TOPS program (Gilbert et al., 2001) into a more abstract topological form. This topology graph is then compared by pairwise pattern discovery with subsets of a database of other, precomputed, domain graphs. Domain definitions from both the CATH and SCOP classifications are available, with subsets of these classifications at different levels; homologous superfamily, topology for CATH; superfamily, fold for SCOP.

The purpose of the comparison service is to give an overview of how a structure relates to all of its neighbours—and this can quickly be achieved by a comparison with the CATH topology representatives (or the SCOP folds). The results are ranked by a value called compression, which is a measure of how well the common pattern between two structures describes those structures. When comparing a probe structure to a database, this value can be useful for distinguishing reasonable hits from more distant similarities.


    DISCOVERY
 TOP
 Abstract
 INTRODUCTION
 COMPARISON
 DISCOVERY
 MATCHING
 MULTIPLE STRUCTURE ALIGNMENT
 DISPLAY
 IMPLEMENTATION
 REFERENCES
 
Common patterns can be discovered for sets of more than two structures. As pattern discovery scales well with the number of graphs in the input set, there is no practical limit to the size of family that could be used. A service is provided for ‘browsing’ the common patterns of groups of structures based on the representative structures for a particular level of the CATH classification tree. These dynamically discovered patterns can be matched back against the database to determine the number of structures it matches outside of the group that it was generated from. This gives an idea of how specific the pattern is for that group.


    MATCHING
 TOP
 Abstract
 INTRODUCTION
 COMPARISON
 DISCOVERY
 MATCHING
 MULTIPLE STRUCTURE ALIGNMENT
 DISPLAY
 IMPLEMENTATION
 REFERENCES
 
Several predefined ‘classic’ patterns are provided to match against the database (Fig. 2). These are based on widely known fold types (the superfolds: Rossmann folds, TIM barrels, jellyrolls, OB fold, plaits and immunoglobulins). To determine the answers to more abstract research questions like ‘what sizes of porin exist?’, there is a form for user-defined patterns which must be encoded in a compact ‘string’ graph. This format is similar in principle to that recently used for fold comparison by alignment of strings (Jonassen and Taylor, 2003) although in our case the strings are only a representation, not a data format.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 2 Predefined patterns

 

    MULTIPLE STRUCTURE ALIGNMENT
 TOP
 Abstract
 INTRODUCTION
 COMPARISON
 DISCOVERY
 MATCHING
 MULTIPLE STRUCTURE ALIGNMENT
 DISPLAY
 IMPLEMENTATION
 REFERENCES
 
The patterns discovered by TOPS can be thought of as aligned cores of a set of structures. Two other programs use this capability of multiple structural alignment as the basis for alignment of multiple sequences.

The MSAT—multiple sequence alignment by topology—site (Ren et al., 2004) has alignments of sequences driven by the common TOPS pattern of their structures. The sequence segments corresponding to those secondary structure elements that have been matched are aligned with ClustalW. The Topsalign program, on the other hand, uses simulated annealing to optimize the initial TOPS alignment of the secondary structure elements (Williams et al., 2003).


    DISPLAY
 TOP
 Abstract
 INTRODUCTION
 COMPARISON
 DISCOVERY
 MATCHING
 MULTIPLE STRUCTURE ALIGNMENT
 DISPLAY
 IMPLEMENTATION
 REFERENCES
 
The graphical display of each common pattern as a diagram (similar to those first produced by Koch et al., 1992) gives a more obvious, and more comprehensible, idea of why a particular structure was ranked at that position on the result list. Since the common pattern is an intermediate result in the process of comparison, it is easily available for display and does not require any further calculation. Cartoons have also been pre-generated for all the structures whose graphs are in the database, and these are shown alongside each result. These cartoons are easily scaled since they are stored as coordinates and images are generated anew each time; they are also available in postscript, pdf and svg formats.

The cartoon drawing is largely independent of the comparison, matching and discovery services. The same goes for the linear diagrams, which more easily allows the two of them to be reused as a web page result. Therefore, it can be used by other sites to visualize their results. For example, MSAT patterns can be displayed as highlights on the cartoons of a set of protein topologies. Even though the pictures are generated dynamically, the cgi parameters are constructed into a pseudo-url so that browsers will use the correct filename when saving the image.


    IMPLEMENTATION
 TOP
 Abstract
 INTRODUCTION
 COMPARISON
 DISCOVERY
 MATCHING
 MULTIPLE STRUCTURE ALIGNMENT
 DISPLAY
 IMPLEMENTATION
 REFERENCES
 
All the software has been written in Java, apart from two accessory C programs (DSSP and TOPS). The service is implemented as servlets running in an Apache Tomcat container on a linux webserver. Picture generation is enabled by the pure java AWT (pja) toolkit. A mysql database is also part of the system.


    Acknowledgments
 
GMT and IM were supported by a joint BBSRC/EPSRC grant, project number 320/BIO14429.

Received on July 23, 2004; revised on February 4, 2005; accepted on February 14, 2005

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 COMPARISON
 DISCOVERY
 MATCHING
 MULTIPLE STRUCTURE ALIGNMENT
 DISPLAY
 IMPLEMENTATION
 REFERENCES
 

    Gilbert, D., et al. (1999) Motif-based searching in TOPS protein topology databases. Bioinformatics, 15, 317–326[Abstract/Free Full Text].

    Gilbert, D., et al. (2001) A computer system to perform structure comparison using TOPS representations of protein structure. Comput. Chem., 26, 23–30[CrossRef][Web of Science][Medline].

    Johannissen, L.O. and Taylor, W.R. (2003) Protein fold comparison by the alignment of topological strings. Protein Eng., 12, 949–955.

    Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637[CrossRef][Web of Science][Medline].

    Koch, I., et al. (1992) Analysis of protein sheet topologies by graph theoretical methods. Proteins, 12, 314–323[CrossRef][Web of Science][Medline].

    Michalopoulos, I., et al. (2004) TOPS: an enhanced database of protein structural topology. Nucleic Acid Res., 32, D251–D254[Abstract/Free Full Text].

    Ren, T., et al. (2004) MSAT: a multiple sequence alignment tool based on TOPS. Appl. Bioinformatics, 3, 149–158[CrossRef][Medline].

    Viksna, J. and Gilbert, D. (2001) Pattern matching and pattern discovery algorithms for protein topologies. Algorithms in Bioinformatics: First International Workshop, WABI 2001 Proceedings Vol. 2149, , pp. 98–111 Lecture Notes in Computer Science. ISBN 3–540–42516–0.

    Westhead, D., et al. (1999) Protein structural topology: automated analysis, diagrammatic representation and database searching. Protein Sci., 8, 897–904[Web of Science][Medline].

    Williams, A., et al. (2003) Multiple structural alignment for distantly related all B structures using TOPS pattern discovery and simulated annealing. Protein Eng., 16, 913–23[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
S. Shi, B. Chitturi, and N. V. Grishin
ProSMoS server: a pattern-based search using interaction matrix representation of protein structures
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W526 - W531.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Veeramalai and D. Gilbert
A novel method for comparing topological models of protein structures enhanced with ligand information
Bioinformatics, December 1, 2008; 24(23): 2698 - 2705.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Shi, Y. Zhong, I. Majumdar, S. Sri Krishna, and N. V. Grishin
Searching for three-dimensional secondary structural patterns in proteins with ProSMoS
Bioinformatics, June 1, 2007; 23(11): 1331 - 1338.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2537    most recent
bti331v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Torrance, G. M.
Right arrow Articles by Westhead, D. W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Torrance, G. M.
Right arrow Articles by Westhead, D. W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?