Bioinformatics Advance Access originally published online on January 2, 2008
Bioinformatics 2008 24(3):426-427; doi:10.1093/bioinformatics/btm622
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A note on difficult structure alignment problems
Center of Applied Molecular Engineering, Division of Bioinformatics, Department of Molecular Biology, University of Salzburg, Hellbrunnerstr. 34, 5020 Salzburg, Austria
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Progress in structural biology depends on several key technologies. In particular tools for alignment and superposition of protein structures are indispensable. Here we describe the use of the TopMatch web service, an effective computational tool for protein structure alignment, for the visualization of structural similarities, and for highlighting relationships found in protein classifications. We provide several instructive examples.
Availability: TopMatch is available as a public web service at http://services.came.sbg.ac.at
Contact: sippl{at}came.sbg.ac.at
Today we face an explosion of newly determined protein structures in part fueled by the various protein structure initiatives. As a result the public repository (PDB) will soon surpass 50 000 entries (Berman et al., 2000). This data base represents our knowledge of protein molecules but the amount of information is overwhelming. To make progress the structures need to be organized, classified and quantified in various ways. For this task and the subsequent retrieval, analysis and visualization of the often intricate relationships structure comparison techniques are indispensable.
Michael Levitt and coworkers (Kolodny et al., 2005) recently presented a most comprehensive analysis of major structure alignment programs. They remark that comparing the various programs is a delicate task and by highlighting the limitations of existing methods they conclude that there is a need for better structural alignment methods. It is indeed surprising that after half a century of protein structure research no generally accepted standards for protein structure alignment have emerged.
A particular difficulty is that as long as existing structural similarities remain undetected we cannot check whether or not any particular method is able to recognize that relationship. According to Kolodny et al., 2005 such difficult examples may be found in existing protein structure classifications by searching for similarities among distinct SCOP (Andreeva et al., 2007) folds or distinct CATH (Greene et al., 2007) architectures or topologies. Here we take up this suggestion and provide a small selection of examples drawn from ongoing classification projects. In these projects we make extensive use of a suite of structure alignment techniques called TopMatch. TopMatch is the successor of ProSup, a program previously used in several large scale structure comparison projects (e.g. Sippl et al., 2001).
We have now completed a web service to make the TopMatch program accessible to the structural biology community. The quality of alignments is essential but ease of use, speed and in particular proper visualization are important ingredients in the interpretation and analysis of structure alignments. The chief goal of this communication is to demonstrate the use of this service by a set of instructive examples drawn from ongoing structure classification initiatives (Suhrer et al., 2007a, b).
In the description of alignments we call the first structure the query (q) and the second structure the target (t). In general a query and target can be aligned in many different ways (Feng and Sippl, 1996). Hence, TopMatch reports a ranked list of alignments. The alignments are characterized by a small set of parameters. The most significant of these is the length of an alignment (the number of residue pairs that are structurally equivalent). We call this the absolute similarity S(q,t). From the alignment we compute a sequence score using a structure derived substitution matrix (Prlic et al., 2000). If this score is positive it is added to S(q,t) and this combined score is used to rank the alignments. Additional useful parameters are the root-mean-square error of superposition (RMS), percentage of sequence identity (Identity), the relative similarity s(q, t) = 100 x 2 S(q, t)/(Lq + Lt), and the relative query and target cover defined as cq = 100 x S(q,t)/Lq and ct = 100 x S(q, t)/Lt, respectively (here Lq and Lt are the respective sequence lengths). Relative similarity and relative cover are simple and intuitive measures describing the extent of mutual similarity amongst two structures.
Figure 1 illustrates the application of TopMatch using a small set of examples. We first demonstrate that for the investigation of structural similarities it is often necessary but also convenient to take into account the manifold of distinct alignments. We then present several examples that may be considered difficult in the sense of Kolodny et al., 2005 where the respective structures reside in distinct SCOP folds and CATH topologies although they share extensive structure similarity.
|
|
We note that the 2D projections shown in Figure 1 do not fully reveal the often complex, intricate, or obscure relationships. We therefore encourage the interested reader to contemplate these examples in 3D using the TopMatch service. We have spent considerable efforts to make the use of this service as convenient as possible. For example, whereas computation of structural alignments of SCOP and CATH domains and their visualization generally requires that the domain definitions are supplied by the user, TopMatch recognizes the domain names automatically. Additional information on the efficient use of TopMatch and proper interpretation of the results is provided by the web service.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
The structure superposition program TopMatch is provided by Proceryon GmbH. Figure 1 was prepared using PyMOL (http://www.pymol.org).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Burkhard Rost
Received on November 23, 2007; revised on December 12, 2007; accepted on December 13, 2007
| REFERENCES |
|---|
|
|
|---|
Andreeva A, et al. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res (2007) doi:10.1093/nar/gkm993.
Berman HM, et al. The Protein Data Bank. Nucleic Acids Res (2000) 28:235–242.
Feng ZK, Sippl MJ. Optimum superimposition of protein structures: ambiguities and implications. Fold. Des (1996) 1:123–132.[CrossRef][Web of Science][Medline]
Greene LH, et al. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res (2007) 35:D291–D297.
Kolodny R, et al. Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J. Mol. Biol (2005) 346:1173–1188.[CrossRef][Web of Science][Medline]
Prlic A, et al. Structure-derived substitution matrices for alignment of distantly related sequences. Protein Eng (2000) 13:545–550.
Sippl MJ, et al. Assessment of the CASP4 Fold Recognition Category. Proteins (2001) 45:55–67.[Web of Science][Medline]
Suhrer SJ, et al. QSCOP-BLAST–fast retrieval of quantified structural information for protein sequences of unknown structure. Nucleic Acids Res (2007a) 35(Web Server issue):W411–W415.
Suhrer SJ, et al. QSCOP–SCOP quantified by structural relationships. Bioinformatics (2007b) 23:513–514.
This article has been cited by other articles:
![]() |
T. S. Wong, S. Rajagopalan, S. M. Freund, T. J. Rutherford, A. Andreeva, F. M. Townsley, M. Petrovich, and A. R. Fersht Biophysical characterizations of human mitochondrial transcription factor A and its binding to tumor suppressor p53 Nucleic Acids Res., November 1, 2009; 37(20): 6765 - 6783. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Micheletti and H. Orland MISTRAL: a tool for energy-based multiple structural alignment of proteins Bioinformatics, October 15, 2009; 25(20): 2663 - 2669. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Suhrer, M. Wiederstein, M. Gruber, and M. J. Sippl COPS--a novel workbench for explorations in fold space Nucleic Acids Res., July 1, 2009; 37(suppl_2): W539 - W544. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. J. Lawton, L. A. Sayavedra-Soto, D. J. Arp, and A. C. Rosenzweig Crystal Structure of a Two-domain Multicopper Oxidase: IMPLICATIONS FOR THE EVOLUTION OF MULTICOPPER BLUE PROTEINS J. Biol. Chem., April 10, 2009; 284(15): 10174 - 10180. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. W. Tan, S. L. Chan, T. C. Ong, L. Y. Yit, Y. S. Tiong, F. T. Chew, J. Sivaraman, and Y. K. Mok Structures of Two Major Allergens, Bla g 4 and Per a 4, from Cockroaches and Their IgE Binding Epitopes J. Biol. Chem., January 30, 2009; 284(5): 3148 - 3157. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Carrillo-Tripp, C. M. Shepherd, I. A. Borelli, S. Venkataraman, G. Lander, P. Natarajan, J. E. Johnson, C. L. Brooks III, and V. S. Reddy VIPERdb2: an enhanced and web API enabled relational database for structural virology Nucleic Acids Res., January 1, 2009; 37(suppl_1): D436 - D442. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Sippl, S. J. Suhrer, M. Gruber, and M. Wiederstein A discrete view on fold space Bioinformatics, March 15, 2008; 24(6): 870 - 871. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Sippl On distance and similarity in fold space Bioinformatics, March 15, 2008; 24(6): 872 - 873. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



