Bioinformatics Advance Access originally published online on January 24, 2008
Bioinformatics 2008 24(6):870-871; doi:10.1093/bioinformatics/btn020
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A discrete view on fold space
Center of Applied Molecular Engineering, Division of Bioinformatics, Department of Molecular Biology, University of Salzburg, Hellbrunnerstr. 34, 5020 Salzburg, Austria
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: The database of known protein structures contains an overwhelming number of structural similarities that frequently point to intriguing biological relationships. The similarities are often difficult to spot, and once detected their comprehension needs proper visualization. Here we introduce the new concept of a Fold Space Navigator, a user interface enabling the efficient navigation through fold space and the instantaneous visualization of pairwise structure similarities.
Availability: The Fold Space Navigator is accessible as a public web service at http://services.came.sbg.ac.at
Contact: sippl{at}came.sbg.ac.at
With the number of known protein structures soon surpassing 50 000 PDB entries (Berman et al., 2000) wide regions of fold space have become densely populated. Comprehension of this vast amount of information requires efficient computational tools for structure alignment and superposition and adequate data structures for storage and retrieval of structural relationships. Moreover, exploration of the contents of such databases demands suitable user interfaces for the systematic navigation through fold space and the instantaneous visualization of structural similarities.
We have implemented a software application, called Fold Space Navigator, to meet these requirements. The navigator is part of our ongoing efforts to build quantified Classifications Of Protein Structures, called qCOPS, pronounced as in queue cops (Feng and Sippl, 1996; Sippl, 1982; Sippl et al., 2001; Sippl and Wiederstein, 2008; Suhrer et al., 2007a, b). We have now finished a web service to make the navigator available to the structural community. The navigator provides simultaneous access to qCOPS, and the popular classifications SCOP (Andreeva et al., 2007) and CATH (Greene et al., 2007). The chief goal of this communication is to provide a brief summary of this service and a primer for navigation where qCOPS, SCOP and CATH serve as three distinct representations of the currently known regions of fold space.
The qCOPS database uses relative similarities between protein domains a and b defined as sa, b = 100 x 2Sa, b/(La + Lb) where Sa,b is the number of structurally equivalent residues and La and Lb are the sequence lengths of the respective domains (Sippl and Wiederstein, 2008). Protein structures are organized in the form of a hierarchical tree where protein domains are represented as nodes and where edges between nodes correspond to pairwise structure similarities. The navigator provides direct access to tree nodes and enables the navigation along tree edges and through hierarchical layers.
The SCOP and CATH databases have many features in common. Both are structure-based classifications but at the same time they are eclectic collections of various categories of relationships like sequence similarity, chemical function, biological role and protein evolution which frequently take precedence over structure similarity. qCOPS on the other hand is strictly based on quantified structure similarity. The SCOP and CATH classifications are organized in a hierarchical scheme where the individual layers correspond to subsets or groups of protein domains. In SCOP the major layers, from bottom to top, are called Family, Superfamily, Fold and Class. In the nomenclature of CATH, the layers are called Homologous Superfamily, Topology, Architecture and Class. This organization is equivalent to the hierarchical tree structure as described above except that metric information on the edges is not available from SCOP and CATH. Since quantification of similarities is essential for navigation and comprehension of structural neighborhoods, we have computed the required data from scratch.
Given the metric information stored in qCOPS, it is straightforward to define an arbitrary number of hierarchical layers where the spacing between layers is controlled by quantified structural relationships. The qCOPS layers currently displayed by the Fold Space Navigator are chosen to resemble the SCOP and CATH hierarchies. From bottom to top the layers of qCOPS are called Equivalent (100–99), Similar (99–90), Related (90–75), Remote (75–50), Distant (50–30) and Unrelated (30–0). Numbers in parentheses are ranges of relative similarity. The names are chosen to relate the numerical definitions to their intuitive meaning (Suhrer et al., 2007a).
To display the hierarchical tree we reuse the familiar concept of a file browser, where parent domains (nodes) correspond to folders and terminal domains (leaves) to files. This is an appealing representation which enables efficient exploration of neighborhoods of individual structures as well as extensive walks through fold space. Figure 1 shows a screen shot of the qCOPS hierarchy as displayed by the Fold Space Navigator and provides examples for navigation and visualization of structural relationships.
|
The hierarchical organization of the qCOPS tree implies that the similarity among domains progressively decreases from bottom to top. This is indeed the case. On the other hand, in SCOP and CATH the progressive decrease in similarity is observed in some cases but does not hold in general since in these classifications family membership and location in the hierarchical tree may reflect a variety of relationships. Simultaneous access to all three classifications provides a wealth of information, from pure structure similarity on the one hand, to sequence similarity and expert knowledge on protein function, biological role and evolutionary kinships on the other.
The question frequently arises whether fold space is discrete or continuous. In the ensuing discussions, it is perhaps useful to distinguish the intrinsic nature of fold space from the various possible representations of fold space. SCOP and CATH are built on the concept of discrete families and similarly the hierarchical tree of qCOPS emphasizes the discrete view. Proper representations of the continuous nature of fold space require different concepts. We finally note that qCOPS is built automatically and is updated weekly with new releases of PDB. Hence, the whole repertoire of known protein structures is accessible through the Fold Space Navigator.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
The qCOPS classification and the structure superposition program TopMatch are provided by Proceryon GmbH under an academic license agreement.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Burkhard Rost
Received on December 21, 2007; revised on January 10, 2008; accepted on January 10, 2008
| REFERENCES |
|---|
|
|
|---|
Andreeva A, et al. Data growth and its impact on the SCOP database: new developments. Nucl. Acids Res (2008) 36. (Database issue), D419–D425.
Berman HM, et al. Protein data bank. Nucl. Acids Res (2000) 28:235–242.
Feng ZK, Sippl MJ. Optimum superimposition of protein structures: ambiguities and implications. Fold. Des (1996) 1:123–132.[CrossRef][Web of Science][Medline]
Greene LH, et al. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucl. Acids Res (2007) 35(Database issue):D291–D297.
Sippl MJ. On the problem of comparing protein structures. Development and applications of a new method for the assessment of structural similarities of polypeptide conformations. J. Mol. Biol (1982) 156:359–388.[CrossRef][Web of Science][Medline]
Sippl MJ, Wiederstein M. A note on difficult structure alignment problems. Bioinformatics (2008) 24:426–427.
Sippl MJ, et al. Assessment of the CASP4 Fold Recognition Category. Proteins (2001) 45:55–67.[Web of Science][Medline]
Suhrer SJ, et al. QSCOP — SCOP quantified by structural relationships. Bioinformatics (2007a) 23:513–514.
Suhrer SJ, et al. QSCOP-BLAST—fast retrieval of quantified structural information for protein sequences of unknown structure. Nucl. Acids Res (2007b) 35(Web Server issue):W411–W415.
This article has been cited by other articles:
![]() |
S. J. Suhrer, M. Wiederstein, M. Gruber, and M. J. Sippl COPS--a novel workbench for explorations in fold space Nucleic Acids Res., July 1, 2009; 37(suppl_2): W539 - W544. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Carrillo-Tripp, C. M. Shepherd, I. A. Borelli, S. Venkataraman, G. Lander, P. Natarajan, J. E. Johnson, C. L. Brooks III, and V. S. Reddy VIPERdb2: an enhanced and web API enabled relational database for structural virology Nucleic Acids Res., January 1, 2009; 37(suppl_1): D436 - D442. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Cuff, I. Sillitoe, T. Lewis, O. C. Redfern, R. Garratt, J. Thornton, and C. A. Orengo The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies Nucleic Acids Res., January 1, 2009; 37(suppl_1): D310 - D314. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Sippl On distance and similarity in fold space Bioinformatics, March 15, 2008; 24(6): 872 - 873. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


