Skip Navigation


Bioinformatics Advance Access originally published online on January 24, 2008
Bioinformatics 2008 24(6):870-871; doi:10.1093/bioinformatics/btn020
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
24/6/870    most recent
btn020v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Google Scholar
Right arrow Articles by Sippl, M. J.
Right arrow Articles by Wiederstein, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sippl, M. J.
Right arrow Articles by Wiederstein, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

A discrete view on fold space

Manfred J. Sippl *, Stefan J. Suhrer , Markus Gruber and Markus Wiederstein

Center of Applied Molecular Engineering, Division of Bioinformatics, Department of Molecular Biology, University of Salzburg, Hellbrunnerstr. 34, 5020 Salzburg, Austria

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: The database of known protein structures contains an overwhelming number of structural similarities that frequently point to intriguing biological relationships. The similarities are often difficult to spot, and once detected their comprehension needs proper visualization. Here we introduce the new concept of a Fold Space Navigator, a user interface enabling the efficient navigation through fold space and the instantaneous visualization of pairwise structure similarities.

Availability: The Fold Space Navigator is accessible as a public web service at http://services.came.sbg.ac.at

Contact: sippl{at}came.sbg.ac.at

With the number of known protein structures soon surpassing 50 000 PDB entries (Berman et al., 2000) wide regions of fold space have become densely populated. Comprehension of this vast amount of information requires efficient computational tools for structure alignment and superposition and adequate data structures for storage and retrieval of structural relationships. Moreover, exploration of the contents of such databases demands suitable user interfaces for the systematic navigation through fold space and the instantaneous visualization of structural similarities.

We have implemented a software application, called Fold Space Navigator, to meet these requirements. The navigator is part of our ongoing efforts to build quantified Classifications Of Protein Structures, called qCOPS, pronounced as in ‘queue cops’ (Feng and Sippl, 1996; Sippl, 1982; Sippl et al., 2001; Sippl and Wiederstein, 2008; Suhrer et al., 2007a, b). We have now finished a web service to make the navigator available to the structural community. The navigator provides simultaneous access to qCOPS, and the popular classifications SCOP (Andreeva et al., 2007) and CATH (Greene et al., 2007). The chief goal of this communication is to provide a brief summary of this service and a primer for navigation where qCOPS, SCOP and CATH serve as three distinct representations of the currently known regions of fold space.

The qCOPS database uses relative similarities between protein domains a and b defined as sa, b = 100 x 2Sa, b/(La + Lb) where Sa,b is the number of structurally equivalent residues and La and Lb are the sequence lengths of the respective domains (Sippl and Wiederstein, 2008). Protein structures are organized in the form of a hierarchical tree where protein domains are represented as nodes and where edges between nodes correspond to pairwise structure similarities. The navigator provides direct access to tree nodes and enables the navigation along tree edges and through hierarchical layers.

The SCOP and CATH databases have many features in common. Both are structure-based classifications but at the same time they are eclectic collections of various categories of relationships like sequence similarity, chemical function, biological role and protein evolution which frequently take precedence over structure similarity. qCOPS on the other hand is strictly based on quantified structure similarity. The SCOP and CATH classifications are organized in a hierarchical scheme where the individual layers correspond to subsets or groups of protein domains. In SCOP the major layers, from bottom to top, are called Family, Superfamily, Fold and Class. In the nomenclature of CATH, the layers are called Homologous Superfamily, Topology, Architecture and Class. This organization is equivalent to the hierarchical tree structure as described above except that metric information on the edges is not available from SCOP and CATH. Since quantification of similarities is essential for navigation and comprehension of structural neighborhoods, we have computed the required data from scratch.

Given the metric information stored in qCOPS, it is straightforward to define an arbitrary number of hierarchical layers where the spacing between layers is controlled by quantified structural relationships. The qCOPS layers currently displayed by the Fold Space Navigator are chosen to resemble the SCOP and CATH hierarchies. From bottom to top the layers of qCOPS are called Equivalent (100–99), Similar (99–90), Related (90–75), Remote (75–50), Distant (50–30) and Unrelated (30–0). Numbers in parentheses are ranges of relative similarity. The names are chosen to relate the numerical definitions to their intuitive meaning (Suhrer et al., 2007a).

To display the hierarchical tree we reuse the familiar concept of a file browser, where parent domains (nodes) correspond to folders and terminal domains (leaves) to files. This is an appealing representation which enables efficient exploration of neighborhoods of individual structures as well as extensive walks through fold space. Figure 1 shows a screen shot of the qCOPS hierarchy as displayed by the Fold Space Navigator and provides examples for navigation and visualization of structural relationships.


Figure 1
View larger version (30K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Fold Space Navigator. When a PDB code is entered a list of all domains originating from the respective PDB file is displayed. From this list a single query domain is selected and at the same time its location is shown in the hierarchical tree. In analogy to a file browser, the left window shows the path from the root to a parent node in the hierarchy and on the right all child domains of this parent are displayed (including query and parent) together with information on the structure and sequence similarities between parent and child nodes, links to the respective PDB entry and a brief description of each molecule. The figure shows the tree opened with the layer called Remote (50–75% similarity) on the left and the subfolder called Related on the right (75–90% similarity). The numbers in parentheses along the tree shown in the left box correspond to the total number of child nodes below the respective parent node. The query chosen in the example is qCOPS domain c3bdeA_ (gray bar in the right box), corresponding to the A chain of the structural genomics target 3bde, reported as a ferredoxin-like fold of unknown function. The parent of c3bdeA_ is c1vqyB_ (gray bar in the left box), which is also found in the right box. This is the only node which points to itself where the corresponding self similarity is necessarily 100%. The parent domain, c1vqyB_, corresponds to the B chain of the structural genomics target 1vqy, a member of the NIPSNAP family (Pfam PF07978). The list of structural neighbors on the right contains mostly proteins of unknown function, except for 2jdj, a protein involved in prodigiosin biosynthesis, and 2omo, a putative antibiotic biosynthesis monooxygenase (Pfam PF03992). To visualize structural relationships the respective domain names are dragged and dropped into the TopMatch web service (Sippl and Wiederstein, 2008) on the same page (not shown). Figures A–C show superimpositions of the query c3bdeA_ with the parent node c1vqyB_ (A), and the two sister nodes c2jdjB_ (B) and c2omoB_ (C), respectively. In the figures, the query domain is always in blue, the target domains in green and the regions of structural similarity are highlighted in red (query) and orange (target). Although the structures of the proteins shown in the right box are all very similar their mutual sequence similarities deduced from the structure alignments is 15% (identical residues) or less. The example shows the dramatic impact of structural genomics on our knowledge of protein structures and it is an exciting exercise to analyze these relationships in some detail. This, however, is beyond the scope of this communication. The figure was prepared using PyMOL (http://www.pymol.org).

 
The hierarchical organization of the qCOPS tree implies that the similarity among domains progressively decreases from bottom to top. This is indeed the case. On the other hand, in SCOP and CATH the progressive decrease in similarity is observed in some cases but does not hold in general since in these classifications family membership and location in the hierarchical tree may reflect a variety of relationships. Simultaneous access to all three classifications provides a wealth of information, from pure structure similarity on the one hand, to sequence similarity and expert knowledge on protein function, biological role and evolutionary kinships on the other.

The question frequently arises whether fold space is discrete or continuous. In the ensuing discussions, it is perhaps useful to distinguish the intrinsic nature of fold space from the various possible representations of fold space. SCOP and CATH are built on the concept of discrete families and similarly the hierarchical tree of qCOPS emphasizes the discrete view. Proper representations of the continuous nature of fold space require different concepts. We finally note that qCOPS is built automatically and is updated weekly with new releases of PDB. Hence, the whole repertoire of known protein structures is accessible through the Fold Space Navigator.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 ACKNOWLEDGEMENTS
 REFERENCES
 
The qCOPS classification and the structure superposition program TopMatch are provided by Proceryon GmbH under an academic license agreement.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Burkhard Rost

Received on December 21, 2007; revised on January 10, 2008; accepted on January 10, 2008

    REFERENCES
 TOP
 ABSTRACT
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Andreeva A, et al. Data growth and its impact on the SCOP database: new developments. Nucl. Acids Res, ( (2008) ) 36, . (Database issue), D419–D425..

    Berman HM, et al. Protein data bank. Nucl. Acids Res, ( (2000) ) 28, : 235–242.[Abstract/Free Full Text].

    Feng ZK, Sippl MJ. Optimum superimposition of protein structures: ambiguities and implications. Fold. Des, ( (1996) ) 1, : 123–132.[CrossRef][ISI][Medline].

    Greene LH, et al. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucl. Acids Res, ( (2007) ) 35, (Database issue): D291–D297.[Abstract/Free Full Text].

    Sippl MJ. On the problem of comparing protein structures. Development and applications of a new method for the assessment of structural similarities of polypeptide conformations. J. Mol. Biol, ( (1982) ) 156, : 359–388.[CrossRef][ISI][Medline].

    Sippl MJ, Wiederstein M. A note on difficult structure alignment problems. Bioinformatics, ( (2008) ) 24, : 426–427.[Abstract/Free Full Text].

    Sippl MJ, et al. Assessment of the CASP4 Fold Recognition Category. Proteins, ( (2001) ) 45, : 55–67.[ISI][Medline].

    Suhrer SJ, et al. QSCOP — SCOP quantified by structural relationships. Bioinformatics, ( (2007a) ) 23, : 513–514.[Abstract/Free Full Text].

    Suhrer SJ, et al. QSCOP-BLAST—fast retrieval of quantified structural information for protein sequences of unknown structure. Nucl. Acids Res, ( (2007b) ) 35, (Web Server issue): W411–W415.[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
M. J. Sippl
On distance and similarity in fold space
Bioinformatics, March 15, 2008; 24(6): 872 - 873.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
24/6/870    most recent
btn020v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Google Scholar
Right arrow Articles by Sippl, M. J.
Right arrow Articles by Wiederstein, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sippl, M. J.
Right arrow Articles by Wiederstein, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?