Skip Navigation


Bioinformatics Advance Access originally published online on November 24, 2006
Bioinformatics 2007 23(4):513-514; doi:10.1093/bioinformatics/btl594
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/4/513    most recent
btl594v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Suhrer, S. J.
Right arrow Articles by Sippl, M. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Suhrer, S. J.
Right arrow Articles by Sippl, M. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

QSCOP—SCOP quantified by structural relationships

Stefan J. Suhrer , Markus Wiederstein and Manfred J. Sippl *

Center of Applied Molecular Engineering, Department of Bioinformatics, Division of Molecular Biology, University of Salzburg Hellbrunnerstraße 34, 5020 Salzburg, Austria

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 REFERENCES
 

Summary: The database SCOP (Structural Classification Of Proteins) has become a major resource in bioinformatics and protein science. A particular strength of SCOP is the flexibility of its rules enabling the preservation of the many details spotted by experts in the classification process. Here we endow classic SCOP Families with quantified structural information and comment on the structural diversity found in the SCOP hierarchy.

Availability: Quantified SCOP (QSCOP) is available as a public WEB service. http://services.came.sbg.ac.at

Contact: sippl{at}came.sbg.ac.at

SCOP defines its aim as providing ‘a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known’ (Murzin et al., 1995; Andreeva et al., 2004). SCOP is organized in three principal hierarchical levels called Family (clear evolutionary relationship), Superfamily (probable common evolutionary origin) and Fold (major structural similarity). The SCOP hierarchy is generally assumed to reflect quantitative structural relationships among protein domains where structural similarities decrease progressively from SCOP Family to Superfamily to Fold. Such assumptions are explicitly or implicitly made in many applications of SCOP ranging from the construction of various benchmark sets to the analysis of the number of new folds discovered by structural genomics efforts (Chandonia and Brenner, 2006).

However, SCOP does not quantify structural relationships nor does it comply with rigid and rigorous computational rules. In fact, a particular strength of SCOP is its flexibility regarding the criteria used in the classification process. It uses a variety of considerations that frequently take precedence over structural relations. Hints on the reasoning used in defining a particular SCOP family are often found in the SCOP browser in the form of free formatted text. For example, the defining feature of SCOP family a.138.1.3, called ‘di-heme elbow motif’, is a short motif frequently found in multiple copies in heme binding proteins. Another example is SCOP family b.71.1.1, called ‘alpha-amylases C-terminal beta-sheet domain’, which is defined in terms of the relative position of the domain along the amino acid sequence, i.e. ‘this domain follows the catalytic beta/alpha barrel domain’. Such criteria do not contain any information on the structural similarity among the members of SCOP families. Consequently, as exemplified in Figure 1, SCOP families are often structurally diverse and this diversity is inherited to the superordinate Superfamily and Fold levels of SCOP.


Figure 1
View larger version (26K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Structural diversity of SCOP families. Each pair of structures shown corresponds to two members of the same SCOP family which belong to two distinct Distant QSCOP groups and thus have <50% equivalent residues (relative to the larger protein). The first protein of each pair is shown in blue, the second in green. Regions of similar structure are indicated in red. (a) The two domains d1fgja_ (hydroxylamine oxidoreductase) and d1m1qa_ (flavocytochrome c3) share 42 equivalent residues (red, rms 2.6 Å); the sequence identity in the superimposed region (red) is 40%. The domains belong to the same SCOP family a.138.1.3, ‘di-heme elbow motif’ which splits in eight QSCOP Distant groups. (b) Domains d1iqpd2 (replication factor C) and d1u0ja_ (rep 40 protein helicase domain); 99 equivalent residues; sequence identity 20%; the protruding subdomains reside at opposite ends of the chains and are unrelated. The number of equivalent residues is <50% even if the subdomains are removed. The associated SCOP family, c.37.1.20, ‘extended AAA-ATPase domain’, contains six Distant QSCOP groups. (c) d.26.1.1, ‘FKBP immunophilin/proline isomerase’, domains d1pina2 and d1bkf_. (d) b.71.1.1, ‘alpha-Amylases, C-terminal beta-sheet domain’, domains d1hx0a1 and d1ht6a1.

 
It follows that assumptions on the extent of structural similarity within the SCOP hierarchy are generally unfounded and the naive use of SCOP may result in improper, imprecise and perhaps erroneous conclusions. On the other hand, investigation of the structural variability within the SCOP hierarchy requires sophisticated structure comparison tools and considerable computational resources which are inaccessible to the general user.

QSCOP extends classic SCOP by four additional layers, called Distant, Related, Similar and Equivalent. These layers are defined entirely in terms of quantitative structural relationships among the domains of individual SCOP Families. Other criteria like sequence similarity and functional similarity are deliberately neglected. In this way QSCOP quantifies the structural diversity of SCOP families leaving the integrity and contents of classic SCOP untouched. The structural diversity, also called granularity, of classic SCOP families is represented by the number of distinct groups in the QSCOP sub-layers.

QSCOP quantifies structural similarity in terms of the number of structurally equivalent residue pairs shared by two SCOP domains. The respective values are obtained from the superimposition of two SCOP domains using structure superposition tools like ProSup (Feng and Sippl, 1996) applied in the context of large datasets (Sippl et al., 2001). The similarity of pairs of structures is then expressed as the percentage of equivalent residues relative to the length of the larger domain. For example, SCOP families that split in two or more Distant QSCOP groups contain pairs of domains sharing <50% equivalent residues. Vice versa, two domains within the same QSCOP distant group share ≥50% equivalent residues. The respective thresholds for the remaining QSCOP layers are 75 (Related), 95 (Similar) and 100% (Equivalent). In particular, the groups in the Equivalent layer contain domains of identical sequence length whose structures can be superimposed to an rms error of <2.5 Å.

The nomenclature used for the QSCOP layers reflects the extent of structure similarity. For example, two domains taken from the same Equivalent group are practically identical so that one of the structures provides an excellent model for all the others in the same group. On the other extreme, two domains found in the same classic SCOP Family but in two distinct Distant QSCOP groups share <50% equivalent residues. As exemplified in Figure 1, such pairs of structures are quite dissimilar and in general it will be a difficult and often impossible exercise to predict the structure of one domain based on the knowledge of the other. Such considerations are particularly relevant for the selection and determination of structural genomics targets, where the declared goal is to establish a complete set of protein structures that can be used as a basis for the modeling of all other proteins.

The most recent SCOP release (1.69) classifies roughly 70 000 protein domains into 3114 SCOP families. There are 2653 families containing >1 domain and 431 of these split into ≥2 Distant QSCOP groups. Such families are characterized by considerable structural diversity (Fig. 1). Naturally, the potential structural diversity of a SCOP family primarily depends on the number of individual domains it contains. Families with only one or a few domains necessarily have low diversity. Table 1 exemplifies the granularity of several SCOP families. For example, the 522 domains of the most diverse SCOP family, i.1.1.1, split into 48 Distant (D), 54 Related (R), 65 Similar (S), and 68 Equivalent (E) QSCOP groups. This family belongs to the class called ‘low resolution protein structures’ which may be considered as a receptacle for unclassified structures. But also the more elaborated SCOP families, residing in the all alpha (a), all beta (b), alpha/beta (c) and alpha+beta (d) classes, frequently split in ≥2 distinct QSCOP groups (Table 1).


View this table:
[in this window]
[in a new window]

 
Table 1 Granularity of several SCOP families

 
The QSCOP online service maintains a complete list of the granularity of all classic SCOP families and provides quick access to subsets of domains of individual SCOP families that have defined structural relationships as defined by the QSCOP layers.


    Acknowledgments
 
The structure superposition programs ProHit/ProSup and TopMatch used to construct QSCOP were provided by Proceryon GmbH which is gratefully acknowledged. Figure 1 was prepared using Rasmol, Molscript and Raster3d. This work was supported by FWF Austria, grant number P13710-MOB.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Anna Tramontano

Received on September 22, 2006; revised on November 17, 2006; accepted on November 20, 2006

    REFERENCES
 TOP
 ABSTRACT
 REFERENCES
 

    Andreeva, A., et al. (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res, . 32, D226–D229[Abstract/Free Full Text].

    Chandonia, J.-M. and Brenner, S.E. (2006) The impact of structural genomics: expectations and outcomes. Science, 311, 347–351[Abstract/Free Full Text].

    Feng, Z.K. and Sippl, M.J. (1996) Optimum superimposition of protein structures: ambiguities and implications. Fold. Des, . 1, 123–132[CrossRef][Web of Science][Medline].

    Murzin, A.G., et al. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol, . 247, 536–540[CrossRef][Web of Science][Medline].

    Sippl, M.J., et al. (2001) Assessment of the CASP4 fold recognition category. Proteins, 45, 55–67[Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
S. J. Suhrer, M. Wiederstein, M. Gruber, and M. J. Sippl
COPS--a novel workbench for explorations in fold space
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W539 - W544.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. J. Sippl, S. J. Suhrer, M. Gruber, and M. Wiederstein
A discrete view on fold space
Bioinformatics, March 15, 2008; 24(6): 870 - 871.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. J. Sippl
On distance and similarity in fold space
Bioinformatics, March 15, 2008; 24(6): 872 - 873.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. J. Sippl and M. Wiederstein
A note on difficult structure alignment problems
Bioinformatics, February 1, 2008; 24(3): 426 - 427.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/4/513    most recent
btl594v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Suhrer, S. J.
Right arrow Articles by Sippl, M. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Suhrer, S. J.
Right arrow Articles by Sippl, M. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?