Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (20)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bolten, E.
Right arrow Articles by Schrader, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bolten, E.
Right arrow Articles by Schrader, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 17 no. 10 2001
Pages 935-941
© 2001 Oxford University Press

Clustering protein sequences—structure prediction by transitive homology

Eva Bolten 3, Alexander Schliep 2,*, Sebastian Schneckener 3, Dietmar Schomburg 1 and Rainer Schrader 2

1 Institut für Biochemie
2 ZAIK/ZPR, Universität zu Köln, Weyertal 80, D-50937 Köln, Germany
3 Science Factory, Köln, Germany

Received on April 20, 2001 ; revised on July 9, 2001 ; accepted on July 9, 2001

Motivation: It is widely believed that for two proteins Aand Ba sequence identity above some threshold implies structural similarity due to a common evolutionary ancestor. Since this is only a sufficient, but not a necessary condition for structural similarity, the question remains what other criteria can be used to identify remote homologues. Transitivity refers to the concept of deducing a structural similarity between proteins Aand Cfrom the existence of a third protein B, such that Aand Bas well as Band Care homologues, as ascertained if the sequence identity between Aand Bas well as that between Band Cis above the aforementioned threshold. It is not fully understood if transitivity always holds and whether transitivity can be extended ad infinitum.

Results: We developed a graph-based clustering approach, where transitivity plays a crucial role. We determined all pair-wise similarities for the sequences in the SwissProt database using the Smith–Waterman local alignment algorithm. This data was transformed into a directed graph, where protein sequences constitute vertices. A directed edge was drawn from vertex Ato vertex Bif the sequences Aand Bshowed similarity, scaled with respect to the self-similarity of A, above a fixed threshold. Transitivity was important in the clustering process, as intermediate sequences were used, limited though by the requirement of having directed paths in both directions between proteins linked over such sequences. The length dependency—implied by the self-similarity—of the scaling of the alignment scores appears to be an effective criterion to avoid clustering errors due to multi-domain proteins. To deal with the resulting large graphs we have developedan efficient library. Methods include the novel graph-based clustering algorithm capable of handling multi-domain proteins and cluster comparison algorithms. Structural Classification of Proteins (SCOP) was used as an evaluation data set for our method, yielding a 24% improvement over pair-wise comparisons in terms of detecting remote homologues.

Availability: The software is available to academic users on request from the authors.

Contact: e.bolten{at}science-factory.com; schliep{at}zpr.uni-koeln.de; s.schneckener{at}science-factory.com; d.schomburg{at}uni-koeln.de; schrader{at}zpr.uni-koeln.de

Supplementary information: http://www.zaik.uni-koeln.de/~schliep/ProtClust.html

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
C. G. Roessler, B. M. Hall, W. J. Anderson, W. M. Ingram, S. A. Roberts, W. R. Montfort, and M. H. J. Cordes
Transitive homology-guided structural studies lead to discovery of Cro proteins with 40% sequence identity but different folds
PNAS, February 19, 2008; 105(7): 2343 - 2348.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
K. Horan, J. Lauricha, J. Bailey-Serres, N. Raikhel, and T. Girke
Genome Cluster Database. A Sequence Family Analysis Platform for Arabidopsis and Rice
Plant Physiology, May 1, 2005; 138(1): 47 - 54.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
H. Xie, A. Wasserman, Z. Levine, A. Novik, V. Grebinskiy, A. Shoshan, and L. Mintz
Large-Scale Protein Annotation through Gene Ontology
Genome Res., May 1, 2002; 12(5): 785 - 794.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.