Bioinformatics, Vol 14, 144-150, Copyright © 1998 by Oxford University Press
J Park and SA Teichmann
MOTIVATION: Large-scale determination of relationships between the proteins
produced by genome sequences is now common. All protein sequences are
matched and those that have high match scores are clustered into families.
In cases where the proteins are built of several domains or duplication
modules, this can lead to misleading results. Consider the very simple
example of three proteins: 1, formed by duplication modules A and B; 2,
formed by duplication modules B' and C; and 3, formed by duplication
modules C' and D. Duplication modules B and B' are homologous, as are C and
C'. Matching the sequences of 1, 2 and 3 followed by simple single-linkage
clustering would put all three in the same family, even though proteins 1
and 3 are not related. This is because the different parts of 2 match 1 and
3. This paper describes a procedure, DIVCLUS, that divides such complex
clusters of partially related sequences into simple clusters that contain
only related duplication modules. In the example just given, it would
produce two groups of sequences: the first with domains B of sequence 1 and
B of sequence 2, and the second with domain C of sequence 2 and C of
sequence 3. DIVCLUS is part of a package called GEANFAMMER, for GEnome
ANalysis and protein FAMily MakER. The package automates the detection of
families of duplication modules from a protein sequence database. RESULTS:
DIVCLUS has been applied to the division of single-linkage clusters
generated from the protein sequences of six completely sequenced bacterial
genomes. Out of 12 013 genes in these six genomes, 4563 single- and
multi-domain sequences formed 1071 complex clusters. Application of the
DIVCLUS program resolved these clusters into 2113 clusters corresponding to
single duplication modules. AVAILABILITY: The perl5 program and its
documentation are available at the following address:
http://www.mrc-lmb.cam.ac.uk/genomes/ and by anonymous ftp at
ftp.mrc-lmb.cam.ac.uk in the directory /pub/genomes/Software/. CONTACT:
sat@mrc-lmb.cam.ac.uk; jong@mrc-lmb. cam.ac.uk
ARTICLES
DIVCLUS: an automatic method in the GEANFAMMER package that finds homologous domains in single- and multi-domain proteins
MRC Laboratory of Molecular Biology, Cambridge, UK.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Cheng DOMAC: an accurate, hybrid protein domain prediction server Nucleic Acids Res., July 13, 2007; 35(suppl_2): W354 - W356. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Oberai, Y. Ihm, S. Kim, and J. U. Bowie A limited universe of membrane protein families and folds. Protein Sci., July 1, 2006; 15(7): 1723 - 1734. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L. Marsden, D. Lee, M. Maibaum, C. Yeats, and C. A. Orengo Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space Nucleic Acids Res., February 15, 2006; 34(3): 1066 - 1080. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Gewehr and R. Zimmer SSEP-Domain: protein domain prediction by alignment of secondary structure elements and profiles Bioinformatics, January 15, 2006; 22(2): 181 - 187. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. V. Galzitskaya and B. S. Melnik Prediction of protein domain boundaries from sequence alone Protein Sci., April 1, 2003; 12(4): 696 - 701. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-L. Xiao, M. Malik, C. A. Whitelaw, and C. D. Town Cloning and Sequencing of cDNAs for Hypothetical Genes from Chromosome 2 of Arabidopsis Plant Physiology, December 1, 2002; 130(4): 2118 - 2128. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Jardine, J. Gough, C. Chothia, and S. A. Teichmann Comparison of the Small Molecule Metabolic Enzymes of Escherichia coli and Saccharomyces cerevisiae Genome Res., June 1, 2002; 12(6): 916 - 929. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Frishman Knowledge-based selection of targets for structural genomics Protein Eng. Des. Sel., March 1, 2002; 15(3): 169 - 183. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. Afonso, E. R. Tulman, Z. Lu, L. Zsak, F. A. Osorio, C. Balinsky, G. F. Kutish, and D. L. Rock The Genome of Swinepox Virus J. Virol., January 15, 2002; 76(2): 783 - 790. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. R. Tulman, C. L. Afonso, Z. Lu, L. Zsak, G. F. Kutish, and D. L. Rock Genome of Lumpy Skin Disease Virus J. Virol., August 1, 2001; 75(15): 7122 - 7130. [Abstract] [Full Text] |
||||
![]() |
S. Balasubramanian, T. Schneider, M. Gerstein, and L. Regan Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genome Nucleic Acids Res., August 15, 2000; 28(16): 3075 - 3082. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Teichmann, J. Park, and C. Chothia Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements PNAS, December 8, 1998; 95(25): 14658 - 14663. [Abstract] [Full Text] [PDF] |
||||







