Bioinformatics, Vol 14, 174-187, Copyright © 1998 by Oxford University Press
J Gracy and P Argos
MOTIVATION: Decomposing each protein into modular domains is a basic
prerequisite to classify accurately structural units in biological
molecules. Boundaries between domains are indicated by two similar amino
acid sequence segments located within the same protein (repeats) or within
homologous proteins at notably different distances from their respective N-
or C-termini. RESULTS: We have developed an automated method that combines
such positional constraints derived from various detected pairwise sequence
similarities to delineate the modular organization of proteins. The
procedure has been applied to a non- redundant data set of 26 990 proteins
whose sequences were taken from the PIR and SWISS-PROT databanks and shared
<60% sequence identity amongst pairs. The resultant clustering,
delineation and multiple alignment of 24 380 sequence fragments yielded a
new database of 4364 domain families. Comparison of the domain collection
with that of PRODOM indicates a clear improvement in the number and size of
domain families, domain boundaries and multiple sequence alignments. The
accuracy and sensitivity of the method are illustrated by results obtained
for ankyrin-like repeats and EGF-like modules. AVAILABILITY: The resulting
database, called DOMO, is available through the database search routine SRS
at Infobiogen (http://www.infobiogen.fr/srs5/), EBI
(http://srs.ebi.ac.uk:5000/) and EMBL (http://www.embl-
heidelberg.de/srs5/) World Wide Web sites. CONTACT: gracy@infobiogen.fr
ARTICLES
Automated protein sequence database classification. II. Delineation Of domain boundaries from sequence similarities
European Molecular Biology Laboratory, Heidelberg, Germany.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Wong and M. A. Ragan MACHOS: Markov clusters of homologous subsequences Bioinformatics, July 1, 2008; 24(13): i77 - i85. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. N.I. Pang, K. Lin, M. A. Wouters, J. Heringa, and R. A. George Identifying foldable regions in protein sequence from the hydrophobic signal Nucleic Acids Res., February 2, 2008; 36(2): 578 - 588. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Uchiyama Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes Nucleic Acids Res., January 25, 2006; 34(2): 647 - 658. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Bae, B. K. Mallick, and C. G. Elsik Prediction of protein interdomain linker regions by a hidden Markov model Bioinformatics, May 15, 2005; 21(10): 2264 - 2270. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. J. Su, L. Lu, S. Saxonov, and D. L. Brutlag eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity Nucleic Acids Res., January 1, 2005; 33(suppl_1): D178 - D182. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. V. Galzitskaya and B. S. Melnik Prediction of protein domain boundaries from sequence alone Protein Sci., April 1, 2003; 12(4): 696 - 701. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Bao and S. R. Eddy Automated De Novo Identification of Repeat Sequence Families in Sequenced Genomes Genome Res., August 1, 2002; 12(8): 1269 - 1276. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. Rigden Use of covariance analysis for the prediction of structural domain boundaries from multiple protein sequence alignments Protein Eng. Des. Sel., February 1, 2002; 15(2): 65 - 77. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Pouliot, J. Gao, Q. J. Su, G. G. Liu, and X. B. Ling DIAN: A Novel Algorithm for Genome Ontological Classification Genome Res., October 1, 2001; 11(10): 1766 - 1779. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. T. Silverstein, E. Shoop, J. E. Johnson, A. Kilian, J. L. Freeman, T. M. Kunau, I. A. Awad, M. Mayer, and E. F. Retzel The MetaFam Server: a comprehensive protein family resource Nucleic Acids Res., January 1, 2001; 29(1): 49 - 51. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Burke, D. Davison, and W. Hide d2_cluster: A Validated Method for Clustering EST and Full-Length cDNA Sequences Genome Res., November 1, 1999; 9(11): 1135 - 1142. [Abstract] [Full Text] |
||||
![]() |
A. Louis, E. Ollivier, J.-C. Aude, and J.-L. Risler Massive Sequence Comparisons as a Help in Annotating Genomic Sequences Genome Res., July 1, 2001; 11(7): 1296 - 1303. [Abstract] [Full Text] [PDF] |
||||




