Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (44)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gracy, J.
Right arrow Articles by Argos, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gracy, J.
Right arrow Articles by Argos, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics, Vol 14, 164-173, Copyright © 1998 by Oxford University Press


ARTICLES

Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment

J Gracy and P Argos
European Molecular Biology Laboratory, Heidelberg, Germany.

MOTIVATION: Genome sequencing projects require the periodic application of analysis tools that can classify and multiply align related protein sequence domains. Full automation of this task requires an efficient integration of similarity and alignment techniques. RESULTS: We have developed a fully automated process that classifies entire protein sequence databases, resulting in alignment of the homologous sequences. The successive steps of the procedure are based on compositional and local sequence similarity searches followed by multiple sequence alignments. Global similarities are detected from the pairwise comparison of amino acid and dipeptide compositions of each protein. After the elimination of all but one sequence from each detected cluster of closely related proteins, the remaining sequences are compiled in a suffix tree which is self-compared to detect local sequence similarities. Sets of proteins which share similar sequence segments are then weighted according to their closeness and multiply aligned using a fast hierarchical dynamic programming algorithm. Computational strategies were devised to minimize computer processing time and memory space requirements. The accuracy of the sequence classifications has been evaluated for 12 462 primary structures distributed over 341 known families. The percentage of sequences with missed or incorrect family assignments was 6.8% on the test set. This low error level is only twice that of the manually constructed PROSITE database ( 3.4% ) and is substantially better than that found for the automatically built PRODOM database ( 34.9% ). AVAILABILITY: The resulting database, called DOMO, is available through database search routine SRS at Infobiogen (http://www.infobiogen.fr/srs5/), EBI (http://srs.ebi.ac.uk:5000/) and EMBL (http://www.embl- heidelberg.de/srs5/) World Wide Web sites. CONTACT: gracy@infobiogen.fr
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
B. Lazareva-Ulitsky, K. Diemer, and P. D. Thomas
On the quality of tree-based protein classification
Bioinformatics, May 1, 2005; 21(9): 1876 - 1890.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Q. J. Su, L. Lu, S. Saxonov, and D. L. Brutlag
eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity
Nucleic Acids Res., January 1, 2005; 33(suppl_1): D178 - D182.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. A. T. Silverstein, E. Shoop, J. E. Johnson, A. Kilian, J. L. Freeman, T. M. Kunau, I. A. Awad, M. Mayer, and E. F. Retzel
The MetaFam Server: a comprehensive protein family resource
Nucleic Acids Res., January 1, 2001; 29(1): 49 - 51.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Lonsdale, M. Crowe, B. Arnold, and B. C. Arnold
Mendel-GFDb and Mendel-ESTS: databases of plant gene families and ESTs annotated with gene family numbers and gene family names
Nucleic Acids Res., January 1, 2001; 29(1): 120 - 122.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. D. Thompson, F. Plewniak, J.-C. Thierry, and O. Poch
DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches
Nucleic Acids Res., August 1, 2000; 28(15): 2919 - 2926.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. G. Henikoff and S. Henikoff
Drosophila Genomic Sequence Annotation Using the BLOCKS+ Database
Genome Res., April 1, 2000; 10(4): 543 - 546.
[Abstract] [Full Text]


Home page
J. Biol. Chem.Home page
T. Kasahara and M. Kasahara
Three Aromatic Amino Acid Residues Critical for Galactose Transport in Yeast Gal2 Transporter
J. Biol. Chem., February 11, 2000; 275(6): 4422 - 4428.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Yona, N. Linial, and M. Linial
ProtoMap: automatic classification of protein sequences and hierarchy of protein families
Nucleic Acids Res., January 1, 2000; 28(1): 49 - 55.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. G. Henikoff, E. A. Greene, S. Pietrokovski, and S. Henikoff
Increased coverage of protein families with the Blocks Database servers
Nucleic Acids Res., January 1, 2000; 28(1): 228 - 230.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. Burke, D. Davison, and W. Hide
d2_cluster: A Validated Method for Clustering EST and Full-Length cDNA Sequences
Genome Res., November 1, 1999; 9(11): 1135 - 1142.
[Abstract] [Full Text]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.