Bioinformatics Vol. 17 no. 3 2001
Pages 272-279
© 2001 Oxford University Press
Original Paper |
Picasso: generating a covering set of protein family profiles
Structural Genomics Group, EMBL-EBI, Cambridge CB10 1SD, UK
Received on December 16, 1999
; revised on October 31, 2000
; accepted on November 6, 2000
Motivation: Evolutionary classification leads to an economical description of protein sequence data because attributes of function and structure are inherited in protein families. This paper presents Picasso, a procedure for deriving a minimal set of protein family profiles that cover all known protein sequences.
Results: Picasso starts from highly overlapping sequence neighbourhoods revealed by all-on-all pairwise Blast alignment. Overlaps are reduced by merging sequences or parts of sequences into multiple alignments. For maximum unification, the multiple alignments must reach into the twilight zone of sequence similarity. Sensitive and selective profileprofile comparison allows unification down to about 15% pairwise sequence identity. Families unified through a short conserved sequence motif are associated with multiple full-length alignments describing different subfamilies. Domains that are mobile modules are identified based on their association with different sets of neighbours. The result is 10000 unified domain families (excluding singletons) representing functionally related proteins and recovering classical prolific domain types in high numbers. The classification is useful, for example, in developing strategies for efficient database searching and for selecting targets to complete the map of all 3-D structures.
Availability: http://www.embl-ebi.ac.uk/picasso/picasso.html
Contact: {heger,holm}@embl-ebi.ac.uk
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Kauffman and G. Karypis LIBRUS: combined machine learning and homology information for sequence-based ligand-binding residue prediction Bioinformatics, December 1, 2009; 25(23): 3099 - 3107. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Fong and A. Marchler-Bauer CORAL: aligning conserved core regions across domain families Bioinformatics, August 1, 2009; 25(15): 1862 - 1868. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Rangwala and G. Karypis Incremental window-based protein sequence alignment algorithms Bioinformatics, January 15, 2007; 23(2): e17 - e23. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Chivian and D. Baker Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection Nucleic Acids Res., October 18, 2006; 34(17): e112 - e112. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Rangwala and G. Karypis Profile-based direct kernels for remote homology detection and fold recognition Bioinformatics, December 1, 2005; 21(23): 4239 - 4247. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Lazareva-Ulitsky, K. Diemer, and P. D. Thomas On the quality of tree-based protein classification Bioinformatics, May 1, 2005; 21(9): 1876 - 1890. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Ginalski, N. V. Grishin, A. Godzik, and L. Rychlewski Practical lessons from protein structure prediction Nucleic Acids Res., April 1, 2005; 33(6): 1874 - 1891. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Ginalski, M. von Grotthuss, N. V. Grishin, and L. Rychlewski Detecting distant homology with Meta-BASIC Nucleic Acids Res., July 1, 2004; 32(suppl_2): W576 - W581. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Sasson, A. Vaaknin, H. Fleischer, E. Portugaly, Y. Bilu, N. Linial, and M. Linial ProtoNet: hierarchical classification of the protein space Nucleic Acids Res., January 1, 2003; 31(1): 348 - 352. [Abstract] [Full Text] [PDF] |
||||

