Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Silverstein, K. A. T.
Right arrow Articles by Retzel, E. F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Silverstein, K. A. T.
Right arrow Articles by Retzel, E. F.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 17 no. 3 2001
Pages 249-261
© 2001 Oxford University Press


Original Paper

MetaFam: a unified classification of protein families. I. Overview and statistics

Kevin A. T. Silverstein *, Elizabeth Shoop , James E. Johnson and Ernest F. Retzel *

Computational Biology Centers, Academic Health Center, University of Minnesota, Mayo Mail Code 43, 420 Delaware St SE, Minneapolis, MN 55455-0312, USA

Received on May 31, 2000 ; revised on July 4, 2000 ; accepted on October 12, 2000

Motivation: Protein sequence classification is becoming an increasingly important means of organizing the voluminous data produced by large-scale genome sequencing projects. At present, there are several independent classification methods. To aid the general classification effort, we have created a unified protein family resource, MetaFam. MetaFam is a protein family classification built upon 10 publicly-accessible protein family databases (Blocks, DOMO, Pfam, PIR-ALN, PRINTS, PROSITE, ProDom, PROTOMAP, SBASE, and SYSTERS). MetaFam’s family ‘supersets’, as we call them, are created automatically using set-theory to compare families among the databases. Families of one database are matched to those in another when the intersection of their members exceeds all other possible family pairings between the two databases. Pairwise family matches are drawn together transitively to create a new list of protein family supersets.

Results: MetaFam family supersets have several useful features: (1) each superset contains more members than the families from which it is composed, because each of the component family databases only works with a subset of our full non-redundant set of proteins; (2) conflicting assignments can be pinpointed quickly, since our analysis identifies individual members that are in conflict with the majority consensus; (3) family descriptions that are absent from automated databases can frequently be assigned; (4) statistics have been computed comparing domain boundaries, family size distributions, and overall quality of MetaFam supersets; (5) the supersets have been loaded into a relational database to allow for complex queries and visualization of the connections among families in a superset and the consensus of individual domain members; and (6) the quality of individual supersets has been assessed using numerous quantitative measures such as family consistency, connectedness, and size. We anticipate this new resource will be particularly useful to genomic database curators.

Availability: Free access to the MetaFam web server is provided to all users at http://metafam.ahc.umn.edu/.

Contact: metafam{at}ahc.umn.edu

Supplementary information: Detailed distribution plots on MetaFam 2.0 supersets and its constituent family databases (e.g. superset/family sizes, domain boundary comparisons) are shown at http://metafam.ahc.umn.edu/mf2.0/stats.html. Statistics on the current release of MetaFam can be found at http://metafam.ahc.umn.edu/current_release/stats.html.

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Plant Physiol.Home page
K. A. VandenBosch and G. Stacey
Summaries of Legume Genomics Projects from around the Globe. Community Resources for Crops and Models
Plant Physiology, March 1, 2003; 131(3): 840 - 865.
[Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.