Skip Navigation


Bioinformatics Advance Access originally published online on March 22, 2007
Bioinformatics 2007 23(10):1282-1288; doi:10.1093/bioinformatics/btm098
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
23/10/1282    most recent
btm098v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (27)
Google Scholar
Right arrow Articles by Suzek, B. E.
Right arrow Articles by Wu, C. H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Suzek, B. E.
Right arrow Articles by Wu, C. H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

UniRef: comprehensive and non-redundant UniProt reference clusters

Baris E. Suzek *, Hongzhan Huang , Peter McGarvey , Raja Mazumder and Cathy H. Wu

Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, USA

*To whom correspondence should be addressed.


   Abstract

Motivation: Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences.

Results: The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering >4 million source sequences, the UniRef100 database combines identical sequences and subfragments from any source organism into a single UniRef entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences at the 90 or 50% sequence identity levels. UniRef100, UniRef90 and UniRef50 yield a database size reduction of ~10, 40 and 70%, respectively, from the source sequence set. The reduced redundancy increases the speed of similarity searches and improves detection of distant relationships. UniRef entries contain summary cluster and membership information, including the sequence of a representative protein, member count and common taxonomy of the cluster, the accession numbers of all the merged entries and links to rich functional annotation in UniProtKB to facilitate biological discovery. UniRef has already been applied to broad research areas ranging from genome annotation to proteomics data analysis.

Availability: UniRef is updated biweekly and is available for online search and retrieval at http://www.uniprot.org, as well as for download at ftp://ftp.uniprot.org/pub/databases/uniprot/uniref

Contact: bes23{at}georgetown.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Associate Editor: Alex Bateman


Received on January 25, 2007; revised on March 2, 2007; accepted on March 7, 2007

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
L. E. Ulrich and I. B. Zhulin
The MiST2 database: a comprehensive genomics resource on microbial signal transduction
Nucleic Acids Res., November 9, 2009; (2009) gkp940v1.
[Abstract] [Full Text] [PDF]


Home page
DatabaseHome page
N. D. Rawlings
A large and accurate collection of peptidase cleavages in the MEROPS database
Database, November 2, 2009; 2009(0): bap015 - bap015.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
The UniProt Consortium
The Universal Protein Resource (UniProt) in 2010
Nucleic Acids Res., October 20, 2009; (2009) gkp846v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Lobley, M. I. Sadowski, and D. T. Jones
pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination
Bioinformatics, July 15, 2009; 25(14): 1761 - 1767.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Maupetit, P. Derreumaux, and P. Tuffery
PEP-FOLD: an online resource for de novo peptide structure prediction
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W498 - W503.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
S.-K. Lim, S. J. Kim, S. H. Cha, Y.-K. Oh, H.-J. Rhee, M.-S. Kim, and J. K. Lee
Complete Genome Sequence of Rhodobacter sphaeroides KD131
J. Bacteriol., February 1, 2009; 191(3): 1118 - 1119.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
The UniProt Consortium
The Universal Protein Resource (UniProt) 2009
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D169 - D174.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Igarashi, E. Heureux, K. S. Doctor, P. Talwar, S. Gramatikova, K. Gramatikoff, Y. Zhang, M. Blinov, S. S. Ibragimova, S. Boyd, et al.
PMAP: databases for analyzing proteolytic events and pathways
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D611 - D618.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Forslund and E. L. L. Sonnhammer
Predicting protein function from domain content
Bioinformatics, August 1, 2008; 24(15): 1681 - 1687.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Loewenstein, E. Portugaly, M. Fromer, and M. Linial
Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space
Bioinformatics, July 1, 2008; 24(13): i41 - i49.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. P. Walsh, C. Webber, S. Searle, S. S. Sturrock, and G. J. Barton
SCANPS: a web server for iterative protein sequence database searching by dynamic programing, with display in a hierarchical SCOP browser
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W25 - W29.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Cole, J. D. Barber, and G. J. Barton
The Jpred 3 secondary structure prediction server
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W197 - W201.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
The UniProt Consortium
The Universal Protein Resource (UniProt)
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D190 - D195.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.