Skip Navigation


Bioinformatics Advance Access originally published online on March 21, 2006
Bioinformatics 2006 22(10):1284-1285; doi:10.1093/bioinformatics/btl105
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/10/1284    most recent
btl105v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Google Scholar
Right arrow Articles by Leinonen, R.
Right arrow Articles by Apweiler, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Leinonen, R.
Right arrow Articles by Apweiler, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org

UniSave: the UniProtKB Sequence/Annotation Version database

Rasko Leinonen , Francesco Nardone , Weimin Zhu and Rolf Apweiler *

EMBL Outstation, The European Bioinformatics Institute (EBI) Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT OF UNISAVE
 ENTRY STORAGE
 PROGRAMMATIC ACCESS
 INTERACTIVE ACCESS
 ACCESS FROM SRS AT...
 REFERENCES
 

Summary: The UniProtKB Sequence/Annotation Version database (UniSave) is a comprehensive archive of UniProtKB/Swiss-Prot and UniProtKB/TrEMBL entry versions. All changed Swiss-Prot and TrEMBL entries are loaded into the UniSave as part of the public bi-weekly UniProtKB releases. Unlike the UniProtKB, which contains only the latest Swiss-Prot and TrEMBL entry versions, the UniSave provides access to previous versions of these entries.

Availability: http://www.ebi.ac.uk/uniprot/unisave

Contact: rolf.apweiler{at}ebi.ac.uk


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT OF UNISAVE
 ENTRY STORAGE
 PROGRAMMATIC ACCESS
 INTERACTIVE ACCESS
 ACCESS FROM SRS AT...
 REFERENCES
 
The Universal Protein Resource (UniProt) combines the activities of Swiss-Prot, TrEMBL and Protein Information Resource (PIR) databases (Bairoch et al., 2005). The UniProt Knowledgebase (UniProtKB), the central part of UniProt, consists of the UniProtKB/Swiss-Prot and UniProtKB/TrEMBL databases. Swiss-Prot entries are manually curated to the highest standard, and the TrEMBL entries are annotated using powerful automated annotation, classification and cross-referencing algorithms.

The Swiss-Prot and TrEMBL entries are subject to changes, but only the most recent versions are preserved in the UniProtKB. However, access to previous entry versions may be highly desirable, especially when references to entries are made from journal articles. Entries in UniProtKB/Swiss-Prot and UniProtKB/TrEMBL go through numerous annotation changes, become secondary to other entries (are replaced) or are removed from UniProtKB without replacement (are deleted). Because of constant annotation improvements, the original annotation may only be accessible by having access to earlier entry versions.

In this article we will describe UniSave, a new publicly available service, which provides interactive and programmatic access to all versions of Swiss-Prot and TrEMBL entries. It is similar to the EMBL sequence version archive (Leinonen et al., 2003) and complements the UniProt Archive (Leinonen et al., 2004), which is the world's most comprehensive protein sequence repository.


    CONTENT OF UNISAVE
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT OF UNISAVE
 ENTRY STORAGE
 PROGRAMMATIC ACCESS
 INTERACTIVE ACCESS
 ACCESS FROM SRS AT...
 REFERENCES
 
All new and updated UniProtKB/Swiss-Prot and UniProtKB/TrEMBL entries are distributed to the public in bi-weekly releases. These entries are made accessible from UniSave shortly after they are made public as part of the UniProtKB releases. All obtainable entry versions, starting from the ninth Swiss-Prot release in November 1988, and from the first TrEMBL release in November 1996, are available through UniSave. By the UniProtKB release 7.0 in February 2006, there were 27 539 591 and 5 071 382, different entry versions for TrEBML and Swiss-Prot, respectively.


    ENTRY STORAGE
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT OF UNISAVE
 ENTRY STORAGE
 PROGRAMMATIC ACCESS
 INTERACTIVE ACCESS
 ACCESS FROM SRS AT...
 REFERENCES
 
The entry versions are stored in an Oracle database. To minimize space consumption changed entry versions are not stored as a whole, but are compared against previous entry versions using the Hunt–Szymanski algorithm (Hunt and Szymanski, 1977). If the entry differential is smaller in size then the original entry, only the differential is stored in the database. When the entry is being unloaded from the archive, the entry version is reconstructed by applying the differential to the original entry. If the entry differential is larger than the original entry, then the entry is stored as a whole, and subsequent versions will be compared against the new version. As a result, entry differentials are only ever applied to one whole entry. To further reduce storage requirements, the entries and their differentials are compressed in 16 kb blocks using zlib (http://www.zlib.org). This increases compressibility of the entries by introducing more redundancy in each compressed unit.


    PROGRAMMATIC ACCESS
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT OF UNISAVE
 ENTRY STORAGE
 PROGRAMMATIC ACCESS
 INTERACTIVE ACCESS
 ACCESS FROM SRS AT...
 REFERENCES
 
UniProtKB/Swiss-Prot and UniProtKB/TrEMBL entries and Fasta formatted sequences can be retrieved programmatically using dbfetch (HTTP GET protocol) at http://www.ebi.ac.uk/cgi-bin/dbfetch, using UniSave/Batch (HTTP POST protocol) at http://www.ebi.ac.uk/uniprot/unisave?&do_batch=1 or SOAP at http://www.ebi.ac.uk/uniprot/unisave/unisave.wsdl. Up to 200 and 10 000 entries can be downloaded using dbfetch and UniSave/Batch, respectively, by providing a list of primary accession numbers or entry names. As an example, the following URL returns all UniProtKB entry versions with accession number Q00001 [GenBank] using dbfetch: (http://www.ebi.ac.uk/cgi-bin/dbfetch?db=UniSave&id=Q00001&format=default&style=raw). The n-th ebtry version is returned by id=Q00001.n, and the latest entry version by id=Q00001. A more fine-grained access is provided through the SOAP service, which is designed to support rich interactive clients.


    INTERACTIVE ACCESS
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT OF UNISAVE
 ENTRY STORAGE
 PROGRAMMATIC ACCESS
 INTERACTIVE ACCESS
 ACCESS FROM SRS AT...
 REFERENCES
 
UniProtKB entries and Fasta formatted sequences can be viewed and downloaded interactively at http://www.ebi.ac.uk/uniprot/unisave. Entries can be retrieved using primary accession numbers or entry names. The first result of a query is a list of matching entry versions together with the UniProtKB database name, entry status, primary accession number, entry name, entry version, sequence version, release and the release date (Fig. 1). The matches are ordered by the release date, the latest version first. If a snapshot date is provided then only the version of the entry that was current at that date is displayed. The entry version status is either incorporated, active, changed, replaced or deleted. An incorporated entry version is the first entry version added into UniProtKB, an active entry version is part of the latest public release, a changed entry version has been superseded by a newer entry version, a replaced entry has become secondary to another entry and a deleted entry has been removed from the UniProtKB without becoming secondary to any other entry. For replaced entry versions, the status ‘Replaced’ can be clicked to return all entries, which have the given entry as a secondary entry. Comparison between entry versions is straightforward by selecting two entries and clicking the ‘Compare Selected’ button. Whenever comparisons are made a Smith–Waterman sequence alignment is computed using SSEARCH (Pearson and Lipman, 1988), and displayed at the bottom of the entry.


Figure 1
View larger version (34K):
[in this window]
[in a new window]
 
Figure 1 The first result of an interactive query is a list of entry versions with UniProtKB database name, entry status, primary accession number, entry name, entry version, sequence version, release and the release date information.

 

    ACCESS FROM SRS AT EBI
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT OF UNISAVE
 ENTRY STORAGE
 PROGRAMMATIC ACCESS
 INTERACTIVE ACCESS
 ACCESS FROM SRS AT...
 REFERENCES
 
The interactive web client at http://www.ebi.ac.uk/uniprot/unisave is also accessible from SRS at http://srs.ebi.ac.uk by following links provided with UniProtKB query results.


    Acknowledgments
 
The authors thank Allyson Williams and Daniel Barrell for help with old Swiss-Prot and TrEMBL releases, Maria-Jesus Martin, Claire O'Donovan, Elisabeth Gasteiger, Nicole Redaschi, Isabelle Phan, Raja Mazumder, Baris Suzek, Darren Natale and Eric Jain, for their suggestions for the web client, Quan Lin and Andrey Sitnov for their contribution to the bi-weekly UniSave production, Mike Donnelly for database support, Alberto Labarga for web support and Mikael Andersson for dbfetch integration. Funding to pay the Open Access publication charges for this article was provided by the authors.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Dmitrij Frishman

Received on February 10, 2006; revised on March 17, 2006; accepted on March 17, 2006

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT OF UNISAVE
 ENTRY STORAGE
 PROGRAMMATIC ACCESS
 INTERACTIVE ACCESS
 ACCESS FROM SRS AT...
 REFERENCES
 

    Bairoch, A., et al. (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res, . 33, D154–D159[Abstract/Free Full Text].

    Hunt, J.W. and Szymanski, T.G. (1977) A fast algorithm for computing longest common subsequences. Commun. ACM, 20, 350–353[CrossRef].

    Leinonen, R., et al. (2003) The EMBL sequence version archive. Bioinformatics, 19, 1861–1862[Abstract/Free Full Text].

    Leinonen, R., et al. (2004) UniProt archive. Bioinformatics, 20, 3236–3237[Abstract/Free Full Text].

    Pearson, W.R. and Lipman, D.J. (1988) Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA, 85, 2444–2448[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
G. Naamati, M. Askenazi, and M. Linial
ClanTox: a classifier of short animal toxins
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W363 - W368.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
The UniProt Consortium
The Universal Protein Resource (UniProt)
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D193 - D197.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/10/1284    most recent
btl105v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Google Scholar
Right arrow Articles by Leinonen, R.
Right arrow Articles by Apweiler, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Leinonen, R.
Right arrow Articles by Apweiler, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?