Bioinformatics Advance Access originally published online on December 8, 2005
Bioinformatics 2006 22(3):376-377; doi:10.1093/bioinformatics/bti822
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SubLoc: a server/client suite for protein subcellular location based on SOAP
Institute of Bioinformatics and Systems Biology, MOE Key Laboratory of Bioinformatics, State Key Laboratory of Biomembrane and Membrane Biotechnology, Department of Biological Sciences and Biotechnology, Tsinghua University Beijing 100084, China
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Based on SOAP(Simple Object Access Protocol) technology, the SubLoc server/client suite offers a user-friendly interface for searching and predicting protein subcellular location.
Availability:SubLoc SOAP server is available through HTTP protocol at http://www.bioinfo.tsinghua.edu.cn/~tigerchen/cgi-bin/subloc_soap.pl. The client software can be downloaded at http://www.bioinfo.tsinghua.edu.cn/~guotao/soapclient.html. More information is given at http://www.bioinfo.tsinghua.edu.cn/~guotao/soapserver.html.
Contact: sunzhr{at}mail.tsinghua.edu.cn
Supplementary information: Supplementary data are available at Bioinformatics online.
| INTRODUCTION |
|---|
|
|
|---|
Proteins are sorted into different cellular compartments such as cytoplasm, nuclear region, mitochondrion, etc. or may be secreted out of the cell, and their proper functioning relies on this precise process of subcellular localization. Hence, subcellular location information may imply the function. DBSubLoc (Guo et al., 2004) is a protein subcellular localization annotation database, which is available at http://www.bioinfo.tsinghua.edu.cn/dbsubloc.html. The database contains >60 000 protein sequences from virus, bacteria, fungi, plant and animal. However, its service is via WWW and the user interface is built on web browsers, which is designed to be accessed by humans, not by machines. Thus, it is troublesome for users to use DBSubLoc in an automated manner.
We adapted the SOAP (Simple Object Access Protocol. http://www.w3.org/TR/soap) technology to DBSubLoc database and developed a server/client suite to facilitate automated information retrieval. SOAP uses XML to define an extensible messaging framework providing a message construct that can be exchanged over a variety of underlying protocols. Because of its simplicity and extensibility, it is being widely applied to more and more bioinformatics databases such as KEGG (Kanehisa et al., 2004; http://www.genome.jp/kegg/soap) and NCBI Entrez Utilities Web Service (Maglott et al., 2005) (http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html). The SOAP server enables users to write their own programs to automatically access the DBSubLoc server. A client with GUI interface is also provided for users who are not familiar with programming.
| OVERVIEW |
|---|
|
|
|---|
The DBSubLoc SOAP server provides seven functions for remote users to access via HTTP protocol. Of the seven functions three are for querying and searching the database with Swiss-Prot ID (Boeckmann et al., 2003), Gene Ontology (GO) ID (Ashburner et al., 2000), DBSubLoc ID, protein name and BLAST (Altschul et al., 1997), one is for submitting users' new sequences to the DBSubLoc database, while the other three are for protein subcellular location prediction with support vector machines (SVM) (Hua and Sun, 2001) method and PSORT (Nakai and Horton, 1999) program. More details of these functions are shown in Table 1. Supplementary tables describe the objects and returned values.
|
The SOAP server also comes with the WSDL (Web Services Description Language, http://www.w3.org/TR/wsdl) file at http://www.bioinfo.tsinghua.edu.cn/~tigerchen/SubLoc.wsdl, which makes it easy for users to program with a number of programming languages to access the SOAP server. For instance, with Perl version 5.8 and SOAP::Lite module, searching DBSubLoc database by protein name p53 would be as simple as thes following example:
- $wsdl = http://www.bioinfo.tsinghua.edu.cn/~tigerchen/SubLoc.wsdl;
- $serv = SOAP::Lite
service($wsdl);
- $res=$serv
name_search(p53);
- $serv = SOAP::Lite
More detailed examples in several programming languages are available at http://www.bioinfo.tsinghua.edu.cn/~guotao/soapclient.html.
A statically precompiled binary of this open-source client with graphic user interface is available for download at http://www.bioinfo.tsinghua.edu.cn/~guotao/soapclient.html, together with screenshots, manual and FAQ. Executable binary files for Linux system and Microsoft Windows 2000/XP are also provided. Besides query by protein ID and name, the client software allows the user to load files containing multiple sequences in FASTA format for homology search with BLAST or for subcellular location prediction with SVM and PSORT. The result of searching and prediction could be saved to a user-defined text file.
| IMPLEMENTATION |
|---|
|
|
|---|
The SOAP server was implemented with Perl version 5.8. The server employs SOAP::Lite module to implement the SOAP technology through HTTP protocol. MySQL version 4.0 serves as the database back end. The HTTP server daemon is Apache. The client is dependent on three Perl modules: Tk module for drawing GUI interface, SOAP::Lite module for communicating with the SOAP server and BioPerl module for parsing FASTAfiles.
| Acknowledgments |
|---|
We are grateful to anonymous reviewers for their constructive comments. This work was supported by Foundational Science Research Grant from the 973 project (2003CB715900), a National Nature Science Grant (No. 90303017, No. 90408019) and the 863 projects (2002AA234041, 2002AA231031).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Alvis Brazma
Received on September 9, 2005; revised on November 11, 2005; accepted on December 5, 2005
| REFERENCES |
|---|
|
|
|---|
Altschul, S.F., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, . 25, 33893402
Ashburner, M., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet, . 25, 2529[CrossRef][ISI][Medline].
Boeckmann, B., et al. (2003) The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res, . 31, 365370
Guo, T., et al. (2004) DBSubLoc: database of protein subcellular localization. Nucleic Acids Res, . 32, D122D124
Hua, S. and Sun, Z. (2001) Support vector machine approach for protein subcellular localization prediction. Bioinformatics, 17, 721728
Kanehisa, M., et al. (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res, . 32, D27780
Maglott, D., et al. (2005) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res, . 33, D54D58
Nakai, K. and Horton, P. (1999) PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem. Sci, . 24, 3436[CrossRef][ISI][Medline].
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||