Skip Navigation


Bioinformatics Advance Access originally published online on February 4, 2005
Bioinformatics 2005 21(9):2142-2143; doi:10.1093/bioinformatics/bti306
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/9/2142    most recent
bti306v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Hao, P.
Right arrow Articles by Zhong, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hao, P.
Right arrow Articles by Zhong, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

MPSS: an integrated database system for surveying a set of proteins

Pei Hao 1,2, Wei-Zhong He 3, Yin Huang 2, Liang-Xiao Ma 3, Ying Xu 2, Hong Xi 3, Chuan Wang 2, Bo-Shu Liu 2, Jin-Miao Wang 2, Yi-Xue Li 2,* and Yang Zhong 1,*

1School of Life Sciences, Fudan University Shanghai 200433, China
2Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences Shanghai 200031, China
3Shanghai Center for Bioinformation Technology Shanghai 200235, China

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 USAGE
 APPLICATION DESIGN
 DISCUSSION AND FUTURE WORK
 REFERENCES
 

Summary: We design and implement an integrated database system called ‘multi-protein survey system’ (MPSS), which provides a platform to retrieve information about many proteins at a time. This system integrates several important and widely used databases including SwissProt, TrEMBL, PDB and InterPro, plus useful references such as GO and KEGG to other databases. Users may submit a group of protein IDs, entry names, SwissProt/TrEMBL accession numbers or GenBank GIs through MPSS’ web interface, and obtain protein annotation information from public databases and pre-computed molecular properties speedily. MPSS can also supply comprehensive information about query proteins, including 3D structures, domains, pathway, gene ontology and visual presentation of mapping to the GO tree and KEGG pathway, to provide an up-to-date view of available knowledge with regard to the structures and molecular functions of proteins under study.

Availability: MPSS is freely accessible at http://www.scbit.org/mpss/

Contact: yangzhong{at}fudan.edu.cn, yxli{at}sibs.ac.cn


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 USAGE
 APPLICATION DESIGN
 DISCUSSION AND FUTURE WORK
 REFERENCES
 
Proteomics is the large-scale study of all proteins expressed in a cell, tissue, organ or the whole organism. It gives rise to a need for some efficient systems to retrieve, correlate and analyze large datasets. However, many existing online protein information systems process only one protein at a time. Therefore, it is a laborious and time-consuming task for people working on high-throughput proteomics projects to gather all relevant information for a large set of proteins under study.

We had such a frustrating experience first-hand when carrying out the annotation work for the rat liver proteome project (Jiang et al., 2004). As a result, we created a database system called ‘multi- protein survey system’ (MPSS) to integrate several popular protein information databases including sequence databases SwissProt plus TrEMBL (Boeckmann et al., 2003), structural database PDB (Deshpande et al., 2005), protein family databases such as Pfam (Bateman et al., 2004) and InterPro (Mulder et al., 2003), and also contain many useful references to other databases such as Gene Ontology (Gene Ontology Consortium, 2004) and KEGG (Kanehisa et al., 2004). We also pre-computed some parameters for each protein, such as iso-electric point (PI), aliphatic index (Ikai, 1980) and grand average of hydropathicity, or GRAVY value (Kyte and Doolittle, 1982).


    USAGE
 TOP
 Abstract
 INTRODUCTION
 USAGE
 APPLICATION DESIGN
 DISCUSSION AND FUTURE WORK
 REFERENCES
 
The MPSS is made for public access as a web application with a very simple user interface. Users may enter a group of protein IDs, entry names or accession numbers from SwissProt, TrEMBL or GenBank on the submission page. The system will return basic protein information in a tabular view, which includes SwissProt annotations such as keywords, cellular location and function, and pre-computed molecular weight, PI, aliphatic index and GRAVY value. The various IDs on the result page are also externally linked to their origin, such as SwissProt, where the user can get a detailed view. Users can also go to ‘Domain Information’, ‘Gene Ontology’, ‘Pathway’ and ‘3D Structure’ pages to obtain the respective result for the same set of protein IDs. ‘Save As...’ links on each page allow the user to save the result to a local file in tab-delimited format that may then be imported into a spreadsheet program such as Microsoft Excel for further analysis or reporting. On the Gene Ontology result page, the ‘Tree View’ link shows the functional distribution of the set of proteins at a glance. The Pathway result contains links leading to a pictorial view highlighting the location of the query protein in the pathway.

The MPSS application also provides a convenient way for users to study homologues. Blast results can be pasted directly into the submission text box, and IDs after the ‘>’ in blast output are selected automatically. If a researcher is examining a new sequence without any known public identifier, they are advised to blast the new sequence against a suitable resource (e.g., GenBank ‘nr’ dataset, Benson et al., 2004) to find a set of homologous proteins. This can be done locally or on the web. Many public websites provide such facilities including NCBI, EBI or the SRS installation at our website http://www.scbit.org/. The resulting section of the blast output containing hit IDs can be pasted into the MPSS submission page. This approach can give users a set of comprehensive annotations relevant to the query sequence.


    APPLICATION DESIGN
 TOP
 Abstract
 INTRODUCTION
 USAGE
 APPLICATION DESIGN
 DISCUSSION AND FUTURE WORK
 REFERENCES
 
The architecture of MPSS is shown in Figure 1. Data content is automatically updated at a configurable frequency. Currently, it is set to update once every two weeks. (At the time of writing, the MPSS database consists of 1,612,378 non-redundant sequences—163,496 from version 45.1 of SwissProt and 1,448,882 from version 28.1 of TrEMBL.) Together, these protein sequence entries are involved in 10,000 pathways from KEGG and are linked to over 6190 protein families, 7000 domains from Pfam and 21,000 3D structures from PDB.



View larger version (44K):
[in this window]
[in a new window]
 
Fig. 1 An overview of the MPSS architecture. The normalized protein ID(s) are mapped onto related information in MPSS. Data sources include SwissProt, TrEMBL, PDB, GO and KEGG. The main functions of MPSS are listed in the gray panels on the right.

 
MPSS collects and integrates Gene Ontology information into the local database, then maps submitted proteins onto the GO tree and creates rich links to other important databases. MPSS also provides useful molecular properties for the proteins under study including molecular weight, theoretical PI, aliphatic index and GRAVY value. This can help to confirm protein identities and facilitate further experimental design. These properties are pre-computed with the ProtParam tool from Expasy (http://www.expasy.org/tools/protpar-ref.html) to achieve high performance.

Domains, the building blocks of protein structure and function, in query proteins can also be retrieved with MPSS. This information can help to classify and predict the functions of the query protein. Domain information from a number of sources, such as Pfam and InterPro, is included.

In addition, MPSS also provides access to the pathway information in KEGG, which is useful for mapping the relationship amongst a whole system of proteins and is especially valuable for those who are interested in signal transduction networks. 3D protein structures, if available in the PDB, can be obtained easily from MPSS. Users can also view the protein classification information based on gene ontology and even map proteins onto the GO tree.


    DISCUSSION AND FUTURE WORK
 TOP
 Abstract
 INTRODUCTION
 USAGE
 APPLICATION DESIGN
 DISCUSSION AND FUTURE WORK
 REFERENCES
 
MPSS provides a novel information service concept that emphasizes fast and flexible access to comprehensive protein information. It integrates rich data from widely used protein-related databases including SwissProt, TrEMBL, InterPro and PDB with links to GO and KEGG. People working on proteomics and microarray studies may benefit from this flexible batch-processing approach of MPSS by allowing them to spend more time on addressing interesting biological questions. For instance, during the rat proteomics project (Jiang et al., 2004), we identified a large set of proteins and relied on MPSS to retrieve most up-to-date annotations from public domain databases.

Given the rapid advancement in biological sciences, the abilities to integrate new data types and adapt to new data formats are necessary for the success of an information service system. The simple internal database structure of MPSS makes it simple to incorporate additional protein information in the future. As a result, users will have increasing amounts of valuable information at their fingertips through MPSS.

Future work will be focused on customizing the service for different users. For example, researchers who are interested in signal transduction would like to see more information about the relationship among proteins together with some functional information, while those in protein engineering might be happy to get very detailed information of the whole structure along with the exact properties of 20 amino acids.

We believe that this new approach to obtaining molecular information will become increasingly prevalent due to the increasing volume and complexity of data involved in bench work. Thus, our ultimate goal is to turn MPSS into a fully automatic pipeline for researchers to retrieve protein information.


    Acknowledgments
 
We would like to thank Mr. Alex Michie for improving the readability of the manuscript. This work was supported in part by grants from the EU-China Scientific Collaborative Project, Shanghai Commission for Science and Technology (03DJ14011), the ‘863’ National High-Tech Programs (2003AA231010, 2004BA711A21) and the ‘973’ National Basic Research Programs (2001CB510203, 2002CB512801, 2003CB715901).

Received on November 24, 2004; revised on January 16, 2005; accepted on January 30, 2005

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 USAGE
 APPLICATION DESIGN
 DISCUSSION AND FUTURE WORK
 REFERENCES
 

    Bateman, A., et al. (2004) The Pfam Protein Families Database. Nucleic Acids Res., 32, D138–D141[Abstract/Free Full Text].

    Benson, D.A., et al. (2004) GenBank: update. Nucleic Acids Res., 32, D23–D26[Abstract/Free Full Text].

    Boeckmann, B., et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res., 31, 365–370[Abstract/Free Full Text].

    Deshpande, N., et al. (2005) The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res., 33, D233–D237[Abstract/Free Full Text].

    Gene Ontology Consortium. (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res., 32, D258–D261[Abstract/Free Full Text].

    Ikai, A. (1980) Thermostability and aliphatic index of globular proteins. J. Biochem., 88, 1895–1898[Abstract/Free Full Text].

    Jiang, X.S., et al. (2004) A high-throughput approach for subcellular proteome: identification of rat liver proteins using subcellular fractionation coupled with two-dimensional liquid chromatography tandem mass spectrometry and bioinformatic analysis. Mol. Cell Proteomics, 3, 441–455[Abstract/Free Full Text].

    Kanehisa, M., et al. (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res., 32, D277–D280[Abstract/Free Full Text].

    Kyte, J. and Doolittle, R.F. (1982) A simple method for displaying the hydropathic character of a protein. J. Mol. Biol., 157, 105–132[CrossRef][Web of Science][Medline].

    Mulder, N., et al. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res., 31, 315–318[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/9/2142    most recent
bti306v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Hao, P.
Right arrow Articles by Zhong, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hao, P.
Right arrow Articles by Zhong, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?