Skip Navigation


Bioinformatics Advance Access originally published online on February 3, 2005
Bioinformatics 2005 21(9):2085-2087; doi:10.1093/bioinformatics/bti291
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/9/2085    most recent
bti291v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gärdén, P.
Right arrow Articles by Häkkinen, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gärdén, P.
Right arrow Articles by Häkkinen, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

PROTEIOS: an open source proteomics initiative

Per Gärdén 1,*, Rikard Alm 2 and Jari Häkkinen 1

1Complex Systems Division and Lund Swegene Bioinformatics Facility, Department of Theoretical Physics, Lund University SE-223 62 Lund, Sweden
2Department of Biochemistry, Center for Chemistry and Chemical Engineering, Lund University SE-221 00 Lund, Sweden

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 THE APPLICATION
 OUTLOOK
 REFERENCES
 

Summary: PROTEIOS is an initiative for the development of a comprehensive open source system for storage, organization, analysis and annotation of proteomics experiments. The PROTEIOS platform is based on commonly acknowledged principles for proteomics data publishing.

Availability: http://www.proteios.org

Contact: per{at}thep.lu.se


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 THE APPLICATION
 OUTLOOK
 REFERENCES
 
The need to organize proteomics data in a standardized form is greater than ever (Prince et al., 2004). The advantage of standards is not limited to software development, but extends to allow the seamless exchange of data between researchers, thus making data publication less strenuous. The Proteomics Standards Initiative (PSI) (http://psidev.sourceforge.net) (Orchard et al., 2003) together with manufacturers of mass spectrometry (MS) equipment have recognized these benefits within MS data and are consequently moving towards standardization. Currently, the PSI covers MS experimental data in the mzData (http://psidev.sourceforge.net) standard and in a related initiative the PSI also covers protein–protein interactions. Future development of PSI standards will cover the larger experimental context, including parts dealing with samples and protein identification.

After a standard has been developed, it will still take time for its adoption by the laboratories. Much effort is needed to effect data exchange and develop solutions for organizing data. In the experimental setting, the data of one experiment is aggregated from several inputs, where laboratory equipment typically cover only certain parts of an experiment. Moreover, the data output is difficult to handle as it contains superfluous data and does not comply with open standards.

PROTEIOS supports the user in managing and connecting data from heterogeneous sources with the aim of tracking all information relevant to an experiment—sample, processing, MS and protein identification. This scope sets it apart from other applications, most of which either focus on MS [e.g. Sashimi (http://sashimi.sourceforge.net), OPD (http://bioinformatics.icmb.utexas.edu/OPD) (Prince et al., 2004), PROTEOME-3d (Lundgren et al., 2003)] or do not enable automatized data capture [e.g. PEDRo (http://pedro.man.ac.uk) (Taylor et al., 2003)]. In this respect, PROTEIOS aims to become for proteomics what BASE (Saal et al., 2002), also maintained by our group, is for microarray research. As data formats for microarray experiments differ from that of proteomic experiments, existing microarray database platforms should probably not be used. Rather than tweaking proteomics data into a tool like BASE, one is better off creating separate applications and thus avoid compromises in data models.

PROTEIOS manages biomaterials information, raw data, images and analysis results, and provides integrated ‘plug-in’-able protein identification, data viewing and analysis tools. The organization and interface of PROTEIOS is designed to closely follow the natural work-flow of the proteomics researcher, and is today compatible with both LC–MSMS and 2D-gel experiments. Being an open source software, PROTEIOS can be used independently of equipment manufacturers and extended or modified to fit local needs.


    THE APPLICATION
 TOP
 Abstract
 INTRODUCTION
 THE APPLICATION
 OUTLOOK
 REFERENCES
 
PROTEIOS is a client–server application, with a many-to-many relationship between clients and servers. This architecture, where PROTEIOS handles import and export of data to and from databases, makes it possible for researchers to share data with colleagues worldwide. PROTEIOS maintains data ownership and accessibility on each database as one unit using standard SQL privilege mechanisms. The PROTEIOS data model is implemented as an XML-schema and as database tables. XML-schemas prescribe the format for files which are directly importable to PROTEIOS. Although XML-files can be very large, XML has a great advantage in that it allows data to be validated. This is important since validation prevents corrupt data from being added into the database. The mass spectrometry standard format mzData is also directly importable since PROTEIOS uses mzData to describe the MS part of the data. Furthermore, the sample generation and sample processing parts of PEDRo (Taylor et al., 2003) can be readily imported. Other imports exist (e.g. mzXML as raw files) and further ones are easily added.

PROTEIOS is implemented in Java and SQL, and is thus platform-independent. Specifically, the PROTEIOS client runs as a Java application on virtually any workstation and connects to server database(s) through the Hibernate (http://www.hibernate.org) middle-ware. Hibernate adds a database abstraction layer that supports most SQL database providers, enabling a wide range of databases to be used as PROTEIOS back-end servers. Currently, two PROTEIOS client applications exist, a graphical user interface (GUI) and a batch handling client.

PROTEIOS graphical user interface
The GUI is the main PROTEIOS client and the common interface to view and analyse the data. It presents the data as graphical objects which makes data viewing easy and intuitive. By interacting with these objects, a user can also import and export data. The GUI enables the user to tie together data from different experimental sources in a project (Fig. 1.



View larger version (74K):
[in this window]
[in a new window]
 
Fig. 1 The PROTEIOS graphical user interface.

 
The data presentation is a tree structure, which can be rearranged to highlight items of interest. This functionality is part of the very flexible data import and export feature. The data can be annotated and extended with, for instance, protein identifications from search engines like PIUMS (http://idelnx81.hh.se/bioinf/mass_spectro.html) and Mascot (http://www.matrixscience.com/) (Perkins et al., 1999).

PROTEIOS batch handling
The same functionality as in the GUI is provided for batch processing.


    OUTLOOK
 TOP
 Abstract
 INTRODUCTION
 THE APPLICATION
 OUTLOOK
 REFERENCES
 
So far most of the effort has been on developing the data repository infrastructure, including validation capabilities, import and export of data, and enabling asynchronous entry of experimental data. The focus is now on extending analysis features and incorporating third party tools. Future development will include a stand-alone PROTEIOS server, which will enable web interaction tools to be connected.

PROTEIOS is evolving rapidly—new features are constantly added. At the same time the aim of PROTEIOS is to remain compatible with the upcoming PSI standards and turn them into useful functionalities. Among other things, future features will include interactability with more protein identification search tools [e.g. Mascot and Sequest (http://fields.scripps.edu/sequest) (Eng et al., 1994)], better support for plug-ins and ontology handling. PROTEIOS is freely available for download (including sample datasets) at the PROTEIOS website http://www.proteios.org under GPL (Gnu Public License (http://www.gnu.org/copyleft/gpl.html). GPL allows anyone to use the PROTEIOS software free of charge. Restrictions may apply on redistribution and modification of the application.


    Acknowledgments
 
This work is in part supported by the Knut and Alice Wallenberg Foundation through the Swegene consortium and by grants from FORMAS (22.6/2002–0042).

Received on November 25, 2004; revised on January 11, 2005; accepted on January 25, 2005

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 THE APPLICATION
 OUTLOOK
 REFERENCES
 

    Eng, J.K., et al. (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom., 5, 976–989[CrossRef][ISI].

    Lundgren, D.H., et al. (2003) Proteome-3d: an interactive bioinformatics tool for large-scale data exploration and knowledge discovery. Mol. Cell. Proteomics, 2, 1164–1176[Abstract/Free Full Text].

    Orchard, S., et al. (2003) The proteomics standards initiative. Proteomics, 3, 1374–1376[CrossRef][ISI][Medline].

    Perkins, D.N., et al. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 20, 3551–3567[CrossRef][ISI][Medline].

    Prince, J.T., et al. (2004) The need for a public proteomics repository. Nat. Biotechnol., 22, 471–472[CrossRef][ISI][Medline].

    Saal, L.H., et al. (2002) Bioarray software environment: a platform for comprehensive management and analysis of microarray data. Genome Biol., 3, software0003.1–0003.6.

    Taylor, C.F., et al. (2003) A systematic approach to modeling, capturing, and disseminating proteomics experimental data. Nat. Biotechnol., 21, 247–254[CrossRef][ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
K. Laukens, R. Matthiesen, F. Lemiere, E. Esmans, H. V. Onckelen, O. N. Jensen, and E. Witters
Integration of gel-based proteome data with pProRep
Bioinformatics, November 15, 2006; 22(22): 2838 - 2840.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/9/2085    most recent
bti291v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gärdén, P.
Right arrow Articles by Häkkinen, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gärdén, P.
Right arrow Articles by Häkkinen, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?