Bioinformatics Advance Access originally published online on October 10, 2006
Bioinformatics 2006 22(22):2838-2840; doi:10.1093/bioinformatics/btl487
Integration of gel-based proteome data with pProRep
1 Laboratory of Plant Physiology and Plant Biochemistry, University of Antwerp Groenenborgerlaan 171, B-2020 Antwerp, Belgium
2 CEPROMA, Center for Proteome Analysis and Mass Spectrometry, University of Antwerp Groenenborgerlaan 171, B-2020 Antwerp, Belgium
3 Department of Biochemistry & Molecular Biology, University of Southern Denmark Odense, Denmark
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: pProRep is a web application integrating electrophoretic and mass spectral data from proteome analyses into a relational database. The graphical web-interface allows users to upload, analyse and share experimental proteome data. It offers researchers the possibility to query all previously analysed datasets and can visualize selected features, such as the presence of a certain set of ions in a peptide mass spectrum, on the level of the two-dimensional gel.
Availability: The pProRep package and instructions for its use can be downloaded from http://www.ptools.ua.ac.be/pProRep. The application requires a web server that runs PHP 5 (http://www.php.net) and MySQL. Some (non-essential) extensions need additional freely available libraries: details are described in the installation instructions.
Contact: kris.laukens{at}ua.ac.be
| 1 INTRODUCTION |
|---|
|
|
|---|
Proteome analysis depends on separation techniques to reduce the complexity of the protein mixture. These techniques are either gel-electrophoresis-based or chromatography-based. In gel-based proteome analysis, intact proteins are usually separated by two-dimensional gel electrophoresis before proteolytic digestion and mass-spectrometric analysis of the resulting peptides. This technique has certain constraints, but since digestion is done after the separation, the connectivity between multiple peptides originating from a single spot is retained. This unique feature makes gel-based analysis an indispensable tool especially for proteome analysis of non-sequenced organisms, as well as in many other cases where maintaining the connectivity between all fragments of a single protein is a requirement.
Very often the knowledge of a previous analysis of a sample is of great value to a repeated analysis and the interpretation of its results. Making all individual analysis results available in a proteomics environment requires dedicated tools. This report describes such a tool called pProRep which was in particular designed to integrate and query gel-based proteome data. In contrast to the available public proteome data repositories (Martens et al., 2005; Desiere et al., 2005; Prince et al., 2004; Craig et al., 2004), pProRep is developed mainly to support in house data integration. It was designed to comply with the following criteria:
- Visual access to the whole database, including annotated gel image maps, through a web browser, as in a federated 2-DE database (Appel et al., 1996; Hoogland et al., 1997).
- Integration of all data types, from sample/experiment/project details to mass spectrometry peak lists and protein identification results in one relational database [e.g. similar to the PROTEIOS tool; Garden et al. (2005)].
- Advanced query functions through all levels of the analytical workflow and graphical visualization of the query results.
| 2 DESIGN AND IMPLEMENTATION |
|---|
|
|
|---|
The pProRep interface is completely based on open-source technology. It is developed in the web-scripting language PHP (PHP 5.0) and tested on an Apache 2.0 web server on Fedora C4. Some of the (not essential) graphical extensions depend on the (freely available) open-source packages GD2 and JpGraph (instructions provided with the documentation). The database was set up in MySQL 4.0.
Tables supporting the following data-types were defined: projects, experiments, samples, gels, spots, hits and mass spectrum peak lists. The database scheme is largely based on the original Pedro-scheme (Taylor et al., 2003), but can adjusted via a number of configuration files. The html-based web-interface was tested using Firefox 1.01.5 and IE 6.0.
The core of the application consists of the system-wide classes, a user-based permission/authentication system and the template-based data visualization code. A database layer in the system core takes care of the interaction of the system with the database. The output is generated by a number of blocks, modules and extensions. A series of configuration files define the complete mapping of all back-end tables to the various interface modules. An overview of the implementation is provided on the website.
| 3 FEATURES |
|---|
|
|
|---|
The different modules in the pProRep-interface can be grouped according to their functionality.
3.1 Data browsing
The listview and recordview modules display any type of information in a formatted table. These modules are used to browse, sort and filter datasets, get information related to a selected information type (e.g. retrieve all associated samples for a given experiment), and offer hyperlinks to other specialized modules designed to perform specific tasks.
The gel module provides all information about a specific gel and uses an associated extension to dynamically generate the gel-image with all associated spots. Annotated spots are automatically hyperlinked to related spot information. Spots present in one or several clipboard channels (see Section 3.4) can be selectively highlighted. A related spot-positioning module enables a user with data administration privileges to visually adjust and update the coordinates of any given spot on a gel.
The spectrum module shows a server-generated mass spectrum graph based on the peak list, a text area with the corresponding list of peaks, and a list of associated spectra: precursor- or product-ion spectra.
3.2 Data import and manipulation
The data administration module offers multiple ways to enter, upload, change or delete existing data. Generic forms are present to add, update or delete individual data rows. Filters offer more advanced data management features. Presently included are as follows: a filter to upload tab-delimited text files, which facilitates import of large datasets, and a peak list import filter used to upload spectrum peak lists in mzdata format, Applied Biosystems 4000 Explorer format and Mascot generic peaklist format. An example batch peak list import script that can be run directly from the command line is also available.
3.3 Data export
Current export functions allow for the generation of universal tab/comma-delimited text files from a series of records. Additional export functions can be defined and assigned to specific modules.
3.4 Advanced queries and data analysis
A query module enables the user to search a database in pre-defined ways through dedicated forms. Two query types are currently defined. Table query allows for querying a table according to specific field criteria. Spectrum query offers a simple way to query the peak list library with a list of masses, and returns a list of all peak lists of which at least a predefined number of peptide masses lies within a certain mass-tolerance of one of the query masses. This allows for rapid searches for spectra present in the database similar to an experimentally observed mass list. The speed of these type of queries is dependent on database performance. Some tips for optimization are given in the online documentation.
The clipboard module is a module dedicated to handle all types of analysis results. It is a user session-based tool to which individual data-items can be temporarily added from within any of the other modules. The clipboard contains three independent channels, enabling complex comparative tasks. Associated colours facilitate tracking the clipboard status of any item in the database through all other modules.
3.5 System configuration
The flexible configuration options are defined by configuration-files. Users and their associated info (including encrypted passwords) are stored in a dedicated database table, which can either belong to a separate database or to the proteome database.
| 4 APPLICATIONS AND PERSPECTIVES |
|---|
|
|
|---|
This web-based tool enables database integration of the outcomes of a laboratory's gel-based proteome experiments. It was extensively evaluated and is currently in use as a tool to integrate high-throughput gel-based proteome data in our laboratory, and user's feedback is directly taken into account in ongoing development.
Its most useful features lie in its capabilities to retrieve, analyse and query the available data. New results (e.g. from a mass spectrometric analysis), or even theoretical data (e.g. a set of mass values associated with a potential post-translational modification), can be rapidly compared against the database. Using the clipboard features, query results can be graphically highlighted, for example, on a labelled two-dimensional gel. Integrated datasets can more easily be re-analysed at any time in the future. Rapid access to previous data can also be of help during the acquisition of new data, and even guide new experiments. The tool can be employed to share the data obtained in a project across different laboratories, e.g. within collaborations. It gives permission-based access to up-to-date information on the status of a project. The application can also be set up to distribute (validated) datasets over the web, e.g. as a supplement to a paper.
In order to cope with the rapidly evolving requirements in proteome data analysis, this application was designed with flexibility and expandability in mind. Future developments will include support to additional data types, more complete support for existing standard data types, such as mzdata (e.g. other descriptive fields) and new analytical functions. The new Proteomic Standards Initiative directives (Orchard et al., 2005) will be closely followed.
| Acknowledgments |
|---|
Stefaan Vandamme and Peter Deckers are acknowledged for valuable feedback. K.L. and E.W. are postdoctoral fellows of the Fund for Scientific ResearchFlanders (Belgium) (F.W.O.-Vlaanderen). R.M. was supported by grants from EU TEMBLOR and by Carlsberg Foundation Fellowships.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Chris Stoeckert
Received on March 29, 2006; revised on September 5, 2006; accepted on September 15, 2006
| REFERENCES |
|---|
|
|
|---|
Appel, R.D., et al. (1996) Federated two-dimensional electrophoresis database: a simple means of publishing two-dimensional electrophoresis data. Electrophoresis, 17, 540546[CrossRef][Web of Science][Medline].
Craig, R., et al. (2004) Open source system for analyzing, validating, and storing protein identification data. J. Proteome Res, . 3, 12341242[CrossRef][Web of Science][Medline].
Desiere, F., et al. (2005) Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biol, . 6, R9[CrossRef][Medline].
Garden, P., et al. (2005) PROTEIOS: an open source proteomics initiative. Bioinformatics, 21, 20852087
Hoogland, C., et al. (1997) Make2ddb: a simple package to set up a two-dimensional electrophoresis database for the World Wide Web. Electrophoresis, 18, 27552758[CrossRef][Web of Science][Medline].
Martens, L., et al. (2005) PRIDE: the PRoteomics IDEntifications database. Proteomics, 5, 35373545[CrossRef][Web of Science][Medline].
Orchard, S., et al. (2005) Further steps towards data standardisation: the Proteomic Standards Initiative HUPO 3(rd) annual congress, Beijing 2527(th) October, 2004. Proteomics, 5, 337339[CrossRef][Medline].
Prince, J.T., et al. (2004) The need for a public proteomics repository. Nat. Biotechnol, . 22, 471472[CrossRef][Web of Science][Medline].
Taylor, C.F., et al. (2003) A systematic approach to modeling, capturing, and disseminating proteomics experimental data. Nat. Biotechnol, . 21, 247254[CrossRef][Web of Science][Medline].
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||