Skip Navigation


Bioinformatics Advance Access originally published online on July 17, 2008
Bioinformatics 2008 24(18):2115-2116; doi:10.1093/bioinformatics/btn376
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/18/2115    most recent
btn376v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Krumsiek, J.
Right arrow Articles by Zimmer, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Krumsiek, J.
Right arrow Articles by Zimmer, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

ProCope—protein complex prediction and evaluation

Jan Krumsiek , Caroline C. Friedel * and Ralf Zimmer

Institut für Informatik, Ludwig-Maximilians-Universität München, Amalienstraße 17, 80333 München, Germany

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 RESULTS AND DISCUSSION
 REFERENCES
 

Summary: Recent advances in high-throughput technology have increased the quantity of available data on protein complexes and stimulated the development of many new prediction methods. In this article, we present ProCope, a Java software suite for the prediction and evaluation of protein complexes from affinity purification experiments which integrates the major methods for calculating interaction scores and predicting protein complexes published over the last years. Methods can be accessed via a graphical user interface, command line tools and a Java API. Using ProCope, existing algorithms can be applied quickly and reproducibly on new experimental results, individual steps of the different algorithms can be combined in new and innovative ways and new methods can be implemented and integrated in the existing prediction framework.

Availability: Source code and executables are available at http://www.bio.ifi.lmu.de/Complexes/ProCope/

Contact: Caroline.Friedel{at}bio.ifi.lmu.de


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 RESULTS AND DISCUSSION
 REFERENCES
 
Protein complexes as modular components of the cells carry out many important molecular functions. Thus, the identification and characterization of protein complexes is a key to understanding cellular processes. Since the publication of two genome-wide tandem affinity purification (TAP) assays for yeast (Gavin et al., 2006; Krogan et al., 2006) many efforts have been put into the prediction of the actual protein complexes from these experimental datasets (Collins et al., 2007; Friedel et al., 2008; Gavin et al., 2006; Hart et al., 2007; Krogan et al., 2006; Pu et al., 2007; Zhang et al., 2008). Generally, these methods first calculate scoring networks for protein interactions from the purification results which are then clustered to derive protein complexes.

Here we present ProCope, an extensible software package written in Java for the prediction and evaluation of protein complexes from purification datasets which integrates efficient implementations of the major prediction methods. The ProCope package provides a convenient graphical user interface (GUI), command line tools suitable for batch job processing and a Java application programming interface (API). The Java API makes it possible to use the functionality of ProCope within other software applications or to extent the user interfaces with new prediction algorithms. Thus, ProCope is a useful tool for researchers both for applying published methods on new datasets to obtain reproducible and high-quality predictions and for developing and evaluating new prediction methods.


    2 IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 RESULTS AND DISCUSSION
 REFERENCES
 
2.1 Methods
The general procedure for protein complex prediction and evaluation used within ProCope is outlined in Figure 1. ProCope provides implementations for the following scoring methods: Socio-affinity scores (Gavin et al., 2006), Purification Enrichment scores (Collins et al., 2007), a scoring scheme based on the hypergeometrical distribution (Hart et al., 2007), Dice coefficients (Zhang et al., 2008) and bootstrap confidence scores (Friedel et al., 2008).


Figure 1
View larger version (27K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Outline of the protein complex set prediction and evaluation procedure. (1) A protein–protein interaction scoring network is derived from the purification data. (2) The scoring network is clustered to derive a set of predicted protein complexes. (3) The quality of the predicted complexes is evaluated using functional annotations, experimental data or reference complex sets. (4) Alternatively, the quality of the scoring networks can be evaluated directly on a reference complex set.

 
For the clustering of PPI networks, ProCope provides access to the Markov Clustering algorithm (van Dongen, 2000) and implements several variants of hierarchical agglomerative clustering (Murthag, 1984). Furthermore, two methods have been implemented for identifying proteins involved in more than one complex (Friedel et al., 2008; Pu et al., 2007).

The accuracy of scoring networks in predicting interactions within reference complexes can be assessed using receiver operating characteristic (ROC) curves. Predicted complexes can be evaluated by calculating sensitivity and positive predictive value compared to reference complexes (Brohée and van Helden, 2006) or functional similarity and co-localization within protein complexes. Functional similarity is evaluated in terms of semantic similarity (Schlicker et al., 2006) of the GO annotations within the complex. Several alternative definitions of semantic similarity are implemented. Furthermore, ProCope provides two different co-localization measures based on experimental data (Friedel et al., 2008; Pu et al., 2007).

All methods have been tested extensively against the results of the original publications. ProCope has been designed to be easily extensible as well as highly efficient to allow for higher order algorithms which require many repeated calculations such as the bootstrap algorithm we presented recently (Friedel et al., 2008).

2.2 User interfaces and Java API
ProCope provides an intuitive, easy to use GUI (see Figure 2) which can also be used as a Cytoscape plugin (Shannon et al., 2003). The user can quickly load purification data, calculate and cluster scoring networks, load evaluation data, compare complex sets, apply cut-offs on score networks or complex sets and much more. A Java Webstart version of the GUI is also available to start the program directly from within the webbrowser without the need for installing the package. In addition to the GUI, a set of command line tools are provided for calculating interaction scores, clustering and complex evaluation. With these tools, different calculation processes can be easily integrated in scripts and batch jobs and distributed on computing clusters.


Figure 2
View larger version (34K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. This figure shows the GUI of ProCope and example results: a ROC curve for different scoring networks and a histogram of socio-affinity scores.

 
All functionalities of ProCope can be accessed via a well-documented Java API and integrated easily into new software programs. Furthermore, the user interfaces of ProCope offer plugin functionalities to extend them by custom score calculation and clustering methods. Detailed examples on using the GUI, command-line tools and the Java API including example source codes are provided with the ProCope release.


    3 RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 RESULTS AND DISCUSSION
 REFERENCES
 
ProCope is an extensible Java suite for the prediction and evaluation of protein complexes from affinity purification assays which implements state-of-the-art methods published in the field over the last years. Its potentials can be illustrated with two examples. First, using the GUI, biologists can rapidly analyse data from new affinity purification experiments using previously published methods or a combination of those. Thus, time-consuming and error-prone reimplementations are avoided. Only few steps are necessary to load the purification data, calculate reaction scores, predict complexes from the scoring networks and evaluate the results (see the ProCope documentation).

Second, the existing functionalities of ProCope can be easily extended using the Java API. Researchers developing new prediction methods can rely on the infrastructure and evaluation methods provided by ProCope. New interaction score definitions or complex predictions methods can be implemented quickly by extending appropriate classes and the performance of the methods can be assessed without delay. Moreover, new powerful approaches can be made available to the research community immediately using the plugin option of the user interfaces. Using ProCope, we have developed an unsupervised bootstrap approach which does not require additional training data apart from the purification experiments (Friedel et al., 2008). Bootstrap confidence scores have been shown to be more accurate than previously published scoring methods and predicted complexes are of equivalent quality to the best published predictions.

Because of the easy access to its methods and the efficiency and extensibility of its algorithms, the ProCope package will be a valuable tool for researchers seeking to apply existing algorithms to new data or developing new and innovative prediction methods.

Requirements

ProCope requires the Java 5.0 (or higher) Runtime Environment, which is freely available at http://www.java.com.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Burkhard Rost

Received on May 27, 2008; revised on July 7, 2008; accepted on July 15, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 RESULTS AND DISCUSSION
 REFERENCES
 

    Brohée S, van Helden J. Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics (2006) 7:488.[CrossRef][Medline]

    Collins SR, et al. Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol. Cell. Proteomics (2007) 6:439–450.[Abstract/Free Full Text]

    Friedel CC, et al. Bootstrapping the interactome: Unsupervised identification of protein complexes in yeast. In: RECOMB 2008.—Vingron M, Wong L, eds. (2008) Berlin/Heidelberg: Springer. 3–16. LNBI 4955.

    Gavin A-C, et al. Proteome survey reveals modularity of the yeast cell machinery. Nature (2006) 440:631–636.[CrossRef][Web of Science][Medline]

    Hart GT, et al. A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinformatics (2007) 8:236.[CrossRef][Medline]

    Krogan NJ, et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature (2006) 440:637–643.[CrossRef][Web of Science][Medline]

    Murthag F. Complexities of hierarchic clustering algorithms: state of the art. Computat. Stat. Q. (1984) 1:101–113.

    Pu S, et al. Identifying functional modules in the physical interactome of Saccharomyces cerevisiae. Proteomics (2007) 7:944–960.[CrossRef][Web of Science][Medline]

    Schlicker A, et al. A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics (2006) 7:302.[CrossRef][Medline]

    Shannon P, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. (2003) 13:2498–2504.[Abstract/Free Full Text]

    van Dongen S. Graph Clustering by Flow Simulation. In: Ph.D. Thesis. (2000) Utrecht, The Netherlands: University of Utrecht.

    Zhang B, et al. From pull-down data to protein interaction networks and complexes with biological relevance. Bioinformatics (2008) 24:979–986.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
C. C. Friedel and R. Zimmer
Identifying the topology of protein complexes from affinity purification assays
Bioinformatics, August 15, 2009; 25(16): 2140 - 2146.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/18/2115    most recent
btn376v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Krumsiek, J.
Right arrow Articles by Zimmer, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Krumsiek, J.
Right arrow Articles by Zimmer, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?