Skip Navigation


Bioinformatics Advance Access originally published online on May 22, 2008
Bioinformatics 2008 24(13):1554-1555; doi:10.1093/bioinformatics/btn238
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/13/1554    most recent
btn238v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gilmore, J. M.
Right arrow Articles by Daly, D. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gilmore, J. M.
Right arrow Articles by Daly, D. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

A Bayesian estimator of protein–protein association probabilities

Jason M. Gilmore 1, Deanna L. Auberry 1, Julia L. Sharp 2, Amanda M. White 1, Kevin K. Anderson 1 and Don S. Daly 1,*

1Pacific Northwest National Laboratory, Battelle Boulevard, Richland, WA 99352 and 2Department of Applied Economics and Statistics, Clemson University, Clemson, SC 29634, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 SENSITIVITY AND SPECIFICITY
 4 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: The Bayesian Estimator of Protein–Protein Association Probabilities (BEPro aff3) is a software tool for estimating probabilities of protein–protein association between bait and prey protein pairs using data from multiple-bait, multiple-replicate, protein liquid chromatography tandem mass spectrometry LC–MS/MS affinity isolation experiments.

Availability: BEPro 3 is public domain software, has been tested on WIndows XP, Linux and Mac OS, and is freely available from http://www.pnl.gov/statistics/BEPro3.

Contact: ds.daly{at}pnl.gov

Supplementary Information: A user guide, example dataset with analysis and additional documentation are included with the BEPro 3 download.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 SENSITIVITY AND SPECIFICITY
 4 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Identifying associations between proteins is essential to the larger goal of inferring protein networks and their functions. Two closely related techniques for uncovering protein associations are the endogenous and exogenous protein affinity isolation assays (Dziembowski and Seraphin, 2004; Markillie et al., 2005). These purification methods isolate ‘prey’ proteins that interact with an affinity-tagged ‘bait’ protein. The isolated prey proteins are digested, analyzed, and identified using liquid chromatography tandem mass spectrometry (LC–MS/MS). Prey identities, however, are uncertain due to mislabeling (a false positive ID) or missed labeling (a false negative ID).

Certainty in a prey identity and, hence, in a prey-bait association is gained through replicate assays. The proportion of observations of a prey protein across replicates of bait is naïve estimate of the probability of association between that prey and bait. This is a naive estimate in that it does not account for false positive and false negative identifications. Further, not all true prey–bait associations are of interest. For instance, a prey protein, such as ribosomal protein, that binds indiscriminately, or ‘ubiquitously,’ across the featured bait proteins may not be of interest.

An affinity isolation experiment identifies a set of potential prey–bait associations—some false, some true, some true but not interesting. This set is informative about both the probabilities of prey–bait association and rates of false positive and false negative identifications The Bayesian Estimator of Probabilities of Protein–Protein Associations (BEPro3) refines the naïve estimate of the probability of association using an estimated prior probability of association, and estimated rates of false positive and false negative identifications. BEPro3 then scores the ubiquity of the prey protein from the prey's Bayes Odds.


    2 ALGORITHM
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 SENSITIVITY AND SPECIFICITY
 4 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
BEPro3 applies a three step statistical algorithm. A likelihood ratio test (LRT), the first step, establishes whether a prey's pattern of observations across a set of bait proteins is random (uniform), or non-random (non-uniform) with respect to at least one bait protein. The Bayes’ Odds of a prey–bait association, or posterior probability of an association, is calculated in the second step using prior probabilities of false positive and false negative identifications whose estimates depend upon the LRT outcomes. Finally, the ubiquity of a prey protein is scored using a weighted average of its Bayes’ Odds scores (Sharp et al., 2007). A prey with many large Bayes’ Odds across baits scores high ubiquity and a prey with few large Bayes’ Odds values receives a low ubiquity score.

A BEPro3 Bayes’ Odds calculation depends upon three parameters: the chance that a randomly chosen protein associates with another randomly chosen protein, and the false positive and false negative identification rates. For a large proteome of N proteins, our prior belief is that the probability that two randomly chosen proteins associate is small, say 1/(N–1). The false positive and false negative identification rates may vary from prey-to-prey and bait-to-bait because one prey may be more easily observed by LC–MS than another, and because of analytical differences in the affinity isolation assay, or differences in sample concentrations submitted for LC–MS. For a prey with a statistically significant LRT, the algorithm estimates the prey true positive and false positive rates by segregating the observed frequencies of prey observation into high and low frequency classes. For a prey with a non-significant LRT, the true positive and false positive rates are estimated with the medians of the rates for those prey with statistically significant LRTs.

It is important to note that the interpretations of the LRT, Bayes’ Odds and ubiquity statistics depend upon the design of the affinity isolation experiment. An experiment featuring all bait proteins that knowingly associate should result in non-significant LRTs, high Bayes’ Odds across almost all bait proteins and high ubiquity scores for those bait proteins also observed as prey. Whereas, a second experiment featuring randomly selected bait proteins should result in statistically significant LRTs (i.e. non-uniform observation frequencies across baits), high Bayes’ Odds for certain bait–prey combinations, and low ubiquity scores for all but those promiscuously sticky prey.


    3 SENSITIVITY AND SPECIFICITY
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 SENSITIVITY AND SPECIFICITY
 4 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The statistical sensitivity and specificity of the BEPro3 algorithm may be assessed using an endogenous affinity isolation experiment with known protein complexes that involved 75 LCMS injections of 2–10 replicates of 16 bait proteins with 9 baits having 4 replicates and 4 baits with 5 or more replicates (Sharp et al., 2007). The resulting 200 prey by 16 bait frequency matrix contained 43 prey that were observed in all but one injection, and 106 that were observed in 5 or fewer injections.

We estimate BEPro3 sensitivity with the true positive fraction of prey–bait associations, or the proportion of known prey–bait pairs (as identified by literature mining and previous affinity isolation experiments) that have a high Bayes’ Odds. Similarly, we estimate BEPro3 specificity with one minus the false positive fraction of prey–bait associations, or one minus the proportion of prey–bait pairs not known to interact that have a high Bayes’ Odds.

Assuming the preliminary identification of prey–bait interactors is true, 3104 of 3200 observed prey–bait associations, or 97%, fall in the ‘not known interactors’ category in the example affinity isolation experiment. The estimated BEPro3 specificity, with a Bayes’ Odds cutoff of 0.5, is about 95% (Fig. 1A). The estimated sensitivity is about 50% (Fig. 1B). Alternatively, if we accept the BEPro3 identification of prey–bait interactors, then 165 prey–bait pairs with high Bayes’ Odds in the ‘not known interactors’ category (Fig. 1A) deserve further investigation as potential protein interactors. Further, the 48 pairs with low Bayes’ Odds in the ‘known interactors’ category (Fig. 1B) may provide guidance to improving the assay.


Figure 1
View larger version (22K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Empirical distributions of Bayes’ Odds for prey proteins that are known interactors (B) and not known interactors (A) with the bait proteins featured in the known complexes experiment.

 

    4 IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 SENSITIVITY AND SPECIFICITY
 4 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
BEPro3 was designed, developed and packaged to ensure a sound implementation of its sophisticated statistical algorithm, to facilitate easy, sensible usage, to ensure easy availability and to encourage modification. The core statistical routines are written in the R language (The R project for Statistical Computing. Vienna, Austria. http://www.r-project.org). A JAVA user interface (Sun Microsystems, Inc., Santa Clara, CA, USA. http://java.sun.com) facilitates data management and setting analysis parameters. BEPro3 returns tabular results and an HTML-annotated analysis summary as text compatible with Excel (Microsoft, Inc., Redmond, WA, USA), Cytoscape (Institute for Systems Biology, Seattle, WA, USA. http://www.cytoscape.org) or a file/internet browser. This software requires R version 2.2 and Java 1.5.0, or more recent versions. BEPro3, R and Java are free and open, allowing for unrestricted distribution under a general GNU license. The self-installing BEPro3 package includes a highly integrated user guide, example dataset and analysis, and supplementary documentation that detail the specifics of the algorithm and its implementation.

BEPro3 inputs affinity–isolation LCMS protein detection scores and LCMS sample pedigree information, as text, via comma-separated value (CSV) files. Protein detection scores, such as peptide counts or total ion abundance, may be in either a prey-by-bait cross-tabulated format with Prey IDs heading rows and LCMS sample IDs heading columns, or a long format with columns for Prey IDs, Bait IDs and detection scores. The latter is offered in recognition that a cross-tabbed matrix of LCMS detection scores is usually quite large, but sparsely populated due to missing observations. Bait IDs are linked to LCMS sample IDs and input via the sample pedigree file. Parameters that manage the flow of data and results, and guide/constrain the analysis are entered from the keyboard into the GUI.

BEPro3 output, in text format, includes an analysis summary, and tables of prey–bait detection frequencies and Bayes’ Odds with supplemental statistics. BEPro3 also creates a Cytoscape network ‘edge’ file with attributes that include Bayes’ Odds and ubiquity scores. The known complexes example, featured in ‘Statistically Inferring Protein–Protein Associations with Affinity Isolation LC-MS/MS Assays’ (Sharp et al., 2007) provides examples of the input and corresponding output.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 SENSITIVITY AND SPECIFICITY
 4 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Funding: Funding was provided by the US Department of Energy Office of Advanced Scientific Computing Research under contract 47901 and by the Office of Biological and Environmental Research under contracts 41966 and 43930 with the Pacific Northwest and the Oak Ridge National Laboratories.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Burkhard Rost

Received on October 8, 2008; revised on May 15, 2008; accepted on May 15, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ALGORITHM
 3 SENSITIVITY AND SPECIFICITY
 4 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Dziembowski A, Seraphin B. The Escherichia coli RNA degradosome: structure, function and relationship to other ribonucleolytic multienzyme complexes. FEBS Lett. (2004) 556:1–6.[CrossRef][Web of Science][Medline]

    Markillie LM, et al. Simple protein complex purification and identification method for high-throughput mapping of protein interaction networks. J. Proteome Res. (2005) 4:268–274.[CrossRef][Web of Science][Medline]

    Sharp JL, et al. Statistically inferring protein-protein associations with affinity isolation LC-MS/MS assays. J. Proteome Res. (2007) 6:3788–3795.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/13/1554    most recent
btn238v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gilmore, J. M.
Right arrow Articles by Daly, D. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gilmore, J. M.
Right arrow Articles by Daly, D. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?