Bioinformatics Advance Access originally published online on February 26, 2008
Bioinformatics 2008 24(7):979-986; doi:10.1093/bioinformatics/btn036
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
From pull-down data to protein interaction networks and complexes with biological relevance
,

1Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 and 2Computer Science Department, North Carolina State University, Raleigh, NC 27695, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Recent improvements in high-throughput Mass Spectrometry (MS) technology have expedited genome-wide discovery of protein–protein interactions by providing a capability of detecting protein complexes in a physiological setting. Computational inference of protein interaction networks and protein complexes from MS data are challenging. Advances are required in developing robust and seamlessly integrated procedures for assessment of protein–protein interaction affinities, mathematical representation of protein interaction networks, discovery of protein complexes and evaluation of their biological relevance.
Results: A multi-step but easy-to-follow framework for identifying protein complexes from MS pull-down data is introduced. It assesses interaction affinity between two proteins based on similarity of their co-purification patterns derived from MS data. It constructs a protein interaction network by adopting a knowledge-guided threshold selection method. Based on the network, it identifies protein complexes and infers their core components using a graph-theoretical approach. It deploys a statistical evaluation procedure to assess biological relevance of each found complex. On Saccharomyces cerevisiae pull-down data, the framework outperformed other more complicated schemes by at least 10% in F1-measure and identified 610 protein complexes with high-functional homogeneity based on the enrichment in Gene Ontology (GO) annotation. Manual examination of the complexes brought forward the hypotheses on cause of false identifications. Namely, co-purification of different protein complexes as mediated by a common non-protein molecule, such as DNA, might be a source of false positives. Protein identification bias in pull-down technology, such as the hydrophilic bias could result in false negatives.
Contact: samatovan{at}ornl.gov
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Jonathan Wren
Present address: Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37232.
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
Received on October 25, 2007; revised on January 2, 2008; accepted on January 22, 2008
This article has been cited by other articles:
![]() |
Z. Fu, M. Wang, M. Gucek, J. Zhang, J. Wu, L. Jiang, R. E. Monticone, B. Khazan, R. Telljohann, J. Mattison, et al. Milk Fat Globule Protein Epidermal Growth Factor-8: A Pivotal Relay Element Within the Angiotensin II and Monocyte Chemoattractant Protein-1 Signaling Cascade Mediating Vascular Smooth Muscle Cells Invasion Circ. Res., June 19, 2009; 104(12): 1337 - 1346. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Krumsiek, C. C. Friedel, and R. Zimmer ProCope--protein complex prediction and evaluation Bioinformatics, September 15, 2008; 24(18): 2115 - 2116. [Abstract] [Full Text] [PDF] |
||||

