Bioinformatics Advance Access published online on February 26, 2008
Bioinformatics, doi:10.1093/bioinformatics/btn036
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
From pull-down data to protein interaction networks and com-plexes with biological relevance

1Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831
2Computer Science Department, North Carolina State University, Raleigh, NC 27695
*To whom correspondence should be addressed. Dr. Nagiza F. Samatova, E-mail: samatovan{at}ornl.gov
| Abstract |
|---|
Motivation: Recent improvements in high-throughput Mass Spectrometry (MS) technology have expedited genome-wide discovery of protein-protein interactions by providing a capability of detecting protein complexes in a physiological setting. Computational inference of protein interaction networks and protein complexes from MS data are challenging. Advances are required in developing robust and seamlessly integrated procedures for assessment of protein-protein interaction affinities, for mathematical representation of protein interaction networks, for discovery of protein complexes, and for evaluation of their biological relevance.
Results: A multi-step but easy-to-follow framework for identifying protein complexes from MS pull-down data is introduced. It assesses interaction affinity between two proteins based on similarity of their co-purification patterns derived from MS data. It constructs a protein interaction network by adopting a knowledge-guided threshold selection method. Based on the network, it identifies protein complexes and infers their core components using a graph-theoretical approach. It deploys a statistical evaluation procedure to assess biological relevance of each found complex. On Saccharo-myces cerevisiae pull-down data (Nature, vol. 440, pp. 631-636, 2006), the framework outperformed other more complicated schemes by at least 10% in F1-measure and identified 610 protein complexes with high functional homo-geneity based on the enrichment in Gene Ontology (GO) annotation. Manual examination of the complexes brought forward hypotheses on the cause of false identifications. Namely, co-purification of different protein complexes as mediated by a common non-protein molecule, such as DNA, might be a source of false positives. Protein identification bias in pull-down technology, such as the hydrophilic bias could result in false negatives.
Contact: samatovan{at}ornl.gov
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Dr. Jonathan Wren
+These authors contributed equally to this work
Current address: Department of Biomedical Informatics, Vanderbilt Uni-versity, Nashville, TN 37232
Received on October 25, 2007; revised on January 2, 2008; accepted on January 22, 2008
This article has been cited by other articles:
![]() |
J. Krumsiek, C. C. Friedel, and R. Zimmer ProCope--protein complex prediction and evaluation Bioinformatics, September 15, 2008; 24(18): 2115 - 2116. [Abstract] [Full Text] [PDF] |
||||
