Kernel methods for predicting proteinprotein interactions
1Department of Genome Sciences, University of Washington Seattle, WA, USA
2Department of Computer Science and Engineering, University of Washington Seattle, WA, USA
*To whom correspondence should be addressed.
Motivation: Despite advances in high-throughput methods for discovering proteinprotein interactions, the interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the search for novel interactions.
Results: We present a kernel method for predicting proteinprotein interactions using a combination of data sources, including protein sequences, Gene Ontology annotations, local properties of the network, and homologous interactions in other species. Whereas protein kernels proposed in the literature provide a similarity between single proteins, prediction of interactions requires a kernel between pairs of proteins. We propose a pairwise kernel that converts a kernel between single proteins into a kernel between pairs of proteins, and we illustrate the kernel's effectiveness in conjunction with a support vector machine classifier. Furthermore, we obtain improved performance by combining several sequence-based kernels based on k-mer frequency, motif and domain content and by further augmenting the pairwise sequence kernel with features that are based on other sources of data.
We apply our method to predict physical interactions in yeast using data from the BIND database. At a false positive rate of 1% the classifier retrieves close to 80% of a set of trusted interactions. We thus demonstrate the ability of our method to make accurate predictions despite the sizeable fraction of false positives that are known to exist in interaction databases.
Availability: The classification experiments were performed using PyML available at http://pyml.sourceforge.net. Data are available at: http://noble.gs.washington.edu/proj/sppi
Contact: asa{at}gs.washington.edu
Received on January 15, 2005; accepted on March 27, 2005
This article has been cited by other articles:
![]() |
P. Smialowski, P. Pagel, P. Wong, B. Brauner, I. Dunger, G. Fobo, G. Frishman, C. Montrone, T. Rattei, D. Frishman, et al. The Negatome database: a reference set of non-interacting protein pairs Nucleic Acids Res., November 17, 2009; (2009) gkp1026v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kashima, Y. Yamanishi, T. Kato, M. Sugiyama, and K. Tsuda Simultaneous inference of biological networks of multiple species from genome-wide data and evolutionary information: a semi-supervised approach Bioinformatics, November 15, 2009; 25(22): 2962 - 2968. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Anjum, A. Doucet, and C. C. Holmes A boosting approach to structure learning of graphs with and without prior knowledge Bioinformatics, November 15, 2009; 25(22): 2929 - 2936. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. F. Rogers and A. Ben-Hur The use of gene ontology evidence codes in preventing classifier assessment bias Bioinformatics, May 1, 2009; 25(9): 1173 - 1177. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Y. Yip and M. Gerstein Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions Bioinformatics, January 15, 2009; 25(2): 243 - 250. [Abstract] [Full Text] [PDF] |
||||
![]() |
T.-t. Soong, K. O. Wrzeszczynski, and B. Rost Physical protein-protein interactions predicted from microarrays Bioinformatics, November 15, 2008; 24(22): 2608 - 2614. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Cui, Q. Liu, D. Puett, and Y. Xu Computational prediction of human proteins that can be secreted into the bloodstream Bioinformatics, October 15, 2008; 24(20): 2370 - 2375. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Mordelet and J.-P. Vert SIRENE: supervised inference of regulatory networks Bioinformatics, August 15, 2008; 24(16): i76 - i82. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Guo, X. Wu, D.-Y. Zhang, and K. Lin Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein-protein interaction dataset Nucleic Acids Res., April 1, 2008; 36(6): 2002 - 2011. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-L. Faulon, M. Misra, S. Martin, K. Sale, and R. Sapra Genome scale enzyme metabolite and drug target interaction predictions using the signature molecular descriptor Bioinformatics, January 15, 2008; 24(2): 225 - 233. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Cui, P. Li, G. Li, F. Xu, C. Zhao, Y. Li, Z. Yang, G. Wang, Q. Yu, Y. Li, et al. AtPID: Arabidopsis thaliana protein interactome database an integrative platform for plant systems biology Nucleic Acids Res., January 11, 2008; 36(suppl_1): D999 - D1008. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Zhong and P. W. Sternberg Automated data integration for developmental biological research Development, September 15, 2007; 134(18): 3227 - 3238. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Lerman and B. E. Shakhnovich Defining functional distance using manifold embeddings of gene ontology annotations PNAS, July 3, 2007; 104(27): 11334 - 11339. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Bleakley, G. Biau, and J.-P. Vert Supervised reconstruction of biological networks with local models Bioinformatics, July 1, 2007; 23(13): i57 - i65. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Dawelbait, C. Winter, Y. Zhang, C. Pilarsky, R. Grutzmann, J.-C. Heinrich, and M. Schroeder Structural templates predict novel protein interactions and targets from pancreas tumour gene expression data Bioinformatics, July 1, 2007; 23(13): i115 - i124. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Cockell, B. Oliva, and R. M. Jackson Structure-based evaluation of in silico predictions of protein protein interactions using Comparative Docking Bioinformatics, March 1, 2007; 23(5): 573 - 581. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. P. Lewis, T. Jebara, and W. S. Noble Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure Bioinformatics, November 15, 2006; 22(22): 2753 - 2760. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Aittokallio and B. Schwikowski Graph-based methods for analysing networks in cell biology Brief Bioinform, September 1, 2006; 7(3): 243 - 255. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Wu, L. Zhu, J. Guo, D.-Y. Zhang, and K. Lin Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations. Nucleic Acids Res., January 1, 2006; 34(7): 2137 - 2150. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. P. Miller, R. S. Lo, A. Ben-Hur, C. Desmarais, I. Stagljar, W. S. Noble, and S. Fields Large-scale identification of yeast integral membrane protein interactions PNAS, August 23, 2005; 102(34): 12123 - 12128. [Abstract] [Full Text] [PDF] |
||||




