Bioinformatics Advance Access originally published online on February 22, 2005
Bioinformatics 2005 21(10):2488-2495; doi:10.1093/bioinformatics/bti339
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Selective integration of multiple biological data for supervised network inference
1Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST) 2-43 Aomi Koto-ku, Tokyo, Japan
2Max Planck Institute for Biological Cybernetics Spemannstrasse 38, 72076 Tübingen, Germany
3Graduate School of Frontier Sciences, University of Tokyo 5-1-5, Kashiwanoha, Kashiwa, Chiba 2778562, Japan
*To whom correspondence should be addressed.
Motivation: Inferring networks of proteins from biological data is a central issue of computational biology. Most network inference methods, including Bayesian networks, take unsupervised approaches in which the network is totally unknown in the beginning, and all the edges have to be predicted. A more realistic supervised framework, proposed recently, assumes that a substantial part of the network is known. We propose a new kernel-based method for supervised graph inference based on multiple types of biological datasets such as gene expression, phylogenetic profiles and amino acid sequences. Notably, our method assigns a weight to each type of dataset and thereby selects informative ones. Data selection is useful for reducing data collection costs. For example, when a similar network inference problem must be solved for other organisms, the dataset excluded by our algorithm need not be collected.
Results: First, we formulate supervised network inference as a kernel matrix completion problem, where the inference of edges boils down to estimation of missing entries of a kernel matrix. Then, an expectationmaximization algorithm is proposed to simultaneously infer the missing entries of the kernel matrix and the weights of multiple datasets. By introducing the weights, we can integrate multiple datasets selectively and thereby exclude irrelevant and noisy datasets. Our approach is favorably tested in two biological networks: a metabolic network and a protein interaction network.
Availability: Software is available on request.
Contact: kato-tsuyoshi{at}aist.go.jp
Supplementary information: A supplementary report including mathematical details is available at www.cbrc.jp/~kato/faem/faem.html
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Cui, P. Li, G. Li, F. Xu, C. Zhao, Y. Li, Z. Yang, G. Wang, Q. Yu, Y. Li, et al. AtPID: Arabidopsis thaliana protein interactome database an integrative platform for plant systems biology Nucleic Acids Res., January 11, 2008; 36(suppl_1): D999 - D1008. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Bleakley, G. Biau, and J.-P. Vert Supervised reconstruction of biological networks with local models Bioinformatics, July 1, 2007; 23(13): i57 - i65. [Abstract] [Full Text] [PDF] |
||||

