Bioinformatics Advance Access originally published online on May 6, 2004
Bioinformatics 2004 20(16):2626-2635; doi:10.1093/bioinformatics/bth294
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics vol. 20 issue 16 © Oxford University Press 2004; all rights reserved.
A statistical framework for genomic data fusion
1 Department of Electrical Engineering and Computer Science, 2 Division of Computer Science, Department of Statistics, University of California, Berkeley 94720, USA, 3 Department of Electrical Engineering, ESAT-SCD, Katholieke Universiteit Leuven 3001, Belgium, 4 Department of Statistics, University of California, Davis 95618, USA and 5 Department of Genome Sciences, University of Washington, Seattle 98195, USA
Received on January 29, 2004; revised on April 7, 2004; accepted on April 23, 2004
Advance Access Publication May 6, 2004
Motivation: During the past decade, the new focus on genomics has highlighted a particular challenge: to integrate the different views of the genome that are provided by various types of experimental data.
Results: This paper describes a computational framework for integrating and drawing inferences from a collection of genome-wide measurements. Each dataset is represented via a kernel function, which defines generalized similarity relationships between pairs of entities, such as genes or proteins. The kernel representation is both flexible and efficient, and can be applied to many different types of data. Furthermore, kernel functions derived from different types of data can be combined in a straightforward fashion. Recent advances in the theory of kernel methods have provided efficient algorithms to perform such combinations in a way that minimizes a statistical loss function. These methods exploit semidefinite programming techniques to reduce the problem of finding optimizing kernel combinations to a convex optimization problem. Computational experiments performed using yeast genome-wide datasets, including amino acid sequences, hydropathy profiles, gene expression data and known proteinprotein interactions, demonstrate the utility of this approach. A statistical learning algorithm trained from all of these data to recognize particular classes of proteinsmembrane proteins and ribosomal proteinsperforms significantly better than the same algorithm trained on any single type of data.
Availability: Supplementary data at http://noble.gs.washington.edu/proj/sdp-svm
Contact: noble{at}gs.washington.edu
* To whom correspondence should be addressed at: Health Sciences Center, Box 357730, 1705 NE Pacific Street, Seattle, WA 98195, USA.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
P. J. Bickel, J. B. Brown, H. Huang, and Q. Li An overview of recent developments in genomics and associated statistical methods Phil Trans R Soc A, November 13, 2009; 367(1906): 4313 - 4337. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Huttenhower, M. A. Hibbs, C. L. Myers, A. A. Caudy, D. C. Hess, and O. G. Troyanskaya The impact of incomplete knowledge on evaluation: an experimental benchmark for protein function prediction Bioinformatics, September 15, 2009; 25(18): 2404 - 2410. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Y. Yip and M. Gerstein Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions Bioinformatics, January 15, 2009; 25(2): 243 - 250. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Jacob and J.-P. Vert Protein-ligand interaction prediction: an improved chemogenomics approach Bioinformatics, October 1, 2008; 24(19): 2149 - 2156. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Ji, L. Sun, R. Jin, S. Kumar, and J. Ye Automated annotation of Drosophila gene expression patterns using a controlled vocabulary Bioinformatics, September 1, 2008; 24(17): 1881 - 1888. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Mordelet and J.-P. Vert SIRENE: supervised inference of regulatory networks Bioinformatics, August 15, 2008; 24(16): i76 - i82. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Damoulas and M. A. Girolami Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection Bioinformatics, May 15, 2008; 24(10): 1264 - 1270. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. V. Tetko, I. V. Rodchenkov, M. C. Walter, T. Rattei, and H.-W. Mewes Beyond the 'best' match: machine learning annotation of protein sequences by integration of different sources of information Bioinformatics, March 1, 2008; 24(5): 621 - 628. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Shin, A. M. Lisewski, and O. Lichtarge Graph sharpening plus graph integration: a synergy that improves protein functional classification Bioinformatics, December 1, 2007; 23(23): 3217 - 3224. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Bleakley, G. Biau, and J.-P. Vert Supervised reconstruction of biological networks with local models Bioinformatics, July 1, 2007; 23(13): i57 - i65. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. De Bie, L.-C. Tranchevent, L. M. M. van Oeffelen, and Y. Moreau Kernel-based data fusion for gene prioritization Bioinformatics, July 1, 2007; 23(13): i125 - i132. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Yamanishi, F. Bach, and J.-P. Vert Glycan classification with tree kernels Bioinformatics, May 15, 2007; 23(10): 1211 - 1216. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Qiu, M. Hue, A. Ben-Hur, J.-P. Vert, and W. S. Noble A structural alignment kernel for protein structures Bioinformatics, May 1, 2007; 23(9): 1090 - 1098. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Barutcuoglu, R. E. Schapire, and O. G. Troyanskaya Hierarchical multi-label prediction of gene function Bioinformatics, April 1, 2006; 22(7): 830 - 836. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kato, K. Tsuda, and K. Asai Selective integration of multiple biological data for supervised network inference Bioinformatics, May 15, 2005; 21(10): 2488 - 2495. [Abstract] [Full Text] [PDF] |
||||

