Skip Navigation



Bioinformatics Advance Access published online on May 6, 2004

Bioinformatics, doi:10.1093/bioinformatics/bth294
Bioinformatics © Oxford University Press 2004; all rights reserved
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
20/16/2626    most recent
bth294v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lanckriet, G. R. G.
Right arrow Articles by Noble, W. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lanckriet, G. R. G.
Right arrow Articles by Noble, W. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Received January 29, 2004
Revised April 7, 2004
Accepted April 23, 2004

Article

A statistical framework for genomic data fusion

Gert R. G. Lanckriet 1, Tijl De Bie 2, Nello Cristianini 3, Michael I. Jordan 4, William Stafford Noble 5*

1 Department of Electrical Engineering and Computer Science, University of California, Berkeley
2 Department of Electrical Engineering, ESAT-SCD, Katholieke Universiteit Leuven, Belgium
3 Department of Statistics, University of California, Davis
4 Division of Computer Science, Department of Statistics, University of California, Berkeley
5 Department of Genome Sciences, University of Washington

* To whom correspondence should be addressed. E-mail: noble{at}gs.washington.edu.


   Abstract

Motivation: During the past decade, the new focus on genomics has highlighted a particular challenge: to integrate the different views of the genome that are provided by various types of experimental data.

Results: This paper describes a computational framework for integrating and drawing inferences from a collection of genome-wide measurements. Each data set is represented via a kernel function, which defines generalized similarity relationships between pairs of entities, such as genes or proteins. The kernel representation is both flexible and efficient, and can be applied to many different types of data. Furthermore, kernel functions derived from different types of data can be combined in a straightforward fashion. Recent advances in the theory of kernel methods have provided efficient algorithms to perform such combinations in a way that minimizes a statistical loss function. These methods exploit semidefinite programming techniques to reduce the problem of finding optimizing kernel combinations to a convex optimization problem. Computational experiments performed using yeast genome-wide data sets, including amino acid sequences, hydropathy profiles, gene expression data and known protein-protein interactions, demonstrate the utility of this approach. A statistical learning algorithm trained from all of these data to recognize particular classes of proteins--membrane proteins and ribosomal proteins--performs significantly better than the same algorithm trained on any single type of data.

Availability: Supplementary data at http://noble.gs.washington.edu/proj/sdp-svm.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
N. Beagley, K. G. Stratton, and B.-J. M. Webb-Robertson
VIBE 2.0: Visual Integration for Bayesian Evaluation
Bioinformatics, January 15, 2010; 26(2): 280 - 282.
[Abstract] [Full Text] [PDF]


Home page
Phil Trans R Soc AHome page
P. J. Bickel, J. B. Brown, H. Huang, and Q. Li
An overview of recent developments in genomics and associated statistical methods
Phil Trans R Soc A, November 13, 2009; 367(1906): 4313 - 4337.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. Huttenhower, M. A. Hibbs, C. L. Myers, A. A. Caudy, D. C. Hess, and O. G. Troyanskaya
The impact of incomplete knowledge on evaluation: an experimental benchmark for protein function prediction
Bioinformatics, September 15, 2009; 25(18): 2404 - 2410.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Y. Yip and M. Gerstein
Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions
Bioinformatics, January 15, 2009; 25(2): 243 - 250.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. Jacob and J.-P. Vert
Protein-ligand interaction prediction: an improved chemogenomics approach
Bioinformatics, October 1, 2008; 24(19): 2149 - 2156.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Ji, L. Sun, R. Jin, S. Kumar, and J. Ye
Automated annotation of Drosophila gene expression patterns using a controlled vocabulary
Bioinformatics, September 1, 2008; 24(17): 1881 - 1888.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
F. Mordelet and J.-P. Vert
SIRENE: supervised inference of regulatory networks
Bioinformatics, August 15, 2008; 24(16): i76 - i82.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Damoulas and M. A. Girolami
Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection
Bioinformatics, May 15, 2008; 24(10): 1264 - 1270.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
I. V. Tetko, I. V. Rodchenkov, M. C. Walter, T. Rattei, and H.-W. Mewes
Beyond the 'best' match: machine learning annotation of protein sequences by integration of different sources of information
Bioinformatics, March 1, 2008; 24(5): 621 - 628.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
H. Shin, A. M. Lisewski, and O. Lichtarge
Graph sharpening plus graph integration: a synergy that improves protein functional classification
Bioinformatics, December 1, 2007; 23(23): 3217 - 3224.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Bleakley, G. Biau, and J.-P. Vert
Supervised reconstruction of biological networks with local models
Bioinformatics, July 1, 2007; 23(13): i57 - i65.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. De Bie, L.-C. Tranchevent, L. M. M. van Oeffelen, and Y. Moreau
Kernel-based data fusion for gene prioritization
Bioinformatics, July 1, 2007; 23(13): i125 - i132.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Yamanishi, F. Bach, and J.-P. Vert
Glycan classification with tree kernels
Bioinformatics, May 15, 2007; 23(10): 1211 - 1216.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Qiu, M. Hue, A. Ben-Hur, J.-P. Vert, and W. S. Noble
A structural alignment kernel for protein structures
Bioinformatics, May 1, 2007; 23(9): 1090 - 1098.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z. Barutcuoglu, R. E. Schapire, and O. G. Troyanskaya
Hierarchical multi-label prediction of gene function
Bioinformatics, April 1, 2006; 22(7): 830 - 836.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Kato, K. Tsuda, and K. Asai
Selective integration of multiple biological data for supervised network inference
Bioinformatics, May 15, 2005; 21(10): 2488 - 2495.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.