Skip Navigation


Bioinformatics Advance Access originally published online on May 6, 2004
Bioinformatics 2004 20(16):2626-2635; doi:10.1093/bioinformatics/bth294
This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow All Versions of this Article:
20/16/2626    most recent
bth294v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (49)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lanckriet, G. R. G.
Right arrow Articles by Noble, W. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lanckriet, G. R. G.
Right arrow Articles by Noble, W. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics vol. 20 issue 16 © Oxford University Press 2004; all rights reserved.

A statistical framework for genomic data fusion

Gert R. G. Lanckriet 1, Tijl De Bie 3, Nello Cristianini 4, Michael I. Jordan 2 and William Stafford Noble 5,*

1 Department of Electrical Engineering and Computer Science, 2 Division of Computer Science, Department of Statistics, University of California, Berkeley 94720, USA, 3 Department of Electrical Engineering, ESAT-SCD, Katholieke Universiteit Leuven 3001, Belgium, 4 Department of Statistics, University of California, Davis 95618, USA and 5 Department of Genome Sciences, University of Washington, Seattle 98195, USA

Received on January 29, 2004; revised on April 7, 2004; accepted on April 23, 2004
Advance Access Publication May 6, 2004

Motivation: During the past decade, the new focus on genomics has highlighted a particular challenge: to integrate the different views of the genome that are provided by various types of experimental data.

Results: This paper describes a computational framework for integrating and drawing inferences from a collection of genome-wide measurements. Each dataset is represented via a kernel function, which defines generalized similarity relationships between pairs of entities, such as genes or proteins. The kernel representation is both flexible and efficient, and can be applied to many different types of data. Furthermore, kernel functions derived from different types of data can be combined in a straightforward fashion. Recent advances in the theory of kernel methods have provided efficient algorithms to perform such combinations in a way that minimizes a statistical loss function. These methods exploit semidefinite programming techniques to reduce the problem of finding optimizing kernel combinations to a convex optimization problem. Computational experiments performed using yeast genome-wide datasets, including amino acid sequences, hydropathy profiles, gene expression data and known protein–protein interactions, demonstrate the utility of this approach. A statistical learning algorithm trained from all of these data to recognize particular classes of proteins—membrane proteins and ribosomal proteins—performs significantly better than the same algorithm trained on any single type of data.

Availability: Supplementary data at http://noble.gs.washington.edu/proj/sdp-svm

Contact: noble{at}gs.washington.edu

* To whom correspondence should be addressed at: Health Sciences Center, Box 357730, 1705 NE Pacific Street, Seattle, WA 98195, USA.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Phil Trans R Soc AHome page
P. J. Bickel, J. B. Brown, H. Huang, and Q. Li
An overview of recent developments in genomics and associated statistical methods
Phil Trans R Soc A, November 13, 2009; 367(1906): 4313 - 4337.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. Huttenhower, M. A. Hibbs, C. L. Myers, A. A. Caudy, D. C. Hess, and O. G. Troyanskaya
The impact of incomplete knowledge on evaluation: an experimental benchmark for protein function prediction
Bioinformatics, September 15, 2009; 25(18): 2404 - 2410.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Y. Yip and M. Gerstein
Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions
Bioinformatics, January 15, 2009; 25(2): 243 - 250.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. Jacob and J.-P. Vert
Protein-ligand interaction prediction: an improved chemogenomics approach
Bioinformatics, October 1, 2008; 24(19): 2149 - 2156.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Ji, L. Sun, R. Jin, S. Kumar, and J. Ye
Automated annotation of Drosophila gene expression patterns using a controlled vocabulary
Bioinformatics, September 1, 2008; 24(17): 1881 - 1888.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
F. Mordelet and J.-P. Vert
SIRENE: supervised inference of regulatory networks
Bioinformatics, August 15, 2008; 24(16): i76 - i82.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Damoulas and M. A. Girolami
Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection
Bioinformatics, May 15, 2008; 24(10): 1264 - 1270.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
I. V. Tetko, I. V. Rodchenkov, M. C. Walter, T. Rattei, and H.-W. Mewes
Beyond the 'best' match: machine learning annotation of protein sequences by integration of different sources of information
Bioinformatics, March 1, 2008; 24(5): 621 - 628.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
H. Shin, A. M. Lisewski, and O. Lichtarge
Graph sharpening plus graph integration: a synergy that improves protein functional classification
Bioinformatics, December 1, 2007; 23(23): 3217 - 3224.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Bleakley, G. Biau, and J.-P. Vert
Supervised reconstruction of biological networks with local models
Bioinformatics, July 1, 2007; 23(13): i57 - i65.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. De Bie, L.-C. Tranchevent, L. M. M. van Oeffelen, and Y. Moreau
Kernel-based data fusion for gene prioritization
Bioinformatics, July 1, 2007; 23(13): i125 - i132.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Yamanishi, F. Bach, and J.-P. Vert
Glycan classification with tree kernels
Bioinformatics, May 15, 2007; 23(10): 1211 - 1216.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Qiu, M. Hue, A. Ben-Hur, J.-P. Vert, and W. S. Noble
A structural alignment kernel for protein structures
Bioinformatics, May 1, 2007; 23(9): 1090 - 1098.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z. Barutcuoglu, R. E. Schapire, and O. G. Troyanskaya
Hierarchical multi-label prediction of gene function
Bioinformatics, April 1, 2006; 22(7): 830 - 836.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Kato, K. Tsuda, and K. Asai
Selective integration of multiple biological data for supervised network inference
Bioinformatics, May 15, 2005; 21(10): 2488 - 2495.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.