Bioinformatics Advance Access originally published online on August 20, 2009
Bioinformatics 2009 25(22):2929-2936; doi:10.1093/bioinformatics/btp485
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A boosting approach to structure learning of graphs with and without prior knowledge
1Medical Research Council, Harwell, UK, 2The Institute of Statistical Mathematics, Tokyo, Japan and 3Department of Statistics, University of Oxford, Oxford, UK
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Identifying the network structure through which genes and their products interact can help to elucidate normal cell physiology as well as the genetic architecture of pathological phenotypes. Recently, a number of gene network inference tools have appeared based on Gaussian graphical model representations. Following this, we introduce a novel Boosting approach to learn the structure of a high-dimensional Gaussian graphical model motivated by the applications in genomics. A particular emphasis is paid to the inclusion of partial prior knowledge on the structure of the graph. With the increasing availability of pathway information and large-scale gene expression datasets, we believe that conditioning on prior knowledge will be an important aspect in raising the statistical power of structural learning algorithms to infer true conditional dependencies.
Results: Our Boosting approach, termed BoostiGraph, is conceptually and algorithmically simple. It complements recent work on the network inference problem based on Lasso-type approaches. BoostiGraph is computationally cheap and is applicable to very high-dimensional graphs. For example, on graphs of order 5000 nodes, it is able to map out paths for the conditional independence structure in few minutes. Using computer simulations, we investigate the ability of our method with and without prior information to infer Gaussian graphical models from artificial as well as actual microarray datasets. The experimental results demonstrate that, using our method, it is possible to recover the true network topology with relatively high accuracy.
Availability: This method and all other associated files are freely available from http://www.stats.ox.ac.uk/
anjum/.
Contact: s.anjum{at}har.mrc.ac.uk; cholmes{at}stats.ox.ac.uk
Supplementary information: Supplementary data are available at Bioinfomatics online.
Associate Editor: Olga Troyanskaya
Received on April 7, 2009; revised on July 24, 2009; accepted on August 9, 2009