Bioinformatics Advance Access originally published online on June 29, 2006
Bioinformatics 2006 22(16):2028-2036; doi:10.1093/bioinformatics/btl344
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Pathway analysis using random forests classification and regression
1 Division of Biostatistics, Department of Epidemiology and Public Health, Yale University School of Medicine New Haven, CT 06520, USA
2 W. M. Keck Biotechnology Resource Laboratory, Yale University School of Medicine New Haven, CT 06520, USA
3 Boyer Center for Molecular Medicine, Yale University School of Medicine New Haven, CT 06520, USA
4 Department of Genetics, Yale University School of Medicine New Haven, CT 06520, USA
5 Pfizer Groton Laboratories, Safety Sciences Groton, CT 06340, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Although numerous methods have been developed to better capture biological information from microarray data, commonly used single gene-based methods neglect interactions among genes and leave room for other novel approaches. For example, most classification and regression methods for microarray data are based on the whole set of genes and have not made use of pathway information. Pathway-based analysis in microarray studies may lead to more informative and relevant knowledge for biological researchers.
Results: In this paper, we describe a pathway-based classification and regression method using Random Forests to analyze gene expression data. The proposed methods allow researchers to rank important pathways from externally available databases, discover important genes, find pathway-based outlying cases and make full use of a continuous outcome variable in the regression setting. We also compared Random Forests with other machine learning methods using several datasets and found that Random Forests classification error rates were either the lowest or the second-lowest. By combining pathway information and novel statistical methods, this procedure represents a promising computational strategy in dissecting pathways and can provide biological insight into the study of microarray data.
Availability: Source code written in R is available from http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm
Contact: hongyu.zhao{at}yale.edu
Supplementary Information: Supplementary Data are available at http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm
Associate Editor: John Quackenbush
Received on April 8, 2006; revised on June 5, 2006; accepted on June 20, 2006
This article has been cited by other articles:
![]() |
J. S. Chang, R.-F. Yeh, J. K. Wiencke, J. L. Wiemels, I. Smirnov, A. R. Pico, T. Tihan, J. Patoka, R. Miike, J. D. Sison, et al. Pathway Analysis of Single-Nucleotide Polymorphisms Potentially Associated with Glioblastoma Multiforme Susceptibility Using Random Forests Cancer Epidemiol. Biomarkers Prev., June 1, 2008; 17(6): 1368 - 1373. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Tai and W. Pan Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data Bioinformatics, December 1, 2007; 23(23): 3170 - 3177. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Tai and W. Pan Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms Bioinformatics, July 15, 2007; 23(14): 1775 - 1782. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Goffard and G. Weiller PathExpress: a web-based tool to identify relevant pathways in gene expression data Nucleic Acids Res., July 13, 2007; 35(suppl_2): W176 - W181. [Abstract] [Full Text] [PDF] |
||||


