Skip Navigation



Bioinformatics Advance Access published online on June 29, 2006

Bioinformatics, doi:10.1093/bioinformatics/btl344
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
22/16/2028    most recent
btl344v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Pang, H.
Right arrow Articles by Zhao, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pang, H.
Right arrow Articles by Zhao, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2006). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
Received April 8, 2006
Revised June 5, 2006
Accepted June 20, 2006

Article

Pathway analysis using random forests classification and regression

Herbert Pang 1, Aiping Lin 2, Matthew Holford 1, Bradley E. Enerson 3, Bin Lu 4, Michael P. Lawton 4, Eugenia Floyd 4, and Hongyu Zhao 5 *

1 Division of Biostatistics, Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT, 06520 USA
2 W. M. Keck Biotechnology Resource Laboratory, Yale University School of Medicine, New Haven, CT, 06520 USA
3 Boyer Center for Molecular Medicine, Yale University School of Medicine, New Haven, CT, 06520 USA
4 Pfizer Groton Laboratories, Safety Sciences, Groton, CT, 06340 USA
5 Division of Biostatistics, Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT, 06520 USA; W. M. Keck Biotechnology Resource Laboratory, Yale University School of Medicine, New Haven, CT, 06520 USA; Department of Genetics, Yale University School of Medicine, New Haven, CT, 06520 USA

* To whom correspondence should be addressed.
Hongyu Zhao, E-mail: hongyu.zhao{at}yale.edu


   Abstract

Motivation: Although numerous methods have been developed to better capture biological information from microarray data, commonly used single gene based methods neglect interactions among genes and leave room for other novel approaches. For example, most classification and regression methods for microarray data are based on the whole set of genes and have not made use of pathway information. Pathway-based analysis in microarray studies may lead to more informative and relevant knowledge for biological researchers.

Results: In this paper, we describe a pathway-based classification and regression method using Random Forests to analyze gene expression data. The proposed methods allow researchers to rank important pathways from externally available databases, discover important genes, find pathway-based outlying cases and make full use of a continuous outcome variable in the regression setting. We also compared Random Forests with other machine learning methods using several datasets and found that Random Forests classification error rates was either the lowest or the second-lowest. By combining pathway information and novel statistical methods, this procedure represents a promising computational strategy in dissecting pathways and can provide biological insight into the study of microarray data.

Availability: Source code written in R is available from this URL: http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm.


Associate Editor: John Quackenbush
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
K. K. Nicodemus and J. D. Malley
Predictor correlation impacts machine learning algorithms: implications for genomic studies
Bioinformatics, August 1, 2009; 25(15): 1884 - 1890.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
J. S. Chang, R.-F. Yeh, J. K. Wiencke, J. L. Wiemels, I. Smirnov, A. R. Pico, T. Tihan, J. Patoka, R. Miike, J. D. Sison, et al.
Pathway Analysis of Single-Nucleotide Polymorphisms Potentially Associated with Glioblastoma Multiforme Susceptibility Using Random Forests
Cancer Epidemiol. Biomarkers Prev., June 1, 2008; 17(6): 1368 - 1373.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
F. Tai and W. Pan
Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data
Bioinformatics, December 1, 2007; 23(23): 3170 - 3177.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
F. Tai and W. Pan
Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms
Bioinformatics, July 15, 2007; 23(14): 1775 - 1782.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. Goffard and G. Weiller
PathExpress: a web-based tool to identify relevant pathways in gene expression data
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W176 - W181.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.