Bioinformatics Advance Access published online on December 7, 2004
Bioinformatics, doi:10.1093/bioinformatics/bti192
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Computer Science, Dartmouth College, Hanover, NH 03755
* To whom correspondence should be addressed.
Motivation: Recent studies have shown that microarray gene expression data is useful for phenotype classification of many diseases. In this classification problem, the number of features (genes) greatly exceeds the number of instances (tissue samples). It has been shown that selecting a small set of informative genes can lead to improved classification accuracy. Many approaches have been proposed for this gene selection problem. Most of the previous gene ranking methods typically select 50-200 top-ranked genes, and these genes are often highly correlated. Our goal is to select a small set of non-redundant marker genes that are most relevant for the classification task. Results: To achieve this goal, we developed a novel hybrid approach that combines gene ranking and clustering analysis. In this approach, we first apply feature filtering algorithms to select a set of top-ranked genes, and then apply hierarchical clustering on these genes to generate a dendrogram. Finally, the dendrogram is analyzed by a sweep-line algorithm, and marker genes are selected by collapsing dense clusters. Empirical study using three public data sets shows that our approach is capable of selecting relatively few marker genes while offering the same or better leave-one-out cross-validation accuracy compared to approaches that use top-ranked genes directly for classification. Availability: The HykGene software is freely available at http://www.cs.dartmouth.edu/~wyh/software.htm. Supplementary Information: Supplementary material is available from http://www.cs.dartmouth.edu/~wyh/hykgene/supplement/index.htm.
Received August 15, 2004
Revised November 24, 2004
Accepted November 25, 2004
Article
HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data
2 Department of Medicine, Dartmouth-Hitchcock Medical Center, Dartmouth Medical School, Hanover, NH 03755; Department of Radiology, Dartmouth-Hitchcock Medical Center, Dartmouth Medical School, Hanover, NH 03755
Yuhang Wang, E-mail: wyh{at}cs.dartmouth.edu
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. W. Chittenden, J. A. Sherman, F. Xiong, A. E. Hall, A. A. Lanahan, J. M. Taylor, H. Duan, J. D. Pearlman, J. H. Moore, S. M. Schwartz, et al. Transcriptional Profiling in Coronary Artery Disease: Indications for Novel Markers of Coronary Collateralization Circulation, October 24, 2006; 114(17): 1811 - 1820. [Abstract] [Full Text] [PDF] |
||||
