Bioinformatics Advance Access published online on August 22, 2006
Bioinformatics, doi:10.1093/bioinformatics/btl442
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, USA
* To whom correspondence should be addressed.
Motivation: The nearest shrunken centroids classifier (Tibshirani et al., 2002) has become a popular algorithm in tumor classification problems using gene expression microarray data. Feature selection is an embedded part of the method to select top-ranking genes based on a univariate distance statistic calculated for each gene individually. The univariate statistics summarize gene expression profiles outside of the gene co-regulation network context, leading to redundant information being included in the selection procedure. Results: We propose an Eigengene based Linear Discriminant Analysis (ELDA) to address gene selection in a multivariate framework. The algorithm uses a modified rotated Spectral Decomposition (SpD) technique to select hub genes that associate with the most important eigenvectors. Using three benchmark cancer microarray data sets, we show that ELDA selects the most characteristic genes, leading to substantially smaller classifiers than the univariate feature selection based analogues. The resulting de-correlated expression profiles make the gene-wise independence assumption more realistic and applicable for the shrunken centroids classifier and other diagonal linear discriminant type of models. Our algorithm further incorporates a misclassification cost matrix, allowing differential penalization of one type of error over another. In the breast cancer data, we show false negative prognosis can be controlled via a cost-adjusted discriminant function. Availability: R code for the ELDA algorithm is available from author upon request.
Received May 2, 2006
Revised July 20, 2006
Accepted August 14, 2006
Article
Eigengene based linear discriminant model for tumor classification using gene expression microarray data
Ronglai Shen 1, Debashis Ghosh 1, Arul Chinnaiyan 2, and Zhaoling Meng 3 *
2 Department of Pathology and Urology, University of Michigan, Ann Arbor, MI 48109-0602, USA
3 Biostatistics and Programming, Sanofi aventis, PO Box 6800, Bridgewater, NJ 08807-0800, USA
Zhaoling Meng, E-mail: zhaoling.meng{at}sanofi-aventis.com
![]()
Abstract
Associate Editor: Satoru Miyano
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Baek, C.-A. Tsai, and J. J. Chen Development of biomarker classifiers from high-dimensional data Brief Bioinform, September 1, 2009; 10(5): 537 - 546. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wang and J. Zhu Improved centroids estimation for the nearest shrunken centroid classifier Bioinformatics, April 15, 2007; 23(8): 972 - 979. [Abstract] [Full Text] [PDF] |
||||

