Bioinformatics Advance Access published online on December 13, 2005
Bioinformatics, doi:10.1093/bioinformatics/bti827
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building, MMC 303, Minneapolis, MN, 55455, USA
* To whom correspondence should be addressed.
Differential gene expression detection and sample classification using microarray data have received much research interest recently. Due to the large number of genes p and small number of samples n (p >> n), microarray data analysis poses big challenges for statistical analysis. An obvious problem due to the "large p small n" (West, 2003) is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes which can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences (Donoho and Johnstone, 1994). Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. (2001) and the "nearest shrunken centroid" proposed by Tibshirani et al. (2002) are ad-hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Wu (2005) proposed the penalized t/F-statistics with shrinkage by formally using the L1 penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed in Wu (2005) to multi-class microarray data. We formally derive the ad-hoc shrunken centroid used in Tibshirani et al. (2003) using the L1 penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection. For supplementary information including the computer programs, detailed analysis results, and R functions for the proposed methods, please see http://www.biostat.umn.edu/~baolin/research/L1C-mc.html.
Received October 11, 2005
Revised December 6, 2005
Accepted December 7, 2005
Article
Differential gene expression detection and sample classification using penalized linear regression models
Baolin Wu 1 *
Baolin Wu, E-mail: baolin{at}biostat.umn.edu
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
Y. Lai Genome-wide co-expression based prediction of differential expressions Bioinformatics, March 1, 2008; 24(5): 666 - 673. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Abeel, Y. Saeys, E. Bonnet, P. Rouze, and Y. Van de Peer Generic eukaryotic core promoter prediction using structural features of DNA Genome Res., February 1, 2008; 18(2): 310 - 323. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Tai and W. Pan Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data Bioinformatics, December 1, 2007; 23(23): 3170 - 3177. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wang and J. Zhu Improved centroids estimation for the nearest shrunken centroid classifier Bioinformatics, April 15, 2007; 23(8): 972 - 979. [Abstract] [Full Text] [PDF] |
||||

