Bioinformatics Advance Access published online on April 26, 2007
Bioinformatics, doi:10.1093/bioinformatics/btm125
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Group SCAD Regression Analysis for Microarray Time Course Gene Expression Data
aDepartment of Biostatistics and Epidemiology, b Department of Bioengineering and cGenomics and Computational Biology Graduate Group, University of Pennsylvania School of Medicine, Philadelphia, PA 19104
To whom correspondence should be addressed. Hongzhe Li, E-mail: hongzhe{at}mail.med.upenn.edu
| Abstract |
|---|
Motivation: Since many important biological systems or processes are dynamic systems, it is important to study the gene expression patterns over time in a genomic scale in order to capture the dynamic behavior of gene expression. Microarray technologies have made it possible to measure the gene expression levels of essentially all the genes during a given biological process. In order to determine the transcriptional factors involved in gene regulation during a given biological process, we propose to develop a functional response model with varying coefficients in order to model the transcriptional effects on gene expression levels and to develop a group smoothly clipped absolute deviation (SCAD) regression procedure for selecting the transcriptional factors with varying coefficients that are involved in gene regulation during a biological process.
Results: Simulation studies indicated that such a procedure is quite effective in selecting the relevant variables with time-varying coefficients and in estimating the coefficients. Application to the yeast cell cycle microarray time course gene expression data set identified 19 of the 21 known transcriptional factors related to the cell cycle process. In addition, we have identified another 52 TFs that also have periodic transcriptional effects on gene expression during the cell cycle process. Compared to simple linear regression analysis at each time point, our procedure identified more known cell cycle related transcriptional factors.
Conclusions: The proposed group SCAD regression procedure is very effective for identifying variables with time-varying coefficients, in particular, for identifying the transcriptional factors that are related to gene expression over time. By identifying the transcriptional factors that are related to gene expression variations over time, the procedure can potentially provide more insight into the gene regulatory networks.
Supplementary Information: http://www.cceb.med.upenn.edu/~hli/gSCAD-Appendix.pdf.
Associate Editor: Prof. John Quackenbush
Received on January 4, 2007; revised on March 1, 2007; accepted on March 23, 2007
This article has been cited by other articles:
![]() |
S. Ma and J. Huang Penalized feature selection and classification in bioinformatics Brief Bioinform, September 1, 2008; 9(5): 392 - 403. [Abstract] [Full Text] [PDF] |
||||
