Bioinformatics Advance Access originally published online on January 29, 2004
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics 20(6) © Oxford University Press 2004; all rights reserved.
ESPD: a pattern detection model underlying gene expression profiles
1 Department of Computer Science and Engineering and 2 Department of Pharmaceutical Sciences, State University of New York at Buffalo, Buffalo, NY 14260, USA
Received on January 3, 2003
; revised on August 9, 2003
; accepted on October 16, 2003
Advance Access Publication January 29, 2004
Motivation: DNA arrays permit rapid, large-scale screening for patterns of gene expression and simultaneously yield the expression levels of thousands of genes for samples. The number of samples is usually limited, and such datasets are very sparse in high-dimensional gene space. Furthermore, most of the genes collected may not necessarily be of interest and uncertainty about which genes are relevant makes it difficult to construct an informative gene space. Unsupervised empirical sample pattern discovery and informative genes identification of such sparse high-dimensional datasets present interesting but challenging problems.
Results: A new model called empirical sample pattern detection (ESPD) is proposed to delineate pattern quality with informative genes. By integrating statistical metrics, data mining and machine learning techniques, this model dynamically measures and manipulates the relationship between samples and genes while conducting an iterative detection of informative space and the empirical pattern. The performance of the proposed method with various array datasets is illustrated.
Availability: Software code is available by request from the first author. All programs were written in MATLAB.
Contact: chuntang{at}cse.buffalo.edu