Bioinformatics Advance Access published online on December 16, 2008
Bioinformatics, doi:10.1093/bioinformatics/btn644
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A Genetic Programming Based Approach to the Classification of Multiclass Microarray Datasets


1Intelligent Computing Lab, Hefei Institute of Intelligent Machines, Chinese Academy of Sciences, P.O. Box 1130, Hefei, Anhui, 230031, China;
2School of Software, Xiamen University, Xiamen, Fujian, 361005, China.
3Department of Automation, University of Science and Technology of China, Hefei, Anhui, 230026, China.
4School of Life Science, University of Science and Technology of China, Hefei, Anhui, 230026, China.
*To whom correspondence should be addressed. Dr. Kun-Hong Liu, E-mail: lkhqz{at}163.com, khliu1977{at}gmail.com
| Abstract |
|---|
Motivation: Feature selection approaches have been widely applied to deal with the small sample size problem in the analysis of microarray datasets. For the multiclass problem, the proposed methods are based on the idea of selecting a gene subset to distinguish all classes. However, it will be more effective to solve a multiclass problem by splitting it into a set of two-class problems and solving each problem with a respective classification system,
Results: We propose a genetic programming (GP) based approach to analyze multiclass microarray datasets. Unlike the traditional GP, the individual proposed in this paper consists of a set of small-scale ensembles, named as sub-ensemble (denoted by SE). Each SE consists of a set of trees. In application, a multiclass problem is divided into a set of two-class problems, each of which is tackled by a SE firstly. The SEs tackling the respective two-class problems are combined to construct a GP individual, so each individual can deal with a multiclass problem directly. Effective methods are proposed to solve the problems arising in the fusion of SEs, and a greedy algorithm is designed to keep high diversity in SEs. This GP is tested in five datasets. The results show that the proposed method effectively implements the feature selection and classification tasks.
These authors contributed equally to this work.
Associate Editor: Prof. David Rocke
Received on March 23, 2008; revised on November 14, 2008; accepted on December 11, 2008