Bioinformatics Advance Access originally published online on December 16, 2008
Bioinformatics 2009 25(3):331-337; doi:10.1093/bioinformatics/btn644
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A genetic programming-based approach to the classification of multiclass microarray datasets


1School of Software, Xiamen University, Xiamen, Fujian, 361005, china, 2Intelligent Computing Lab, Hefei Institute of Intelligent Machines, Chinese Academy of Sciences, P.O. Box 1130, Hefei, Anhui, 230031, China, 3Department of Automation and 4School of Life Science, University of Science and Technology of China, Hefei, Anhui 230026, China
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Feature selection approaches have been widely applied to deal with the small sample size problem in the analysis of micro-array datasets. For the multiclass problem, the proposed methods are based on the idea of selecting a gene subset to distinguish all classes. However, it will be more effective to solve a multiclass problem by splitting it into a set of two-class problems and solving each problem with a respective classification system.
Results: We propose a genetic programming (GP)-based approach to analyze multiclass microarray datasets. Unlike the traditional GP, the individual proposed in this article consists of a set of small-scale ensembles, named as sub-ensemble (denoted by SE). Each SE consists of a set of trees. In application, a multiclass problem is divided into a set of two-class problems, each of which is tackled by a SE first. The SEs tackling the respective two-class problems are combined to construct a GP individual, so each individual can deal with a multiclass problem directly. Effective methods are proposed to solve the problems arising in the fusion of SEs, and a greedy algorithm is designed to keep high diversity in SEs. This GP is tested in five datasets. The results show that the proposed method effectively implements the feature selection and classification tasks.
Contact: lkhqz{at}163.com; khliu1977{at}gmail.com
Supplementary information: Supplementary data are available at Bioinformatics online.
The authors wish it to be known that, in their opinion, they should be regarded as joint First Authors.
Received on March 23, 2008; revised on November 14, 2008; accepted on December 11, 2008