Bioinformatics Advance Access published online on November 28, 2007
Bioinformatics, doi:10.1093/bioinformatics/btm486
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Monte Carlo feature selection for supervised classification
Drami
ski aaInstitute of Computer Science, Polish Acad. Sci, Ordona 21, PL-01-237 Warsaw, Poland, bDepartment of Genetics and Pathology, Rudbeck Laboratory, Uppsala University, c The Linnaeus Centre for Bioinformatics, Uppsala University and The Swedish University for Agricultural Sciences, Box 758, SE-751 24 Uppsala, Sweden, d Interdisciplinary Centre for Mathematical and Computer Modelling, Warsaw University, Poland
2To whom correspondence should be addressed. Prof. Jan Komorowski, E-mail: Jan.Komorowski{at}lcb.uu.se
| Abstract |
|---|
Motivation: Pre-selection of informative features for supervised classification is a crucial, albeit delicate, task. It is desirable that feature selection provides the features which contribute most to the classification task per se and which should therefore be used by any classifier later used to produce classification rules. In this paper, a conceptually simple but computer-intensive approach to this task is proposed. The reliability of the approach rests on multiple construction of a tree classifier for many training sets randomly chosen from the original sample set, where samples in each training set consist of only a fraction of all of the observed features.
Results: The resulting ranking of features may then be used to advantage for classification via a classifier of any type. The approach was validated using Golub et al. leukemia data and the Alizadeh et al. lymphoma data. Not surprisingly, we obtained a significantly different list of genes. Biological interpretation of the genes selected by our method showed that several of them are involved in precursors to different types of leukemia and lymphoma rather than being genes that are common to several forms of cancers, which is the case for the other methods.
Availability: Prototype available upon request.
Contact: Jan.Komorowski{at}lcb.uu.se
Associate Editor: Dr. Joaquin Dopazo
1these authors contributed equally
Received on December 13, 2006; revised on August 28, 2007; accepted on September 25, 2007
This article has been cited by other articles:
![]() |
B. M. King and B. Tidor MIST: Maximum Information Spanning Trees for dimension reduction of biological data sets Bioinformatics, May 1, 2009; 25(9): 1165 - 1172. [Abstract] [Full Text] [PDF] |
||||
