Skip Navigation


Bioinformatics Advance Access originally published online on January 20, 2006
Bioinformatics 2006 22(7):837-842; doi:10.1093/bioinformatics/btl008
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/7/837    most recent
btl008v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Choudhary, A.
Right arrow Articles by Dougherty, E. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Choudhary, A.
Right arrow Articles by Dougherty, E. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Genetic test bed for feature selection

Ashish Choudhary 1, Marcel Brun 2, Jianping Hua 2, James Lowey 2, Ed Suh 2 and Edward R. Dougherty 1,2,*

1Department of Electrical and Computer Engineering, Texas A&M University College Station, TX 77843, USA
2TGen 445 North Fifth Street, Phoenix, AZ 85004, USA

*To whom correspondence should be addressed.

Motivation: Given a large set of potential features, such as the set of all gene-expression values from a microarray, it is necessary to find a small subset with which to classify. The task of finding an optimal feature set of a given size is inherently combinatoric because to assure optimality all feature sets of a given size must be checked. Thus, numerous suboptimal feature-selection algorithms have been proposed. There are strong impediments to evaluate feature-selection algorithms using real data when data are limited, a common situation in genetic classification. The difficulty is compound. First, there are no class-conditional distributions from which to draw data points, only a single small labeled sample. Second, there are no test data with which to estimate the feature-set errors, and one must depend on a training-data-based error estimator. Finally, there is no optimal feature set with which to compare the feature sets found by the algorithms.

Results: This paper describes a genetic test bed for the evaluation of feature-selection algorithms. It begins with a large biological feature-label dataset that is used as an empirical distribution and, using massively parallel computation, finds the top feature sets of various sizes based on a given sample size and classification rule. The user can draw random samples from the data, apply a proposed algorithm, and evaluate the proficiency of the proposed algorithm via three different measures (code provided). A key feature of the test bed is that, once a dataset is input, a single command creates the entire test bed relative to the dataset. The particular dataset used for the first version of the test bed comes from a microarray-based classification study that analyzes a large number of microarrays, prepared with RNA from breast tumor samples from each of 295 patients.

Availability: The software and supplementary material are available at http://public.tgen.org/tgen-cb/support/testbed/

Contact: edward{at}ece.tamu.edu


Received on September 30, 2005; revised on January 12, 2006; accepted on January 13, 2006

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
C. Sima and E. R. Dougherty
What should be expected from feature selection in small-sample settings
Bioinformatics, October 1, 2006; 22(19): 2430 - 2436.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.