Skip Navigation



Bioinformatics Advance Access published online on November 7, 2007

Bioinformatics, doi:10.1093/bioinformatics/btm528
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
23/24/3343    most recent
btm528v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Varshavsky, R.
Right arrow Articles by Linial, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Varshavsky, R.
Right arrow Articles by Linial, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2007). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Unsupervised Feature Selection under Perturbations: Meeting the Challenges of Biological Data

Roy Varshavsky 1,*, Assaf Gottlieb 2, David Horn 2 and Michal Linial 3

1School of Computer Science and Engineering, The Hebrew University of Jerusalem 91904, Israel
2School of Physics and Astronomy, Tel Aviv University 69978, Israel
3Deptartment of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem 91904, Israel

*To whom correspondence should be addressed. Mr. Roy Varshavsky, E-mail: royke{at}cs.huji.ac.il


   Abstract

Motivation: Feature selection methods aim to reduce the complexity of data and to uncover the most relevant biological variables. In reality, information in biological datasets is often incomplete as a result of untrustworthy samples and missing values. The reliability of selection methods may therefore be questioned.

Methods: information loss is incorporated into a perturbation scheme, testing which features are stable under it. This method is applied to data analysis by unsupervised feature filtering (UFF). The latter has been shown to be a very successful method in analysis of gene-expression data.

Results: We find that the UFF quality degrades smoothly with information loss. It remains successful even under substantial damage. Our method allows for selection of a best imputation method on a dataset treated by UFF. More importantly, scoring features according to their stability under information loss is shown to be correlated with biological importance in cancer studies. This scoring may lead to novel biological insights.

Supplementary Data and Code availability: attached

Keywords and Abbreviations: Feature Selection, Imputation, Information Loss, Comparative Genomic Hybridization (CGH), Singular Value Decomposition (SVD), Unsupervised Feature Filtering (UFF).

Associate Editor: Prof. David Rocke


Received on July 23, 2007; revised on September 12, 2007; accepted on October 15, 2007

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.