Skip Navigation



Bioinformatics Advance Access published online on July 4, 2006

Bioinformatics, doi:10.1093/bioinformatics/btl346
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
22/17/2114    most recent
btl346v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Malossini, A.
Right arrow Articles by Ng, R. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Malossini, A.
Right arrow Articles by Ng, R. T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2006). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
Received November 28, 2005
Revised May 28, 2006
Accepted June 22, 2006

Article

Detecting potential labeling errors in microarrays by data perturbation

Andrea Malossini 1 *, Enrico Blanzieri 1, and Raymond T. Ng 2

1 Department of Information and Communication Technology, University of Trento, 38050 Povo, Italy
2 Department of Computer Science, University of British Columbia, Vancouver, B.C., V6T1Z4, Canada

* To whom correspondence should be addressed.
Andrea Malossini, E-mail: malossin{at}dit.unitn.it


   Abstract

Motivation: Classification is widely used in medical applications. However, the quality of the classifier depends critically on the accurate labeling of the training data. But for many medical applications, labeling a sample or grading a biopsy can be subjective. Existing studies confirm this phenomenon and show that even a very small number of mislabeled samples could deeply degrade the performance of the obtained classifier, particularly when the sample size is small. The problem we address in this paper is to develop a method for automatically detecting samples that are possibly mislabeled.

Results: We propose two algorithms, a classification-stability algorithm and a leave-one-out-error-sensitivity algorithm for detecting possibly mislabeled samples. For both algorithms, the key structure is the computation of the Leave-One-Out perturbation matrix. The classification-stability algorithm is based on measuring the stability of the label of a sample with respect to label changes of other samples and the version of this algorithm based on the support vector machine (SVM) appears to be quite accurate for three real data sets. The suspect list produced by the SVM version is of high quality. Furthermore, when human intervention is not available, the correction heuristic appears to be beneficial.


Associate Editor: Satoru Miyano
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.