Bioinformatics Advance Access published online on July 4, 2006
Bioinformatics, doi:10.1093/bioinformatics/btl346
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Information and Communication Technology, University of Trento, 38050 Povo, Italy
* To whom correspondence should be addressed.
Motivation: Classification is widely used in medical applications. However, the quality of the classifier depends critically on the accurate labeling of the training data. But for many medical applications, labeling a sample or grading a biopsy can be subjective. Existing studies confirm this phenomenon and show that even a very small number of mislabeled samples could deeply degrade the performance of the obtained classifier, particularly when the sample size is small. The problem we address in this paper is to develop a method for automatically detecting samples that are possibly mislabeled. Results: We propose two algorithms, a classification-stability algorithm and a leave-one-out-error-sensitivity algorithm for detecting possibly mislabeled samples. For both algorithms, the key structure is the computation of the Leave-One-Out perturbation matrix. The classification-stability algorithm is based on measuring the stability of the label of a sample with respect to label changes of other samples and the version of this algorithm based on the support vector machine (SVM) appears to be quite accurate for three real data sets. The suspect list produced by the SVM version is of high quality. Furthermore, when human intervention is not available, the correction heuristic appears to be beneficial.
Received November 28, 2005
Revised May 28, 2006
Accepted June 22, 2006
Article
Detecting potential labeling errors in microarrays by data perturbation
Andrea Malossini 1 *,
Enrico Blanzieri 1,
and
Raymond T. Ng 2
2 Department of Computer Science, University of British Columbia, Vancouver, B.C., V6T1Z4, Canada
Andrea Malossini, E-mail: malossin{at}dit.unitn.it
![]()
Abstract
Associate Editor: Satoru Miyano
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Zhang, C. Wu, E. Blanzieri, Y. Zhou, Y. Wang, W. Du, and Y. Liang Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model Bioinformatics, October 15, 2009; 25(20): 2708 - 2714. [Abstract] [Full Text] [PDF] |
||||
