Skip Navigation


Bioinformatics Advance Access originally published online on August 6, 2009
Bioinformatics 2009 25(20):2708-2714; doi:10.1093/bioinformatics/btp478
This Article
Right arrow Full Text
Right arrow Full Text (Print PDF)
Right arrow All Versions of this Article:
25/20/2708    most recent
btp478v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zhang, C.
Right arrow Articles by Liang, Y.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zhang, C.
Right arrow Articles by Liang, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model

Chen Zhang 1, Chunguo Wu 1, Enrico Blanzieri 2,*, You Zhou 1, Yan Wang 1, Wei Du 1 and Yanchun Liang 1,*

1 College of Computer Science and Technology, Jilin University, 130012, China and 2 Department of Information and Communication Technology, University of Trento, 38050 Povo, Italy

* To whom correspondence should be addressed.


   Abstract

Motivation: Mislabeled samples often appear in gene expression profile because of the similarity of different sub-type of disease and the subjective misdiagnosis. The mislabeled samples deteriorate supervised learning procedures. The LOOE-sensitivity algorithm is an approach for mislabeled sample detection for microarray based on data perturbation. However, the failure of measuring the perturbing effect makes the LOOE-sensitivity algorithm a poor performance. The purpose of this article is to design a novel detection method for mislabeled samples of microarray, which could take advantage of the measuring effect of data perturbations.

Results: To measure the effect of data perturbation, we define an index named perturbing influence value (PIV), based on the support vector machine (SVM) regression model. The Column Algorithm (CAPIV), Row Algorithm (RAPIV) and progressive Row Algorithm (PRAPIV) based on the PIV value are proposed to detect the mislabeled samples. Experimental results obtained by using six artificial datasets and five microarray datasets demonstrate that all proposed methods in this article are superior to LOOE-sensitivity. Moreover, compared with the simple SVM and CL-stability, the PRAPIV algorithm shows an increase in precision and high recall.

Availability: The program and source code (in JAVA) are publicly available at http://ccst.jlu.edu.cn/CSBG/PIVS/index.htm

Contact: blanzier{at}dit.unitn.it; ycliang{at}jlu.edu.cn

Associate Editor: Joaquin Dopazo


Received on April 8, 2009; revised on July 14, 2009; accepted on August 3, 2009

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.