Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (56)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Dudoit, S.
Right arrow Articles by Fridlyand, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Dudoit, S.
Right arrow Articles by Fridlyand, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 19 no. 9 2003
Pages 1090-1099
© 2003 Oxford University Press

Bagging to improve the accuracy of a clustering procedure

Sandrine Dudoit 1,* and Jane Fridlyand 2

1 Division of Biostatistics, School of Public Health, University of California, Berkeley, 140 Earl Warren Hall, #7360, Berkeley, CA 94720-7360, USA
2 Jain Lab, Comprehensive Cancer Center, University of California, San Francisco, 2340 Sutter St., #N412, San Francisco, CA 94143-0128, USA

Received on November 15, 2001 ; revised on November 8, 2002 ; accepted on November 11, 2002

Motivation: The microarray technology is increasingly being applied in biological and medical research to address a wide range of problems such as the classification of tumors. An important statistical question associated with tumor classification is the identification of new tumor classes using gene expression profiles. Essential aspects of this clustering problem include identifying accurate partitions of the tumor samples into clusters and assessing the confidence of cluster assignments for individual samples.

Results: Two new resampling methods, inspired from bagging in prediction, are proposed to improve and assess the accuracy of a given clustering procedure. In these ensemble methods, a partitioning clustering procedure is applied to bootstrap learning sets and the resulting multiple partitions are combined by voting or the creation of a new dissimilarity matrix. As in prediction, the motivation behind bagging is to reduce variability in the partitioning results via averaging. The performances of the new and existing methods were compared using simulated data and gene expression data from two recently published cancer microarray studies. The bagged clustering procedures were in general at least as accurate and often substantially more accurate than a single application of the partitioning clustering procedure. A valuable by-product of bagged clustering are the cluster votes which can be used to assess the confidence of cluster assignments for individual observations

Contact: sandrine{at}stat.berkeley.edu

Supplementary information: For supplementary information on datasets, analyses, and software, consult http://www.stat.berkeley.edu/~sandrine and http://www.bioconductor.org.

* To whom correspondence should be addressed.

The authors wish it to be known that, in their opinion, both authors should be regarded as joint First Authors.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Neurophysiol.Home page
I. Ozden, H. M. Lee, M. R. Sullivan, and S. S.-H. Wang
Identification and Clustering of Event Patterns From In Vivo Multiphoton Optical Recordings of Neuronal Ensembles
J Neurophysiol, July 1, 2008; 100(1): 495 - 503.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
D. Amaratunga, J. Cabrera, and V. Kovtun
Microarray learning with ABC
Biostat., January 1, 2008; 9(1): 128 - 136.
[Abstract] [Full Text] [PDF]


Home page
Stat Methods Med ResHome page
Seo Young Kim and J. Won Lee
Ensemble clustering method based on the resampling similarity measure for gene expression data
Statistical Methods in Medical Research, December 1, 2007; 16(6): 539 - 564.
[Abstract] [PDF]


Home page
BioinformaticsHome page
Z. Yu, H.-S. Wong, and H. Wang
Graph-based consensus clustering for class discovery from gene expression data
Bioinformatics, November 1, 2007; 23(21): 2888 - 2896.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z. Guo, Y. Li, X. Gong, C. Yao, W. Ma, D. Wang, Y. Li, J. Zhu, M. Zhang, D. Yang, et al.
Edge-based scoring and searching method for identifying condition-responsive protein protein interaction sub-network
Bioinformatics, August 15, 2007; 23(16): 2121 - 2128.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D.-W. Kim, K.-Y. Lee, K. H. Lee, and D. Lee
Towards clustering of incomplete microarray data without the use of imputation
Bioinformatics, January 1, 2007; 23(1): 107 - 113.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Valentini
Clusterv: a tool for assessing the reliability of clusters discovered in DNA microarray data
Bioinformatics, February 1, 2006; 22(3): 369 - 370.
[Abstract] [Full Text] [PDF]


Home page
Molecular Cancer TherapeuticsHome page
D. W. Mount and R. Pandey
Using bioinformatics and genome analysis for new therapeutic interventions
Mol. Cancer Ther., October 1, 2005; 4(10): 1636 - 1643.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D.-W. Kim, K. H. Lee, and D. Lee
Detecting clusters of different geometrical shapes in microarray gene expression data
Bioinformatics, May 1, 2005; 21(9): 1927 - 1934.
[Abstract] [Full Text] [PDF]


Home page
NEJMHome page
A. Holleman, M. H. Cheok, M. L. den Boer, W. Yang, A. J.P. Veerman, K. M. Kazemier, D. Pei, C. Cheng, C.-H. Pui, M. V. Relling, et al.
Gene-Expression Patterns in Drug-Resistant Acute Lymphoblastic Leukemia Cells and Response to Treatment
N. Engl. J. Med., August 5, 2004; 351(6): 533 - 542.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
M. D. Rasmussen, M. S. Deshpande, G. Karypis, J. Johnson, J. A. Crow, and E. F. Retzel
wCLUTO: A Web-Enabled Clustering Toolkit
Plant Physiology, October 1, 2003; 133(2): 510 - 516.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.