Bioinformatics Advance Access originally published online on March 25, 2004
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics 20(12) © Oxford University Press 2004; all rights reserved.
A comparison of cluster analysis methods using DNA methylation data
1 Department of Preventive Medicine, 2 Department of Surgery and 3 Department of Biochemistry and Molecular Biology, Norris Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA
Received on July 15, 2003; revised on March 1, 2004; accepted on March 2, 2004
Advance Access Publication March 25, 2004
Motivation: Aberrant DNA methylation is common in cancer. DNA methylation profiles differ between tumor types and subtypes and provide a powerful diagnostic tool for identifying clusters of samples and/or genes. DNA methylation data obtained with the quantitative, highly sensitive MethyLight technology is not normally distributed; it frequently contains an excess of zeros. Established tools to analyze this type of data do not exist. Here, we evaluate a variety of methods for cluster analysis to determine which is most reliable.
Results: We introduce a Bernoullilognormal mixture model for clustering DNA methylation data obtained using MethyLight. We model the outcomes using a two-part distribution having discrete and continuous components. It is compared with standard cluster analysis approaches for continuous data and for discrete data. In a simulation study, we find that the two-part model has the lowest classification error rate for mixture outcome data compared with other approaches. The methods are illustrated using DNA methylation data from a study of lung cancer cell lines. Compared with competing hierarchical clustering methods, the mixture model approaches have the lowest cross-validation error for detecting lung cancer subtype (non-small versus small cell). The Bernoullilognormal mixture assigns observations to subgroups with the lowest uncertainty.
Availability: Software is available upon request from the authors.
Supplementary information: http://www-rcf.usc.edu/~kims/SupplementaryInfo.html
Contact: kims{at}usc.edu
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Bock and T. Lengauer Computational epigenetics Bioinformatics, January 1, 2008; 24(1): 1 - 10. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. D. Siegmund, A. J. Levine, J. Chang, and P. W. Laird Modeling exposures for DNA methylation profiles. Cancer Epidemiol. Biomarkers Prev., March 1, 2006; 15(3): 567 - 572. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. C. Thomas The Need for a Systematic Approach to Complex Pathways in Molecular Epidemiology Cancer Epidemiol. Biomarkers Prev., March 1, 2005; 14(3): 557 - 559. [Full Text] [PDF] |
||||

