Skip Navigation


Bioinformatics Advance Access originally published online on March 25, 2004
This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow All Versions of this Article:
20/12/1896    most recent
bth176v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (11)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Siegmund, K. D.
Right arrow Articles by Laird-Offringa, I. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Siegmund, K. D.
Right arrow Articles by Laird-Offringa, I. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics 20(12) © Oxford University Press 2004; all rights reserved.

A comparison of cluster analysis methods using DNA methylation data

Kimberly D. Siegmund 1,*, Peter W. Laird 2,3 and Ite A. Laird-Offringa 2,3

1 Department of Preventive Medicine, 2 Department of Surgery and 3 Department of Biochemistry and Molecular Biology, Norris Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA

Received on July 15, 2003; revised on March 1, 2004; accepted on March 2, 2004
Advance Access Publication March 25, 2004

Motivation: Aberrant DNA methylation is common in cancer. DNA methylation profiles differ between tumor types and subtypes and provide a powerful diagnostic tool for identifying clusters of samples and/or genes. DNA methylation data obtained with the quantitative, highly sensitive MethyLight technology is not normally distributed; it frequently contains an excess of zeros. Established tools to analyze this type of data do not exist. Here, we evaluate a variety of methods for cluster analysis to determine which is most reliable.

Results: We introduce a Bernoulli–lognormal mixture model for clustering DNA methylation data obtained using MethyLight. We model the outcomes using a two-part distribution having discrete and continuous components. It is compared with standard cluster analysis approaches for continuous data and for discrete data. In a simulation study, we find that the two-part model has the lowest classification error rate for mixture outcome data compared with other approaches. The methods are illustrated using DNA methylation data from a study of lung cancer cell lines. Compared with competing hierarchical clustering methods, the mixture model approaches have the lowest cross-validation error for detecting lung cancer subtype (non-small versus small cell). The Bernoulli–lognormal mixture assigns observations to subgroups with the lowest uncertainty.

Availability: Software is available upon request from the authors.

Supplementary information: http://www-rcf.usc.edu/~kims/SupplementaryInfo.html

Contact: kims{at}usc.edu

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
C. Bock and T. Lengauer
Computational epigenetics
Bioinformatics, January 1, 2008; 24(1): 1 - 10.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
K. D. Siegmund, A. J. Levine, J. Chang, and P. W. Laird
Modeling exposures for DNA methylation profiles.
Cancer Epidemiol. Biomarkers Prev., March 1, 2006; 15(3): 567 - 572.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
D. C. Thomas
The Need for a Systematic Approach to Complex Pathways in Molecular Epidemiology
Cancer Epidemiol. Biomarkers Prev., March 1, 2005; 14(3): 557 - 559.
[Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.