Skip Navigation



Bioinformatics Advance Access published online on May 16, 2006

Bioinformatics, doi:10.1093/bioinformatics/btl185
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
22/16/1971    most recent
btl185v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Liu, J.
Right arrow Articles by Baudis, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Liu, J.
Right arrow Articles by Baudis, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2006). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
Received February 6, 2006
Revised April 20, 2006
Accepted May 10, 2006

Article

Distance-based clustering of CGH Data

Jun Liu 1 *, Jaaved Mohammed 1, James Carter 1, Sanjay Ranka 1, Tamer Kahveci 1, and Michael Baudis 2

1 Computer and Information Science and Engineering, University of Florida, Gainesville, FL, 32611
2 Institut fuer Humangenetik, Rheinisch-Westfaelische Technische Hochschule, Aachen, Germany


   Abstract

Motivation: We consider the problem of clustering a population of Comparative Genomic Hybridization (CGH) data samples. The goal is to develop a systematic way of placing patients with similar CGH imbalance profiles into the same cluster. Our expectation is that patients with the same cancer types will generally belong to the same cluster as their underlying CGH profiles will be similar.

Results: We focus on distance based clustering strategies. We do this in two steps. 1) Distances of all pairs of CGH samples are computed. 2) CGH samples are clustered based on this distance. We develop three pairwise distance/similarity measures, namely raw, cosine, and sim. Raw measure disregards correlation between contiguous genomic intervals. It compares the aberrations in each genomic interval separately. The remaining measures assume that consecutive genomic intervals may be correlated. Cosine maps pairs of CGH samples into vectors in a high dimensional space and measures the angle between them. Sim measures the number of independent common aberrations. We test our distance/similarity measures on three well known clustering algorithms, bottom-up, top-down, and -means with and without centroid shrinking. Our results show that sim consistently performs better than the remaining measures. This indicates that the correlation of neighboring genomic intervals should be considered in the structural analysis of CGH data sets. The combination of sim with top-down clustering emerged as the best approach.

Availability: All software developed in this paper and all the datasets are available from the authors upon request.


Associate Editor: Martin Bishop
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
M. Gerstung, M. Baudis, H. Moch, and N. Beerenwinkel
Quantifying cancer progression with conjunctive Bayesian networks
Bioinformatics, November 1, 2009; 25(21): 2809 - 2815.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
W. N. Van Wieringen, M. A. Van De Wiel, and B. Ylstra
Weighted clustering of called array CGH data
Biostat., July 1, 2008; 9(3): 484 - 500.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Liu, S. Ranka, and T. Kahveci
Classification and feature selection algorithms for multi-class CGH data
Bioinformatics, July 1, 2008; 24(13): i86 - i95.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. P. Shah, W. L. Lam, R. T. Ng, and K. P. Murphy
Modeling recurrent DNA copy number alterations in array CGH data
Bioinformatics, July 1, 2007; 23(13): i450 - i458.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Liu, S. Ranka, and T. Kahveci
Markers improve clustering of CGH data
Bioinformatics, February 15, 2007; 23(4): 450 - 457.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.