Skip Navigation


Bioinformatics Advance Access originally published online on March 4, 2009
Bioinformatics 2009 25(9):1152-1157; doi:10.1093/bioinformatics/btp123
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
25/9/1152    most recent
btp123v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Sharma, A.
Right arrow Articles by McIndoe, R. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sharma, A.
Right arrow Articles by McIndoe, R. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets

Ashok Sharma 1, Robert Podolsky 1,2, Jieping Zhao 1 and Richard A. McIndoe 1,3,*

1Center for Biotechnology and Genomic Medicine, 2Department of Medicine and 3Department of Pathology, Medical College of Georgia, Augusta, GA, USA

*To whom correspondence should be addressed.


   Abstract

Motivation: As the number of publically available microarray experiments increases, the ability to analyze extremely large datasets across multiple experiments becomes critical. There is a requirement to develop algorithms which are fast and can cluster extremely large datasets without affecting the cluster quality. Clustering is an unsupervised exploratory technique applied to microarray data to find similar data structures or expression patterns. Because of the high input/output costs involved and large distance matrices calculated, most of the algomerative clustering algorithms fail on large datasets (30 000 + genes/200 + arrays). In this article, we propose a new two-stage algorithm which partitions the high-dimensional space associated with microarray data using hyperplanes. The first stage is based on the Balanced Iterative Reducing and Clustering using Hierarchies algorithm with the second stage being a conventional k-means clustering technique. This algorithm has been implemented in a software tool (HPCluster) designed to cluster gene expression data. We compared the clustering results using the two-stage hyperplane algorithm with the conventional k-means algorithm from other available programs. Because, the first stage traverses the data in a single scan, the performance and speed increases substantially. The data reduction accomplished in the first stage of the algorithm reduces the memory requirements allowing us to cluster 44 460 genes without failure and significantly decreases the time to complete when compared with popular k-means programs. The software was written in C# (.NET 1.1).

Availability: The program is freely available and can be downloaded from http://www.amdcc.org/bioinformatics/bioinformatics.aspx.

Contact: rmcindoe{at}mail.mcg.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Associate Editor: David Rocke


Received on October 7, 2008; revised on January 27, 2009; accepted on February 28, 2009

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.