Bioinformatics Advance Access published online on May 6, 2004
Bioinformatics, doi:10.1093/bioinformatics/bth292
Bioinformatics © Oxford University Press 2004; all rights reserved
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Computer Science, University of Texas at Dallas, Richardson, TX 75252
* To whom correspondence should be addressed. E-mail: lkhan{at}utdallas.edu.
Motivation: The increasing use of microarray technologies is generating large amounts of data that must be processed in order to extract useful and rational fundamental patterns of gene expression. Hierarchical clustering technology is one method used to analyze gene expression data, but traditional hierarchical clustering algorithms suffer from several drawbacks (e.g. fixed topology structure; mis-clustered data which can not be reevaluated[M1]). In this paper we introduce a new hierarchical clustering algorithm that overcomes some of these drawbacks. Result: We propose a new tree-structure self-organizing neural network, called dynamically growing self-organizing tree (DGSOT) algorithm for hierarchical clustering. The DGSOT constructs a hierarchy from top to bottom by division. At each hierarchical level, the DGSOT optimizes the number of clusters, from which the proper hierarchical structure of the underlying data set can be found. In addition, we propose a new cluster validation criterion based on the geometric property of the Voronoi partition of the dataset in order to find the proper number of clusters at each hierarchical level. This criterion uses the Minimum Spanning Tree (MST) concept of graph theory and is computationally inexpensive for large data sets. A K-level up distribution (KLD) mechanism, which increases the scope of data distribution in the hierarchy construction, was used to improve the clustering accuracy. The KLD mechanism allows the data mis-clustered in the early stages to be re-evaluated at a late stage and increases the accuracy of the final clustering result. The clustering result of the DGSOT is easily displayed as a dendrogram for visualization. Based on a yeast cell cycle microarray expression data set, we found that our algorithm extracts gene expression patterns at different levels. Furthermore, the biological functionality enrichment in the clusters is considerably high and the hierarchical structure of the clusters is more reasonable. Availability: DGSOT is available upon request from the authors.
Revised April 23, 2004
Accepted April 24, 2004
Article
A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles
2 Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?