Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (31)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Xu, Y.
Right arrow Articles by Xu, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Xu, Y.
Right arrow Articles by Xu, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 18 no. 4 2002
Pages 536-545
© 2002 Oxford University Press

Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees

Ying Xu *, Victor Olman and Dong Xu

Protein Informatics Group, Life Sciences Division, Oak Ridge National Laboratory, MS 6480, Oak Ridge, TN 27831-6480, USA

Received on August 28, 2001 ; revised on November 2, 2001 ; accepted on November 7, 2001

Motivation: Gene expression data clustering provides a powerful tool for studying functional relationships of genes in a biological process. Identifying correlated expression patterns of genes represents the basic challenge in this clustering problem.

Results: This paper describes a new framework for representing a set of multi-dimensional gene expression data as a Minimum Spanning Tree (MST), a concept from the graph theory. A key property of this representation is that each cluster of the expression data corresponds to one subtree of the MST, which rigorously converts a multi-dimensional clustering problem to a tree partitioning problem. We have demonstrated that though the inter-data relationship is greatly simplified in the MST representation, no essential information is lost for the purpose of clustering. Two key advantages in representing a set of multi-dimensional data as an MST are: (1) the simple structure of a tree facilitates efficient implementations of rigorous clustering algorithms, which otherwise are highly computationally challenging; and (2) as an MST-based clustering does not depend on detailed geometric shape of a cluster, it can overcome many of the problems faced by classical clustering algorithms. Based on the MST representation, we have developed a number of rigorous and efficient clustering algorithms, including two with guaranteed global optimality. We have implemented these algorithms as a computer software EXpression data Clustering Analysis and VisualizATiOn Resource (EXCAVATOR). To demonstrate its effectiveness, we have tested it on three data sets, i.e. expression data from yeast Saccharomyces cerevisiae , expression data in response of human fibroblasts to serum, and Arabidopsis expression data in response to chitin elicitation. The test results are highly encouraging.

Availability: EXCAVATOR is available on request from the authors.

Contact: xyn{at}ornl.gov

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
A. Bhattacharya and R. K. De
Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles
Bioinformatics, June 1, 2008; 24(11): 1359 - 1366.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
S. C. Wieland, J. S. Brownstein, B. Berger, and K. D. Mandl
Density-equalizing Euclidean minimum spanning trees for the detection of all disease cluster shapes
PNAS, May 29, 2007; 104(22): 9404 - 9409.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Gupta, C. D. Maranas, and R. Albert
Elucidation of directionality for co-expressed genes: predicting intra-operon termination sites
Bioinformatics, January 15, 2006; 22(2): 209 - 214.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Z. Liu, F. Mao, J.-t. Guo, B. Yan, P. Wang, Y. Qu, and Y. Xu
Quantitative evaluation of protein-DNA interactions using an optimized knowledge-based potential
Nucleic Acids Res., January 26, 2005; 33(2): 546 - 558.
[Abstract] [Full Text] [PDF]


Home page
Adv. Dent. Res.Home page
W.P. Kuo
Overview of Bioinformatics and its Application to Oral Genomics
Adv. Dent. Res., December 1, 2003; 17(1): 89 - 94.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
F. Katagiri and J. Glazebrook
Local Context Finder (LCF) reveals multidimensional relationships among mRNA expression profiles of Arabidopsis responding to pathogen infection
PNAS, September 16, 2003; 100(19): 10842 - 10847.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.