Bioinformatics Advance Access originally published online on February 24, 2006
Bioinformatics 2006 22(10):1259-1268; doi:10.1093/bioinformatics/btl065
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data
1 Department of Mathematics, China Medical University Shenyang, China
2 Division of Biostatistics, School of Public Health, University of Minnesota Minneapolis, MN 55455-0392, USA
*To whom correspondence should be addressed.
Motivation: Because co-expressed genes are likely to share the same biological function, cluster analysis of gene expression profiles has been applied for gene function discovery. Most existing clustering methods ignore known gene functions in the process of clustering.
Results: To take advantage of accumulating gene functional annotations, we propose incorporating known gene functions into a new distance metric, which shrinks a gene expression-based distance towards 0 if and only if the two genes share a common gene function. A two-step procedure is used. First, the shrinkage distance metric is used in any distance-based clustering method, e.g. K-medoids or hierarchical clustering, to cluster the genes with known functions. Second, while keeping the clustering results from the first step for the genes with known functions, the expression-based distance metric is used to cluster the remaining genes of unknown function, assigning each of them to either one of the clusters obtained in the first step or some new clusters. A simulation study and an application to gene function prediction for the yeast demonstrate the advantage of our proposal over the standard method.
Contact: weip{at}biostat.umn.edu
Received on November 28, 2005; revised on January 16, 2006; accepted on February 20, 2006
This article has been cited by other articles:
![]() |
D. Dotan-Cohen, S. Kasif, and A. A. Melkman Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering Bioinformatics, July 15, 2009; 25(14): 1789 - 1795. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Andreopoulos, A. An, X. Wang, and M. Schroeder A roadmap of clustering algorithms: finding a match for a biomedical application Brief Bioinform, May 1, 2009; 10(3): 297 - 314. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Takigawa and H. Mamitsuka Probabilistic path ranking based on adjacent pairwise coexpression for metabolic transcripts analysis Bioinformatics, January 15, 2008; 24(2): 250 - 257. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Dotan-Cohen, A. A. Melkman, and S. Kasif Hierarchical tree snipping: clustering guided by prior knowledge Bioinformatics, December 15, 2007; 23(24): 3335 - 3342. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Tai and W. Pan Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data Bioinformatics, December 1, 2007; 23(23): 3170 - 3177. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. C. Tseng Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data Bioinformatics, September 1, 2007; 23(17): 2247 - 2255. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Tai and W. Pan Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms Bioinformatics, July 15, 2007; 23(14): 1775 - 1782. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Shi, M. Klustein, I. Simon, T. Mitchell, and Z. Bar-Joseph Continuous hidden process model for time series expression experiments Bioinformatics, July 1, 2007; 23(13): i459 - i467. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Shiga, I. Takigawa, and H. Mamitsuka Annotating gene function by combining expression data with a modular gene network Bioinformatics, July 1, 2007; 23(13): i468 - i478. [Abstract] [Full Text] [PDF] |
||||

