Bioinformatics Advance Access originally published online on June 24, 2004
Bioinformatics 2004 20(17):3137-3145; doi:10.1093/bioinformatics/bth373
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics vol. 20 issue 17 © Oxford University Press 2004; all rights reserved.
Constrained clusters of gene expression profiles with pathological features
1 Undergraduate Program for Bioinformatics and Systems Biology and 2 Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Bunkyo, Tokyo, Japan, 3 Department of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Nara, Japan and 4 Department of Surgery and Clinical Oncology, Graduate School of Medicine, Osaka University, Suita, Osaka, Japan
Received on December 28, 2003; revised on April 22, 2004; accepted on June 18, 2004
Advance Access Publication June 24, 2004
Motivation: Gene expression profiles should be useful in distinguishing variations in disease, since they reflect accurately the status of cells. The primary clustering of gene expression reveals the genotypes that are responsible for the proximity of members within each cluster, while further clustering elucidates the pathological features of the individual members of each cluster. However, since the first clustering process and the second classification step, in which the features are associated with clusters, are performed independently, the initial set of clusters may omit genes that are associated with pathologically meaningful features. Therefore, it is important to devise a way of identifying gene expression clusters that are associated with pathological features.
Results: We present the novel technique of itemset constrained clustering (IC-Clustering), which computes the optimal cluster that maximizes the interclass variance of gene expression between groups, which are divided according to the restriction that only divisions that can be expressed using common features are allowed. This constraint automatically labels each cluster with a set of pathological features which characterize that cluster. When applied to liver cancer datasets, IC-Clustering revealed informative gene expression clusters, which could be annotated with various pathological features, such as tumor and man, or except tumor and normal liver function. In contrast, the k-means method overlooked these clusters.
Supplementary information: Our dataset is available on the following web page: http://love2.aist-nara.ac.jp/laboratory/data_download.html.
Contact: sesejun{at}gi.k.u-tokyo.ac.jp
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Isobe, A. Nakaya, and S. Tabata Genotype Matrix Mapping: Searching for Quantitative Trait Loci Interactions in Genetic Variation in Complex Traits DNA Res, November 13, 2007; (2007) dsm020v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Ohya, J. Sese, M. Yukawa, F. Sano, Y. Nakatani, T. L. Saito, A. Saka, T. Fukuda, S. Ishihara, S. Oka, et al. High-dimensional and large-scale phenotyping of yeast mutants PNAS, December 27, 2005; 102(52): 19015 - 19020. [Abstract] [Full Text] [PDF] |
||||

