Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (125)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Yeung, K. Y.
Right arrow Articles by Ruzzo, W. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yeung, K. Y.
Right arrow Articles by Ruzzo, W. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 17 no. 9 2001
Pages 763-774
© 2001 Oxford University Press

Principal component analysis for clustering gene expression data

K. Y. Yeung * and W. L. Ruzzo

Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, USA

Received on January 1, 2001 ; revised on May 3, 2001 ; accepted on May 23, 2001

Motivation: There is a great need to develop analytical methodology to analyze and to exploit the information contained in gene expression data. Because of the large number of genes and the complexity of biological networks, clustering is a useful exploratory technique for analysis of gene expression data. Other classical techniques, such as principal component analysis (PCA), have also been applied to analyze gene expression data. Using different data analysis techniques and different clustering algorithms to analyze the same data set can lead to very different conclusions. Our goal is to study the effectiveness of principal components (PCs) in capturing cluster structure. Specifically, using both real and synthetic gene expression data sets, we compared the quality of clusters obtained from the original data to the quality of clusters obtained after projecting onto subsets of the principal component axes.

Results: Our empirical study showed that clustering with the PCs instead of the original variables does not necessarily improve, and often degrades, cluster quality. In particular, the first few PCs (which contain most of the variation in the data) do not necessarily capture most of the cluster structure. We also showed that clustering with PCs has different impact on different algorithms and different similarity metrics. Overall, we would not recommend PCA before clustering except in special circumstances.

Contact: kayee{at}cs.washington.edu

Supplementary information: http://www.cs.washington.edu/homes/kayee/pca

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Am. J. Respir. Cell Mol. Bio.Home page
M. Gonzalez-Juarrero, L. C. Kingry, D. J. Ordway, M. Henao-Tamayo, M. Harton, R. J. Basaraba, W. H. Hanneman, I. M. Orme, and R. A. Slayden
Immune Response to Mycobacterium tuberculosis and Identification of Molecular Markers of Disease
Am. J. Respir. Cell Mol. Biol., April 1, 2009; 40(4): 398 - 409.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Ma and M. R. Kosorok
Identification of differential gene pathways with principal component analysis
Bioinformatics, April 1, 2009; 25(7): 882 - 889.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
E. Yohannes, J. Chang, G. J. Christ, K. P. Davies, and M. R. Chance
Proteomics Analysis Identifies Molecular Targets Related to Diabetes Mellitus-associated Bladder Dysfunction
Mol. Cell. Proteomics, July 1, 2008; 7(7): 1270 - 1285.
[Abstract] [Full Text] [PDF]


Home page
Genes Dev.Home page
K. Kurimoto, Y. Yabuta, Y. Ohinata, M. Shigeta, K. Yamanaka, and M. Saitou
Complex genome-wide transcription dynamics orchestrated by Blimp1 for the specification of the germ cell lineage in mice
Genes & Dev., June 15, 2008; 22(12): 1617 - 1635.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
E. Grundberg, H. Brandstrom, K. C. L. Lam, S. Gurd, B. Ge, E. Harmsen, A. Kindmark, O. Ljunggren, H. Mallmin, O. Nilsson, et al.
Systematic assessment of the human osteoblast transcriptome in resting and induced primary cells
Physiol Genomics, May 1, 2008; 33(3): 301 - 311.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Bandyopadhyay, A. Mukhopadhyay, and U. Maulik
An improved algorithm for clustering gene expression data
Bioinformatics, November 1, 2007; 23(21): 2859 - 2865.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. S. Moller-Levet, C. M. West, and C. J. Miller
Exploiting sample variability to enhance multivariate analysis of microarray data
Bioinformatics, October 15, 2007; 23(20): 2733 - 2740.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. J. Nueda, A. Conesa, J. A. Westerhuis, H. C. J. Hoefsloot, A. K. Smilde, M. Talon, and A. Ferrer
Discovering gene expression patterns in time course microarray experiments by ANOVA SCA
Bioinformatics, July 15, 2007; 23(14): 1792 - 1800.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
R. K. Reen, A. A. Dombkowski, L. A. Kresty, D. Cukovic, J. M. Mele, S. Salagrama, R. Nines, and G. D. Stoner
Effects of Phenylethyl Isothiocyanate on Early Molecular Events in N-Nitrosomethylbenzylamine-Induced Cytotoxicity in Rat Esophagus
Cancer Res., July 1, 2007; 67(13): 6484 - 6492.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
K. L.M. Boylan, M. A. Gosse, S. E. Staggs, S. Janz, S. Grindle, G. S. Kansas, and B. G. Van Ness
A Transgenic Mouse Model of Plasma Cell Malignancy Shows Phenotypic, Cytogenetic, and Gene Expression Heterogeneity Similar to Human Multiple Myeloma
Cancer Res., May 1, 2007; 67(9): 4069 - 4078.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. A. Zapala and N. J. Schork
Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables
PNAS, December 19, 2006; 103(51): 19430 - 19435.
[Abstract] [Full Text] [PDF]


Home page
The International Journal of Robotics ResearchHome page
C. Chen and H. Wang
Appearance-Based Topological Bayesian Inference for Loop-Closing Detection in a Cross-Country Environment
The International Journal of Robotics Research, October 1, 2006; 25(10): 953 - 983.
[Abstract] [PDF]


Home page
J. Biol. Chem.Home page
S. A. Jesch, P. Liu, X. Zhao, M. T. Wells, and S. A. Henry
Multiple Endoplasmic Reticulum-to-Nucleus Signaling Pathways Coordinate Phospholipid Metabolism with Gene Expression by Distinct Mechanisms
J. Biol. Chem., August 18, 2006; 281(33): 24070 - 24083.
[Abstract] [Full Text] [PDF]


Home page
BloodHome page
C. S. Wilson, G. S. Davidson, S. B. Martin, E. Andries, J. Potter, R. Harvey, K. Ar, Y. Xu, K. J. Kopecky, D. P. Ankerst, et al.
Gene expression profiling of adult acute myeloid leukemia identifies novel biologic clusters for risk classification and outcome prediction
Blood, July 15, 2006; 108(2): 685 - 696.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Rainer, F. Sanchez-Cabo, G. Stocker, A. Sturn, and Z. Trajanoski
CARMAweb: comprehensive R- and bioconductor-based web service for microarray data analysis.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W498 - W503.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
Q. C. Ru, L. A. Zhu, J. Silberman, and C. D. Shriver
Label-free Semiquantitative Peptide Feature Profiling of Human Breast Cancer and Breast Disease Sera via Two-dimensional Liquid Chromatography-Mass Spectrometry
Mol. Cell. Proteomics, June 1, 2006; 5(6): 1095 - 1104.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
M. Rattray, X. Liu, G. Sanguinetti, M. Milo, and N. D. Lawrence
Propagating uncertainty in microarray data analysis
Brief Bioinform, March 1, 2006; 7(1): 37 - 47.



Home page
Mol. Cell. ProteomicsHome page
I. C. Gerling, S. Singh, N. I. Lenchik, D. R. Marshall, and J. Wu
New Data Analysis and Mining Approaches Identify Unique Proteome and Transcriptome Markers of Susceptibility to Autoimmune Diabetes
Mol. Cell. Proteomics, February 1, 2006; 5(2): 293 - 305.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
I. Zwir, H. Huang, and E. A. Groisman
Analysis of differentially-regulated genes within a regulatory network by GPS genome navigation
Bioinformatics, November 15, 2005; 21(22): 4073 - 4083.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
R. A. Jolly, K. M. Goldstein, T. Wei, H. Gao, P. Chen, S. Huang, J.-M. Colet, T. P. Ryan, C. E. Thomas, and S. T. Estrem
Pooling samples within microarray studies: a comparative analysis of rat liver transcription response to prototypical toxicants
Physiol Genomics, August 11, 2005; 22(3): 346 - 355.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
R. Wang, J. T. Prince, and E. M. Marcotte
Mass spectrometry of the M. smegmatis proteome: Protein expression levels correlate with function, operons, and codon bias
Genome Res., August 1, 2005; 15(8): 1118 - 1126.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Liang, B. Tayo, X. Cai, and A. Kelemen
Differential and trajectory methods for time course gene expression data
Bioinformatics, July 1, 2005; 21(13): 3009 - 3016.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
L. M. Maurer, E. Yohannes, S. S. Bondurant, M. Radmacher, and J. L. Slonczewski
pH Regulates Genes for Flagellar Motility, Catabolism, and Oxidative Stress in Escherichia coli K-12
J. Bacteriol., January 1, 2005; 187(1): 304 - 319.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
J. B. Fitzgerald, M. Jin, D. Dean, D. J. Wood, M. H. Zheng, and A. J. Grodzinsky
Mechanical Compression of Cartilage Explants Induces Multiple Time-dependent Gene Expression Patterns and Involves Intracellular Calcium and Cyclic AMP
J. Biol. Chem., May 7, 2004; 279(19): 19502 - 19511.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
N. Qu, U. Schittko, and I. T. Baldwin
Consistency of Nicotiana attenuata's Herbivore- and Jasmonate-Induced Transcriptional Responses in the Allotetraploid Species Nicotiana quadrivalvis and Nicotiana clevelandii
Plant Physiology, May 1, 2004; 135(1): 539 - 548.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Xu, V. Olman, L. Wang, and Y. Xu
EXCAVATOR: a computer program for efficiently mining gene expression data
Nucleic Acids Res., October 1, 2003; 31(19): 5582 - 5589.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
H. Ressom, D. Wang, and P. Natarajan
Clustering gene expression data using adaptive double self-organizing map
Physiol Genomics, June 24, 2003; 14(1): 35 - 46.
[Abstract] [Full Text] [PDF]


Home page
Recent Prog Horm ResHome page
N.M. Svrakic, O. Nesic, M.R.K. Dasu, D. Herndon, and J.R. Perez-Polo
Statistical Approach to DNA Chip Analysis
Recent Prog. Horm. Res., January 1, 2003; 58(1): 75 - 93.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.