Bioinformatics Advance Access originally published online on August 9, 2005
Bioinformatics 2005 21(19):3748-3754; doi:10.1093/bioinformatics/bti617
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Accounting for probe-level noise in principal component analysis of microarray data
1Department of Computer Science, Regent Court 211 Portobello Road, Sheffield S1 4DP, UK
2School of Computer Science, University of Manchester Oxford Road, Manchester M13 9PL, UK
3Department of Biomedical Science, Addison Building Western Bank, Sheffield S10 2TN, UK
*To whom correspondence should be addressed.
Motivation: Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis.
Results: We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to denoise a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.
Availability: The software used in the paper is available from http://www.bioinf.man.ac.uk/resources/puma. The microarray data are depo-sited in the NCBI database.
Contact: neil{at}dcs.shef.ac.uk
Received on June 14, 2005; revised on July 15, 2005; accepted on August 8, 2005
This article has been cited by other articles:
![]() |
J. Kim, D. G. Bates, I. Postlethwaite, P. Heslop-Harrison, and K.-H. Cho Linear time-varying models can reveal non-linear interactions of biomolecular regulatory networks using multiple time-series data Bioinformatics, May 15, 2008; 24(10): 1286 - 1292. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Sanguinetti, J. Noirel, and P. C. Wright MMG: a probabilistic tool to identify submodules of metabolic pathways Bioinformatics, April 15, 2008; 24(8): 1078 - 1084. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Sanguinetti, N. D. Lawrence, and M. Rattray Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities Bioinformatics, November 15, 2006; 22(22): 2775 - 2781. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Liu, M. Milo, N. D Lawrence, and M. Rattray Probe-level measurement error improves accuracy in detecting differential gene expression Bioinformatics, September 1, 2006; 22(17): 2107 - 2113. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Sanguinetti, M. Rattray, and N. D. Lawrence A probabilistic dynamical model for quantitative inference of the regulatory mechanism of transcription Bioinformatics, July 15, 2006; 22(14): 1753 - 1759. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Rattray, X. Liu, G. Sanguinetti, M. Milo, and N. D. Lawrence Propagating uncertainty in microarray data analysis Brief Bioinform, March 1, 2006; 7(1): 37 - 47. |
||||

