Skip Navigation



Bioinformatics Advance Access published online on August 9, 2005

Bioinformatics, doi:10.1093/bioinformatics/bti617
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
21/19/3748    most recent
bti617v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Sanguinetti, G.
Right arrow Articles by Lawrence, N. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sanguinetti, G.
Right arrow Articles by Lawrence, N. D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2005). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org
Received June 14, 2005
Revised July 15, 2005
Accepted August 8, 2005

Article

Accounting for probe-level noise in principal component analysis of microarray data

Guido Sanguinetti 1, Marta Milo 2, Magnus Rattray 3, and Neil D. Lawrence 1*

1 Department of Computer Science, Regent Court, 211 Portobello Road, Sheffield, S1 4DP, UK
2 Department of Computer Science, Regent Court, 211 Portobello Road, Sheffield, S1 4DP, UK; Department of Biomedical Science, Addison Building, Western Bank, Sheffield, S10 2TN, UK
3 School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK

* To whom correspondence should be addressed.
Neil D. Lawrence, E-mail: neil{at}dcs.shef.ac.uk


   Abstract

Motivation: Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis.

Results: We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to ‘denoise’ a microarray data set leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained.

Availability: The software used in the paper is available from http://www.cs.man.ac.uk/~liux/puma. The microarray data is deposited in the NCBI database.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Physiol. GenomicsHome page
E. C. Pacheco-Pinedo, M. T. Budak, U. Zeiger, L. H. Jorgensen, S. Bogdanovich, H. D. Schroder, N. A. Rubinstein, and T. S. Khurana
Transcriptional and functional differences in stem cell populations isolated from extraocular and limb muscles
Physiol Genomics, March 3, 2009; 37(1): 35 - 42.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Kim, D. G. Bates, I. Postlethwaite, P. Heslop-Harrison, and K.-H. Cho
Linear time-varying models can reveal non-linear interactions of biomolecular regulatory networks using multiple time-series data
Bioinformatics, May 15, 2008; 24(10): 1286 - 1292.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Sanguinetti, J. Noirel, and P. C. Wright
MMG: a probabilistic tool to identify submodules of metabolic pathways
Bioinformatics, April 15, 2008; 24(8): 1078 - 1084.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Sanguinetti, N. D. Lawrence, and M. Rattray
Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities
Bioinformatics, November 15, 2006; 22(22): 2775 - 2781.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Liu, M. Milo, N. D Lawrence, and M. Rattray
Probe-level measurement error improves accuracy in detecting differential gene expression
Bioinformatics, September 1, 2006; 22(17): 2107 - 2113.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Sanguinetti, M. Rattray, and N. D. Lawrence
A probabilistic dynamical model for quantitative inference of the regulatory mechanism of transcription
Bioinformatics, July 15, 2006; 22(14): 1753 - 1759.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
M. Rattray, X. Liu, G. Sanguinetti, M. Milo, and N. D. Lawrence
Propagating uncertainty in microarray data analysis
Brief Bioinform, March 1, 2006; 7(1): 37 - 47.




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.