Skip Navigation


Bioinformatics Advance Access originally published online on February 25, 2005
Bioinformatics 2005 21(10):2548-2549; doi:10.1093/bioinformatics/bti343
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2548    most recent
bti343v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (29)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Sharov, A. A.
Right arrow Articles by Ko, M. S. H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sharov, A. A.
Right arrow Articles by Ko, M. S. H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

A web-based tool for principal component and significance analysis of microarray data

Alexei A. Sharov , Dawood B. Dudekula and Minoru S. H. Ko *

Developmental Genomics and Aging Section, Laboratory of Genetics, National Institute on Aging, National Institutes of Health Baltimore, MD 21224, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 REFERENCES
 

Summary: We have developed a program for microarray data analysis, which features the false discovery rate for testing statistical significance and the principal component analysis using the singular value decomposition method for detecting the global trends of gene-expression patterns. Additional features include analysis of variance with multiple methods for error variance adjustment, correction of cross-channel correlation for two-color microarrays, identification of genes specific to each cluster of tissue samples, biplot of tissues and corresponding tissue-specific genes, clustering of genes that are correlated with each principal component (PC), three-dimensional graphics based on virtual reality modeling language and sharing of PC between different experiments. The software also supports parameter adjustment, gene search and graphical output of results. The software is implemented as a web tool and thus the speed of analysis does not depend on the power of a client computer.

Availability: The tool can be used on-line or downloaded at http://lgsun.grc.nia.nih.gov/ANOVA/

Contact: kom{at}mail.nih.gov

Global gene-expression analysis with microarrays becomes a routine procedure in biomedical research. Although many programs have been developed to support the statistical analysis of microarray results (Kim et al., 2001; Theilhaber et al., 2004; TIGR, 2004 http://www.tigr.org/software/tm4/; Tusher et al., 2001), they do not necessarily contain all the advanced analysis methods. To facilitate the use of these relatively new methods we developed NIA Array Analysis software. A complete description of the software as well as the glossary of technical and statistical terms can be found at http://lgsun.grc.nia.nih.gov/ANOVA/. In this paper, we describe the main features of this software.

The NIA Array Analysis software can be used for both single-color and two-color microarrays with or without a dye swap. It uses a tab-delimited text file as an input and generates outputs in both graphics and text formats. An additional tool (Arrayjoin) assembles multiple input files from different experiments into one input file. The software can also take an annotation file that hyperlinks each microarray probe to various web resources, including Unigene, TIGR, MGI and NIA Mouse Gene Index. These gene links allow the users to incorporate microarray data into other programs, e.g. the GenMAPP for Gene Ontology analysis. All results can be saved as a stand-alone web-page for sharing or releasing the data.

The software offers an optional adjustment of signal intensities, when two-color hybridizations are used. This is based on our observation that signal intensities in one channel (e.g. red) often increase with the increasing signal intensities in the other channel (e.g. green), even when the same reference RNA is always used for the red channel. If readings from these two channels are independent, the signal intensities in the red channel should not vary among experiments and should be corrected if there are changes.

We have implemented the single-factor analysis of variance (ANOVA) for testing statistical significance. Testing multiple hypotheses with the ANOVA requires some modifications such as error variance averaging and false discovery rate (FDR). The average error variance for genes with similar signal intensities is estimated using the sliding window of adjustable size applied to genes sorted by their average signal intensities. Because some genes (outliers) may have unusually high error variance, genes with the highest variance values (a top 1% by default) are not used for the error variance averaging. To obtain an estimate for the true error variance, the software provides the following five different error models as options: (1) actual error variance (this option processes each gene independently), (2) intensity-specific average error variance, (3) Bayesian error model (Baldi and Long, 2001), (4) maximum between intensity-specific average error variance and actual error variance and (5) maximum between intensity-specific average error variance and Bayesian error variances. Option (4), the most conservative model, is used as default. However, if error variance is too high, none of these models is reliable. Thus, we tag and visually examine genes with high error variance (five times greater than the average). Users can also select more stringent criteria for removing outliers (i.e. a lower z-threshold level). The default threshold (z = 8) removes only the most deviating outliers. Estimation of the z-value is based on the ANOVA results; thus, ANOVA is applied iteratively with outlier removal in each cycle until no new outliers are detected.

The FDR identifies the proportion of false positives among significant genes (Benjamini and Hochberg, 1995; Reiner et al., 2003). Traditional p-values, which are designed for testing a single hypothesis, are not suited to the comparison of several thousand genes. The Bonferroni correction is not relevant either, because it is too stringent and allows no false positives among significant genes. We have implemented the original method (Benjamini and Hochberg, 1995):

(1)
where r is the rank of a gene ordered by increasing p-values, pi is the p-value for a gene with rank i and N is the total number of genes tested. It indicates the proportion of false positives among all genes with p-values lower or equal to the p-value of the gene of interest.

The software offers two methods of clustering tissue samples and subsequent identification of correlated genes. First, hierarchical clustering of samples (e.g. tissues and cells) is done by using the average distance method. A set of genes, unique to each cluster is identified in the following manner. For each gene, g, we first identify a sample T1(g) with the lowest average expression, E[T1(g)], within the cluster and a sample T2(g) with the highest average expression, E[T2(g)], outside the cluster. If K genes satisfy E[T1(g)] > E[T2(g)], these genes always have higher expressions in samples within the cluster than in samples outside the cluster. To determine if the difference E[T1(g)] – E[T2(g)] is statistically significant, we calculate z-values based on the error model and p-values based on single-tail normal distribution. Finally we estimate FDR values using Equation 1, in which N is the minimum between 2K and the total number of genes. The set of K genes represents only a half (the positive part) of the normal distribution, and thus K is doubled for estimating the FDR.

Second, we have implemented the principal component analysis (PCA). One advantage of PCA is that the principal components are always orthogonal (uncorrelated), whereas other methods (e.g. K-means clustering) often produce redundant correlated clusters. We have also implemented the singular value decomposition method, which reduces the dimension in both columns and rows of the data matrix. The method combines samples and genes in a single graph (called biplot) so that their association can be analyzed visually (Chapman et al., 2002; Gabriel, 1971). The NIA Array Analysis tool generates interactive two-dimensional (2D) and 3D biplots (Fig. 1). Each gene in a biplot is hyperlinked to its annotation and histogram showing the expression levels in each sample. We identify two sets of genes that are positively and negatively correlated with each principal component (PC). If the degree of a gene-expression change associated with a specific PC exceeds a user-defined threshold, then the gene is considered correlated with the PC.



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 1 3D biplot of gene expression during preimplantation mouse development (data from Hamatani et al., 2004).

 
The NIA Array Analysis tool has been successfully used for the last two years (Hamatani et al., 2004; Sharov et al., 2003). This open-source non-restricted software will be a valuable resource for the research community.

Received on January 14, 2005; revised on February 15, 2005; accepted on February 17, 2005

    REFERENCES
 TOP
 Abstract
 REFERENCES
 

    Baldi, P. and Long, A.D. (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics, 17, 509–519[Abstract/Free Full Text].

    Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. R. Stat. Soc. B, 57, 289–300.

    Chapman, S., et al. (2002) Using biplots to interpret gene expression patterns in plants. Bioinformatics, 18, 202–204[Abstract/Free Full Text].

    Gabriel, R. (1971) The biplot graphical display of matrices with application to principal component analysis. Biometrika, 58, 453–467[Abstract/Free Full Text].

    Hamatani, T., et al. (2004) Dynamics of global gene expression changes during mouse preimplantation development. Dev. Cell, 6, 117–131[CrossRef][Web of Science][Medline].

    Kim, S.K., et al. (2001) A gene expression map for Caenorhabditis elegans. Science, 293, 2087–2092[Abstract/Free Full Text].

    Reiner, A., et al. (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics, 19, 368–375[Abstract/Free Full Text].

    Sharov, A.A., et al. (2003) Transcriptome analysis of mouse stem cells and early embryos. PLoS Biol., 1, e74.

    Theilhaber, J., et al. (2004) GECKO: a complete large-scale gene expression analysis platform. BMC Bioinformatics, 5, 195[CrossRef][Medline].

    TIGR. (2004) TM4 microarray software suite.

    Tusher, V.G., et al. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA, 98, 5116–5121[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Am. J. Pathol.Home page
A. A. Sharov, A. N. Mardaryev, T. Y. Sharova, M. Grachtchouk, R. Atoyan, H. R. Byers, J. T. Seykora, P. Overbeek, A. Dlugosz, and V. A. Botchkarev
Bone Morphogenetic Protein Antagonist Noggin Promotes Skin Tumorigenesis via Stimulation of the Wnt and Shh Signaling Pathways
Am. J. Pathol., September 1, 2009; 175(3): 1303 - 1314.
[Abstract] [Full Text] [PDF]


Home page
ReproductionHome page
M. Vallee, I. Dufort, S. Desrosiers, A. Labbe, C. Gravel, I. Gilbert, C. Robert, and M.-A. Sirard
Revealing the bovine embryo transcript profiles during early in vivo embryonic development
Reproduction, July 1, 2009; 138(1): 95 - 105.
[Abstract] [Full Text] [PDF]


Home page
BloodHome page
T. L. Arenzana, M. R. Smith-Raska, and B. Reizis
Transcription factor Zfx controls BCR-induced proliferation and survival of B lymphocytes
Blood, June 4, 2009; 113(23): 5857 - 5867.
[Abstract] [Full Text] [PDF]


Home page
EndocrinologyHome page
R. A. Heimeier, B. Das, D. R. Buchholz, and Y.-B. Shi
The Xenoestrogen Bisphenol A Inhibits Postembryonic Vertebrate Development by Antagonizing Gene Regulation by Thyroid Hormone
Endocrinology, June 1, 2009; 150(6): 2964 - 2973.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
M. Kunisada, C.-Y. Cui, Y. Piao, M. S.H. Ko, and D. Schlessinger
Requirement for Shh and Fox family genes at different stages in sweat gland development
Hum. Mol. Genet., May 15, 2009; 18(10): 1769 - 1778.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. Gilbert, S. Scantland, I. Dufort, O. Gordynska, A. Labbe, M.-A. Sirard, and C. Robert
Real-time monitoring of aRNA production during T7 amplification to prevent the loss of sample representation during microarray hybridization sample preparation
Nucleic Acids Res., May 1, 2009; 37(8): e65 - e65.
[Abstract] [Full Text] [PDF]


Home page
ReproductionHome page
M. Z Carletti and L. K Christenson
Rapid effects of LH on gene expression in the mural granulosa cells of mouse periovulatory follicles
Reproduction, May 1, 2009; 137(5): 843 - 855.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
A. T. Jacobs and L. J. Marnett
HSF1-mediated BAG3 Expression Attenuates Apoptosis in 4-Hydroxynonenal-treated Colon Cancer Cells via Stabilization of Anti-apoptotic Bcl-2 Proteins
J. Biol. Chem., April 3, 2009; 284(14): 9176 - 9183.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Ma and M. R. Kosorok
Identification of differential gene pathways with principal component analysis
Bioinformatics, April 1, 2009; 25(7): 882 - 889.
[Abstract] [Full Text] [PDF]


Home page
CarcinogenesisHome page
Q. T. Tran, L. Xu, V. Phan, S. B. Goodwin, M. Rahman, V. X. Jin, C. H. Sutter, B. D. Roebuck, T. W. Kensler, E.O. George, et al.
Chemical genomics of cancer chemopreventive dithiolethiones
Carcinogenesis, March 1, 2009; 30(3): 480 - 486.
[Abstract] [Full Text] [PDF]


Home page
DNA ResHome page
K. Aiba, T. Nedorezov, Y. Piao, A. Nishiyama, R. Matoba, L. V. Sharova, A. A. Sharov, S. Yamanaka, H. Niwa, and M. S. H. Ko
Defining Developmental Potency and Cell Lineage Trajectories by Expression Profiling of Differentiating Mouse Embryonic Stem Cells
DNA Res, February 1, 2009; 16(1): 73 - 80.
[Abstract] [Full Text] [PDF]


Home page
ReproductionHome page
C. Vigneault, C. Gravel, M. Vallee, S. McGraw, and M.-A. Sirard
Unveiling the bovine embryo transcriptome during the maternal-to-embryonic transition
Reproduction, February 1, 2009; 137(2): 245 - 257.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. R. Demey, J. L. Vicente-Villardon, M. P. Galindo-Villardon, and A. Y. Zambrano
Identifying molecular markers associated with classification of genotypes by External Logistic Biplots
Bioinformatics, December 15, 2008; 24(24): 2832 - 2838.
[Abstract] [Full Text] [PDF]


Home page
Biol. Reprod.Home page
S. D. Fiedler, M. Z. Carletti, X. Hong, and L. K. Christenson
Hormonal Regulation of MicroRNA Expression in Periovulatory Mouse Mural Granulosa Cells
Biol Reprod, December 1, 2008; 79(6): 1030 - 1037.
[Abstract] [Full Text] [PDF]


Home page
J. Immunol.Home page
S. Zandi, R. Mansson, P. Tsapogas, J. Zetterblad, D. Bryder, and M. Sigvardsson
EBF1 Is Essential for B-Lineage Priming and Establishment of a Transcription Factor Network in Common Lymphoid Progenitors
J. Immunol., September 1, 2008; 181(5): 3364 - 3372.
[Abstract] [Full Text] [PDF]


Home page
ReproductionHome page
M. Vallee, K. Aiba, Y. Piao, M.-F. Palin, M. S H Ko, and M.-A. Sirard
Comparative analysis of oocyte transcript profiles reveals a high degree of conservation among species
Reproduction, April 1, 2008; 135(4): 439 - 448.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
L. Li, L. Ying, M. Naesens, W. Xiao, T. Sigdel, S. Hsieh, J. Martin, R. Chen, K. Liu, M. Mindrinos, et al.
Interference of globin genes with biomarker discovery for allograft rejection in peripheral blood samples
Physiol Genomics, January 17, 2008; 32(2): 190 - 197.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
M. A. Watson, L. R. Ylagan, K. M. Trinkaus, W. E. Gillanders, M. J. Naughton, K. N. Weilbaecher, T. P. Fleming, and R. L. Aft
Isolation and Molecular Profiling of Bone Marrow Micrometastases Identifies TWIST1 as a Marker of Early Tumor Relapse in Breast Cancer Patients
Clin. Cancer Res., September 1, 2007; 13(17): 5001 - 5009.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
M. Shimono, S. Sugano, A. Nakayama, C.-J. Jiang, K. Ono, S. Toki, and H. Takatsuji
Rice WRKY45 Plays a Crucial Role in Benzothiadiazole-Inducible Blast Resistance
PLANT CELL, June 1, 2007; 19(6): 2064 - 2076.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
S. Raman, X. Puyang, T.-Y. Cheng, D. C. Young, D. B. Moody, and R. N. Husson
Mycobacterium tuberculosis SigM Positively Regulates Esx Secreted Protein and Nonribosomal Peptide Synthetase Genes and Down Regulates Virulence-Associated Surface Lipid Synthesis
J. Bacteriol., December 15, 2006; 188(24): 8460 - 8468.
[Abstract] [Full Text] [PDF]


Home page
ReproductionHome page
N T Rogers, G Halet, Y Piao, J Carroll, M S H Ko, and K Swann
The absence of a Ca2+ signal during mouse egg activation can affect parthenogenetic preimplantation development, gene expression patterns, and blastocyst quality
Reproduction, July 1, 2006; 132(1): 45 - 57.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2548    most recent
bti343v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (29)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Sharov, A. A.
Right arrow Articles by Ko, M. S. H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sharov, A. A.
Right arrow Articles by Ko, M. S. H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?