Skip Navigation


Bioinformatics Advance Access originally published online on May 1, 2008
Bioinformatics 2008 24(15):1735-1736; doi:10.1093/bioinformatics/btn211
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/15/1735    most recent
btn211v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Murie, C.
Right arrow Articles by Nadon, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Murie, C.
Right arrow Articles by Nadon, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

A correction for estimating error when using the Local Pooled Error Statistical Test

Carl Murie 1 and Robert Nadon 1,2,*

1McGill University and Genome Quebec Innovation Centre, 740 avenue du Docteur Penfield, Montreal, Quebec, Canada, H3A 1A4 and 2McGill University, Department of Human Genetics, Montreal, 1205 avenue du Docteur Penfield N5/13, Quebec, Canada, H3A 1A4

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 LOCAL POOLED ERROR
 3 MODIFICATION OF THE...
 REFERENCES
 

Jain et al. introduced the Local Pooled Error (LPE) statistical test designed for use with small sample size microarray gene-expression data. Based on an asymptotic proof, the test multiplicatively adjusts the standard error for a test of differences between two classes of observations by {pi}/2 due to the use of medians rather than means as measures of central tendency. The adjustment is upwardly biased at small sample sizes, however, producing fewer than expected small P-values with a consequent loss of statistical power. We present an empirical correction to the adjustment factor which removes the bias and produces theoretically expected P-values when distributional assumptions are met. Our adjusted LPE measure should prove useful to ongoing methodological studies designed to improve the LPE's; performance for microarray and proteomics applications and for future work for other high-throughput biotechnologies.

Availability: The software is implemented in the R language and can be downloaded from the Bioconductor project website (http://www.bioconductor.org).

Contact: robert.nadon{at}mcgill.ca


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 LOCAL POOLED ERROR
 3 MODIFICATION OF THE...
 REFERENCES
 
The Local Pooled Error (LPE) statistical test (Jain et al., 2003) is one of a number of small n tests developed for microarray gene-expression data which borrow strength from other genes in estimating error variance associated with differential expression between two classes of observations. The test has the added advantage of being resistant to outliers by virtue of estimating expression differences between medians rather than means.

One problematic assumption of the test, however, is that error variance varies solely (or at least primarily) as a function of signal intensity. This assumption can be criticized on empirical evidence that error variability for microarray data varies across genes. In this circumstance, the LPE test will underestimate P-values for some genes and overestimate it for others. Modifications of the LPE test have been proposed for microarrays and mass spectrometry proteomics to address this limitation.

Cho and Lee (2004), for example, propose an Empirical Bayes approach for microarray data in which LPE error estimates are used as priors to be updated by gene-specific information. Park et al. (2007) take probe error heterogeneity into account in a method that combines LPE and weighted ANOVA error estimates for tiling expression arrays. For mass spectrometry proteomics applications, Cho et al.'s (2007) Lw statistic estimates error variance as a weighted function of the LPE algorithm and individual protein error estimates.

The effectiveness of these and other modifications to the LPE, e.g. Allet et al. (2004), depends crucially on the test's; assumption that the multiplicative adjustment to error variance estimates of differential expression derived from medians rather than means is independent of sample size. We demonstrate that the adjustment is overly conservative for small sample sizes, however, and propose an empirical correction.


    2 LOCAL POOLED ERROR
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 LOCAL POOLED ERROR
 3 MODIFICATION OF THE...
 REFERENCES
 
The LPE method pools variance estimates of genes with similar intensities in order to gain an improved error estimate and increased degrees of freedom. A calibration curve of variance versus mean intensity is generated for each group and the gene-specific median intensity is used to obtain the gene's; variance estimate from the calibration curve. A multiplicative adjustment of {pi}/2 is applied to the variance estimate obtained from the calibration curve for the purpose of statistical testing.

The LPE z-statistic is as follows:


Formula 1

(1)
where


Formula 2

(2)

Formula are the variances derived from the calibration curve using the median of the gene intensities for a particular group. ni is the number of replicates for a particular group. The associated probability of the z-statistic under the null hypothesis is calculated by reference to the standard normal distribution.

A proof by Mood et al. (1974) shows that with normal data the ratio of the squared standard error of the median relative to that of the mean is asymptotically {pi}/2. Figure 1 shows that the ratio converges to {pi}/2 when the sample size is large, around 100, but is less than {pi}/2 when the sample sizes are small, from 3 to 10. The ratio of variances at small sample sizes also oscillates lower to higher depending on whether the sample size is even or odd. This fluctuation is due to the difference in obtaining the median with even and odd sample sizes. The middle value of the ordered distribution is used as the median with odd sample sizes while the mean of the two middle values of the ordered distribution is used with even sample sizes. There is higher variability when taking the middle value of a distribution (with odd number of samples) than taking the average of the two middle values (with even number of samples) (Stuart and Ord, 1994).


Figure 1
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. The ratio of the variance of sampling medians over sampling means across a range of sample sizes. The sampling was repeated 1000 times for each sample size (ranging from 3 to 1000).

 

    3 MODIFICATION OF THE LPE METHOD
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 LOCAL POOLED ERROR
 3 MODIFICATION OF THE...
 REFERENCES
 
The use of an empirically estimated variance ratio adjustment, ci, based on sample size can correct the bias caused by the {pi}/2 adjustment. The {pi}/2 term in Equation (1) is replaced by the empirically generated ratio of the variance of sampling a median over the variance of sampling a mean. Equation (1) then becomes:


Formula 3

(3)

The parameters, c1 and c2, are the ratio of variances of sampling the median and mean based on the number of replicates for each group (Fig. 1).

Figure 2 shows that the LPE test has a lower than expected false positive rate (FPR) which fluctuates between even and odd sample sizes (average FPR with odd and even samples sizes is 0.030 and 0.022, respectively) in a similar manner as the ratio of variances in Figure 1. The LPE method also shows a non-uniform P-value distribution with fewer than expected small P-values. The {pi}/2 adjustment increases the variance by an overly large proportion and causes the LPE test statistics to be smaller than they should be and skews the P-value distribution leftward. In contrast, the adjusted LPE test produced theoretically expected values.


Figure 2
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. (a) FPR for the LPE and adjusted LPE methods using simulated data with no differentially expressed genes evaluated at P ≤ .05 threshold. The LPE showed variable and low FPR. In contrast, the adjusted LPE showed appropriate FPR for all sample sizes. (b) The adjusted LPE, but not the LPE, shows the theoretically expected uniform P-value distribution. Each dataset had 10 000 genes with each gene's; replicate intensity drawn from a N(µ,0.1) distribution. µ was drawn from a N(7,1) distribution.

 
Figure 3 summarizes the results of the LPE and adjusted LPE methods applied to the HGU95 Affymetrix spike-in dataset (www.affymetrix.com). The HGU95 data is based on a 14 x 14 Latin Square design of ‘spiked-in’ transcripts (14 concentrations per microarray chip x 14 groups) with three replicates for each group. The concentrations for the ‘spiked-in’ transcripts were doubled for each consecutive group (0 and 0.25 to 1024 pM inclusive). To assess the performance of the statistical tests we used the FPR, the true positive rate (TPR, which is the proportion of transcripts correctly identified as being differentially expressed), and the partial area under the curve [pAUC, which measures the area under a receiver operator characteristic (ROC) curve below a false positive cutoff of 0.05]. The pAUC has a value between 0 (worst performance) and 1 (perfect performance).


Figure 3
View larger version (26K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. P-value histograms and boxplots of FPR, TPR and pAUC from the LPE and adjusted LPE methods applied to the HGU95 latin square dataset. The data were normalized using six different normalization methods (labeled by row).

 
The adjusted LPE method has higher TPRs and pAUCs across all normalization methods. The adjusted LPE method also has a more uniform P-value distribution and a FPR closer to the expected value of 0.05, using a P-value cutoff of 0.05, than the original LPE method. We have applied the LPE and adjusted LPE methods to other simulated and experimental datasets (data not shown) and the adjusted LPE method consistently exhibits higher power and generates more uniform P-value distributions.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Joaquin Dopazo

Received on November 17, 2007; revised on March 28, 2008; accepted on April 25, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 LOCAL POOLED ERROR
 3 MODIFICATION OF THE...
 REFERENCES
 

    Allet N, et al. In vitro and in silico processes to identify differentially expressed proteins. Proteomics (2004) 4:2333–2351.[CrossRef][Web of Science][Medline]

    Cho H, Lee JK. Bayesian hierarchical error model for analysis of gene expression data. Bioinformatics (2004) 20:2016–2025.[Abstract/Free Full Text]

    Cho H, et al. Statistical identification of differentially labeled peptides from liquid chromatography tandem mass spectrometry. Proteomics (2007) 7:3681–3692.[CrossRef][Web of Science][Medline]

    Jain N, et al. Local pooled error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics (2003) 19:1945–1951.[Abstract/Free Full Text]

    Mood AM, et al. Introduction to the theory of statistics (1974) New York: 3rd edn. Mcgraw-Hill.

    Park T, et al. Error-pooling-based statistical methods for identifying novel temporal replication profiles of human chromosomes observed by DNA tiling arrays. Nucleic Acids Res. (2007) 35:e69.[Abstract/Free Full Text]

    Stuart A, Ord K. Kendall's Advanced Theory of Statistics, Vol. 1, Distribution Theory (1994) London: 6th edn. Edward Arnold.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/15/1735    most recent
btn211v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Murie, C.
Right arrow Articles by Nadon, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Murie, C.
Right arrow Articles by Nadon, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?