Bioinformatics Advance Access originally published online on August 25, 2006
Bioinformatics 2006 22(21):2699-2701; doi:10.1093/bioinformatics/btl459
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
An application for assessing quality of RNA hybridized to Affymetrix GeneChips
1 Department of Biostatistics, Virginia Commonwealth University Richmond, VA 23298-0032, USA
2 Technische Universität Chemnitz, 09107 Chemnitz Germany
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: This paper describes a stand-alone application for estimating the 3' to 5' ratio by fitting a mixed effects model to the interior pixel intensities of perfect match probes for selected control probe sets from an Affymetrix *.DAT file. The effectiveness of this method was demonstrated previously by an application of the method to two microarray datasets for which external verification of RNA quality was known. This application provides a more objective assessment of sample quality in that both a point estimate and 95% confidence interval about the 3' to 5' ratio are provided.
Availability: The software and installation instructions are freely available for download at http://www.people.vcu.edu/~kjarcher/Research/Data.htm
Contact: kjarcher{at}vcu.edu
| INTRODUCTION |
|---|
|
|
|---|
Affymetrix eukaryotic GeneChips include probe sets that interrogate both the 3' and 5' ends of selected transcripts. Once a sample has been hybridized to an Affymetrix GeneChip, probe set expression summaries are obtained and the ratios of the 3' to 5' signal intensities are calculated for each internal control gene, as a quality assessment demonstrating the degree to which the gene was transcribed (Affymetrix, 2001). Many have recommended that the 3' to 5' ratio of internal control genes be less than some predetermined threshold, such as 3. A large 3' to 5' ratio is considered to be indicative of a problem during RNA extraction, cDNA synthesis reaction and/or IVT/Biotin labeling reaction steps.
Recently, the use of mixed effects models were proposed for estimating the 3' to 5' ratio, which permits estimation of a confidence interval about the point estimate of interest (Archer et al., 2006). This model uses the interior pixel level data from the resulting *.DAT file and takes into account the correlation structure of the probes within probe sets, and probe sets interrogating the same transcript. As described previously, since pixel level intensities share the same level of a classification factor, the mixed effects models is
|
| (1) |
- µ represents the overall mean;
jrepresents the fixed effect associated with the jth end of the transcript;
represents the effect associated with the lth probe set nested within the kth gene;
represents the random effect of the mth PM probe nested within the lth probe set nested within the kth gene;
- xklm represents the log2 transformed percent GC content for the mth probe;
represents the regression coefficient associated with the log2 transformed percent GC content; and
represents the error for the ith pixel and are assumed to be
, and
and
are correlated.
and
, are treated as random effects and assumed to be normally distributed and independent of each other and of
. The variancecovariance structure is such that the k genes are independent, but covariance of probes and probe sets within the same gene is allowed. The group covariance structure used was compound symmetry. The effectiveness of this method was demonstrated previously by application of the method to two microarray datasets for which external verification of RNA quality was known (Archer et al., 2006). The following control probe sets on the Affymetrix HG-U133A and HG-Focus GeneChips that interrogate GAPDH, ß-actin, and ISGF were used: AFFX-HUMGAPDH/M33197_3_at, AFFX-HUMGAPDH/M33197_5_at, AFFX-HSAC07/X00351_3_at, AFFX-HSAC07/X00351_5_at, AFFX-HUMISGF3A/M97935_3_at, and AFFX-HUMISGF3A/M97935_5_at. Under conditions of good quality RNA, the confidence interval about the 3' to 5' ratio should include 1; under conditions of poor quality RNA, the confidence interval will not include 1.
This paper describes a stand-alone application, which extracts the interior pixel level data from an Affymetrix *.DAT file and fits the mixed effects model described in Equation 1. This application provides a more objective assessment of sample quality in that both a point estimate and 95% confidence interval about the 3' to 5' ratio are provided.
| IMPLEMENTATION |
|---|
|
|
|---|
The application is written in Microsoft Visual C++ and additionally accesses the R programming environment (R Development Core Team, 2005) using R(D)Com Server (Baier and Neuwirth, http://cran.r-project.org/contrib/extra/dcom/). The user is prompted to select the directory in which the DAT file is stored, identify the appropriate DAT file, and subsequently select the control probe sets on the given GeneChip to be used in the analysis. The user can then select between the options Export or Export and Analyze, as we have provided the user with the option of exporting the pixel level data and performing the analysis, or exporting the pixel level data to a text file for subsequent analysis in an alternative statistical software program, such as SAS. In the next sections, details for using the various options are described. The application currently permits estimation for all human and mouse GeneChips; instructions for adding additional GeneChips are available at the software download web address.
Export interior pixel intensities from DAT file to TXT file
To create a text file containing the interior pixel level intensities from the user-selected control probe sets, the instructions are as follows:
- Click Browse to specify the directory containing the DAT file of interest.
- Select the DAT file from the drop down menu labeled DAT File.
- Choose the control probe sets from the given GeneChip by selecting then transferring them individually by clicking on the Select >> button. To delete a selected probe set, click Clear next to the corresponding probe set. Note: once the 3' probe set is selected, its corresponding 5' probe set will be automatically entered.
- Click Export to create the TXT file with the extracted interior pixel level intensities to the same directory containing the DAT file. In the event that DAT files are changed, the resulting TXT file additionally includes the creation date and time to aid in proper file identification.
Analyze pixel data from existing TXT file
Once a TXT file has been extracted, the user can always fit the mixed effects model using this application by uploading the TXT file and selecting the Analyze button. The specific instructions are as follows:
- Click Browse... to specify the folder that contains the TXT file.
- Type the file name in the box next to the Analyze button.
- Click Analyze. The results will be displayed in the corresponding boxes.
Export and analyze pixel data from DAT file
Most often, users will want to use the application to both export the pixel level data and perform the analysis. The process one should follow is the same as that for Export interior pixel intensities from DAT file to TXT file with the exception that in step 4, the user should select the Export and Analyze button. This causes the program to first extract the interior pixel level intensities from each perfect match (PM) probe belonging to the selected control probe sets. Thereafter, the lme function in the R nlme library (Pinheiro and Bates, 2006) is called which fits the mixed effects model described. The resulting output is the point estimate and 95% confidence interval for the 3' to 5' ratio. As recommended in the paper, any interval not overlapping 1 may be indicative of a poor sample quality.
Two changes are noted between the initial study and the software application described in this paper. First, the initial study fit the mixed effects models using Proc Mixed in The SAS System while the current implementation uses the lme function in the R programming environment. The two software implementations estimate the degrees of freedom and variance somewhat differently. For a linear combination L of fixed effects estimates
, the SAS Proc Mixed procedure computes the degrees of freedom corresponding to the degrees of freedom of the contrast variance
, and various options, such as DDFM = SATTERH can be used to approximate the degrees of freedom (Fai and Cornelius, 1996). The R lme function calculates the degrees of freedom as K 1 where K is the number of genes included in the model. Therefore, the confidence coefficient used by R lme is larger than that used by SAS Proc Mixed.
Second, in the initial study probes that were not sequence verified (Mecham et al., 2004) were eliminated from analysis. To more widely disseminate this application, we have omitted the sequence verification step. Due to these two changes, the point estimates differ somewhat, although the conclusions based on the estimated confidence intervals are essentially the same for each GeneChip, as reported in the initial study (see Table 1 for comparisons).
|
| SUMMARY |
|---|
|
|
|---|
Due to the labile nature of RNA, verification of sample quality is an important aspect of the microarray experimental process (Auer et al., 2003; Dumur et al., 2004), particularly since conclusions drawn from statistical analyses of microarray data from poor quality samples could yield misleading findings. The application in this paper provides an inferential method for assessing the quality of a hybridized sample.
| Acknowledgments |
|---|
This research was supported in part by National Institute of Diabetes & Digestive & Kidney Diseases DK069859 and IRG-73-001-28 from the American Cancer Society.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Trey Ideker
Received on July 10, 2006; revised on August 18, 2006; accepted on August 22, 2006
| REFERENCES |
|---|
|
|
|---|
(2001) Affymetrix. Microarray Suite version 5.0, , CA Santa Clara.
Archer, K.J., et al. (2006) Assessing quality of hybridized RNA in Affymetrix GeneChip experiments using mixed-effects models. Biostatistics, 7, 198212
Auer, H., et al. (2003) Chipping away at the chip bias: RNA degradation in microarray analysis. Nature Genet, . 354, 292293.
Dumur, C.I., et al. (2004) Evaluation of quality control criteria in microarray gene expression analysis. Clin. Chem, . 50, 19942002
Fai, A.H.T. and Cornelius, P.L. (1996) Approximate F-tests of multiple degree of freedom hypotheses in generalized least squares analyses of unbalanced split-plot experiments. J. Stat. Comput. Simulation, 54, 363378.
Mecham, B.H., et al. (2004) Increased measurement accuracy for sequence-verified microarray probes. Physiol. Genomics, 18, 308315
Pinheiro, J.C. and Bates, D.M. Mixed Effects Models in S and S-Plus 2002, (2006) , NY Springer.
R Development Core Team. R: A language and environment for statistical computing, . (2005) , Vienna, Austria ISBN 3-900051-07-0 R Foundation for Statistical Computing.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||