Skip Navigation


Bioinformatics Advance Access originally published online on August 18, 2005
Bioinformatics 2005 21(20):3880-3886; doi:10.1093/bioinformatics/bti636
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow Supplementary Data
Right arrow All Versions of this Article:
21/20/3880    most recent
bti636v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Meintrup, D.
Right arrow Articles by Reisinger, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Meintrup, D.
Right arrow Articles by Reisinger, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oxfordjournals.org

A statistical model providing comprehensive predictions for the mRNA differential display

David Meintrup 1 and Ellen Reisinger 2,*

1Institut EIT 1, Universitaet der Bundeswehr Muenchen 85577 Neubiberg, Germany
2Department of Physiology II, Universitaet Freiburg 79104 Freiburg, Germany

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 ALGORITHM
 4 IMPLEMENTATION
 5 DISCUSSION
 SUPPLEMENTARY DATA
 REFERENCES
 

Motivation: Differential display (DD) or arbitrarily primed fingerprinting serves to identify differentially expressed genes, but these techniques cannot determine how many of the theoretically available genes have been uncovered. Previous mathematical models are unsatisfying as they are not suitable to analyze experimental data.

Results: In the present study, we provide a statistical model based on the redundancy of cDNA fragments amplified during DD experiments. This model is applicable to any DD and predicts (1) the total number of genes expressed in a sample cell type or tissue, (2) the number of differentially expressed genes, (3) the coverage obtained with any given number of primer combinations. In a DD experiment comparing two developmental stages of the post natal rat inner ear, we estimated the total number of differentially expressed genes accessible by DD to be 445, and the number of primer combinations required to uncover 90% of these to be 127.

Availability: The algorithms were implemented in MatlabTM (The Mathworks, Inc., Natick, MA) environment and are available at www.physiologie.uni-freiburg.de/download.html

Contact: ellen.reisinger{at}physiologie.uni-freiburg.de


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 ALGORITHM
 4 IMPLEMENTATION
 5 DISCUSSION
 SUPPLEMENTARY DATA
 REFERENCES
 
The mRNA differential display (DD) is a reliable and highly sensitive technique to compare gene expression in different cell types or between distinct developmental stages of a given cell type (Liang and Pardee, 1992; Liang et al., 1993; McClelland et al., 1995; Wan et al., 1996). cDNA fragments are amplified in a PCR using arbitrary primers combined with one- or two-base anchored oligo(dT)-primers each. These primers bind multiply in the transcriptome in a partly degenerated way (Liang and Pardee, 1992; Bauer et al., 1993). Mixtures of DNA fragments resulting from DD-PCRs are displayed after electrophoresis and autoradiography. Selected cDNA fragments are recovered from the gel and identified by cycle sequencing. As the intensity of a band in the display reflects the mRNA amount of the respective gene, the DD allows selective identification of genes that differ in their expression level.

To date, proven arbitrary primers as well as reliable protocols are available (see Systems and Methods section), and since genomic sequences of common laboratory animals are on hand, most sequences of the identified bands match with public database entries, making this technique easier to handle than some years ago. However, at present little is known about the coverage, i.e. the portion of differentially expressed genes detected by the DD.

Several efforts were undertaken to predict the number of arbitrary primers required to cover most expressed genes in a cell. First estimations by Liang and Pardee (1992) predicted 240 primer combinations required to statistically cover all mRNA sequences expressed in one type of cell. These predictions, however, suffered from inaccurate estimations on the total number of genes in higher organisms, and did not account for redundancy, i.e. the possibility that the same cDNA might be ‘hit’ by several primer combinations. Recently, a highly elaborated mathematical model (Yang and Liang, 2004) predicted the probability for a single cDNA to be amplified at least once after a certain number of DD-PCRs, therefore being independent of redundancy or the total number of genes. This model is based on predictions on the hybridization of primers and cDNA under certain experimental conditions and considers the degeneracy of primer binding in detail. However, it exhibits major disadvantages: first, no experimental data on the degenerate binding of the 5' end of arbitrary primers other than 13mers are provided, making the model barely applicable to a DD using longer or shorter arbitrary primers. Second, changes in experimental conditions will alter the binding properties of the primers and affect the basic assumptions of this model. Third, at least some of the parameters required for predictions in this model may not be verified experimentally. Fourth, as small changes in the parameters result in major deviations in the predicted coverage, this model gives inaccurate predictions. Moreover, none of the models proposed so far is applicable to so-called ‘RAP-PCR’ (‘RNA fingerprinting by arbitrarily primed PCR’), where two arbitrary primers are combined with each other (Welsh et al., 1992).

In the present study, we provide a statistical model for mRNA DDs as well as RAP-PCRs that was developed on experimental data of a combined DD/RAP experiment consisting of 28 arbitrarily primed PCRs. Different from previous models, the model presented here is based on experimental observations of redundantly amplified genes, provided that a finite number of expressed genes in any examined tissue will result in repeated amplification of the same gene after a sufficient number of DD cycles. It is shown that this approach provides valid predictions for mRNA DD or RAP experiments, e.g. the total number of expressed genes in the examined tissue, the number of differentially expressed genes, and the course of coverage and redundancy.


    2 SYSTEMS AND METHODS
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 ALGORITHM
 4 IMPLEMENTATION
 5 DISCUSSION
 SUPPLEMENTARY DATA
 REFERENCES
 
2.1 cDNA preparation for DD
mRNA samples were isolated from organs of Corti of newborn (P0) and of rats on post natal day 14 (P14) using DYNABeads (Dynal, Oslo, Norway) according to the manufacturer's instructions. cDNA was synthesized by reverse transcription of ~150 ng of P0 or P14 mRNA with Superscript II (Invitrogen, Karlsruhe, Germany) in supplemented buffer and 10 mM DTT, 1 mM dNTPs, 1 pmol oligo(dT)-primer and 0.3 µl RNasin (Promega, Madison, WI) for 2 h at 42°C. The integrity of the cDNA was confirmed by a PCR amplifying a fragment of GAPDH cDNA.

2.2 DD-PCR
Primer sequences were derived from Delta Differential Display Kit (BD Biosciences Clontech, Palo Alto, CA). Arbitrary ‘P-primers’ consisted of a consensus sequence of 16 nt (5'-ATTAACCCTCACTAAA-3') and a 9 nt arbitrary sequence at the 3' end:

5'-TGCTGGGGA-3' (P1), 5'-TCGGTCATAG-3' (P2), 5'-TGCTGGTAG-3' (P4), 5'-GATCTGACTG-3' (P5), 5'-TGCTGGGTG-3' (P6), 5'-TGCTGTATG-3' (P7), 5'-TGTGGCAGG-3' (P9) and 5'-GCACCGTCC-3' (P10).

The consensus sequence of the ‘T-primers’ consisted of 28 nt (5'-CATTATGCTGAGTGATATCTTTTTTTTT-3') followed by 5'-AG-3' (T3), 5'-CC-3' (T5) or 5'-GA-3' (T7) at the 3' end. These primers were combined for 28 DD-PCRs as indicated in Table 1. The differential display procedure was performed as recommended in the user manual (PT 1173-1) supplemented with the Delta Differential Display Kit (Clontech), that is based on a long-distance DD-PCR described in Diachenko et al. (1996). Briefly, DD-PCRs were carried out in duplicate with 5 pmol of each primer and cDNA corresponding to 2.4 ng mRNA (higher concentration) or 0.6 ng mRNA (lower concentration) in a total volume of 20 µl consisting of 1x Klen Taq PCR–Buffer (Clontech), 50 µM dNTPs each, 50 nM [{alpha}-33P]dATP (2500 Ci/mmol, Amersham, Freiburg, Germany) and 0.4 µl Advantage Klen Taq Polymerase Mix (Clontech). The temperature cycle protocol comprised 5 min 94°C, 5 min 40°C, 5 min 68°C followed by two cycles with 1 min 94°C, 2 min 40°C and 5 min 68°C and 25 cycles with 45 s at 94°C, 45 s 40°C and 2 min 68°C and a final elongation step at 68°C for 7 min. The resulting DNA fragments were separated electrophoretically on a denaturing acrylamide gel and visualized through autoradiography.


View this table:
[in this window]
[in a new window]
 
Table 1 Numbers of bands in the DD experiment

 
2.3 Analysis of the DD
The number of bands in each fingerprint was counted visually at least three times using the mean value as the result. To reduce the number of false positives, only bands occurring in both duplicates were considered (Diachenko et al., 1996), and bands were regarded as differential only if both bands in the duplicate fingerprint were stronger than both bands in the other. Owing to our special interest in upregulated genes, only DNA bands with higher intensity at the later stage of development are indicated as differential; bands with higher intensity at the earlier stage are included in the non-differential bands. Differential bands were excised from the gel, reamplified and sequenced by cycle sequencing. If possible, reamplified PCR products were sequenced directly, otherwise the cDNA fragments were cloned in pBluescript SK, prior to sequencing. In the latter case, to ensure that the major cDNA of this band had been identified, the sequences of at least two individual clones needed to be identical. The sequences of the DD bands were checked for redundancy by comparing these sequences against each other and furthermore, against sequences in public databases to identify the fragments and thus uncover non-overlapping but redundant sequences.

Redundancy occurring within the same primer combination, in most cases visible as double or triple bands, was eliminated prior to statistical analysis. This accorded to 73 of the identified 296 DNA bands. The number of unidentified non-differential bands was multiplied with the ratio of 223/296, assuming the same proportion of double or triple bands within the non-differential bands.

2.4 Computer simulation
The computer simulated experiments were performed by generating pseudo-random numbers as realizations of the random variables that will be introduced in the next section. Based on the 90 combinations of T-primers (two-base anchored primers with 5'-Linker-T9 VV-3', where V is A, G or C) with 10 different P-primers and 45 combinations of P-primers with each other possible in a commercially available DD kit, we exemplarily simulated a DD experiment consisting of n=135 primer combinations.


    3 ALGORITHM
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 ALGORITHM
 4 IMPLEMENTATION
 5 DISCUSSION
 SUPPLEMENTARY DATA
 REFERENCES
 
A DD experiment identified genes that were upregulated during the post natal development of the inner ear. Identity and significance of the differentially expressed genes for inner ear physiology will be published elsewhere (Reisinger et al., 2005), an extract of the autoradiography is available as Supplementary information at the Journal's web page. In this DD, a total of 1407 cDNA fragments was amplified, and 223 of these bands were found to be differential. Subsequent analysis by quantitative real-time PCR on independent RNA isolates confirmed that 93% of the identified genes were indeed differential, i.e. exhibiting at least 2-fold upregulation during inner ear maturation (data not shown). From these 223 differential bands 174 originated from different mRNA sequences, whereas 49 bands corresponded to redundantly amplified genes. Closer investigation of this redundancy showed no clear correlation between the frequency with which a gene was identified and the expression level quantified by real-time PCR (data not shown), as observed earlier (Wan et al., 1996). Furthermore, even though it has been described that fingerprints using the same two-base anchored oligo(dT) primer exhibit increased redundancy owing to mispriming effects (Liang et al., 1993), only two redundantly occurring genes were amplified with the same two-base anchored oligo(dT)-primer, and both did not originate from mispriming. Therefore, we consider the event of two primers matching a cDNA so that a 250–1500 bp fragment is amplified as a random event with equal probability.

The number of redundant bands increased with increasing numbers of DD-PCRs performed. From this observation we hypothesized that the number of genes identified during the experiment will saturate, leading to the idea of using a detailed analysis of the redundancy to predict the total number of differentially expressed cDNAs and furthermore, the total number of cDNAs in the compared tissues. To this end, we developed a statistical model for the mRNA DD providing reliable answers of the following questions in statistical mean:

  1. What is the total number of differentially expressed genes?
  2. What is the total number of expressed genes?
  3. How does the redundancy, i.e. the number of repeatedly identified genes increase during the series of DD-PCRs?
  4. What is the development of the coverage?
  5. How many primer combinations are required to achieve a certain coverage?

For the following considerations, the total of fingerprints originating from the same primer combination are termed as ‘subexperiment’. To describe the statistical model for the mRNA DD, the subsequent notation is introduced:

Generally, three types of bands may be distinguished in any of our DD experiments: redundant differential (br), non-redundant differential (bu) and non-differential (bs) ones. Every subexperiment in which a subset of cDNA fragments is amplified corresponds to a random choice of bands that will consist of these three possible types. Thus, with yi denoting the sample size of the i-th subexperiment and zi–1 denoting the total number of identified non-redundant differential bands during the first i–1 subexperiments, we can describe the i-th subexperiment by a probability space given by the sample space

and by the multinomial distribution, (br, bu, bs) {Omega}i:

The success probabilities q1, q2, q3 are given by the respective proportions of the three types of bands, i.e.

Hence, Pi(br, bu, bs) is the probability that the i-th subexperiment produces br redundant differential bands, bu non-redundant differential bands and bs non-differential bands.

In the DD, the sample size yi is unknown, too. Therefore, we also have to model the sample size in each subexperiment. To that end let the random variable

Yi denote the sample size of the ith subexperiment, i = 1, ..., n.
We assume that the random variables Y1, ..., Yn are independent and identically distributed. In total, our statistical model for the DD consists of
({Omega}i, Pi): probability space with multinomial distribution,
Y1, ..., Yn: independent and identically distributed.

With this model in hand we can derive other random variables of interest. We denote by

Xi the number of differential bands in the ith subexperiment,
Di the total number of differential bands after i subexperiments,
Zi the total number of non-redundant differential bands after i subexperiments,
Ui the number of new non-redundant differential bands in the i-th subexperiment,
Ri the redundancy (i.e. the number of redundant bands) in the i-th subexperiment,
Ci the coverage after i subexperiments.
The coverage Ci is the proportion of differentially expressed genes that had been identified after i subexperiments, i.e.

(1)
This statistical model allows to derive estimators for some parameters of interest (see the Appendix in Supplementary information for details). Denoting by the experimental sample sizes and by the estimator, we use

(2)
as best unbiased estimator for the sample size. For the parameter p we use the maximum-likelihood estimator, which after i subexperiments is given by

(3)
where is a sample of X1, ..., Xi, i.e. the experimentally observed number of differential bands. Similarly, we get an estimator for the total number of cDNAs by maximizing the function

(4)
so that

(5)
where fi(N)(k) is the binomial density function of Ui, i.e.

Here denotes the number of new non-redundant bands in the i-th subexperiment. Note that the numerical maximization of F(N) will not exhibit a maximum if no redundancy in the experimental data is provided. Using the estimators and one can derive estimators , , , and for the total number of differential cDNAs N1, the total number of non-redundant differential bands Zi, the number of new non-redundant differential bands Ui and the redundancy Ri in the i-th subexperiment and finally for the coverage Ci after i subexperiments (for details see the Appendix, Supplementary information).

(6)

(7)

(8)

(9)

(10)

(11)
These estimators provide answers for all questions set out above. This will be demonstrated for an experimental dataset in the following section.


    4 IMPLEMENTATION
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 ALGORITHM
 4 IMPLEMENTATION
 5 DISCUSSION
 SUPPLEMENTARY DATA
 REFERENCES
 
The numbers of bands in total, differential and non-redundant bands in each of the 28 DD-PCRs as well as the applied primers are listed in Table 1. Note that in the first 12 subexperiments an arbitrary P-primer was combined with a T-primer (see Systems and Methods), whereas in the following subexperiments two P-primers were combined (RAP-PCRs). Using the data in Table 1, estimators for all parameters may be calculated.

First, according to Equation (2) the mean sample size is given as

with a standard deviation of 9.05 (see Appendix in Supplementary information). The sample sizes differed slightly between the classical DD experiments (1–12) with and the RAP-PCRs (13–28) with . In the following considerations, the mean value of all 28 subexperiments was used. Next, the maximum-likelihood estimator for p is calculated after any subexperiment according to Equation (3) and given in the last column of Table 1. Figure 1A illustrates the convergence of the graph for p with an increasing number of subexperiments. The most accurate estimator for p is the last one,

with a 90% confidence interval (CI) of [0.142, 0.176] (see Appendix in Supplementary information). As shown in Figure 1A, the accuracy of the values for increases with an increasing number of subexperiments. More precisely, the length of the confidence interval for is proportional to with t being the summed sample size.



View larger version (11K):
[in this window]
[in a new window]
 
Fig. 1 Estimation of the parameters p and N. (A) The ratio of the sum of differential and non-differential bands is the maximum-likelihood estimator for p. Accuracy of increases with an increasing number of subexperiments. (B) The numerical maximization of the likelihood function for N after 28 subexperiments exhibits a maximum at 2800, corresponding to the total number of cDNAs (available under standard PCR conditions). (C) Training of the model with the first 15–27 subexperiments only leads to the plotted estimations for N.

 
According to Equation (4), to calculate the function F(N) has to be maximized, which is the product of the densities fi(N)(). Therefore, the data of Table 1 are implemented in fi(N)(k). For instance, for every we have

Subsequently, the maximization is carried out numerically. Figure 1B shows the graph of F(N). The numerical optimization leads to a maximum of f(N) for

Figure 1C shows stabilization of with an increasing number of subexperiments. Hence, approximately 2800 different cDNA transcripts may be amplified under standard PCR conditions from the rat organ of Corti. Using Equation (6) leads to an estimated number of upregulated genes of

Combining the estimators for , and and Equations (7) and (8), the expected number of differential bands as well as the number of non-redundant differential bands after i subexperiments can be computed, as illustrated in Figure 2A, and at an expanded scale (subexperiments 1–28) in Figure 2B, here complemented by the experimental data. To verify the predictions of the model, we performed computer simulated experiments by generating pseudo-random numbers for y and x whose distributions are given by the corresponding random variables Yi and Xi and determined by the calculated estimators and . The respective graphs (gray lines in Fig. 2) represent a computed DD experiment using the same estimators for , and as in the experimental DD. These simulated experiments correlate with the calculated course for zi, ci and di (Fig. 2A). The graphs of the simulated experiments and of the experimental data show similar features, both supporting the suitability of the statistical model (Fig. 2B).



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 2 Predictions on differential bands and non-redundant differential bands. (A) Number of differential bands (left axis) and the coverage (right axis). The expected number of non-redundant bands (black solid line) is plotted together with a computer simulation of non-redundant bands (gray line). Furthermore, the expected number of all (non-redundant + redundant) differential bands (black dashed line) and a simulation for the number of all differential bands (gray dashed line) are shown. (B) Magnification of (A) for the first 28 subexperiments, complemented by experimental data (dotted line). The graphs of the experimental data and of the simulated experiments show similar features, supporting the suitability of the statistical model. Typically these graphs lie completely under or over the expected mean as we look at the sum of non-redundant differential bands.

 
The model predictions were further evaluated by training the model with the first 14–27 subexperiments and predicting the number of differential and non-redundant differential bands after 28 experiments (Fig. 3, gray dots). Although the number of training experiments is rather small, the predictions for d28 and z28 are in fair agreement with the experimentally obtained values. Another validation method that shows an even better match is obtained by performing a sort of resampling, also shown in Figure 3. For this purpose, 20 sets of subexperiments were chosen randomly without replacement and the resulting predictions for d28 and z28 were averaged. For example, the averaged predictions of 20 sets comprised 20 subexperiments each leading to a total number of expected differential genes to 226, compared with 223 experimentally obtained differential bands. Similarly, the averaged predictions for z28 using the data of 20 subexperiments resulted in 173 non-redundant differential bands to be expected after 28 subexperiments, what is in agreement with the experimentally observed 174 bands. The predictions of the resampling procedure improve with a growing number of considered subexperiments, as indicated by decreasing error intervals shown in Figure 3.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 3 Comparison of model predictions and experimental results. The model was trained with the first 14–27 subexperiments and predicted the number of differential bands d28 (upper panel) and non-redundant bands z28 (lower panel) after 28 subexperiments, gray dots. For the resampling (black), a randomly chosen subset of subexperiments is used for prediction and the results are averaged, shown with error intervals (see Appendix in Supplementary information). The predictions improve with an increasing number of considered subexperiments, as indicated by the error intervals. Predictions are in agreement with the experimentally observed number of differential bands (223) and non-redundant bands (174).

 
Implementation of the model requires redundancy to occur. Equation (10) estimates the number of redundant bands that may be expected in any subexperiment. Setting leads to

allowing to estimate from which experiment on at least one redundant band may be expected. For our experimental conditions this will be from the ninth experiment onwards.

The statistical model allows further predictions on the coverage of DD experiments: according to Equation (11), the coverage after any number of subexperiments can be estimated. Theoretically, 28 subexperiments lead to a coverage of

From Equation (11) follows

(12)
This equation allows the estimation of the required number of subexperiments to gain a certain coverage c. In our DD experiment with a mean sample size of 50.25, i(0.9)=127 subexperiments are predicted to cover 90% of the differentially expressed genes. If one assumes that the sample size for the same experimental conditions in any DD experiment is proportional to the total number of cDNAs, the number of subexperiments to reach a certain coverage would be independent of the number of expressed genes.

Finally, the subexperiment where the number of redundant bands equals the non-redundant ones can be estimated by setting , resulting in

In our series of subexperiments, the number of redundant bands is predicted to equal the number of non-redundant bands in the 39th subexperiment.


    5 DISCUSSION
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 ALGORITHM
 4 IMPLEMENTATION
 5 DISCUSSION
 SUPPLEMENTARY DATA
 REFERENCES
 
Since the original publication of the mRNA DD by Liang and Pardee (1992), the method has systematically been improved (Liang et al., 1993, 1994; Mou et al., 1994) and is now available as a commercial set of primer sequences and a reliable description of the experimental procedure. Starting from such a DD experiment on the developing inner ear, we analyzed the frequency with which individual sequences were amplified within the first 28 subexperiments. The observation that redundancy increased with an increasing number of subexperiments was used for the establishment of a statistical model that allows predictions for further DD-PCRs. The resulting model enables one to estimate the total number of genes expressed in the examined tissue, the number of differentially expressed genes and the coverage in dependence of the number of applied primer combinations. The model is strongly supported by our experimental data, the validation of the predictions by the resampling procedure and the results of computer simulated experiments. These validations also show a reasonable robustness of the predictions toward small changes in the dataset.

Previous mathematical models relied on theoretical predictions of the hybridization of arbitrary P-primers and T-primers with cDNA molecules. The first estimations on how many primer combinations are required to cover the cellular mRNAs (Liang and Pardee, 1992; Bauer et al., 1993) based on the assumption that higher organisms express ~100 000 genes and individual cells make use of 15% of these. This assumption might easily be corrected using the most recent approximations for the number of expressed genes from the International Human Genome Sequencing Consortium (IHGSC, 2004). However, their predictions on the DD coverage underestimate the required primer combinations as they do not account for the possibility that more than one cDNA fragment of the same gene might be amplified. A more detailed study of the statistics of primer binding of Yang and Liang (2004) predicted 160 arbitrary primers combined with 3 one-base anchored T-primers will amplify a cDNA of 600 bp length with a probability of 93%. The resulting 480 primer combinations for 93% coverage are obviously higher than the 167 primer combinations leading to 93% coverage estimated from Equation (12) in our model (using ). This difference might be explained by the fact that hybridization-based models rely on rough estimations of parameter values that are at least in part not accessible to experimental tests. Moreover, the predictions vary over a broad range if small changes in parameter values as length of the cDNAs or number of base matches between arbitrary primers and cDNA are assumed. Since all parameters in our model are calculated from experimental data, this model is supposed to give more accurate and reliable predictions than other mathematical models describing DD experiments. Furthermore, our statistical model is the first suitable model for experiments using combinations of internally binding arbitrary primers (RAP-PCRs). Another possible application are DD experiments comparing more than two samples, as these can be regarded as a set of one-by-one comparisons (e.g. bands that arise from only one sample but not from all others), and thus each set can be analyzed separately. Moreover, the model predictions are not affected by the use of total RNA instead of mRNA, since most mRNA preparations are not completely free of rRNA. Taken together, the major benefit of the mathematical model proposed here relies on the parameters that are estimated from experimental data, making this model widely applicable. When adopted to a new dataset, the reliability of the model predictions can be verified by the provided validation methods (as in Figs 1A, C and 3).

The usefullness of this model is demonstrated by implementation on a DD comparing two developmental stages of the organ of Corti, a rather small and homogeneous tissue of neuroepithelial origin. The considerable proportion of 15% of upregulated genes reflects intense morphological and physiological maturation processes that take place during inner ear development (Pujol et al., 1997). The calculated 2800 genes predict ~11–14% of the protein encoding genes of higher organisms to be expressed in the mammalian organ of Corti tissue (IHGSC, 2004). For two reasons, however, we expect that the total number of expressed genes in the organ of Corti is somewhat higher: first, a few cDNA fragments were amplified with only one (mostly arbitrary) primer binding in sense and antisense direction. Hence, the use of the same primer in different combinations can lead to a redundant amplification that is not truly independent, resulting in a small overestimation of redundancy. Consequently, the most accurate application for the model proposed here would be an experiment using every primer only once. Second, it should be kept in mind that not every DNA sequence is successfully amplified in a standard PCR, and an efficient amplification of sequences with high GC content requires the supplementation of DMSO or other hydrogen bond weakeners (Arezi et al., 2003). As our DD was performed under standard PCR conditions, we conclude that a series of cDNA sequences, difficult to amplify, is not accounted for the here proposed estimation on the total number of expressed genes in the inner ear and therefore, for the number of differentially expressed genes. However, according to the fact that such sequences will not be amplified with an increasing number of primer combinations, the coverage calculated from our statistical model respects only genes available under standard PCR conditions.

The statistical model proposed in this study is applicable to any DD experiment after the analysis of a sample of subexperiments as soon as substantial redundancy is observed. Hence for any examined tissue, a DD experiment will be suitable to determine the number of cDNA molecules that are amplified under standard PCR conditions and thus the number of expressed genes might be estimated. Although, to get reliable results it is necessary to avoid redundancy originating from mispriming of the one- or two-base anchored oligo(dT) primers described in Liang et al. (1993). This mispriming can be overcome if the reverse transcription reaction (RT) is performed with non-anchored oligo(dT) primers, and anchored primers are used during the PCR only (Rothschild et al., 1997; Frost and Guggenheim, 1999). The increased priming specificity might result from the higher annealing temperature for PCR than during RT.


    SUPPLEMENTARY DATA
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 ALGORITHM
 4 IMPLEMENTATION
 5 DISCUSSION
 SUPPLEMENTARY DATA
 REFERENCES
 
Autoradiography of the DD and Mathematical Appendix are available on Bioinformatics online.


    Acknowledgments
 
We thank Dr Bernd Fakler for helpful the discussions and for critically reading the manuscript. This work was supported by DFG-Graduiertenkolleg 843.

Conflict of Interest: none declared.

Received on June 30, 2005; revised on August 9, 2005; accepted on August 16, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 ALGORITHM
 4 IMPLEMENTATION
 5 DISCUSSION
 SUPPLEMENTARY DATA
 REFERENCES
 

    Arezi, B., et al. (2003) Amplification efficiency of thermostable DNA polymerases. Analyt. Biochem., 321, 226–235.

    Bauer, D., et al. (1993) Identification of differentially expressed mRNA species by an improved display technique (DDRT–PCR). Nucleic Acids Res., 21, 4272–4280[Abstract/Free Full Text].

    Diachenko, L.B., et al. (1996) Combining the technique of RNA fingerprinting and differential display to obtain differentially expressed mRNA. Biochem. Biophys. Res. Commun., 219, 824–828[CrossRef][ISI][Medline].

    Frost, M.R. and Guggenheim, J.A. (1999) Mammalian polyadenylation sites: implications for differential display. Nucleic Acids Res., 27, 1386–1391[Abstract/Free Full Text].

    International Human Genome Sequencing Consortium (IHGSC). (2004) Finishing the euchromatic sequence of the human genome. Nature, 431, 931–945[CrossRef][Medline].

    Liang, P., et al. (1993) Distribution and cloning of eukaryotic mRNAs by means of differential display: refinements and optimization. Nucleic Acids Res., 21, 3296–3275.

    Liang, P. and Pardee, A.B. (1992) Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. Science, 257, 967–971[Abstract/Free Full Text].

    Liang, P., et al. (1994) Differential display using one-base anchored oligo-dT primers. Nucleic Acids Res., 22, 5763–5764[Free Full Text].

    McClelland, M., et al. (1995) RNA fingerprinting and differential display using arbitrarily primed PCR. Trends Genet., 11, 242–246[CrossRef][ISI][Medline].

    Mou, L., et al. (1994) Improvements to the differential display method for gene analysis. Biochem. Biophys. Res. Commun., 199, 564–569[CrossRef][ISI][Medline].

    Reisinger, E., et al. (2005) Gene expression associated with the onset of hearing detected by mRNA differential display in mammalian organ of Corti. in press.

    Rothschild, C.B., et al. (1997) DD/AP-PCR: combination of differential display and arbitrarily primed PCR of oligo(dT) cDNA. Analyt. Biochem., 245, 48–54.

    Pujol, R., Lavigne-Rebillard, M., Lenoir, M. In Rubel, E.W., Popper, A.W., Fay, R.R. (Eds.). Development of the Auditory System, (1997) , NY Springer, pp. 146–192.

    Wan, J.S., et al. (1996) Cloning differenitally expressed mRNAs. Nat. Biotechnol., 14, 1685–1691[CrossRef][ISI][Medline].

    Welsh, J., et al. (1992) Arbitrarily primed PCR fingerprinting of RNA. Nucleic Acids Res., 20, 4965–4970[Abstract/Free Full Text].

    Yang, S. and Liang, P. (2004) Global analysis of gene expression by differential display: a mathematical model. Mol. Biotechnol., 27, 197–208[CrossRef][ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow Supplementary Data
Right arrow All Versions of this Article:
21/20/3880    most recent
bti636v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Meintrup, D.
Right arrow Articles by Reisinger, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Meintrup, D.
Right arrow Articles by Reisinger, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?