Skip Navigation


Bioinformatics Advance Access originally published online on October 27, 2004
Bioinformatics 2005 21(5):687-688; doi:10.1093/bioinformatics/bti078
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/5/687    most recent
bti078v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Nielsen, H. B.ør.
Right arrow Articles by Knudsen, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Nielsen, H. B.ør.
Right arrow Articles by Knudsen, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Implementation of a gene expression index calculation method based on the PDNN model

Henrik Bjørn Nielsen *, Laurent Gautier and Steen Knudsen

Center for Biological Sequence Analysis, BioCentrum-DTU Technical University of Denmark Building 208, DK-2800 Lyngby, Denmark

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 IMPLEMENTATION
 REFERENCES
 

Summary: Gene expression index calculations from Affymetrix GeneChips have been dominated by the Affymetrix MAS, dChip, and RMA methods. A new method to estimate the gene expression value utilizing the probe sequence information named position-dependent nearest-neighbor (PDNN) has been suggested by Zhang et al. (2003). Here we describe an open source implementation of the PDNN method for the statistical language R.

Availability: The package can be downloaded from http://www.bioconductor.org/repository/devel/package/html/affypdnn.html

Contact: hbjorn{at}cbs.dtu.dk


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 IMPLEMENTATION
 REFERENCES
 
Short oligonucleotide microarrays, such as the GeneChip from Affymetrix, use multiple probes per targeted transcript. This type of microarray has shown that the probe signals are not always consistent between different probes. This inconsistency is only marginally due to noise in the measurements. The main differences in the signal are due to the differences in the probes' properties.

In order to obtain a single expression index value representing a gene expression, several data processing methods have been developed. The most widely used are the MAS v.5, the dChip, and the RMA [Affymetrix, Inc., Santa Clara, CA, USA; Li and Wong, 2001; Irizarry et al., 2003]. Zhang et al. (2003) developed a position-dependent nearest-neighbor (PDNN) model over the probe signals that enables estimation of a gene expression index. The variation in expression index values between experiments was shown to be superior to both the MAS v.5 and the dChip methods (Zhang et al., 2003). However, our studies show that the results are comparable to the dChip method, when the latter uses PM probes only in the expression index calculation (Fig. 1). On the other hand, the PDNN method is justified not only by its performance, but also by its applicability. The method requires only one chip to estimate gene expression index values, in contrast to the dChip method that requires a series of chips to perform well. In theory the method should also be able to calculate the correct gene expression value in cases where the probes are very similar in properties or partially overlapping. Such probes may be problematic for the dChip method, because dChip assumes independent probe measurements within a probe set. In contrast the PDNN method does the corrections based on the total set of probes on the array.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 1 The standard deviation versus the averages of the log transformed expression levels, shown for both the PDNN and the dChip method where the latter uses PM only (Li and Wong, 2001; Gautier et al., 2004). Each plot represents 12,474 genes through 14 experiments (data is from the ‘1532 series’ in the Human Latin square dataset, available at http://www.affymetrix.com/support/technical/sample_data/datasets.affx).

 
It should be noted that the PDNN model assumes that the majority of probes are designed specifically for their target, and that they will only capture specific signals from the target. Here we describe the implementation of the PDNN method, as a package named affyPDNN for the statistical language R.


    IMPLEMENTATION
 TOP
 Abstract
 INTRODUCTION
 IMPLEMENTATION
 REFERENCES
 
The PDNN model minimizes

where Î ij and I ij are the estimated and the observed probe intensity of the ith probe in a probe set targeted for gene j, respectively, and M is the number of probes. Î ij is estimated using the following equation:

where B is the global background, N * is the amount of RNA molecules that contribute to non-specific binding (NSB), and

If we consider a probe as a string of bases {b 1, b 2, ..., b 25}, and {omega} k as position-dependent weights, the target-specific free energy E and the average free energy for NSB, E *, can be calculated as a weighted sum of stacking energies ({varepsilon}):


The minimization of F is done by permuting the B and N * values and recalculating Î. We investigated the F landscape for a series of different chip types and found it to be smooth and with only one minimum, i.e., the global minimum (Fig. 2). Therefore a steepest descent method for minimizing F was implemented.



View larger version (41K):
[in this window]
[in a new window]
 
Fig. 2 F landscape for a series of values for N * and B. The plot is based on the 94394hgu95a11.cel file from the ‘1532 series’ in the Human Latin square dataset, available at http://www.affymetrix.com/support/technical/sample_data/datasets.affx.

 
In order to calculate Î fast for N * and B values, N * and B were isolated in Equation (1). Thus, Î can be calculated as:

where k1 ij , k2 ij and k3 ij are constant matrices:



Zhang et al. (2003) suggest an outlier rejection criterion for probes where Î deviates more than three standard deviations from I. We tested this criterion and found that the overall variation between replicates diminished when the criterion was less strict (data not shown). However, it cannot be excluded that chip hybridizations of dubious quality will benefit from such a criterion. As a consequence we implemented the criterion to be user-specifiable.

The package was developed as an add-on for the affy package (Gautier et al., 2004) for the analysis of Affymetrix Gene chips and can be used in this context. This not only allows the scientist to use the PDNN method in concert with a series of normalization and higher-level analysis tools, but also allows easy comparison between different gene expression index calculations methods, like dChip, MAS v.5, AverageDifference, etc.

Received on March 9, 2004; revised on September 14, 2004; accepted on September 27, 2004

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 IMPLEMENTATION
 REFERENCES
 

    Li, C. and Wong, W.H. (2001) Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol., 2, RESEARCH0032[Medline].

    Gautier, L., Cope, L., Bolstad, B.M., Irizarry, R.A. (2004) Affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics, 20, 307–315[Abstract/Free Full Text].

    Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., Speed, T.P. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 2, 249–264.

    Zhang, L., Miles, M.F., Aldape, K.D. (2003) A model of molecular interactions on short oligonucleotide microarrays. Nat. Biotechnol., 7, 818–821.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/5/687    most recent
bti078v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Nielsen, H. B.ør.
Right arrow Articles by Knudsen, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Nielsen, H. B.ør.
Right arrow Articles by Knudsen, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?