Bioinformatics Advance Access originally published online on October 27, 2004
Bioinformatics 2005 21(5):687-688; doi:10.1093/bioinformatics/bti078
© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org
Implementation of a gene expression index calculation method based on the PDNN model
Henrik Bjørn Nielsen *,
Laurent Gautier and
Steen Knudsen
Center for Biological Sequence Analysis, BioCentrum-DTU Technical University of Denmark Building 208, DK-2800 Lyngby, Denmark
*To whom correspondence should be addressed.
 |
Abstract
|
|---|
Summary: Gene expression index calculations from Affymetrix GeneChips have been dominated by the Affymetrix MAS, dChip, and RMA methods. A new method to estimate the gene expression value utilizing the probe sequence information named position-dependent nearest-neighbor (PDNN) has been suggested by Zhang et al. (2003). Here we describe an open source implementation of the PDNN method for the statistical language R.
Availability: The package can be downloaded from http://www.bioconductor.org/repository/devel/package/html/affypdnn.html
Contact: hbjorn{at}cbs.dtu.dk
 |
INTRODUCTION
|
|---|
Short oligonucleotide microarrays, such as the GeneChip from
Affymetrix, use multiple probes per targeted transcript. This
type of microarray has shown that the probe signals are not
always consistent between different probes. This inconsistency
is only marginally due to noise in the measurements. The main
differences in the signal are due to the differences in the
probes' properties.
In order to obtain a single expression index value representing a gene expression, several data processing methods have been developed. The most widely used are the MAS v.5, the dChip, and the RMA [Affymetrix, Inc., Santa Clara, CA, USA; Li and Wong, 2001; Irizarry et al., 2003]. Zhang et al. (2003) developed a position-dependent nearest-neighbor (PDNN) model over the probe signals that enables estimation of a gene expression index. The variation in expression index values between experiments was shown to be superior to both the MAS v.5 and the dChip methods (Zhang et al., 2003). However, our studies show that the results are comparable to the dChip method, when the latter uses PM probes only in the expression index calculation (Fig. 1). On the other hand, the PDNN method is justified not only by its performance, but also by its applicability. The method requires only one chip to estimate gene expression index values, in contrast to the dChip method that requires a series of chips to perform well. In theory the method should also be able to calculate the correct gene expression value in cases where the probes are very similar in properties or partially overlapping. Such probes may be problematic for the dChip method, because dChip assumes independent probe measurements within a probe set. In contrast the PDNN method does the corrections based on the total set of probes on the array.
It should be noted that the PDNN model assumes that the majority
of probes are designed specifically for their target, and that
they will only capture specific signals from the target. Here
we describe the implementation of the PDNN method, as a package
named affyPDNN for the statistical language R.
 |
IMPLEMENTATION
|
|---|
The PDNN model minimizes
where
Î ij and
I ij are the estimated and the observed
probe intensity of the
ith probe in a probe set targeted for
gene
j, respectively, and
M is the number of probes.
Î ij is estimated using
the following equation:
where
B is the
global background,
N * is the amount of RNA molecules
that contribute to non-specific binding (NSB), and
If we consider a probe as a string of bases {
b 1,
b 2, ...,
b 25}, and
k as position-dependent weights, the target-specific
free energy
E and the average free energy for NSB,
E *, can be calculated as a weighted sum of stacking energies
(

):
The
minimization of
F is done by permuting the
B and
N * values and recalculating
Î. We investigated the
F landscape for a series of different chip types and found it
to be smooth and with only one minimum, i.e., the global minimum
(
Fig. 2). Therefore a steepest descent method for minimizing
F was implemented.
In order to calculate
Î fast for
N * and
B values,
N * and
B were isolated in
Equation (1).
Thus,
Î can be calculated as:
where
k1
ij ,
k2
ij and
k3
ij are constant matrices:
Zhang et al. (2003) suggest an outlier
rejection criterion for probes where
Î deviates more than
three standard deviations from
I. We tested this criterion and
found that the overall variation between replicates diminished
when the criterion was less strict (data not shown). However,
it cannot be excluded that chip hybridizations of dubious quality
will benefit from such a criterion. As a consequence we implemented
the criterion to be user-specifiable.
The package was developed as an add-on for the affy package (Gautier et al., 2004) for the analysis of Affymetrix Gene chips and can be used in this context. This not only allows the scientist to use the PDNN method in concert with a series of normalization and higher-level analysis tools, but also allows easy comparison between different gene expression index calculations methods, like dChip, MAS v.5, AverageDifference, etc.
Received on March 9, 2004; revised on September 14, 2004; accepted on September 27, 2004
 |
REFERENCES
|
|---|
Li, C. and Wong, W.H. (2001) Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol., 2, RESEARCH0032[Medline].
Gautier, L., Cope, L., Bolstad, B.M., Irizarry, R.A. (2004) Affyanalysis of Affymetrix GeneChip data at the probe level. Bioinformatics, 20, 307315[Abstract/Free Full Text].
Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., Speed, T.P. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 2, 249264.
Zhang, L., Miles, M.F., Aldape, K.D. (2003) A model of molecular interactions on short oligonucleotide microarrays. Nat. Biotechnol., 7, 818821.

CiteULike
Connotea
Del.icio.us What's this?