Skip Navigation


Bioinformatics Advance Access originally published online on July 26, 2005
Bioinformatics 2005 21(18):3629-3636; doi:10.1093/bioinformatics/bti593
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
21/18/3629    most recent
bti593v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (58)
Google Scholar
Right arrow Articles by Ji, H.
Right arrow Articles by Wong, W. H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ji, H.
Right arrow Articles by Wong, W. H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions{at}oupjournals.org

TileMap: create chromosomal map of tiling array hybridizations

Hongkai Ji 1 and Wing Hung Wong 2,*

1Department of Statistics, Harvard University Cambridge, MA 02138, USA
2Department of Statistics, Stanford University Stanford, CA 94305, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 IMPLEMENTATION
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 

Motivation: Tiling array is a new type of microarray that can be used to survey genomic transcriptional activities and transcription factor binding sites at high resolution. The goal of this paper is to develop effective statistical tools to identify genomic loci that show transcriptional or protein binding patterns of interest.

Results: A two-step approach is proposed and is implemented in TileMap. In the first step, a test-statistic is computed for each probe based on a hierarchical empirical Bayes model. In the second step, the test-statistics of probes within a genomic region are used to infer whether the region is of interest or not. Hierarchical empirical Bayes model shrinks variance estimates and increases sensitivity of the analysis. It allows complex multiple sample comparisons that are essential for the study of temporal and spatial patterns of hybridization across different experimental conditions. Neighboring probes are combined through a moving average method (MA) or a hidden Markov model (HMM). Unbalanced mixture subtraction is proposed to provide approximate estimates of false discovery rate for MA and model parameters for HMM.

Availability: TileMap is freely available at http://biogibbs.stanford.edu/~jihk/TileMap/index.htm

Contact: whwong{at}stanford.edu

Supplementary information: http://biogibbs.stanford.edu/~jihk/TileMap/index.htm (includes coloured versions of all figures)


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 IMPLEMENTATION
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 
Tiling array is a new type of microarray that interrogates genomes with high-density probes. In a typical tiling array, probes are distributed along chromosomes approximately evenly at a density of one probe per 10–100 bp. When hybridized with RNA or chromatin immunoprecipitation (ChIP) samples, the array will detect genomic loci that show transcriptional or transcription factor binding patterns of interest (Kapranov et al., 2002, Kapranov et al., 2003; Cawley et al., 2004; Kampa et al., 2004). Owing to the high density of the probes, a whole genome can be surveyed in an unbiased manner at a high resolution.

Identifying genomic loci that show transcriptional or protein binding patterns of interest is a key step of digesting information from tiling array experiments. Currently, available tools to fulfill this task are few. Examples include G-TRANS (Kampa et al., 2004), a moving average (MA) method proposed by Keles et al. (2004) and a hidden Markov model (HMM) method proposed by Li et al. (2005). The available tools are not sufficient to meet the diversified needs of the biology community. For example, current tools mainly rely on one-sample and two-sample comparisons. However, in order to study a complex developmental process, one may need to do tiling array experiment under a number of different developmental stages and identify genomic loci with specific temporal or spatial transcriptional or transcription factor binding patterns. This will inevitably involve sophisticated multiple-sample comparisons that current tools cannot handle. Moreover, if one wishes to do experiment under multiple conditions, the number of replicates within each condition will be small owing to the cost constraint. How to make efficient use of the small number of replicates was not specifically considered in previous works.

The goal of this paper is to develop effective statistical models and algorithms to detect genomic loci that show hybridization patterns of interest. We will emphasize the tool's ability to do flexible multiple sample comparisons and to make efficient use of a small number of replicates. A two-step approach, TileMap, is proposed. In the first step, a test-statistic is computed for each probe based on a hierarchical empirical Bayes model. In the second step, test-statistics of probes within a genomic region are combined to infer whether the region has the hybridization pattern of interest or not. Hierarchical empirical Bayes model shrinks variance estimates and increases sensitivity of the analysis when the number of replicates is small. It also provides a flexible way to do complex multiple sample comparisons. Two different methods—an MA method and an HMM—are used to combine neighboring probes. Unbalanced mixture subtraction (UMS) is proposed to provide an approximate estimate of local false discovery rate for MA and model parameters for HMM. Cawley et al.'s (2004) chromosome 21 and 22 ChIP-chip experiment data are used to test and illustrate TileMap, where it shows improved performance over existing methods.

The moving average method was initially used by Keles et al. (2004) in analyzing tiling array data. In addition to the ability to do multiple sample comparisons, there are two main differences between TileMap and Keles's method. First, to compute a probe level test-statistic, Keles's method uses data only from the probe in question, whereas TileMap pools information from all the probes in the array via a closed-form empirical Bayes shrinkage estimator of variance. Recent studies showed that pooling information from all probes is an effective way to increase the sensitivity of gene selection from microarray experiment when the number of replicates in the experiment is small (Baldi and Long, 2001; Newton et al., 2004; Smyth, 2004), and variance is the main component through which information pooling takes effect (H. Ji and W. H. Wong, submitted for publication). TileMap applies this idea to tiling array analysis. Second, a different strategy is adopted by TileMap to determine the cutoff for making rejections. Keles's method uses bootstrap to estimate the null distribution of their ‘scan-statistics’ for choosing the cutoff. They made an implicit equal mean assumption, i.e. under the null hypothesis H0, mean hybridization intensities are equal under different experimental conditions. Although this assumption may be reasonable for two-sample comparisons with H0: µ1 = µ2, it is often inappropriate for false discovery rate (FDR) estimation when H0 contains some random effects, e.g. H0: µ1 – µ2 ~ N(0,1), and for FDR control of multiple-sample comparisons, such as mutant1 (mt1) < wild type(wt) < mutant2 (mt2). For the latter case, the correct null hypothesis is H0: NOT {mt1 < wt < mt2} instead of H0': mt1 = wt = mt2. H0 not only includes H0', but also mt1 = wt < mt2, mt1 > wt < mt2, etc. FDR control for such a complicated composite null is difficult. To deal with this problem, TileMap adopts an empirical technique, i.e. UMS, to get an approximate estimate of the local FDR and to choose a cutoff. In contrast to TileMap and Keles's method, Affymetrix's G-TRANS uses a different strategy. In G-TRANS, probes are grouped into overlapping windows, and a Wilcoxon signed-rank or rank sum test is carried out for each window. It is difficult to generalize this method to complex multiple sample comparisons, and no FDR estimate is provided by G-TRANS. Recently, Li et al. (2005) also proposed an HMM method for tiling array analysis, but their method was again limited to two-sample comparisons and did not pool information across probes when estimating the variance of individual probes.


    2 METHODS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 IMPLEMENTATION
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 
2.1 Hierarchical empirical Bayes model forcomputing probe level test-statistics
After proper preprocessing of the data, the first step of TileMap is to compute a test-statistic for each probe. Assume that there are I probes in the array, hybridizations are done in J different conditions, and there are Kj replicates under condition j. Let Xijk denote the normalized and log-transformed PM or PM-MM value of probe i under condition j and replicate k. The following model is used to describe Xijk:

(1)

(2)

(3)
Define v = {Sigma}j(Kj 1), , and . The basic idea to compute probe level statistics is to estimate by pooling information from all , then treat as known and compare µijs in terms of their posterior distribution.

To estimate , we use the following empirical Bayes shrinkage estimator proposed by H. Ji and W. H. Wong (submitted for publication), based on the theory of Natural Exponential Family with Quadratic Variance Function (Morris, 1983):

(4)

(5)
The derivation of the estimator is outlined in Section 1, Supplementary materials.

Once is obtained, the posterior distribution of µij will be approximated by , and probe level statistics will be constructed based on this approximation. For two-sample comparisons (J = 2), the probe level test-statistic is

(6)
For multiple-sample comparisons (J > 2), e.g. (m1 > wt) or (m2 > wt), the probe level test-statistics will be computed as follows: (1) draw µijs from C times; (2) for each probe i, count how many times the prespecified condition is satisfied, and denote this number by Si; (3) use ti = 1 – (Si/C) as probe level summary. The advantage of this simulation based method is its flexibility, making it especially useful to study, i.e. hybridizations at specific time and specific place in developmental processes.

Formula (6) above looks like a t-statistic, but, in fact, they are different in the way the denominator is derived. in formula (6) pools information from all probes, whereas si which is used in canonical t-statistics uses only information of probe i to estimate its own standard deviation. Pooling information to estimate variance significantly increases the sensitivity of the method, since it provides better estimate of variance in terms of mean square error and results in better separation of test-statistic distributions under the null and alternative hypothesis. The same principle applies to multiple-sample comparisons too. We also tried to shrink µijs by setting a proper prior in formula (2). However, mean shrinking usually does not provide much additional gain in sensitivity whereas it may incur a significant amount of extra computation. This explains why a flat prior was used in formula (2). In formula (1), we assume common variance under different conditions, but this assumption is not crucial. One can assume unequal variance and apply the shrinkage estimator for each condition separately.

Without loss of generality, in what follows, we assume that small ti corresponds to the hybridization pattern of interest. Depending on individual studies, this can be met by setting appropriate group labels (e.g. in a ChIP-chip experiment, define and in formula (6) to be the control and IP samples respectively) or by taking transformations, such as –ti if necessary.

2.2 Combining information fromneighboring probes
TileMap provides two ways to combine information from neighboring probes. The first method uses an MA. In other words,

(7)
is computed as the final summary statistic for probe i. This is identical to Keles et al.'s (2004) scan-statistic, except for the way ti is calculated. Keles's method uses canonical t-statistics, whereas here we use a modified version of it and pool information from all probes to estimate variance. Keles's method only considers two-sample comparisons, whereas TileMap can handle multiple-sample comparisons. Before taking average, ti in multiple-sample case will be transformed by log[ti/(1 ti)]. Notice that in multiple-sample case, ti is a posterior probability and belongs to [0, 1]; if ti = 0 or 1, it is replaced by {varepsilon} or 1 – {varepsilon}, where {varepsilon} is a small number (e.g. 1 x 10–6). For two-sample case, ti is given by formula (6) and belongs to (–{infty},+{infty}), and it will be used directly in formula (7). The choice of w was discussed in Keles et al. (2004) and will not be the focus here.

The second method uses HMM. The advantage of using HMM is that there is no need to preselect a w before analyzing new data. The HMM structure is shown in Figure 1b. More precisely, let Hi denote the hybridization state of probe i. Hi = 1 if probe i shows the pattern of interest; otherwise Hi = 0. Hereafter, we assume that i = 1, ..., I correspond to the probe's physical order on the chromosome. Define di,j to be the physical distance between the centers of probes i and j. Assume that (1) P(Hi = 0) = {pi}0, P(Hi = 1) = {pi}1 = 1 – {pi}0; (2) if di,i+1 ≤ d0, the transition probabilities are P(Hi+1 = 1|Hi = 0,di,i+1 ≤ d0) = a0, P(Hi+1 = 0|Hi = 1,di,i+1 ≤ d0) = a1; if di,i+1 > d0, P(Hi+1 = 0|Hi,di,i+1 > d0) = {pi}0, P(Hi+1 = 1|Hi,di,i+1 > d0) = {pi}1; (3) the conditional distribution of probe level test-statistics is f(ti = t|Hi = 0) = f0(t), f(ti = t|Hi = 1) = f1(t). Under these assumptions, once d0, {pi}0, a0, a1, f0(t) and f1(t) are known, the standard forward–backward algorithm can be applied to infer the hidden state Hi through probe level test-statistics ti.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 1 TileMap overview. (a) Illustration of TileMap procedures. Raw data, TileMap probe level test-statistics, MA summaries and HMM posterior probabilities are shown from top to bottom. In TileMap, small test-statistics correspond to the hybridization pattern of interest. The posterior probability shown is the posterior probability of not being a target probe. (b) The HMM structure in TileMap.

 
In MA, mis are used to rank and select probes to form target regions. The entire set of mi can be viewed as a sample from a mixture distribution {pi}0f0(m) + {pi}1f1(m), where f0(m) and f1(m) are distributions of mi under Hi = 0 and Hi = 1, respectively. We need to estimate {pi}0,f0(m) and f1(m) in order to control FDR. In HMM, tis are used to infer the hidden states, and target regions are selected based on the posterior probability of Hi = 1. The tis can be treated as a sample from another mixture {pi}0f0(t) + {pi}1f1(t), and one needs to know {pi}0,f0(t) and f1(t) before decoding the HMM. TileMap adopts unbalanced mixture subtraction to deal with these two similar issues.

2.3 UMS
The goal of UMS is to recover different components of a mixture distribution h(t) {equiv} {pi}0f0(t) + (1 – {pi}0)f1(t), where t represents a generic statistic. In reality, h(t) is observed, but {pi}0 and f1(t) are unknown. Canonical FDR procedures assume f0(t) to be known and try to restore f1(t) by subtracting f0(t) from h(t). These procedures usually work well in two-sample comparisons where f0(t) can be obtained, either by theory or by simulation techniques, such as permutations. However, they cannot be applied directly to cases where f0(t) is hard to obtain, e.g. multiple-sample comparisons or comparisons involving complex composite null. To circumvent this difficulty, UMS makes use of additional information (see below) to construct two ‘unbalanced’ distributions. Both are mixtures of f0(t) and f1(t), but they differ in the abundance of f1(t) component. The two ‘unbalanced’ mixtures can then be used to reconstruct {pi}0, f0(t) and f1(t) and estimate FDR.

UMS is illustrated in Figure 2. We first construct two mixtures g0(t) = p0f0(t) + (1 – p0)f1 (t) and g1(t) = q0f0 (t) + (1 – q0)f1(t), where p0 > {pi}0 ≥ q0. If two such mixtures can be constructed and if t0 such that t -> t0, f0(t)/f1(t) -> {infty}, then limt->t0 g1 (t)/g0(t) = q0/p0 {equiv} r. Once r is known, f1(t) can be obtained by formula (8).

(8)
To estimate f0(t), notice that {pi}1 = 1 – {pi}0 is usually small; therefore, g0(t) can provide an approximation of f0(t). Given f1(t) and g0(t), {pi}0 can be estimated by fitting h(t) using {theta}0g0(t) + (1 – {theta}0)f1 (t) such that {int}{h(t) – [{theta}0g0(t) + (1 – {theta}0)f1(t)]}2dt is minimized. The resulting estimate ; therefore, provides a conservative estimate of {pi}1, which is desirable if we want to keep a relatively stringent criteria in detecting regions of interest. Once {pi}1, f0(t) and f1(t) are estimated, local false discovery rate at a point t can be estimated by ], and FDR for a rejection region Z, e.g. {t ≤ tcut}, can be estimated using the relationship FDR(Z) = E[lfdr(t)|Z]. In the special case where the null distribution f0(t) is known, we can set p0 = 1, g0(t) = f0(t), and g1(t) = h(t); then UMS reduces to the q-value method discussed in Storey and Tibshirani (2003).



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 2 Unbalanced Mixture Subtraction. Left panel is a conceptual example to illustrate UMS. See Section 2.3 for a detailed description. Right panel is a real example where UMS was applied to analyze 18 arrays of a cMyc ChIP-chip experiment to estimate f0(t) and f1(t) (Section 4).

 
To construct the two unbalanced mixtures g0(t) and g1(t), we need additional information. If biological knowledge tells that certain regions are more likely to be transcribed or bound by the transcription factors under study, this piece of information can be used, e.g. one can map known transcription factor binding motifs to the genome to collect regions of potential interest. If such biological information is not available, the correlation structure provided by tiling array itself can be used to get an approximate estimate of g0(t) and g1(t) as discussed in the following paragraphs.

For HMM, we pick up probes with ti > t(p), where t(p) is the p-th percentile of all tis. Then, ti+1, the immediate downstream test-statistics of the selected probes are used to form . We then pick up probes with ti ≤ t(q), and use their downstream ti+1 to form . For MA, ti+1 is replaced by mi+w+1, and a similar procedure is used to construct and . We then use and to surrogate g0(.) and g1(.). The intuition behind these procedures is that if a DNA/cDNA fragment hybridizes to a probe, it also tends to hybridize to its neighboring probes. Thus, if a probe has very small ti, its neighboring probes are more likely to have the pattern of interest than random probes do and vice versa.

To generalize the above procedures, we define a selection statistic ui. We use to approximate g0(t), and to approximate g1(t). For MA, ui = tiw–1, Ti = mi. For HMM, ui = ti–1, Ti = ti. Both MA and HMM use A = {ui > t(p)} and R = {ui ≤ t(q)}. By default, t(p) = t(1) and t(q) = t(5) (see Section 5.3 Supplementary material for discussions about the choice of t(p) and t(q)). As an illustration, the right panel of Figure 2 provides a real example for estimating HMM parameters in this way.

It can be shown that if (1) P(Hi = 0|I{uiA} = 1) > {pi}0, (2) f(Ti = t|Hi, I{uiA} = 1) = f(Ti = t|Hi), then is a valid surrogate for g0(t). Similarly, if (1) P(Hi = 0|I{uiR} = 1) ≤ {pi}0, (2) f(Ti = t|Hi, I{uiR} = 1) = f(Ti = t|Hi), is a valid surrogate for g1(t). Usually, condition (1) is not hard to meet. Condition (2) is implied in HMM case, but in general, it only holds approximately or may not even hold owing to the possible selection bias or the residual correlations between ui and Ti after accounting for Hi. We, therefore, label the application of UMS here as an ‘approximate’ procedure, meaning that it only provides a rough and possibly imprecise or optimistic estimate of FDR under the null model, unless the previous assumptions are completely satisfied. The advantage of UMS over the permutation-based FDR estimation, such as SAM (Tusher et al., 2001), is that, if conditions (1) and (2) are indeed satisfied, UMS can provide FDR estimate for complex composite null hypothesis, such as ‘not µ1 < µ2 < µ3 or µ4 < µ5’, whereas the latter cannot. Also, UMS provides an interface to incorporate other sources of information (e.g. empirical biological knowledge about which genes/regions are more likely to show desired pattern) to evaluate false positive rates. When applying UMS, however, it is important to understand that there is always a possibility that bias may be introduced by the new sources of information.

For HMM, one also needs to determine a0, a1 and d0. One can choose a1 and d0 according to the typical length of hybridizations. For example, in ChIP-chip experiments, IP fragments are usually ~1 kb. If the probe density in the chip is 1 probe/35 bp, a typical hybridization would contain ~28 probes; correspondingly, a1 can be set to 1/28 to match the mean length of continuous Hi = 1 segments in HMM, and d0 can be set to 1000. To estimate a0, assume that ({pi}0, {pi}1) is the stationary distribution for the Markov chain without gaps (i.e. without di,i+1 > d0), then {pi}1 = a0/(a0 + a1), and a0 can be estimated by where .


    3 IMPLEMENTATION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 IMPLEMENTATION
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 
TileMap is implemented in ANSI C. In terms of computation time, it is usually >10 times faster than G-TRANS (refer to Section 2, Supplementary material). TileMap includes functions to do raw data normalization, local repeat filtering, probe level summary, UMS, MA and HMM. Local repeat filtering masks any probe that occurs more than once in a 2 kb local window. In UMS, users may choose to use their own selection statistics. For MA, permutation-based FDR estimation routine is also provided. The output of TileMap includes final summaries for each probe and a *.bed file containing selected genomic regions. The latter is defined by lfdr in MA or posterior probability of Hi = 0 in HMM being smaller than a user specified cutoff.

In UMS, all statistics are transformed to [0, 1], e.g. t-statistic is transformed using exp(ti)/[1 + exp(ti)]. [0, 1] is then equally divided into n (default = 1000) intervals. g0(.) and g1(.) were estimated using empirical distributions of test-statistics in these intervals. To estimate r, we compute rt = [1 – G1(t)]/[1 G0(t)] for t = t(50), t(51), ..., t(99). r is then set to be the median of these 50 rts. To get stable estimates of f1(.), we also assumed monotone likelihood ratio in implementing UMS, i.e. as t -> t0, f0(t)/f1 (t) is increasing.


    4 RESULTS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 IMPLEMENTATION
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 
Tilemap was tested using a ChIP-chip experiment performed by Cawley et al. (2004) as well as simulations. In this section, we will present the global design and the main results of the tests. Details of how tests and simulations were done are provided in Supplementary material (Sections 3–6). Cawley's experiment tried to identify binding regions for three transcription factors using Affymetrix chromosomes 21 and 22 tiling arrays. Their cMyc data on Chip A and p53-FL (full length antibody) data on Chips A, B, C were used for testing. For discussion's convenience, Chips A, B and C in p53 experiment are treated here as a combined single chip. For each transcription factor, hybridizations were done for two biological replicates under three different conditions: IP, control GST (C1) and control input (C2). For each biological replicate and experimental condition, three technical replicates were obtained. In total, there were 18 arrays for each transcription factor. Before analysis, raw data were quantile normalized (Bolstad et al., 2003), PM-only intensities were log transformed and adjusted for batch effect (Section 3.1, Supplementary material). Local repeats were filtered out. The 18 arrays were then randomly divided into three groups G1, G2 and G3 for later use. Each group contained six arrays: two for IP, two for C1 and two for C2. Within each condition, the two arrays were from different biological replicates.

4.1 Sensitivity test based on cMyc data
In order to see how variance shrinking helps increase sensitivity in small replicate case, MA with variance shrinking (MA-S) was first compared with MA without variance shrinking (MA-N). Notice that in two sample comparisons, MA-N is equivalent to Keles's scan statistic. Before the comparison, a gold standard was constructed by applying MA-N to all 18 arrays to select probes showing IP > C1 and IP > C2 (Section 3.2, Supplementary material). The gold standard contained 1654 probes (0.5% of all probes) and was grouped into 180 binding regions. In order to compare, one or two of the G1–G3 groups were excluded. (w = 5) and (w = 5) were applied to the remaining arrays to rank probes according to (1) IP > C1 using one of the G1–G3 groups only; (2) IP > C1 and IP > C2 using one of the G1–G3 groups only; and (3) IP > C1 and IP > C2 using two of the G1–G3 groups. For simplicity, we use s2r2, s3r2 and s3r4 to denote the three settings above. s2r2 stands for two-sample comparison where each sample has two replicates, s3r2 stands for three-sample two replicates, etc. In each of the three settings above, 0.5% of top ranking probes were selected to form binding regions. This guaranteed that both methods had the same coverage of the genome. Two probes, if separated by <500 bp, were treated as in a single region. Regions were ranked according to minimum of mis of each region. MA-S and MA-N were then compared in terms of what fraction of their top ranking probes were gold standard probes (Fig. 3a) and how many of their top ranking regions overlap gold standard regions (Fig. 3b). There were three possibilities to choose one group or two groups to exclude, and the results shown here were averages over the three possibilities. According to Figure 3, MA-S was indeed more powerful than MA-N, even if the way we define gold standard was biased toward MA-N. In s2r2 case, the effect was most striking. At probe level, the rate of correct rejections increased from ~0.2 to ~0.85 when 500 rejections were made; at binding region level, MA-S identified >70 more gold standard regions among the top 160 regions. The gain from shrinking decreases as the number of arrays increases, but given that 2–3 replicates are most often encountered, we would expect to gain from variance shrinking in a significant number of real studies.



View larger version (47K):
[in this window]
[in a new window]
 
Fig. 3 Comparisons of MA, HMM, Keles's method and G-TRANS in cMyc data analysis. Fraction/number of correct rejections among certain number of total rejections was shown. (a) MA-S and MA-N were compared at probe level, probe density = 1/35 bp; (b) MA-S and MA-N were compared at binding region level, probe density = 1/35 bp; (c) MA-S and MA-N were compared at binding region level, probe density = 1/70 bp; (d) G-TRANS, Keles's method (MA-N), MA-S and HMM were compared at binding region level, probe density = 1/35 bp, G–M was used as gold standard.

 
We also reduced the probe density from 1 probe/35 bp to 1 probe/70 bp by discarding half of the probes. (w = 2) and (w = 2) were compared again using the data with the reduced probe density (Fig. 3c). The gold standard used in Figure 3c was the same as that in Figure 3b which were constructed using all probes including the probes discarded. The gain from variance shrinking became more evident. Interestingly, in s3r2 case, MA-S found ~100 ‘true’ binding regions among its top 160 rejections (Fig. 3c). The same sensitivity was achieved by MA-N but with doubled probe density (Fig. 3b). This means that we only need half as many probes for MA-S as for MA-N to achieve the same sensitivity in this case. If MA-N were used to survey 100 genes, using the same number of probes, MA-S allows us to survey 200 genes without losing the ability to detect the true targets. Surveying more genes, however, can increase the chance for finding the unknown players in a biological process.

In order to compare G-TRANS, Keles's method, MA-S and HMM, they were applied to analyze cMyc data as we did in Figure 3b. Two gold standards, ‘G–M’ and ‘G–H’, were constructed using all 18 arrays (Section 3.3, Supplementary material). G–M standard contained 78 regions, which is the intersection of the regions identified by G-TRANS and MA-N. G–H standard contained 73 regions, which is the intersection of G-TRANS and HMM regions. Different methods were compared at binding region level using both G–M standard (Fig. 3d) and G–H standard (Supplementary Figure S2). The two results were similar. When replicates were few (s2r2, s3r2), MA-S and HMM showed clear improvement in sensitivity compared with G-TRANS and Keles's method. As the number of arrays became large (s3r4), all methods began to show similar performance. Keles's method and G-TRANS cannot be used to do multiple sample comparisons. In order to get summary statistics for IP > C1 and IP > C2, Keles's method was replaced by MA-N; G-TRANS was applied twice to do two-sample comparisons IP > C1 and IP > C2 separately, and the maximum of the two P-values for each probe was taken as its final summary statistics to derive binding regions. We did not compare TileMap with the HMM method proposed by Li et al. (2005), since the latter was not available at the time this work was done.

Next, we compared the enrichment of cMyc binding sites in regions identified by different methods. cMyc consensus binding pattern CA[C/T]G[T/C]G was mapped to chromosome 21, yielding 17 563 potential binding sites (TFBS) in a total of 18.3 Mb non-repeat genomic sequences. Among these TFBS, 4496 were located in regions whose human–mouse–rat cross-species conservation score was among the top 30% of the whole chromosome (Section 3.4, Supplementary material). G-TRANS, MA-N (Keles), MA-S and HMM were all applied to select the top 0.5% probes and group them into binding regions using 18 cMyc arrays (s3r6) as well as reduced data (s2r2, s3r2, s3r4). The number of TFBS and conserved TFBS (cTFBS) in the identified regions were counted. Binding site enrichment was computed as the ratio of TFBS and cTFBS densities in selected regions to their chromosome-wide counterparts. Site enrichment for different methods is listed in Table 1. Based on the results, MA-S and HMM consistently showed higher or nearly equal TFBS and cTFBS enrichment than G-TRANS. They also showed higher TFBS enrichment than MA-N (Keles) in s2r2 case. The differences, however, diminished as more arrays were included.


View this table:
[in this window]
[in a new window]
 
Table 1 cMyc binding site enrichment in predicted binding regions

 
4.2 Sensitivity test based on p53 data
Different methods were further compared through the analysis of 18 p53-FL arrays. Cawley et al. (2004) verified 14 p53 binding regions by qPCR, using either p53_FL or p53_DO1 antibody. These regions were used here as gold standard. Each method was applied under different settings (s2r2–s3r6) to select the top 0.5% probes and group them into binding regions. Methods were then compared in terms of how many experimentally verified regions were identified in their top 10, top 20 and all selected regions. The results were listed in the top panel of Table 2. HMM and MA-S again detected more experimentally verified regions than GTRANS and MA-N (Keles) when replicates were few (e.g. s2r2, s3r2). We also reduced the probe density from 1/35 to 1/100 bp by discarding two-thirds of all the probes. MA-N, MA-S and HMM were compared again (the bottom panel of Table 2). The better performance of MA-S and HMM over MA-N in small replicate case became more evident (e.g. s3r2). G-TRANS was not compared here since we were unable to use it to analyze a set of specified probes.


View this table:
[in this window]
[in a new window]
 
Table 2 Sensitivity of GTRANS, MA-N, MA-S and HMM on p53 data

 
4.3 Performance of UMS
To see how UMS works, a series of simulations were done. In all simulations, six arrays were generated and equally divided into three groups D1, D2 and D3, each of size two. Each array contained 50 000 probes. Probe intensities were generated according to formulae (1) and (3). v0 = 4.64, were chosen to match the typical values observed in real data. A number of binding regions with pattern D1 < D2 < D3 were generated to serve as targets, we wish to identify (Section 4.1, Supplementary material). In total, these regions covered 50 000 {pi}1 probes. Simulations differ in the way {Delta}i1 = µi2 – µi1 and {Delta}i2 = µi3 – µi2 were generated, which was designed to test UMS from different perspective. Table 3 listed the designs for simulations I–III. In each simulation, 10 different datasets were generated, and the results below were averages over the 10 datasets. Here, we use {pi}1 = 0.05 to illustrate the results, although {pi}1 = 0.01, 0.02 and 0.10 were also tried and similar results were obtained.


View this table:
[in this window]
[in a new window]
 
Table 3 Simulation design for testing UMS

 
Simulations I and II tested UMS when its assumptions were true. In simulation I, we tested D1 = D2 = D3 versus D1 < D2 < D3. Probe level test-statistics t were computed for three-sample comparison D1 < D2 < D3. UMS was applied to estimate {pi}1 and lfdr based on t. In UMS, t(p) = t(1), t(q) was set to t(2), t(5), t(10) and t(50) respectively, and [0, 1] was divided into n = 50 intervals. For comparison's purpose, permutation test was also applied to estimate lfdr. Since we knew exactly what probes were true targets, the true lfdr could be obtained. Both true lfdr and lfdr obtained by permutation test were shown together with UMS estimates in Figure 4a and b. In Figure 4a, estimations were based on t without variance shrinking. As expected, both UMS and permutation test gave desired lfdr, with UMS being a little bit more conservative. In Figure 4b, estimations were based on t with variance shrinking. Surprisingly, permutation test failed to provide desired lfdr, even though the null hypothesis here was D1 = D2 = D3. This was owing to the combined effect of permutation test and shrinking. The sample variance of probes with D1 < D2 < D3 tend to become bigger after permutations; therefore, variance estimates of all probes were pulled toward a bigger when shrinkage estimator was applied, and test-statistics tend to become more centralized in the permutation distribution. As a result, the number of false probes was underestimated on the tail part, causing optimistic lfdr estimates. In contrast to permutation test, however, UMS still provided conservative lfdr estimates.



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 4 Local false discovery rate estimates by UMS and permutation test in simulations I–III. t(p) = t(1). UMS was applied under four different t(q) settings: t(q) = 2nd, 5th, 10th, 50th percentile of t. Black curves correspond to true lfdr. (a) Simulation I, estimations based on t without variance shrinking; (b) simulation I, estimations based on t with variance shrinking; (c) simulation II, estimations based on t without variance shrinking; (d) simulation III, estimations based on t with variance shrinking.

 
In simulation II (Section 4.2, Supplementary material), probes outside target regions were assigned some random changes. This introduced random components, such as D1 < D2 > D3, D1 > D2 < D3 into the null hypothesis which is no longer D1 = D2 = D3. UMS and permutation test were both applied to estimate lfdr, and the results based on t without and with variance shrinking were shown in Figure 4c and Supplementary Figure S3b, respectively. Now even in the non-shrinking case, permutation test failed to provide desired lfdr for D1 < D2 < D3. UMS, however, again provided conservative estimates for both non-shrinking and shrinking t under different t(q) settings.

Simulations I and II tested UMS when its conditional independence assumption [i.e. f(Ti = t|Hi,I{uiA} = 1) = f(Ti = t|Hi)] was true. Analysis of Cawley's experiment showed that this assumption can provide a first order approximation of the real data (Section 5.1, Supplementary material). To see how UMS performs when this assumption does not hold, in simulations III–VI, we challenged UMS by violating the assumption indifferent ways.

In simulation III (Section 4.3, Supplementary material), we introduced some additional binding regions with pattern D1 = D2 < D3 or D1 < D2 = D3 into the background. Each type of the new regions also covered {pi}1 of the total probes. The additional regions belonged to null hypothesis and were not the targets we wish to detect. This design broke the conditional independence assumption under Hi = 0, since D1 = D2 < D3 and D1 < D2 = D3 are more likely to generate significant test-statistics than D1 = D2 = D3 does, and unlike simulation II, here probes from D1 = D2 < D3 and D1 < D2 = D3 tend to be clustered together. The lfdr estimates by UMS and permutation test in this simulation are shown in Figure 4d and Supplementary Figure S3c. When t(q) was small (q% ≤ {pi}1 in this case), UMS was still able to provide reasonable lfdr estimates. When t(q) became large, the estimates became optimistic. In both cases, however, UMS performed much better than permutation test. A theoretical analysis of why UMS works in such a situation when t(q) is small is given by supplementary material (Section 5.2).

Simulations IV–VI (Section 4.4, Supplementary material) were tailored from simulations I–III respectively. Residual correlations between Ti and ui were introduced into binding regions, which broke the conditional independence assumption under Hi = 1. The results obtained were shown in Supplementary Figure S3 and were similar to those in Figure 4, suggesting that this type of violation of the assumption did not influence the performance of UMS significantly.

We did additional theoretical analysis and tests (Sections 4.5–4.8 and Section 5, Supplementary material). Together with simulations here, they showed that (1) when the conditional independence assumption of UMS holds, UMS can provide reasonable lfdr and {pi}1 estimates, and the performance is robust to choices of t(p) and t(q); (2) when the assumption does not hold, UMS can provide reasonable lfdr and {pi}1 estimates when t(q) is small, and under such condition, the performance of UMS is robust to choices of t(p); however, if t(q) is big, UMS is sensitive to choices of t(p); (3) UMS works reasonably well when mi instead of ti is used to estimate lfdr. According to our own experience, by setting t(q) ≤ t(5) and t(1) ≤ t(p) ≤ t(20), UMS usually can provide reasonable performance.

Finally, when applying UMS to Cawley's experiment, at lfdr = 0.5 level, MA detected 30 and 19 regions with pattern IP > C1 and IP > C2 for cMyc (ChipA) and p53-FL data, respectively. At posterior probability = 0.5 level, HMM detected 168 and 142 regions. As a comparison, at P-value = 0.001 level, G-TRANS reported 48 and 152 regions. HMM tend to report more regions than MA, many of which are regions shorter than the window size specified by MA and are not reported by MA (Section 6, Supplementary material). Whether the shorter regions found by HMM are more likely to be true signals or noise cannot be clearly resolved using current data. When we checked the probe intensities, many such regions did look like true binding regions (Figure S8). Future experimental verifications are needed to resolve this issue.


    5 DISCUSSION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 IMPLEMENTATION
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 
Compared with previous tools, TileMap provides a flexible way to study tiling array hybridizations under multiple experimental conditions. The variance shrinking component increases the sensitivity in finding genomic loci of interest when the number of replicates is small. Though we have only illustrated the use of TileMap in ChIP-chip experiment, it can also be used to analyze transcriptional activities of the genome. In terms of computation time, TileMap is substantially more efficient than G-TRANS.

The main difficulty of multiple sample comparisons is to get the distribution of test-statistics under the null hypothesis, which is needed for FDR control or HMM decoding. TileMap adopts an approximate procedure, UMS, to deal with this issue. UMS is not a perfect solution. However, the estimation of null distributions under complex composite null is a difficult problem in general, for which there are no good solutions currently. UMS embodies an initial try to address this issue. The rough estimate provided by UMS can be used to guide the choice of cutoffs, and in many cases, such an imprecise estimate is enough for practical use for several reasons. First, FDR is always model dependent, e.g. assuming H0 : µ1 = µ2 and H0 : µ1 µ2 ~ N(0,1) will result in very different FDR estimates. Therefore, unless the statistical null model (e.g. µ1 = µ2) matches the scientific null (e.g. not tumor-related), FDR could be very misleading. Second, compared with power, FDR is of secondary importance if we are only interested in a few top regions. What we really care about is to have higher chance to find regions of real scientific interest instead of getting an FDR estimate for a statistical model which may be an oversimplification of the real world. Despite all these arguments, we also acknowledge that further investigation of how to control FDR under composite null per se deserves further investigation. Such investigations will provide basis for rigorous statistical inference for complex multiple sample comparisons.

Both MA and HMM used here did not consider the real distributions of the length of hybridizations. Current knowledge about such distributions is limited. If one can determine these distributions, the models here can be refined and may provide further resolving power. For MA, the average can be replaced by a weighted average; for HMM, a distance dependent transition probability can be used. All these aspects deserve further investigation. Finally, TileMap is only the first step to utilize the information provided by tiling array. Future efforts to integrate TileMap with cis-regulatory module discovery, alternative splicing analysis, etc. will help us get deeper understanding of various biological systems.


    Acknowledgments
 
The authors thank Simon Cawley for providing the chromosome 21 and 22 ChIP-chip data, Xiaole S. Liu for helpful discussions about using HMM in analyzing tiling arrays, and the two anonymous referees for their invaluable suggestions to improve the paper. The work was partially supported by NIH grant GM-067250. Funding to pay the Open Access publication charges for this article was provided by the same grant.

Conflict of Interest: none declared.

Received on May 2, 2005; revised on July 19, 2005; accepted on July 20, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 IMPLEMENTATION
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 

    Baldi, P. and Long, A.D. (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics, 17, 509–519[Abstract/Free Full Text].

    Bolstad, B.M., et al. (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19, 185–193[Abstract/Free Full Text].

    Cawley, S., et al. (2004) Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell, 116, 499–509[CrossRef][Web of Science][Medline].

    Kampa, D., et al. (2004) Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res., 14, 331–342[Abstract/Free Full Text].

    Kapranov, P., et al. (2002) Large-scale transcriptional activity in chromosomes 21 and 22. Science, 296, 916–919[Abstract/Free Full Text].

    Kapranov, P., et al. (2003) Beyond expression profiling: next generation uses of high density oligonucleotide arrays. Brief. Funct. Genomic. Proteomic., 2, 47–56[Abstract/Free Full Text].

    Keles, S., van der Laan, M.J., Dudoit, S., Cawley, S.E. (2004) Multiple testing methods for ChIP-Chip high density oligonucleotide array data. Working Paper Series, Paper 147, , Berkeley, CA U.C. Berkeley Division of Biostatistics, University of California.

    Li, W., et al. (2005) A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences. Bioinformatics, 21, Suppl. 1, i274–i282[Abstract].

    Morris, C.N. (1983) Natural exponential families with quadratic variance functions: statistical theory. The Annals of Statistics, 11, 515–529.

    Newton, M.A., et al. (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, 5, 155–176[Abstract].

    Smyth, G.K. (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol., 3, Article 3.

    Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA, 100, 9440–9445[Abstract/Free Full Text].

    Tusher, V.G., et al. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA, 98, 5116–5121[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
H. Choi, A. I. Nesvizhskii, D. Ghosh, and Z. S. Qin
Hierarchical hidden Markov model with application to joint analysis of ChIP-chip and ChIP-seq data
Bioinformatics, July 15, 2009; 25(14): 1715 - 1721.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
M. C. Gambetta, K. Oktaba, and J. Muller
Essential Role of the Glycosyltransferase Sxc/Ogt in Polycomb Repression
Science, July 3, 2009; 325(5936): 93 - 96.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
D. R. Mole, C. Blancher, R. R. Copley, P. J. Pollard, J. M. Gleadle, J. Ragoussis, and P. J. Ratcliffe
Genome-wide Association of Hypoxia-inducible Factor (HIF)-1{alpha} and HIF-2{alpha} DNA Binding with Expression Profiling of Hypoxia-inducible Transcripts
J. Biol. Chem., June 19, 2009; 284(25): 16767 - 16775.
[Abstract] [Full Text] [PDF]


Home page
Stem CellsHome page
B. L. Kidder, S. Palmer, and J. G. Knott
SWI/SNF-Brg1 Regulates Self-Renewal and Occupies Core Pluripotency-Related Genes in Embryonic Stem Cells
Stem Cells, February 1, 2009; 27(2): 317 - 328.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Zhang
Poisson approximation for significance in genome-wide ChIP-chip tiling arrays
Bioinformatics, December 15, 2008; 24(24): 2825 - 2831.
[Abstract] [Full Text] [PDF]


Home page
Genes Dev.Home page
S. A. Vokes, H. Ji, W. H. Wong, and A. P. McMahon
A genome-scale analysis of the cis-regulatory circuitry underlying sonic hedgehog-mediated patterning of the mammalian limb
Genes & Dev., October 1, 2008; 22(19): 2651 - 2663.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
H. Lian, W. A. Thompson, R. Thurman, J. A. Stamatoyannopoulos, W. S. Noble, and C. E. Lawrence
Automated mapping of large-scale chromatin structure in ENCODE
Bioinformatics, September 1, 2008; 24(17): 1911 - 1916.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M.-L. Martin-Magniette, T. Mary-Huard, C. Berard, and S. Robin
ChIPmix: mixture model of regressions for two-color ChIP-chip analysis
Bioinformatics, August 15, 2008; 24(16): i181 - i186.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Smeenk, S. J. van Heeringen, M. Koeppel, M. A. van Driel, S. J. J. Bartels, R. C. Akkers, S. Denissov, H. G. Stunnenberg, and M. Lohrum
Characterization of genome-wide p53-binding sites upon stress response
Nucleic Acids Res., June 1, 2008; 36(11): 3639 - 3654.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. Biol.Home page
P. Hatzis, L. G. van der Flier, M. A. van Driel, V. Guryev, F. Nielsen, S. Denissov, I. J. Nijman, J. Koster, E. E. Santo, W. Welboren, et al.
Genome-Wide Pattern of TCF7L2/TCF4 Chromatin Occupancy in Colorectal Cancer Cells
Mol. Cell. Biol., April 15, 2008; 28(8): 2732 - 2744.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. J. Reiss, M. T. Facciotti, and N. S. Baliga
Model-based deconvolution of genome-wide DNA binding
Bioinformatics, February 1, 2008; 24(3): 396 - 403.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
H. Shim and S. Keles
Integrating quantitative information from ChIP-chip experiments into motif finding
Biostat., January 1, 2008; 9(1): 51 - 65.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. Bock and T. Lengauer
Computational epigenetics
Bioinformatics, January 1, 2008; 24(1): 1 - 10.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
H.-R. Chung, D. Kostka, and M. Vingron
A physical model for tiling array analysis
Bioinformatics, July 1, 2007; 23(13): i80 - i86.
[Abstract] [Full Text] [PDF]


Home page
Genes Dev.Home page
A. Matsumoto, M. Ukai-Tadenuma, R. G. Yamada, J. Houl, K. D. Uno, T. Kasukawa, B. Dauwalder, T. Q. Itoh, K. Takahashi, R. Ueda, et al.
A functional genomics strategy reveals clockwork orange as a transcriptional regulator in the Drosophila circadian clock
Genes & Dev., July 1, 2007; 21(13): 1687 - 1700.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
M. B. Gerstein, C. Bruce, J. S. Rozowsky, D. Zheng, J. Du, J. O. Korbel, O. Emanuelsson, Z. D. Zhang, S. Weissman, and M. Snyder
What is a gene, post-ENCODE? History and updated definition
Genome Res., June 1, 2007; 17(6): 669 - 681.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
O. Emanuelsson, U. Nagalakshmi, D. Zheng, J. S. Rozowsky, A. E. Urban, J. Du, Z. Lian, V. Stolc, S. Weissman, M. Snyder, et al.
Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome
Genome Res., June 1, 2007; 17(6): 886 - 897.
[Abstract] [Full Text] [PDF]


Home page
DevelopmentHome page
S. A. Vokes, H. Ji, S. McCuine, T. Tenzen, S. Giles, S. Zhong, W. J. R. Longabaugh, E. H. Davidson, W. H. Wong, and A. P. McMahon
Genomic characterization of Gli-activator targets in sonic hedgehog-mediated neural patterning
Development, May 15, 2007; 134(10): 1977 - 1989.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Park, Y. Kim, S. Bekiranov, and J. K. Lee
Error-pooling-based statistical methods for identifying novel temporal replication profiles of human chromosomes observed by DNA tiling arrays
Nucleic Acids Res., May 14, 2007; 35(9): e69 - e69.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. Parisi, P. Wirapati, and F. Naef
Identifying synergistic regulation involving c-Myc and sp1 in human tissues
Nucleic Acids Res., March 1, 2007; (2007) gkl1157v2.
[Abstract] [Full Text] [PDF]


Home page
Genes Dev.Home page
T. Sandmann, C. Girardot, M. Brehme, W. Tongprasit, V. Stolc, and E. E.M. Furlong
A core transcriptional network for early mesoderm development in Drosophila melanogaster
Genes & Dev., February 15, 2007; 21(4): 436 - 449.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Du, J. S. Rozowsky, J. O. Korbel, Z. D. Zhang, T. E. Royce, M. H. Schultz, M. Snyder, and M. Gerstein
A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating validated biological knowledge
Bioinformatics, December 15, 2006; 22(24): 3016 - 3024.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. Ji, S. A. Vokes, and W. H. Wong
A comparative analysis of genome-wide chromatin immunoprecipitation data for mammalian transcription factors
Nucleic Acids Res., December 4, 2006; 34(21): e146 - e146.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Papana and H. Ishwaran
CART variance stabilization and regularization for high-throughput genomic data
Bioinformatics, September 15, 2006; 22(18): 2254 - 2261.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
W. E. Johnson, W. Li, C. A. Meyer, R. Gottardo, J. S. Carroll, M. Brown, and X. S. Liu
Model-based analysis of tiling-arrays for ChIP-chip
PNAS, August 15, 2006; 103(33): 12457 - 12462.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
21/18/3629    most recent
bti593v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (58)
Google Scholar
Right arrow Articles by Ji, H.
Right arrow Articles by Wong, W. H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ji, H.
Right arrow Articles by Wong, W. H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?