Skip Navigation


Bioinformatics Advance Access originally published online on May 25, 2009
Bioinformatics 2009 25(14):1838-1840; doi:10.1093/bioinformatics/btp320
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
25/14/1838    most recent
btp320v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by He, K.
Right arrow Articles by Luo, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by He, K.
Right arrow Articles by Luo, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

NTAP: for NimbleGen tiling array ChIP-chip data analysis

Kun He 1,2,*, Xueyong Li 3,4, Junli Zhou 3,5, Xing-Wang Deng 3, Hongyu Zhao 6 and Jingchu Luo 1,*

1 College of Life Sciences, Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, Peking University, Beijing 100871, China, 2 Department of Plant Biology, Carnegie Institution, Stanford, CA 94305, 3 Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06520, USA, 4 Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, 5 Beijing Kaituo DNA Biotech Research Center CO., Ltd., 39 West Shangdi Rd, Haidian District, Beijing 100085, China and 6 Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520, USA

* To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FUNCTIONS AND FEATURES
 3 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary:NTAP is designed to analyze ChIP-chip data generated by the NimbleGen tiling array platform and to accomplish various pattern recognition tasks that are useful especially for epigenetic studies. The modular design of NTAP makes the data processing highly customizable. Users can either use NTAP to perform the full process of NimbleGen tiling array data analysis, or choose post-processing modules in NTAP to analyze pre-processed epigenetic data generated by other platforms. The output of NTAP can be saved in standard GFF format files and visualized in GBrowse.

Availability and Implementation:The source code of NTAP is freely available at http://ntap.cbi.pku.edu.cn/. It is implemented in Perl and R and can be used on Linux, Mac and Windows platforms.

Contact: ntap{at}mail.cbi.pku.edu.cn; luojc{at}pku.edu.cn; hekun78{at}gmail.com


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FUNCTIONS AND FEATURES
 3 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Genome-level high-density tiling arrays are becoming more accessible for genome-wide profiling studies including transcriptome identification (Bertone et al., 2004), transcription factor binding site identification (Lee et al., 2007), histone modification profiling (Gendrel et al., 2005; Li et al., 2008), DNA methylation profiling (Hayashi et al., 2007) and comparative genome hybridization. Specific analysis methods and tools are required for each type of study because the strategies behind different tiling array applications vary extensively. As a result, several models have been proposed and software tools have been developed for the analysis of different types of tiling array data (Chung et al., 2007; Ji et al., 2008; Li et al., 2005; Wang et al., 2006; Zhang et al., 2007). However, there is still room to improve for data analysis of epigenetic features including histone modifications and DNA methylation. The recognition of distribution patterns of modifications at both the local (gene) and global (chromosome) levels are usually required to infer biologically meaningful conclusions (Hayashi et al., 2007; Li et al., 2008).

Here, we present a NimbleGen Tiling array data Analysis Package (NTAP) designed for histone modification profiling analysis (Li et al., 2008) that can also be applied to other ChIP-chip data (Lee et al., 2007). The advantage of our package is its ability to generate reports for various pattern recognition questions instead of focusing only on identifying significantly enriched oligos or genomic regions.


    2 FUNCTIONS AND FEATURES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FUNCTIONS AND FEATURES
 3 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
NTAP was developed using the R statistical language to take advantage of the powerful statistical functions of other open source packages especially those from the Bioconductor project (http://www.bioconductor.org/). It contains five main steps for data analysis: importing, normalization, feature identification, oligos mapping and post-processing for pattern recognition.

2.1 Data importing
We implemented an R function similar to the ‘read.maimages’ function in the limma package (Smyth 2004) to import NimbleGen raw data into limma data object formats for normalization.

2.2 Data normalization
Users can apply various microarray normalization methods to the imported datasets through the limma package functions ‘normalizeBetweenArrays’ and ‘normalizeWithinArrays’. Unlike the expression profiling arrays whose log transformed ratio distributions are usually symmetric around zero, the distribution of the ChIP-chip result tends to skew to the ChIP channel. Because only the protein-bound DNA fragments will be pulled down by a specific antibody, more positive log transformed ChIP/Input ratios are expected. Thus, the rank-invariant set scheme (Buck and Lieb, 2004) was incorporated for better data normalization.

2.3 Feature identification
Tiling arrays usually contain several oligos per single gene rather than one oligo per gene. For example, the traditional whole-genome array for expression studies in Arabidopsis thaliana usually contains only 23k oligos, while a customized whole-genome tiling array tiled at ~250 bp resolution may contain ~400k oligos. The much larger number of oligos on a single array makes the traditional methods for feature identification unfeasible. For tiling array data, expressed mRNA or pulled-down DNA fragments can cause the signal of a group of neighboring oligos to increase simultaneously. Therefore, our package implements the non-parametric Wilcoxon rank-sum method to compare the signal differences between the ChIP channel and the reference channel for a group of oligos using sliding windows. Under certain circumstances, however, the density of some tiling arrays may not be high enough to use the Wilcoxon method. In these cases, we utilize simple comparison linear models implemented in limma (Smyth 2004) to identify single oligos whose signal increased significantly in the ChIP channel. Then, we consider a genomic region as ‘positive’ if the region contains a single oligo that meets stringent user-defined criteria or the region contains a group of neighboring oligos that meet less stringent criteria.

2.4 Mapping oligos to gene models
Genome data are usually kept up-to-date by genome sequencing consortia or curation groups, who usually release their data as standard XML format files that can be parsed to easily obtain coordinates of gene models. A Perl module was implemented to retrieve records of the gene model position information on each chromosome and to determine the relative position of a specific oligo to its nearby gene model(s). Signal distribution patterns among different groups of genes can then be determined based on the stored relative position information.

2.5 Post-processing functions
The following questions are frequently asked in epigenomics research. What is the modification distribution pattern relative to genes and does it vary between different organs/tissues? Is there an association between specific histone modification levels and gene sizes, or gene expression levels? To answer these questions, we implemented several R functions to align genes, to calculate the average ChIP/Input intensity ratio of the oligos within sliding windows, and to plot the final results for different groups (Fig. 1).


Figure 1
View larger version (40K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Demonstration of two different methods for the alignment of gene models and reorganization of histone modification patterns. (A) Two different strategies to align genes (three genes with different lengths were used as examples). The alignment without gene length normalization overlapped all the oligos based on their absolute distance (kb) to the transcription start site while with length normalization based on the relative positions (percentile) to the transcription start site. (B) The histone modification distribution pattern between different user-defined gene sub-groups that contain various length genes in this particular case. (C) The tissue-specific histone modification distribution pattern on all genes by the two different strategies demonstrated in (A).

 
2.6 Result visualization
Quality control is a key step to guarantee the validity of the overall data analysis. An R function was implemented to calculate the raw intensities correlation coefficient between any pair of two replicates. MA-plots of array hybridization results are also generated in order to examine the intensity ratio (M) versus averaged intensity (A) to discover possible non-linear biases that require special normalization methods. After raw data processing, all the oligos are mapped back to the most up-to-date chromosomes and the ChIP/Input ratio value of each oligo can then be plotted along the chromosome. These values can be displayed either by a program within NTAP or they can be exported in the GFF format to be displayed in the Generic Genome Browser GBrowse (Stein et al., 2002).


    3 IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FUNCTIONS AND FEATURES
 3 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Most of the functions are implemented in the R statistical language (http://www.r-project.org/) and Perl. Users can also choose any other software to pre-process their data before using our post-processing modules.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FUNCTIONS AND FEATURES
 3 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We thank Dr Kate Dreher for providing critical comments.

Funding: NSFC (grants 90408015, 863: 2006AA02Z334); China high-tech platform; Monsanto Fellowship and the China Postdoctoral Program (to K.H.).

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: John Quackenbush

Received on January 24, 2009; revised on May 10, 2009; accepted on May 12, 2009

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 FUNCTIONS AND FEATURES
 3 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Bertone P, et al. Global identification of human transcribed sequences with genome tiling arrays. Science (2004) 306:2242–2246.[Abstract/Free Full Text]

    Buck MJ, Lieb JD. ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics (2004) 83:349–360.[CrossRef][Web of Science][Medline]

    Chung HR, et al. A physical model for tiling array analysis. Bioinformatics (2007) 23:i80–i86.[Abstract/Free Full Text]

    Gendrel AV, et al. Profiling histone modification patterns in plants using genomic tiling microarrays. Nat. methods (2005) 2:213–218.[CrossRef][Web of Science][Medline]

    Hayashi H, et al. High-resolution mapping of DNA methylation in human genome using oligonucleotide tiling array. Hum. Genetics (2007) 120:701–711.[CrossRef][Web of Science][Medline]

    Ji H, et al. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat. Biotech. (2008) 26:1293–1300.[CrossRef][Web of Science][Medline]

    Lee J, et al. Analysis of transcription factor HY5 genomic binding sites revealed its hierarchical role in light regulation of development. Plant Cell (2007) 19:731–749.[Abstract/Free Full Text]

    Li W, et al. A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences. Bioinformatics (2005) 21(Suppl_1):i274–i282.[Abstract]

    Li X, et al. High-resolution mapping of epigenetic modifications of the rice genome uncovers interplay between DNA methylation, histone methylation, and gene expression. Plant cell (2008) 20:259–276.[Abstract/Free Full Text]

    Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. (2004) 3:.

    Stein LD, et al. The generic genome browser: a building block for a model organism system database. Genome Res. (2002) 12:1599–1610.[Abstract/Free Full Text]

    Wang X, et al. NMPP: a user-customized NimbleGen microarray data processing pipeline. Bioinformatics (2006) 22:2955–2957.[Abstract/Free Full Text]

    Zhang ZD, et al. Tilescope: online analysis pipeline for high-density tiling microarray data. Genobiology (2007) 8:R81.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
25/14/1838    most recent
btp320v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by He, K.
Right arrow Articles by Luo, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by He, K.
Right arrow Articles by Luo, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?