Bioinformatics Advance Access originally published online on September 1, 2005
Bioinformatics 2005 21(21):4067-4068; doi:10.1093/bioinformatics/bti652
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BiQ Analyzer: visualization and quality control for DNA methylation data from bisulfite sequencing
1Max-Planck-Institut für Informatik Saarbrücken, Germany
2Universität des Saarlandes FR 8.3 Biowissenschaften, Genetik/Epigenetik, Saarbrücken, Germany
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: Manual processing of DNA methylation data from bisulfite sequencing is a tedious and error-prone task. Here we present an interactive software tool that provides start-to-end support for this process. In an easy-to-use manner, the tool helps the user to import the sequence files from the sequencer, to align them, to exclude or correct critical sequences, to document the experiment, to perform basic statistics and to produce publication-quality diagrams.
Emphasis is put on quality control: The program automatically assesses data quality and provides warnings and suggestions for dealing with critical sequences. The BiQ Analyzer program is implemented in the Java programming language and runs on any platform for which a recent Java virtual machine is available.
Availability: The program is available without charge for non-commercial users and can be downloaded from http://biq-analyzer.bioinf.mpi-inf.mpg.de/
Contact: cbock{at}mpi-inf.mpg.de
| 1 INTRODUCTION |
|---|
|
|
|---|
DNA methylation is a frequent biochemical modification of eukaryotic DNA. In vertebrates, it almost exclusively affects the C5 position of cytosines that belong to CpG dinucleotides (i.e. a cytosine is directly followed by a guanine). Although this phenomenon has been known for several decades, it has recently witnessed a boost of attention. DNA methylation is assumed to play an important role in cancer (Feinberg and Tycko, 2004) and ageing (Issa, 2003). It is the cause for several developmental diseases (Walter and Paulsen, 2003). It has been brought into connection with chromatin remodeling (Reik et al., 2003), low success rates in mammalian cloning (Reik et al., 2003) and RNA interference (Kawasaki and Taira, 2004).
The most accurate and probably the most widely used experimental protocol for analyzing DNA methylation makes use of a selective conversion of unmethylated cytosines to uracils by bisulfite treatment (Frommer et al., 1992; Hajkova et al., 2002). Subsequent amplification, cloning, sequencing and comparison to the genomic sequence allows for identifying the unmethylated cytosines, which then appear as thymines in a multiple sequence alignment. Although this protocol is generally reliable, it gives rise to some potential error sources, which we address with our program.
Currently, few software tools exist that are tailored to support DNA methylation research. On the one hand, several primer design websites (Li and Dahiya, 2002; Tusnady et al., 2005) help the experimenter to prepare DNA methylation experiments, a problem that is upstream of the data processing task that we consider here. On the other hand, there is a basic Microsoft Excel template (Anbazhagan et al., 2001), which can assist with the calculation of average methylation and similar statistics when methylation data have already been generated and cleaned up (downstream of our task). The only software that partially overlaps in scope with BiQ Analyzer is MethTools (Grunau et al., 2000), a set of Perl scripts that generate publication-quality diagrams (lollipops and logos) from methylation data. BiQ Analyzer differs from MethTools in several respects. First, BiQ Analyzer imports sequence files directly from the sequencer without the need for any manual intervention and assists the user with all steps of alignment and quality control. Second, BiQ Analyzer does not only calculate summary statistics but can export the methylation data in full detail and in a format that makes it easy to import them into any statistics package or spreadsheet program. Third, BiQ Analyzer supports standardized experiment documentation. Finally, BiQ Analyzer provides an interactive graphical interface that guides the user through quality control and gives continuous feedback on problematic sequences.
| 2 QUALITY CONTROL METHODS |
|---|
|
|
|---|
Potential error sources in bisulfite sequencing arise from three phases of the experimental protocol: bisulfite conversion, PCR and sequencing. Each of these steps can give rise to characteristic errors in the sequences, which the experimenter must address before deriving methylation profiles.
Here we describe these error types, their impact on methylation data and the quality control methods that BiQ Analyzer applies to identify the critical sequences.
Incomplete conversion. In bisulfite sequencing we assume that all unconverted Cs were originally methylated. Therefore, when the bisulfite treatment fails to convert unmethylated Cs, methylation will be overestimated. Fortunately, for vertebrates it is possible to identify those sequences with a low conversion rate. Assuming that Cs outside a CpG context are always unmethylated (Reik et al., 2003), BiQ Analyzer calculates the conversion rate of a sequence as the ratio between the number of correctly converted Cs outside a CpG context divided by the sum of converted and unconverted Cs outside a CpG context. By default, BiQ Analyzer highlights all sequences with a conversion rate lower than 90% as critical.
Clone sequences. PCR amplification can produce a vast over-representation of sequences from one or few individual chromosomes. Usage of such identical sequences results in biased estimation of DNA methylation. BiQ Analyzer implements a heuristic clone detection method. It highlights those sequences as critical that are identical in all correctly aligned C positions. The advantage of this method over simple sequence comparison is that it is insensitive to sequence truncations and sequencing errors at non-C positions.
Sequencing errors. Sequencing errors changing C to T and vice versa can lead to errors in the methylation data derived from the sequences. Therefore, BiQ Analyzer suggests excluding all sequences that fall below a local sequence identity level of 80% against the genome sequence (conversions and truncations are ignored). Furthermore, in our experiments we regularly observe ambiguous base insertions within a CpG context (i.e. CG
CTG or CG
TCG). In these cases, BiQ Analyzer reports the methylation state of the CpG dinucleotide as unknown.
The threshold levels for minimum conversion rate and minimum sequence identity are based on our experience with bisulfite sequencing and the user can change them in the configuration file of BiQ Analyzer.
| 3 PROGRAM OVERVIEW |
|---|
|
|
|---|
BiQ Analyzer is a software tool designed to mimic the manual process of DNA methylation analysis. In several steps, the user is guided from the import of sequences, across several phases of quality control and multiple sequence alignment, to a questionnaire documenting the experiment. In each of the quality control steps, the program makes suggestions how to handle critical sequences, but the ultimate decision to include or exclude a sequence always stays with the user. Based on the user decisions during that process, the program finally generates a one-file HTML documentation (including publication-quality methylation diagrams in the widely-used lollipop style) and saves the derived methylation data to the system clipboard, ready for subsequent analysis with a spreadsheet or a statistics program.
As a Java application, BiQ Analyzer runs on almost any platform, requiring only a recent version of the Java virtual machine (which can be downloaded from www.javasoft.com) and a screen resolution of at least 1024*768 pixels. For the multiple sequence alignment, a local version of ClustalW (Thompson et al., 1994) is used, which we include in the standard download package. The alignment step is computationally expensive and can be slow on older computers. Therefore, the program also provides an option to calculate the alignment over the internet on a high-performance computer at Max-Planck-Institut für Informatik.
| 4 CONCLUSION |
|---|
|
|
|---|
BiQ Analyzer provides start-to-end support for the visualization and quality control of DNA methylation data from bisulfite sequencing. For the frequent user of bisulfite sequencing it will lead to significant speed up of the data analysis process. The occasional user will benefit from the extensive hints that help to perform a rigorous quality control. Beyond that, BiQ Analyzer promises to be a first step towards standardization in quality control and documentation. This is a necessary prerequisite for the second generation of DNA methylation databases that will validate data quality and that will accept direct submissions from the public. Non-commercial users can download BiQ Analyzer free of charge from http://biq-analyzer.bioinf.mpi-inf.mpg.de/.
| Acknowledgments |
|---|
We thank Tarang Khare for helpful discussions and Joachim Büch for technical support. This work was conducted within the context of the EU Network of Excellence Biosapiens (LSHG-CT-2003-503265) and the EU Network of Excellence The Epigenome (LSHG-CT-2004-503433).
Conflict of Interest: none declared.
Received on June 9, 2005; revised on August 17, 2005; accepted on August 26, 2005
| REFERENCES |
|---|
|
|
|---|
-
Hajkova, P., et al. (2002) DNA-methylation analysis by the bisulfite-assisted genomic sequencing method. Methods Mol. Biol., 200, 143154[Medline].
-
Issa, J.P. (2003) Age-related epigenetic changes and the immune system. Clin. Immunol., 109, 103108[CrossRef][Web of Science][Medline].
-
Kawasaki, H. and Taira, K. (2004) Induction of DNA methylation and gene silencing by short interfering RNAs in human cells. Nature, 431, 211217[CrossRef][Web of Science][Medline].
-
Li, L.C. and Dahiya, R. (2002) MethPrimer: designing primers for methylation PCRs. Bioinformatics, 18, 14271431
[Abstract/Free Full Text] . -
Reik, W., et al. (2003) Mammalian epigenomics: reprogramming the genome for development and therapy. Theriogenology, 59, 2132[CrossRef][Web of Science][Medline].
-
Thompson, J.D., et al. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 46734680
[Abstract/Free Full Text] . -
Tusnady, G.E., et al. (2005) BiSearch: primer-design and search tool for PCR on bisulfite-treated genomes. Nucleic Acids Res., 33, e9
[Abstract/Free Full Text] . -
Walter, J. and Paulsen, M. (2003) Imprinting and disease. Semin. Cell Dev. Biol., 14, 101110[CrossRef][Web of Science][Medline].
Anbazhagan, R., et al. (2001) Spreadsheet-based program for the analysis of DNA methylation. Biotechniques, 30, 110114[Web of Science][Medline].
Feinberg, A.P. and Tycko, B. (2004) The history of cancer epigenetics. Nat. Rev. Cancer, 4, 143153[Web of Science][Medline].
Frommer, M., et al. (1992) A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc. Natl Acad. Sci. USA, 89, 18271831
Grunau, C., et al. (2000) MethToolsa toolbox to visualize and analyze DNA methylation data. Nucleic Acids Res., 28, 10531058
This article has been cited by other articles:
![]() |
L. Lefebvre, L. Mar, A. Bogutz, R. Oh-McGinnis, M. A. Mandegar, J. Paderova, M. Gertsenstein, J. A. Squire, and A. Nagy The interval between Ins2 and Ascl2 is dispensable for imprinting centre function in the murine Beckwith-Wiedemann region Hum. Mol. Genet., November 15, 2009; 18(22): 4255 - 4267. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Davidsson, H. Lilljebjorn, A. Andersson, S. Veerla, J. Heldrup, M. Behrendtz, T. Fioretos, and B. Johansson The DNA methylome of pediatric acute lymphoblastic leukemia Hum. Mol. Genet., November 1, 2009; 18(21): 4054 - 4065. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Gerasimaite, G. Vilkaitis, and S. Klimasauskas A directed evolution design of a GCG-specific DNA hemimethylase Nucleic Acids Res., October 8, 2009; (2009) gkp772v2. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Bouazoune, T. B. Miranda, P. A. Jones, and R. E. Kingston Analysis of individual remodeled nucleosomes reveals decreased histone-DNA contacts created by hSWI/SNF Nucleic Acids Res., September 1, 2009; 37(16): 5279 - 5294. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Lee, J. Gaetz, B. Bugarija, C. J. Fernandes, G. E. Snyder, E. C. Bush, and B. T. Lahn Chromatin analysis of occluded genes Hum. Mol. Genet., July 15, 2009; 18(14): 2567 - 2574. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Novakovic, M. Sibson, H. K. Ng, U. Manuelpillai, V. Rakyan, T. Down, S. Beck, T. Fournier, D. Evain-Brion, E. Dimitriadis, et al. Placenta-specific Methylation of the Vitamin D 24-Hydroxylase Gene: IMPLICATIONS FOR FEEDBACK AUTOREGULATION OF ACTIVE VITAMIN D LEVELS AT THE FETOMATERNAL INTERFACE J. Biol. Chem., May 29, 2009; 284(22): 14838 - 14848. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Schulz, R. B. McCole, K. Woodfine, A. J. Wood, M. Chahal, D. Monk, G. E. Moore, and R. J. Oakey Transcript- and tissue-specific imprinting of a tumour suppressor gene Hum. Mol. Genet., January 1, 2009; 18(1): 118 - 127. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Chotalia, S. A. Smallwood, N. Ruf, C. Dawson, D. Lucifero, M. Frontera, K. James, W. Dean, and G. Kelsey Transcription is required for establishment of germline methylation marks at imprinted genes Genes & Dev., January 1, 2009; 23(1): 105 - 117. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. N. Kongkham, P. A. Northcott, Y. S. Ra, Y. Nakahara, T. G. Mainprize, S. E. Croul, C. A. Smith, M. D. Taylor, and J. T. Rutka An Epigenetic Genome-Wide Screen Identifies SPINT2 as a Novel Tumor Suppressor Gene in Pediatric Medulloblastoma Cancer Res., December 1, 2008; 68(23): 9945 - 9953. [Abstract] [Full Text] [PDF] |
||||
![]() |
W.-Y. Tang, R. Newbold, K. Mardilovich, W. Jefferson, R. Y. S. Cheng, M. Medvedovic, and S.-M. Ho Persistent Hypomethylation in the Promoter of Nucleosomal Binding Protein 1 (Nsbp1) Correlates with Overexpression of Nsbp1 in Mouse Uteri Neonatally Exposed to Diethylstilbestrol or Genistein Endocrinology, December 1, 2008; 149(12): 5922 - 5931. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Yagi, K. Hirabayashi, S. Sato, W. Li, Y. Takahashi, T. Hirakawa, G. Wu, N. Hattori, N. Hattori, J. Ohgane, et al. DNA methylation profile of tissue-dependent and differentially methylated regions (T-DMRs) in mouse promoter regions demonstrating tissue-specific gene expression Genome Res., December 1, 2008; 18(12): 1969 - 1978. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Novakovic, V. Rakyan, H.K. Ng, U. Manuelpillai, C. Dewi, N.C. Wong, R. Morley, T. Down, S. Beck, J.M. Craig, et al. Specific tumour-associated methylation in normal human term placenta and first-trimester cytotrophoblasts Mol. Hum. Reprod., September 1, 2008; 14(9): 547 - 554. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. J. Jensen, P. Novak, K. E. Eblin, A. J. Gandolfi, and B. W. Futscher Epigenetic remodeling during arsenical-induced malignant transformation Carcinogenesis, August 1, 2008; 29(8): 1500 - 1508. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. A.T. Rodriguez, A. S.L. Cheng, P. S. Yan, D. Potter, F. J. Agosto-Perez, C. L. Shapiro, and T. H.-M. Huang Epigenetic repression of the estrogen-regulated Homeobox B13 gene in breast cancer Carcinogenesis, July 1, 2008; 29(7): 1459 - 1465. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Kumaki, M. Oda, and M. Okano QUMA: quantification tool for methylation analysis Nucleic Acids Res., July 1, 2008; 36(suppl_2): W170 - W175. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Rohde, Y. Zhang, T. P. Jurkowski, H. Stamerjohanns, R. Reinhardt, and A. Jeltsch Bisulfite sequencing Data Presentation and Compilation (BDPC) web server--a useful tool for DNA methylation analysis Nucleic Acids Res., March 1, 2008; 36(5): e34 - e34. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Oh, R. Ho, L. Mar, M. Gertsenstein, J. Paderova, J. Hsien, J. A. Squire, M. J. Higgins, A. Nagy, and L. Lefebvre Epigenetic and Phenotypic Consequences of a Truncation Disrupting the Imprinted Domain on Distal Mouse Chromosome 7 Mol. Cell. Biol., February 1, 2008; 28(3): 1092 - 1103. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.J. Marques, P. Costa, B. Vaz, F. Carvalho, S. Fernandes, A. Barros, and M. Sousa Abnormal methylation of imprinted genes in human sperm is associated with oligozoospermia Mol. Hum. Reprod., February 1, 2008; 14(2): 67 - 74. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Bock and T. Lengauer Computational epigenetics Bioinformatics, January 1, 2008; 24(1): 1 - 10. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Wiznerowicz, J. Jakobsson, J. Szulc, S. Liao, A. Quazzola, F. Beermann, P. Aebischer, and D. Trono The Kruppel-associated Box Repressor Domain Can Trigger de Novo Promoter Methylation during Mouse Early Embryogenesis J. Biol. Chem., November 23, 2007; 282(47): 34535 - 34541. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Ruf, S. Bahring, D. Galetzka, G. Pliushch, F. C. Luft, P. Nurnberg, T. Haaf, G. Kelsey, and U. Zechner Sequence-based bioinformatic prediction and QUASEP identify genomic imprinting of the KCNK9 potassium channel gene in mouse and human Hum. Mol. Genet., November 1, 2007; 16(21): 2591 - 2599. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. P. Kovacheva, T. J. Mellott, J. M. Davison, N. Wagner, I. Lopez-Coviella, A. C. Schnitzler, and J. K. Blusztajn Gestational Choline Deficiency Causes Global and Igf2 Gene DNA Hypermethylation by Up-regulation of Dnmt1 Expression J. Biol. Chem., October 26, 2007; 282(43): 31777 - 31788. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Mikeska, C. Bock, O. El-Maarri, A. Hubner, D. Ehrentraut, J. Schramm, J. Felsberg, P. Kahl, R. Buttner, T. Pietsch, et al. Optimization of Quantitative MGMT Promoter Methylation Analysis Using Pyrosequencing and Combined Bisulfite Restriction Analysis J. Mol. Diagn., July 1, 2007; 9(3): 368 - 381. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. M. Carr, E. M. A. Valleley, S. F. Cordery, A. F. Markham, and D. T. Bonthron Sequence analysis and editing for bisulphite genomic sequencing projects Nucleic Acids Res., May 21, 2007; (2007) gkm330v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-M. Ho, W.-Y. Tang, J. Belmonte de Frausto, and G. S. Prins Developmental Exposure to Estradiol and Bisphenol A Increases Susceptibility to Prostate Carcinogenesis and Epigenetically Regulates Phosphodiesterase Type 4 Variant 4 Cancer Res., June 1, 2006; 66(11): 5624 - 5632. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||











