Skip Navigation


Bioinformatics Advance Access originally published online on November 17, 2007
Bioinformatics 2008 24(1):1-10; doi:10.1093/bioinformatics/btm546
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
24/1/1    most recent
btm546v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (8)
Google Scholar
Right arrow Articles by Bock, C.
Right arrow Articles by Lengauer, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bock, C.
Right arrow Articles by Lengauer, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Computational epigenetics

Christoph Bock * and Thomas Lengauer

Max-Planck-Institut für Informatik, Saarbrücken, Germany

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TWO FACETS OF...
 3 MECHANISMS OF EPIGENETIC...
 4 GENERATION, LOW-LEVEL...
 5 EPIGENOME DATA ANALYSIS
 6 EPIGENOME PREDICTION:...
 7 CANCER EPIGENETICS: TOWARD...
 8 FUTURE DIRECTIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

Epigenetic research aims to understand heritable gene regulation that is not directly encoded in the DNA sequence. Epigenetic mechanisms such as DNA methylation and histone modifications modulate the packaging of the DNA in the nucleus and thereby influence gene expression. Patterns of epigenetic information are faithfully propagated over multiple cell divisions, which makes epigenetic regulation a key mechanism for cellular differentiation and cell fate decisions. In addition, incomplete erasure of epigenetic information can lead to complex patterns of non-Mendelian inheritance. Stochastic and environment-induced epigenetic defects are known to play a major role in cancer and ageing, and they may also contribute to mental disorders and autoimmune diseases. Recent technical advances such as ChIP-on-chip and ChIP-seq have started to convert epigenetic research into a high-throughput endeavor, to which bioinformatics is expected to make significant contributions. Here, we review pioneering computational studies that have contributed to epigenetic research. In addition, we give a brief introduction into epigenetics—targeted at bioinformaticians who are new to the field—and we outline future challenges in computational epigenetics.

Contact: cbock{at}mpi-inf.mpg.de


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TWO FACETS OF...
 3 MECHANISMS OF EPIGENETIC...
 4 GENERATION, LOW-LEVEL...
 5 EPIGENOME DATA ANALYSIS
 6 EPIGENOME PREDICTION:...
 7 CANCER EPIGENETICS: TOWARD...
 8 FUTURE DIRECTIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Epigenetics is commonly defined as the ‘study of mitotically and/or meiotically heritable changes in gene function that cannot be explained by changes in DNA sequence’ (Russo et al., 1996). Its fundamental objective is to elucidate how genetic information encoded in the DNA sequence and non-genetic aspects such as the way the DNA is packaged inside the nucleus jointly control gene expression. This touches upon two central problems of biology: How do cells specialize when a complex multi-cellular organism develops from a single fertilized egg (Reik, 2007)? And which molecular mechanisms contribute to phenotypic inheritance (Richards, 2006)?

The field of epigenetics has recently received a boost of attention and is currently among the fastest moving areas in molecular biology. Unprecedented technological advances enable genome-scale analysis of epigenetic mechanisms and render comprehensive epigenome projects feasible (Bernstein et al., 2007). Epigenetic analysis of embryonic stem (ES) cells has started to unveil the basic circuitry of mammalian development (Surani et al., 2007). And in cancer research, epigenetics opens up novel approaches for early diagnosis and treatment (Jones and Baylin, 2007).

Various bioinformatic challenges arise from the analysis of epigenetic data, and computational methods have already played a role in solving important epigenetic problems. In this review, we introduce the basic concepts of epigenetics and we summarize relevant computational and bioinformatic work performed in this area. Furthermore we outline future directions, arguing that upcoming epigenome projects will constitute a major challenge for the emerging field of computational epigenetics.


    2 TWO FACETS OF EPIGENETIC INHERITANCE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TWO FACETS OF...
 3 MECHANISMS OF EPIGENETIC...
 4 GENERATION, LOW-LEVEL...
 5 EPIGENOME DATA ANALYSIS
 6 EPIGENOME PREDICTION:...
 7 CANCER EPIGENETICS: TOWARD...
 8 FUTURE DIRECTIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Epigenetic mechanisms influence phenotype through heritable regulation of gene expression. The constitutive property of epigenetic inheritance is that it is encoded in covalent modifications of the DNA and the chromatin proteins attached to it, rather than in the DNA sequence itself (as is the case for genetic inheritance). Because such modifications are more readily altered than the DNA sequence, epigenetic information can be reprogrammed dynamically during cellular differentiation, but is also propagated with substantially lower fidelity than genetic information. An error rate of 10–3 has been estimated per site and cell division for DNA methylation (Ushijima et al., 2003), in contrast to values in the order of 10–8 per basepair and cell division for genetic mutations (Drake et al., 1998). Epigenetic inheritance occurs both between generations of cells (mitotic inheritance) and between generations of a species (meiotic inheritance).

Epigenetic mitotic inheritance is critically involved in cellular differentiation and cell fate decisions. Recent research has provided a mechanistic understanding of the key phases of epigenetic regulation during development (Reik, 2007). To start with, germ cells carry highly specialized and parent-specific epigenetic information. Shortly after fertilization, a fundamental reprogramming step resets most epigenetic information to a default state, which might be derived from properties of the DNA sequence. This reprogrammed epigenetic state seems to be crucial for the pluripotency of ES cells (i.e. for their ability to differentiate into diverse tissue types). During cellular differentiation, ES cells reprogram their epigenetic state once again when tissue-specific transcription factors are activated and pluripotency-specific genes become silenced. In terminally differentiated cells, epigenetic information is faithfully propagated during cell division. However, cellular ageing leads to increasing heterogeneity within a cell population and can also contribute to tumor development (Fraga and Esteller, 2007). Finally, the specialized cells of the germline reprogram epigenetic information in a parent-specific way, before it is passed on to the offspring as sperm or egg.

Epigenetic meiotic inheritance is caused by incomplete reprogramming in the early embryo, which results in the propagation of epigenetic information from parent to offspring. This phenomenon gives rise to patterns of phenotypic inheritance that are inconsistent with Mendelian rules. First, imprinted genes are inherited and expressed in a parent-specific way, i.e. only the maternal allele is transcribed while the paternal allele is epigenetically silenced or vice versa. Imprinted genes play a central role in the development of placenta and brain, and thay have been linked to several rare neurogenetic disorders as well as to cancer (Solter, 2006). Second, acquired traits can be epigenetically transmitted over multiple generations. While this type of inheritance is relatively rare in mammals (Peaston and Whitelaw, 2006), for plants it seems to be a common way of adapting gene regulation to a changing environment (Grant-Downton and Dickinson, 2006).


    3 MECHANISMS OF EPIGENETIC REGULATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TWO FACETS OF...
 3 MECHANISMS OF EPIGENETIC...
 4 GENERATION, LOW-LEVEL...
 5 EPIGENOME DATA ANALYSIS
 6 EPIGENOME PREDICTION:...
 7 CANCER EPIGENETICS: TOWARD...
 8 FUTURE DIRECTIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Epigenetic regulation exploits the fact that the packaging of DNA inside the nucleus directly influences gene expression (Dillon, 2006). In general, the tighter a gene's DNA is wrapped up, the more likely it is switched off. Conversely, the more accessible it is to the transcription machinery, the more likely it is actively transcribed. Physically, the genome of eukaryotic cells is stored in a highly regulated protein–DNA complex called chromatin, which controls DNA accessibility for cellular processes such as transcription, replication and DNA repair (Woodcock, 2006). Epigenetic mechanisms can be both activating (i.e. fostering open chromatin structure, called euchromatin) or repressive (i.e. fostering condensed chromatin structure, called heterochromatin), and different epigenetic mechanisms frequently act synergistically. Three biochemical mechanisms are commonly referred to as epigenetic: (i) DNA methylation, (ii) histone modifications and (iii) binding of non-histone proteins such as Polycomb and trithorax group complexes.

DNA methylation (Bird, 2002; Weber and Schübeler, 2007) is the only epigenetic modification that directly affects the DNA. Biochemically, a hydrogen atom of the cytosine base is replaced by a methyl group (Fig. 1, left). This does not alter the way in which the cytosine is transcribed into mRNA, but it fosters a locally more compact chromatin structure and affects transcription factor binding. In mammals, DNA methylation is largely confined to cytosines in a CpG context (‘CpG’ stands for cytidine and guanosine, separated by a phosphate atom), which has two important implications. First, any genomic position that can be methylated is symmetric, i.e. there is a—methylated or unmethylated—cytosine on the forward strand as well as on the reverse strand. Therefore, after DNA replication a specific enzyme can read the DNA methylation pattern of the parent strand and faithfully copy it to the newly synthesized strand, thereby maintaining heritable DNA methylation patterns. Second, in mammalian genomes CpG dinucleotides occur in clusters, and the genomic regions with highest CpG density—termed CpG islands—exhibit the lowest levels of DNA methylation. This phenomenon is most likely caused by the fact that mutation rates are substantially higher for methylated CpGs than for unmethylated CpGs, hence absence of DNA methylation at least in the germline seems to be constitutive for long-term maintenance of most CpG islands.


Figure 1
View larger version (41K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Carriers of epigenetic information: DNA and nucleosome. The left panel shows a DNA double helix that is methylated symmetrically on both strands (orange spheres) at its center CpG (PDB structure: 329d). DNA methylation is the only epigenetic mechanism that directly targets the DNA. The right panel shows a nucleosome spindle consisting of eight histone proteins (center), around which two loops of DNA are wound (PDB structure: 1KX5). The nucleosome is subject to covalent modifications of its histones and to the binding of non-histone proteins.

 
Histone modifications (Kouzarides, 2007) are post-translational modifications of the core histone proteins that constitute the nucleosome (Fig. 1, right). The long and unstructured N-terminal tails by which histone proteins interact with neighboring nucleosomes are subject to various types of covalent modifications, including lysine and arginine methylation, lysine acetylation and serine phosphorylation. Histone modifications influence the nucleosome's assembly into higher-order packaging structures by moderating its DNA-binding affinity and by recruiting further chromatin remodeling complexes. The concept of the histone code (Turner, 2007) suggests that histone modifications are used combinatorially to program genes for activation during subsequent steps of cellular differentiation. Although the generality of this concept is controversial and remains difficult to test experimentally, it provides a plausible model for a number of recent observations, such as the programmed activation of tissue-specific transcription factors during differentiation of ES cells (Bernstein et al., 2006).

Non-histone proteins influence chromatin structure by interacting with histones and DNA in a number of ways. ATP-dependent chromatin remodeling complexes act like molecular machines and can directly move or displace nucleosomes along the DNA (Gangaraju and Bartholomew, 2007). A second group of proteins, which includes HP1 as well as the Polycomb and trithorax group complexes can be thought of as the readers and writers of the epigenome. They bind to the DNA or to specifically modified histones and catalyze other histone modifications or DNA methylation. The Polycomb group complex 2 for example catalyzes repressive histone methylations and recruits DNA methylation through its interaction with a DNA methyltransferase (Schuettengruber et al., 2007). Transcription factors can also affect chromatin structure, e.g. through recruitment of histone acetylases. Interestingly, there is evidence that sometimes transcription factor binding is maintained during cell division and would therefore qualify as mitotically heritable (Zhou et al., 2005). Nevertheless, by convention rather than by definition transcription factor binding is not usually regarded as epigenetic.

In summary, a variety of epigenetic mechanisms jointly control the packaging of the DNA, thereby regulating which genes are accessible for transcription. Epigenetic mechanisms are highly interwoven and regulate their target genes (and each other) in a complex network of synergistic and antagonistic interactions. Disentangling this network both biochemically for a small number of representative genes and statistically from a whole-genome perspective, and relating the results to development and disease are important goals of epigenetic research. In the remainder of this review, we discuss arising bioinformatic challenges, and we show how computational methods have contributed and will continue to contribute to answering important epigenetic questions.


    4 GENERATION, LOW-LEVEL PROCESSING AND QUALITY CONTROL OF EPIGENETIC DATA
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TWO FACETS OF...
 3 MECHANISMS OF EPIGENETIC...
 4 GENERATION, LOW-LEVEL...
 5 EPIGENOME DATA ANALYSIS
 6 EPIGENOME PREDICTION:...
 7 CANCER EPIGENETICS: TOWARD...
 8 FUTURE DIRECTIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Various experimental techniques have been developed for genome-wide mapping of epigenetic information (Table 1). These techniques follow a basic three-stage design. First, the epigenetic information is biochemically converted into genetic information, e.g. by enriching genomic regions that carry a particular histone modification in a DNA library. Second, standard DNA techniques such as tiling microarrays or sequencing are applied. Third, computational algorithms are used to infer the epigenetic information from the tiling array data or sequencing output. All experimental methods for epigenome mapping generate large amounts of data and require efficient ways of low-level data processing and quality control.


View this table:
[in this window]
[in a new window]

 
Table 1. Methods for genome-wide mapping of epigenetic information

 
For ChIP-on-chip (Table 1), the key bioinformatic challenge is to derive a ranked list of overrepresented genomic regions from raw probe intensities. Although there are some similarities to the analysis of tiling array data for transcriptome mapping (see Royce et al., 2005 for a review), most available algorithms are specifically targeted to peak finding in ChIP-on-chip data. The initial and still widely used solution employs a three-step process (Cawley et al., 2004). First, the microarrays are quantile-normalized and standardized to a common median intensity. Second, a Wilcoxon rank sum test is applied locally on a sliding window to test for differential hybridization and to derive a Z-score for each probe. Third, significant probes are merged into regions of overrepresentation if sufficiently close to each other, and these regions are ranked by their combined Z-score. More recently, hidden Markov models were introduced to improve the detection accuracy (first implemented in HMMTiling, Li et al., 2005), linear models were applied to control for differences in probe sensitivity [implemented in MAT (Johnson et al., 2006) for Affymetrix one-color arrays and in MA2C (Song et al., 2007) for NimbleGen two-color arrays] and probabilistic binding models were used to improve spatial resolution (implemented in the JBD algorithm, Qi et al., 2006). Furthermore, several peak finding toolkits have been developed to facilitate routine processing of ChIP-on-chip datasets. TileMap is an easy-to-use peak finder for Affymetrix tiling array data, which has been applied in a number of independent studies (Ji and Wong, 2005); Ringo is a Bioconductor package for the analysis of ChIP-on-chip data from the widely used NimbleGen platform (Toedling et al., 2007); ChIPOTle is a basic peak finding macro for Excel, which does not take platform-specific information into account (Buck et al., 2005); and Tilescope is a fully integrated analysis pipeline that is applicable to data from both the Affymetrix and the NimbleGen platform (Zhang et al., 2007). In spite of the abundance of algorithms published recently, the peak finding problem for ChIP-on-chip data cannot be regarded as solved. In particular, current peak finders have problems with histone modifications that cover extended genomic regions and they seem to miss a substantial number of weak binding sites. In order to select a biologically meaningful cutoff that distinguishes between significant peaks and random fluctuations, experimental validation of a moderate number of detected peaks continues to be crucial. To guide this process, a framework has been proposed that can help identify most informative regions for validation (Du et al., 2006).

The key bioinformatic step of ChIP-seq (Table 1) is the fast and accurate mapping of short sequence reads to the reference genome. In principle, any seed-based alignment program such as blastn (http://www.ncbi.nlm.nih.gov/BLAST) or BLAT (Kent, 2002) is applicable to this task. Nevertheless, seed alignment strategies that are specifically optimized for reads from a particular sequencing platform have been reported to yield substantial increases in speed and coverage (Synamatix Sdn. Bhd., 2007). Two commercial solutions for short ChIP-seq reads are currently available, namely, the ELAND tool included in the Solexa analysis pipeline (http://www.solexa.com/) and the SXOligosearch software (http://www.synamatix.com/). In addition, a customized alignment protocol has been developed at the Broad Institute (Mikkelsen et al., 2007). Unlike relative probe intensities in ChIP-on-chip, each sequence read in a ChIP-seq experiment directly corresponds to a single chromatin fragment that was bound by the antibody during immunoprecipitation. For this reason, it is commonly assumed that ChIP-seq requires almost no normalization and that data analysis can be based directly on sequence read counts (Barski et al., 2007) or sliding window read counts (Mikkelsen et al., 2007). However, an important caveat is that the process of mapping tags to the reference genome can bias the analysis toward genomic regions with unique and complex sequence patterns. This is because short sequencing reads that (partially) overlap with low-complexity regions or with interspersed repeats stand a higher chance of being discarded for lack of unique genomic alignment.

Bisulfite sequencing (Table 1) requires customized analysis software that accounts for the ‘fifth base’, 5-methyl-cytosine. When bisulfite-treated DNA is sequenced directly (i.e. without vector cloning), the average methylation levels can be estimated using the ESME software (Lewin et al., 2004, freely available from http://www.epigenome.org/index.php?page=download). This software corrects for systematic bias induced by different molecular weights at methylation-specific SNPs and facilitates quality control. When subclones of bisulfite-treated DNA are sequenced, which is regarded as the gold standard for DNA methylation analysis, methylation patterns are inferred by aligning the clonal sequences to the genomic sequence. The BiQ Analyzer software (Bock et al., 2005) has been developed to simplify this analysis, to perform stringent quality control and to visualize the results. In addition, specialized primer design programs exist, of which Methyl Primer Express (freely available from http://www.appliedbiosystems.com/) is probably the most widely used. However, manual refinement is often necessary, suggesting that further improvements of primer design programs are needed.


    5 EPIGENOME DATA ANALYSIS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TWO FACETS OF...
 3 MECHANISMS OF EPIGENETIC...
 4 GENERATION, LOW-LEVEL...
 5 EPIGENOME DATA ANALYSIS
 6 EPIGENOME PREDICTION:...
 7 CANCER EPIGENETICS: TOWARD...
 8 FUTURE DIRECTIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Rapid progress of experimental technologies has given rise to several epigenome mapping initiatives (Table 2). These projects have been breaking ground not only in terms of applying and improving large-scale experimental methods, but also in terms of developing bioinformatic methods for analyzing their data.


View this table:
[in this window]
[in a new window]

 
Table 2. Large-scale epigenome mapping projects as of October 2007

 
This is particularly true for the ENCODE project, which has been designed from the onset as a close cooperation between experimental and computational biologists. Although the ENCODE project aims to map functional elements in the human genome rather than to resolve epigenetic questions, the methods and tools that emerged from this project contribute to epigenome data analysis in a number of ways. First, a method for unsupervised segmentation of chromatin data was developed based on wavelet smoothing and hidden Markov models (Thurman et al., 2007). When applied to selected ChIP-on-chip datasets from the ENCODE pilot phase, the algorithm neatly recovered the two main chromatin states: open and transcriptionally competent euchromatin as well as inaccessible and transcriptionally silent heterochromatin. Second, the joint statistical analysis of all 105 ChIP-on-chip datasets from the ENCODE pilot phase (Zhang et al., 2007) provides an example of exploratory data analysis on a large and heterogeneous dataset that includes substantial amounts of epigenetic information. Third, several alternative prediction methods for annotating functional promoters were developed and evaluated (Trinklein et al., 2007), indicating that epigenetic data can substantially improve the accuracy of promoter annotation. Fourth, a rigorous statistical test was developed that assesses the significance of overlap between two sets of genomic features, e.g. between CpG islands and unmethylated genomic regions (ENCODE Project Consortium, 2007). The authors show that—under relatively weak assumptions—their Genome Structure Correction method yields realistic P-values while other randomization-based methods tend to over-estimate significance. Fifth, the ENCODE project was accompanied by the systematic incorporation of epigenome datasets into the UCSC Genome Browser (Thomas et al., 2007), which now provides integrated visualization and standardized retrieval of various genome and epigenome datasets. Finally, the successful collaboration of experimental and bioinformatic researchers in the ENCODE project has raised the awareness of synergies between wet lab and computational research. The AHEAD task force for example acknowledges the critical importance of bioinformatic methods and infrastructure in their proposal for a human epigenome project (Alliance for Human Epigenomics and Disease, 2007).

Although the bioinformatic focus of the other large-scale epigenome projects (Table 2) was less pronounced than in the ENCODE project, important bioinformatic progress arose from them as well. The HEROIC project played a catalyzing role for the development of epigenome data storage, visualization and analysis infrastructure in Europe. In fact, in its regulatory builds the Ensembl genome browser (Hubbard et al., 2007) will increasingly incorporate epigenetic information such as genome-wide maps of DNA methylation and histone modifications (P.Flicek, personal communication). The HEP project for the first time explored the challenges and opportunities of high-resolution epigenome analysis in multiple unrelated individuals. And the two large-scale ChIP-seq projects that have been completed recently underline the relevance of analyzing various epigenetic mechanisms simultaneously in a single cell type (Barski et al., 2007) and at multiple stages during cellular differentiation (Mikkelsen et al., 2007). While the general picture emerging from these studies is consistent with mammalian epigenomes being segmented into alternating regions of open and condensed chromatin, many more sophisticated concepts become visible only at high resolution and when analyzing various epigenetic mechanisms simultaneously. For example, it has been shown recently that computational integration of several histone modification maps can be used to predict the locations of enhancers in the human genome, even where these are invisible to phylogenetic methods (Heintzman et al., 2007; Roh et al., 2007).

However, these pioneering epigenome mapping projects also highlight two major impediments to epigenome data analysis: the unsolved problem of public data storage and the lack of experimental standardization. Public data storage in databases such as GenBank and ArrayExpress has played an important role for bioinformatic research, by making primary data available for meta-analysis and benchmarking studies. However, with the advent of ChIP-seq, the central collection of primary data is hitting technical limitations. A typical three-day run on a Solexa sequencer gives rise to hundreds of gigabytes of primary image data and several gigabases of sequence reads, and in less than a year a single Solexa sequencer could generate the equivalent of all sequence data stored in GenBank until 2005. In addition to developing more efficient methods for data processing and storage, it will therefore be necessary to work out policies that regulate how primary data should be archived and how the benefits of publicly available primary data can be maintained when central storage is no longer an option. The second problem, lack of experimental standardization, hampers the computational integration of epigenetic datasets from different studies. Because epigenetic information is tissue-specific and because methods such as ChIP-on-chip are highly sensitive to variation in the experimental protocol, most epigenome datasets that have been published to date are—strictly speaking—incomparable. Nevertheless, several meta-analyses of ChIP-on-chip data have been published and significant correlations have been observed for epigenetic modifications that are associated with an open chromatin structure (Bock et al., 2007; Parisi et al., 2007; Zhang et al., 2007), while an initial comparison for repressive histone modifications indicated substantially less correlation between different datasets (C.Bock, unpublished data). Although complete standardization is neither realistic nor desirable, it seems advisable to focus different epigenome mapping projects on the same set of cell lines, as is done in the ENCODE project.


    6 EPIGENOME PREDICTION: INFERRING EPIGENETIC STATES FROM THE DNA SEQUENCE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TWO FACETS OF...
 3 MECHANISMS OF EPIGENETIC...
 4 GENERATION, LOW-LEVEL...
 5 EPIGENOME DATA ANALYSIS
 6 EPIGENOME PREDICTION:...
 7 CANCER EPIGENETICS: TOWARD...
 8 FUTURE DIRECTIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
A substantial amount of bioinformatic research has been devoted to the prediction of epigenetic information from characteristics of the genomic sequence. Such predictions serve a dual purpose. First, accurate epigenome predictions can substitute for experimental data, to some degree, which is particularly relevant for newly discovered epigenetic mechanisms and for species other than human and mouse. Second, prediction algorithms build statistical models of epigenetic information from training data and can therefore act as a first step toward quantitative modeling of an epigenetic mechanism.

Promoter prediction—an important topic in bioinformatics since the early 1990s—can be regarded as the first attempt to predict epigenetic states from the DNA sequence. This is because active promoters are characterized by an open and transcriptionally permissive chromatin structure and exhibit specific epigenetic properties such as absence of DNA methylation and enrichment for histone acetylation. A large number of promoter prediction methods have been developed during the last two decades, most of which use DNA sequence characteristics combined with a machine-learning algorithm to identify candidate promoters (see Bajic et al., 2004 for a comprehensive overview and benchmarking analysis). In the highly annotated human genome, promoter prediction has lost some of its relevance and researchers are increasingly focusing on advanced questions of transcription control, such as inferring tissue-specific signals (Smith et al., 2007) and reconstructing transcriptional networks (Bulcke et al., 2006).

CpG island prediction has some overlap with promoter prediction because the majority of promoters in mammalian genomes co-localize with CpG islands (Antequera, 2003). However, CpG islands play a more general role as mediators of open chromatin structure, and they frequently overlap with enhancers and other regulatory elements. CpG islands were originally discovered by a striking absence of DNA methylation (Cooper et al., 1983), which is regarded as a constitutive feature of CpG islands. The absence of DNA methylation in the germline reduces CpG-to-TpG mutation rates inside CpG islands, leading to overrepresentation of CpGs relative to the genomic average. CpG islands are often predicted solely based on their GC and CpG frequencies, and multiple variants of the original definition (Gardiner-Garden and Frommer, 1987) are in use. However, a recent study showed that these definitions yield high false positive rates, and a refined concept of bona fide CpG islands based on large-scale epigenome prediction was proposed (Bock et al., 2007).

DNA methylation prediction is conceptually easier than the prediction of more volatile epigenetic mechanisms because DNA methylation patterns exhibit relatively low tissue specificity compared to other epigenetic information. Therefore, it is not surprising that similar approaches applied to DNA methylation data for blood (Bock et al., 2006) and brain tissue (Das et al., 2006; Fang et al., 2006) yielded comparable results. In all three cases, machine-learning methods were used to derive a classifier for presence or absence of DNA methylation in a given region. Prediction accuracies were high, and the most predictive attributes included CpG-rich sequence patterns (Bock et al., 2006; Das et al., 2006; Fang et al., 2006), specific DNA structure properties and repetitive DNA elements (Bock et al., 2006) as well as certain transcription factor binding sites (Fang et al., 2006). Interestingly, a similar method could also predict which genomic regions are prone to becoming methylated in a cell line overexpressing the DNA methyltransferase DNMT1 (Feltus et al., 2003).

Prediction of nucleosome positioning is based on the observation that the sequence composition of DNA molecules strongly affects their nucleosome affinity, i.e. how easily they can be wound around a nucleosome (Satchwell et al., 1986). Several recent papers showed that this in vitro effect has significant impact on the genomic positioning of nucleosomes in vivo (Ioshikhes et al., 2006; Peckham et al., 2007; Segal et al., 2006). Although all three papers focus their analysis on yeast, the highly conserved nature of the nucleosome suggests a general applicability of these results. Indeed, Segal et al. observe that the predictions change little when training is performed on nucleosome positioning data from chicken instead of yeast, and Ioshikhes et al. find that an alignment of multiple yeast species can increase prediction accuracy.

Successful prediction has also been reported for several other epigenetic phenomena: DNase I hypersensitive sites could be distinguished from a random control set using support vector machines with k-mer sequence motifs as prediction attributes (Noble et al., 2005). Polycomb/trithorax response elements in Drosophila were identified by sequence criteria (Ringrose et al., 2003), a finding that may not easily translate to humans because mammalian Polycomb/trithorax response elements exhibit less identifiable sequence patterns. Imprinted genes were predicted using a wide range of genomic features (sequence motifs, CpG islands, repeats, predicted transcription factor binding sites) and a commercial support vector machine-based data mining suite (Luedi et al., 2005). Finally, genes that escape X-chromosome inactivation were predicted by a support vector machine and found to be enriched in Alu repeats and CpG-rich sequence motifs (Wang et al., 2006). However, a conclusive assessment of prediction methods for imprinted genes and for genes that escape inactivation seems problematic due to the small number of affected genes, their clustering in small genomic regions and the difficulty of independent experimental validation.

In summary, a large number of genomic regions exhibit clearly detectable epigenetic footprints in their DNA sequence. This has practical applications for genome annotation and also challenges the notion of genome and epigenome as two largely independent systems of inheritance working at different time scales. Rather, the genome seems to encode not only genes and cis-regulatory elements, but also a default epigenetic state that becomes active in the absence of other regulatory influences such as the binding of transcription factors or the activity of chromatin remodeling complexes. This interpretation is consistent with the emerging concept of multi-tasking genomes, which simultaneously (and on top of each other) encode genes and their regulation (Kapranov et al., 2007). Furthermore, this model provides an explanation for the fact that only a small subset of suitable consensus binding motifs are actually used by transcription factors in vivo. A new generation of in silico methods for detecting transcription factor binding has already started to benefit from epigenome prediction in order to distinguish functional from non-functional sites (Narlikar et al., 2007).


    7 CANCER EPIGENETICS: TOWARD IMPROVED DIAGNOSIS AND THERAPY
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TWO FACETS OF...
 3 MECHANISMS OF EPIGENETIC...
 4 GENERATION, LOW-LEVEL...
 5 EPIGENOME DATA ANALYSIS
 6 EPIGENOME PREDICTION:...
 7 CANCER EPIGENETICS: TOWARD...
 8 FUTURE DIRECTIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
It has been known for a long time that mutations and chromosomal deletions can irreversibly destroy tumor suppressor genes and are pivotal events in cancer progression. In contrast, the importance of epigenetic mechanisms for tumor development has been appreciated more recently (see Feinberg and Tycko, 2004 for a historical account of cancer epigenetics). It is now clear that a substantial proportion of silenced tumor suppressor genes are lost due to epigenetic deactivation rather than sequence damages (Esteller, 2007; Jones and Baylin, 2007). Furthermore, a comparison between the epigenetic characteristics of cancer cells and stem cells suggests that epigenetic deregulation may program cells for cancer-like behavior long before they are visually identifiable as tumor cells (Feinberg et al., 2006).

The important role of epigenetic defects for cancer opens up new opportunities for improved diagnosis and therapy. Early diagnosis profits from the fact that epigenetic aberrations occur early during tumorigenesis and are frequently detectable in peripheral blood when destroyed tumor cells leak DNA into the bloodstream (Laird, 2003). Epigenetic cancer therapy exploits the fact that—in contrast to genomic damage—epigenetic aberrations are pharmacologically reversible (Yoo and Jones, 2006). These active areas of research give rise to two questions that are particularly amenable to bioinformatic analysis. First, given a list of genomic regions exhibiting epigenetic differences between tumor cells and controls (or between different disease subtypes), can we detect common patterns or find evidence of a functional relationship of these regions to cancer? Second, can we use bioinformatic methods in order to improve diagnosis and therapy by detecting and classifying important disease subtypes?

Keshet et al. faced a typical instance of the first question, after MeDIP analysis in two cancer cell lines and in a set of primary tumors had detected hundreds of genes whose promoters were selectively methylated in cancer (Keshet et al., 2006). They applied several bioinformatic methods in order to identify common patterns among these genes, including overrepresentation analysis of Gene Ontology terms, sequence motif discovery, genomic clustering analysis and comparison with public gene expression data. Based on these computational analyses, they concluded that only a small percentage of epigenetically silenced genes in cancer cells are tumor suppressor genes. In contrast, many of the genes that are unlikely to be tumor suppressor genes exhibit certain sequence patterns, which may predispose them for epigenetic silencing—as a side effect rather than cause of tumor development. A recent study elaborated on this finding by applying a more advanced motif discovery pipeline and could identify additional sequence motifs on the same dataset (Eden et al., 2007). The observation that epigenetically silenced genes often share certain sequence motifs in their promoters has also been used in order to detect new candidates for cancer-specific hypermethylation (Goh et al., 2007). To address the substantial class bias—only a small percentage of genes become hypermethylated in a particular cancer—and the lack of an experimental control set, Goh et al. devised an algorithm that iteratively combines unsupervised clustering and supervised prediction. Furthermore, the recent discovery of a link between DNA hypermethylation in cancer and Polycomb binding in ES cells using a combination of bioinformatic comparisons and experimental validation (Ohm et al., 2007; Schlesinger et al., 2007; Widschwendter et al., 2007) highlights the synergistic power of computational and experimental methods in cancer epigenetics. Future studies toward understanding the epigenetic characteristics of cancer cells will benefit from the recently launched PubMeth database, which aggregates literature data about which genes have been reported hypermethylated for which cancer (Ongenaert et al., 2007).

The second question is aimed at the discovery and validation of biomarkers for cancer diagnosis, prognosis and therapy optimization (Laird, 2003). In an early study on DNA methylation patterns in leukemia, support vector machines applied to DNA methylation microarray data could accurately distinguish between two important disease subtypes, acute lymphoblastic leukemia and acute myeloid leukemia (Model et al., 2001). In a series of papers, Siegmund and coworkers developed (Marjoram et al., 2006; Siegmund et al., 2004) and applied (Weisenberger et al., 2006) clustering methods for unsupervised discovery of epigenetically distinct cancer subtypes. They could show that a well-defined subset of colon cancer patients exhibit substantially elevated levels of DNA hypermethylation, and they developed a biomarker for diagnosing this disease subtype. Epigenetic biomarkers also play an increasing role for therapy optimization. For example, clinical trials showed that cancer-specific DNA methylation of the MGMT gene promoter can make glioblastomas (brain tumors) more susceptible to chemotherapy with alkylating agents (Hegi et al., 2005). A combination of bioinformatic methods and experiments was recently used to optimize DNA methylation analysis of MGMT and to develop it into a routine clinical biomarker for personalized cancer therapy (Mikeska et al., 2007). However, in spite of the fast progress in epigenetic cancer diagnosis, few epigenetic cancer biomarkers have yet been validated in large patient cohorts and substantial work remains to be done before epigenetic cancer diagnosis will start having a measurable positive effect on disease burden in the population.


    8 FUTURE DIRECTIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TWO FACETS OF...
 3 MECHANISMS OF EPIGENETIC...
 4 GENERATION, LOW-LEVEL...
 5 EPIGENOME DATA ANALYSIS
 6 EPIGENOME PREDICTION:...
 7 CANCER EPIGENETICS: TOWARD...
 8 FUTURE DIRECTIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The first wave of research in the field of computational epigenetics was driven by rapid progress of experimental methods for data generation, which required adequate computational methods for low-level data processing and quality control, prompted epigenome prediction studies as a means of understanding the genomic distribution of epigenetic information, and provided the foundation for initial projects on cancer epigenetics. While these topics will continue to be highly relevant areas of research and the mere quantity of epigenetic data arising from epigenome projects poses a major bioinformatic challenge, we also expect that the focus of computational studies will significantly broaden and deepen. First, epigenome data analysis will increasingly take the proteins into account that read and write epigenetic information, as well as their interaction partners and regulatory networks. Such reverse engineering of epigenetic regulation could lead to a quantitative model and, ultimately, rational manipulation of the core circuitry that controls cell fate and pluripotency (Boyer et al., 2005). Second, the decreasing cost of epigenome mapping will enable quantitative analysis of epigenetic variation in human populations. Recent twin studies suggest that both environmental influences (Fraga et al., 2005) and genetic variation (Heijmans et al., 2007) influence epigenetic variation. It will be a daunting bioinformatic task to distill putative functional connections from the integration of epigenome data with gene expression profiles and haplotype maps for a large sample from a heterogeneous population. Third, epigenome mapping in multiple species will add an evolutionary perspective to computational epigenetics. Initial results suggest that orthologous regions in different mammals carry similar epigenetic information (Bernstein et al., 2005; Enard et al., 2004), which is expected because the DNA encodes parts of its epigenetic state (Bock et al., 2007; Segal et al., 2006). It will be interesting to see whether comparative epigenomics can significantly improve our ability to identify functionally important sites in the human genome, as is the case for comparative genomics. Fourth, theoretical modeling will provide a way to fathom our mechanistic and quantitative understanding of epigenetic mechanisms. For example, two recent studies could show that co-operativity among the proteins that write epigenetic information is required for stably maintaining the state of an epigenetic switch in the presence of highly dynamic fluctuations at the molecular level (Dodd et al., 2007; Sontag et al., 2006). Modeling studies can thus help explain how the high-level phenomena that we observe for epigenetic regulation emerge from the dynamic interplay of various epigenetic mechanisms. Fifth, the development of powerful and easy-to-use ‘statistical genome browsers’ will enable biologists to perform complex epigenome data analysis without requiring strong statistical or programming skills. The Galaxy web service (Blankenberg et al., 2007)—which lets users design and execute genome analyses through an intuitive web front-end—is a first step in this direction and further tools that are more specifically targeted to epigenetic data will follow. Sixth, epigenetic mechanisms could turn out to play a role in diseases other than cancer, as there is strong circumstantial evidence for epigenetic regulation being involved in mental disorders, autoimmune diseases and other complex diseases (Bjornsson et al., 2004; Feinberg, 2007). Bioinformatic methods such as text mining and exploratory data mining may play a role in identifying and prioritizing concrete hypotheses for experimental validation.

In conclusion, exciting times are ahead for research in computational epigenetics!


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TWO FACETS OF...
 3 MECHANISMS OF EPIGENETIC...
 4 GENERATION, LOW-LEVEL...
 5 EPIGENOME DATA ANALYSIS
 6 EPIGENOME PREDICTION:...
 7 CANCER EPIGENETICS: TOWARD...
 8 FUTURE DIRECTIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
We would like to thank Jörn Walter and Paul Flicek for critically reading the manuscript. Funding by the Max Planck Society is greatly acknowledged.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Jonathan Wren

Received on August 25, 2007; revised on October 28, 2007; accepted on October 28, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TWO FACETS OF...
 3 MECHANISMS OF EPIGENETIC...
 4 GENERATION, LOW-LEVEL...
 5 EPIGENOME DATA ANALYSIS
 6 EPIGENOME PREDICTION:...
 7 CANCER EPIGENETICS: TOWARD...
 8 FUTURE DIRECTIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Alliance for Human Epigenomics and Disease. Proposal for an International AHEAD Pilot Project. (2007) 28 October 2007, date last accessed. Available: http://www.aacr.org/Uploads/DocumentRepository/ TaskForces/ahead_pilot_project_proposal_may2007.pdf.

    Antequera F. Structure, function and evolution of CpG island promoters. Cell. Mol. Life Sci (2003) 60:1647–1658.[CrossRef][Web of Science][Medline]

    Bajic VB, et al. Promoter prediction analysis on the whole human genome. Nat. Biotechnol (2004) 22:1467–1473.[CrossRef][Web of Science][Medline]

    Barski A, et al. High-resolution profiling of histone methylations in the human genome. Cell (2007) 129:823–837.[CrossRef][Web of Science][Medline]

    Bernstein BE, et al. Genomic maps and comparative analysis of histone modifications in human and mouse. Cell (2005) 120:169–181.[CrossRef][Web of Science][Medline]

    Bernstein BE, et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell (2006) 125:315–326.[CrossRef][Web of Science][Medline]

    Bernstein BE, et al. The mammalian epigenome. Cell (2007) 128:669–681.[CrossRef][Web of Science][Medline]

    Bird A. DNA methylation patterns and epigenetic memory. Genes Dev (2002) 16:6–21.[Free Full Text]

    Bjornsson HT, et al. An integrated epigenetic and genetic approach to common human disease. Trends Genet (2004) 20:350–358.[CrossRef][Web of Science][Medline]

    Blankenberg D, et al. A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. Genome Res (2007) 17:960–964.[Abstract/Free Full Text]

    Bock C, et al. BiQ Analyzer: visualization and quality control for DNA methylation data from bisulfite sequencing. Bioinformatics (2005) 21:4067–4068.[Abstract/Free Full Text]

    Bock C, et al. CpG island methylation in human lymphocytes is highly correlated with DNA sequence, repeats and predicted DNA structure. PLoS Genet (2006) 2:e26.[CrossRef][Medline]

    Bock C, et al. CpG island mapping by epigenome prediction. PLoS Comput. Biol (2007) 3:e110.[CrossRef][Medline]

    Boyer LA, et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell (2005) 122:947–956.[CrossRef][Web of Science][Medline]

    Buck MJ, Lieb JD. ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics (2004) 83:349–360.[CrossRef][Web of Science][Medline]

    Buck MJ, et al. ChIPOTle: a user-friendly tool for the analysis of ChIP-chip data. Genome Biol (2005) 6:R97.[CrossRef][Medline]

    Bulcke D, et al. Inferring transcriptional networks by mining 'omics' data. Curr. Bioinformatics (2006) 1:313.

    Cawley S, et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell (2004) 116:499–509.[CrossRef][Web of Science][Medline]

    Cooper DN, et al. Unmethylated domains in vertebrate DNA. Nucleic Acids Res (1983) 11:647–658.[Abstract/Free Full Text]

    Das R, et al. Computational prediction of methylation status in human genomic sequences. Proc. Natl Acad. Sci. USA (2006) 103:10713–10716.[Abstract/Free Full Text]

    Dillon N. Gene regulation and large-scale chromatin organization in the nucleus. Chromosome Res (2006) 14:117–126.[CrossRef][Web of Science][Medline]

    Dodd IB, et al. Theoretical analysis of epigenetic cell memory by nucleosome modification. Cell (2007) 129:813–822.[CrossRef][Web of Science][Medline]

    Drake JW, et al. Rates of spontaneous mutation. Genetics (1998) 148:1667–1686.[Abstract/Free Full Text]

    Du J, et al. A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating validated biological knowledge. Bioinformatics (2006) 22:3016–3024.[Abstract/Free Full Text]

    Eckhardt F, et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nat. Genet (2006) 38:1378–1385.[CrossRef][Web of Science][Medline]

    Eden E, et al. Discovering motifs in ranked lists of DNA sequences. PLoS Comput. Biol (2007) 3:e39.[CrossRef][Medline]

    Enard W, et al. Differences in DNA methylation patterns between humans and chimpanzees. Curr. Biol (2004) 14:R148–R149.[CrossRef][Web of Science][Medline]

    ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science (2004) 306:636–640.[Abstract/Free Full Text]

    ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 447:799–816.[CrossRef][Medline]

    Esteller M. Cancer epigenomics: DNA methylomes and histone-modification maps. Nat. Rev. Genet (2007) 8:286–298.[CrossRef][Web of Science][Medline]

    Fang F, et al. Predicting methylation status of CpG islands in the human brain. Bioinformatics (2006) 22:2204–2209.[Abstract/Free Full Text]

    Feinberg AP. Phenotypic plasticity and the epigenetics of human disease. Nature (2007) 447:433–440.[CrossRef][Medline]

    Feinberg AP, Tycko B. The history of cancer epigenetics. Nat. Rev. Cancer (2004) 4:143–153.[Web of Science][Medline]

    Feinberg AP, et al. The epigenetic progenitor origin of human cancer. Nat. Rev. Genet (2006) 7:21–33.[CrossRef][Web of Science][Medline]

    Feltus FA, et al. Predicting aberrant CpG island methylation. Proc. Natl Acad. Sci. USA (2003) 100:12253–12258.[Abstract/Free Full Text]

    Fraga MF, Esteller M. Epigenetics and aging: the targets and the marks. Trends Genet (2007) 23:413–418.[CrossRef][Web of Science][Medline]

    Fraga MF, et al. Epigenetic differences arise during the lifetime of monozygotic twins. Proc. Natl Acad. Sci. USA (2005) 102:10604–10609.[Abstract/Free Full Text]

    Gangaraju VK, Bartholomew B. Mechanisms of ATP dependent chromatin remodeling. Mutat. Res (2007) 618:3–17.[Web of Science][Medline]

    Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J. Mol. Biol (1987) 196:261–282.[CrossRef][Web of Science][Medline]

    Goh L, et al. Genomic sweeping for hypermethylated genes. Bioinformatics (2007) 23:281–288.[Abstract/Free Full Text]

    Grant-Downton RT, Dickinson HG. Epigenetics and its implications for plant biology 2. The ‘epigenetic epiphany’: epigenetics, evolution and beyond. Ann. Bot. (Lond.) (2006) 97:11–27.[Abstract/Free Full Text]

    Hajkova P, et al. DNA-methylation analysis by the bisulfite-assisted genomic sequencing method. Methods Mol. Biol (2002) 200:143–154.[Medline]

    Hegi ME, et al. MGMT gene silencing and benefit from temozolomide in glioblastoma. N. Engl. J. Med (2005) 352:997–1003.[Abstract/Free Full Text]

    Heijmans BT, et al. Heritable rather than age-related environmental and stochastic factors dominate variation in DNA methylation of the human IGF2/H19 locus. Hum. Mol. Genet (2007) 16:547–554.[Abstract/Free Full Text]

    Heintzman ND, et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet (2007) 39:311–318.[CrossRef][Web of Science][Medline]

    HEROIC Project Consortium. High-throughput Epigenetic Regulatory Organisation In Chromatin - Project Fact Sheet. (2005) 28 October 2007, date last accessed. Available: http://cordis.europa.eu/fetch?CALLER=FP6_PROJ&ACTION=D&DOC=1&CAT=PROJ&QUERY=1183993108794&RCN=78439.

    Hubbard TJ, et al. Ensembl 2007. Nucleic Acids Res (2007) 35:D610–D617.[Abstract/Free Full Text]

    Ioshikhes IP, et al. Nucleosome positions predicted through comparative genomics. Nat. Genet (2006) 38:1210–1215.[CrossRef][Web of Science][Medline]

    Ji H, Wong WH. TileMap: create chromosomal map of tiling array hybridizations. Bioinformatics (2005) 21:3629–3636.[Abstract/Free Full Text]

    Johnson WE, et al. Model-based analysis of tiling-arrays for ChIP-chip. Proc. Natl Acad. Sci. USA (2006) 103:12457–12462.[Abstract/Free Full Text]

    Jones PA, Baylin SB. The epigenomics of cancer. Cell (2007) 128:683–692.[CrossRef][Web of Science][Medline]

    Jones PA, Martienssen R. A blueprint for a Human Epigenome Project: the AACR Human Epigenome Workshop. Cancer Res (2005) 65:11241–11246.[Abstract/Free Full Text]

    Kapranov P, et al. Genome-wide transcription and the implications for genomic organization. Nat. Rev. Genet (2007) 8:413–423.[CrossRef][Web of Science][Medline]

    Kent WJ. BLAT–the BLAST-like alignment tool. Genome Res (2002) 12:656–664.[Abstract/Free Full Text]

    Keshet I, et al. Evidence for an instructive mechanism of de novo methylation in cancer cells. Nat. Genet (2006) 38:149–153.[CrossRef][Web of Science][Medline]

    Kouzarides T. Chromatin modifications and their function. Cell (2007) 128:693–705.[CrossRef][Web of Science][Medline]

    Laird PW. The power and the promise of DNA methylation markers. Nat. Rev. Cancer (2003) 3:253–266.[CrossRef][Web of Science][Medline]

    Lewin J, et al. Quantitative DNA methylation analysis based on four-dye trace data from direct sequencing of PCR amplificates. Bioinformatics (2004) 20:3005–3012.[Abstract/Free Full Text]

    Li W, et al. A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences. Bioinformatics (2005) 21(Suppl. 1):i274–i282.[Abstract]

    Luedi PP, et al. Genome-wide prediction of imprinted murine genes. Genome Res (2005) 15:875–884.[Abstract/Free Full Text]

    Marjoram P, et al. Cluster analysis for DNA methylation profiles having a detection threshold. BMC Bioinformatics (2006) 7:361.[CrossRef][Medline]

    Microarray and Gene Expression Data Society. The MIAME Checklist – update January 2005. (2005) 28 October 2007, date last accessed. Available: http://www.mged.org/Workgroups/MIAME/MIAMEchecklist_chipchip.pdf.

    Mikeska T, et al. Optimization of Quantitative MGMT Promoter Methylation Analysis Using Pyrosequencing and Combined Bisulfite Restriction Analysis. J. Mol. Diagn (2007) 9:368–381.[Abstract/Free Full Text]

    Mikkelsen TS, et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature (2007) 448:553–560.[CrossRef][Medline]

    Model F, et al. Feature selection for DNA methylation based cancer classification. Bioinformatics (2001) 17(Suppl. 1):S157–S164.[Abstract]

    Narlikar L, et al. Nucleosome occupancy information improves de novo motif discovery. In: Research in Computational Molecular Biology, 11th Annual International Conference, RECOMB 2007, Oakland, CA, USA, April 21–25, 2007, Proceedings.—Speed TP, Huang H, eds. (2007) New York: Springer-Verlag.

    Noble WS, et al. Predicting the in vivo signature of human gene regulatory sequences. Bioinformatics (2005) 21(Suppl. 1):i338–i343.[Abstract]

    Ohm JE, et al. A stem cell-like chromatin pattern may predispose tumor suppressor genes to DNA hypermethylation and heritable silencing. Nat. Genet (2007) 39:237–242.[CrossRef][Web of Science][Medline]

    Ongenaert M, et al. PubMeth: a cancer methylation database combining text-mining and expert annotation. Nucleic Acids Res (2007).

    Parisi F, et al. Identifying synergistic regulation involving c-Myc and sp1 in human tissues. Nucleic Acids Res (2007) 35:1098–1107.[Abstract/Free Full Text]

    Peaston AE, Whitelaw E. Epigenetics and phenotypic variation in mammals. Mamm. Genome (2006) 17:365–374.[CrossRef][Web of Science][Medline]

    Peckham HE, et al. Nucleosome positioning signals in genomic DNA. Genome Res (2007) 17:1170–1177.[Abstract/Free Full Text]

    Qi Y, et al. High-resolution computational models of genome binding events. Nat. Biotechnol (2006) 24:963–970.[CrossRef][Web of Science][Medline]

    Rakyan VK, et al. DNA methylation profiling of the human major histocompatibility complex: a pilot study for the human epigenome project. PLoS Biol (2004) 2:e405.[CrossRef][Medline]

    Reik W. Stability and flexibility of epigenetic gene regulation in mammalian development. Nature (2007) 447:425–432.[CrossRef][Medline]

    Richards EJ. Inherited epigenetic variation – revisiting soft inheritance. Nat. Rev. Genet (2006) 7:395–401.[CrossRef][Web of Science][Medline]

    Ringrose L, et al. Genome-wide prediction of Polycomb/Trithorax response elements in Drosophila melanogaster. Dev. Cell (2003) 5:759–771.[CrossRef][Web of Science][Medline]

    Roh TY, et al. Genome-wide prediction of conserved and nonconserved enhancers by histone acetylation patterns. Genome Res (2007) 17:74–81.[Abstract/Free Full Text]

    Royce TE, et al. Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping. Trends Genet (2005) 21:466–475.[CrossRef][Web of Science][Medline]

    Russo VEA, et al. Epigenetic Mechanisms of Gene Regulation. (1996) Plainview, N.Y: Cold Spring Harbor Laboratory Press.

    Satchwell SC, et al. Sequence periodicities in chicken nucleosome core DNA. J. Mol. Biol (1986) 191:659–675.[CrossRef][Web of Science][Medline]

    Schlesinger Y, et al. Polycomb-mediated methylation on Lys27 of histone H3 pre-marks genes for de novo methylation in cancer. Nat. Genet (2007) 39:232–236.[CrossRef][Web of Science][Medline]

    Schuettengruber B, et al. Genome regulation by polycomb and trithorax proteins. Cell (2007) 128:735–745.[CrossRef][Web of Science][Medline]

    Segal E, et al. A genomic code for nucleosome positioning. Nature (2006) 442:772–778.[CrossRef][Medline]

    Siegmund KD, et al. A comparison of cluster analysis methods using DNA methylation data. Bioinformatics (2004) 20:1896–1904.[Abstract/Free Full Text]

    Smith AD, et al. Tissue-specific regulatory elements in mammalian promoters. Mol. Syst. Biol (2007) 3:73.[Medline]

    Solter D. Imprinting today: end of the beginning or beginning of the end? Cytogenet. Genome Res (2006) 113:12–16.[CrossRef][Web of Science][Medline]

    Song JS, et al. Model-based analysis of two-color arrays (MA2C). Genome Biol (2007) 8:R178.[CrossRef][Medline]

    Sontag LB, et al. Dynamics, stability and inheritance of somatic DNA methylation imprints. J. Theor. Biol (2006) 242:890–899.[CrossRef][Web of Science][Medline]

    Surani MA, et al. Genetic and epigenetic regulators of pluripotency. Cell (2007) 128:747–762.[CrossRef][Medline]

    Synamatix Sdn. Bhd. SXOligoSearch Supporting Document. (2007) 28 October 2007, date last accessed. Available: http://synasite.mgrc.com.my:8080/sxog/files/SXOligoSearch_benchmark.pdf.

    Thomas DJ, et al. The ENCODE Project at UC Santa Cruz. Nucleic Acids Res (2007) 35:D663–D667.[Abstract/Free Full Text]

    Thurman RE, et al. Identification of higher-order functional domains in the human ENCODE regions. Genome Res (2007) 17:917–927.[Abstract/Free Full Text]

    Toedling J, et al. Ringo – an R/Bioconductor package for analyzing ChIP-chip readouts. BMC Bioinformatics (2007) 8:221.[CrossRef][Medline]

    Trinklein ND, et al. Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome. Genome Res (2007) 17:720–731.[Abstract/Free Full Text]

    Turner BM. Defining an epigenetic code. Nat. Cell Biol (2007) 9:2–6.[CrossRef][Web of Science][Medline]

    Ushijima T, et al. Fidelity of the methylation pattern and its variation in the genome. Genome Res (2003) 13:868–874.[Abstract/Free Full Text]

    Wang Z, et al. Evidence of influence of genomic DNA sequence on human X chromosome inactivation. PLoS Comput. Biol (2006) 2:e113.[CrossRef][Medline]

    Weber M, Schübeler D. Genomic patterns of DNA methylation: targets and function of an epigenetic mark. Curr. Opin. Cell Biol (2007) 19:273–280.[CrossRef][Medline]

    Weisenberger DJ, et al. CpG island methylator phenotype underlies sporadic microsatellite instability and is tightly associated with BRAF mutation in colorectal cancer. Nat. Genet (2006) 38:787–793.[CrossRef][Web of Science][Medline]

    Widschwendter M, et al. Epigenetic stem cell signature in cancer. Nat. Genet (2007) 39:157–158.[CrossRef][Web of Science][Medline]

    Woodcock CL. Chromatin architecture. Curr. Opin. Struct. Biol (2006) 16:213–220.[CrossRef][Web of Science][Medline]

    Yoo CB, Jones PA. Epigenetic therapy of cancer: past, present and future. Nat. Rev. Drug Discov (2006) 5:37–50.[CrossRef][Web of Science][Medline]

    Zhang ZD, et al. Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions. Genome Res (2007a) 17:787–797.[Abstract/Free Full Text]

    Zhang ZD, et al. Tilescope: online analysis pipeline for high-density tiling microarray data. Genome Biol (2007b) 8:R81.[CrossRef][Medline]

    Zhou GL, et al. Memory mechanisms of active transcription during cell division. Bioessays (2005) 27:1239–1245.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
T. Benoukraf, P. Cauchy, R. Fenouil, A. Jeanniard, F. Koch, S. Jaeger, D. Thieffry, J. Imbert, J.-C. Andrau, S. Spicuglia, et al.
CoCAS: a ChIP-on-chip analysis suite
Bioinformatics, April 1, 2009; 25(7): 954 - 955.
[Abstract] [Full Text] [PDF]


Home page
Acta Biochim Biophys SinHome page
Y. Pei, T. Zhang, V. Renault, and X. Zhang
An overview of hepatocellular carcinoma study by omics-based methods
Acta Biochim Biophys Sin, January 1, 2009; 41(1): 1 - 15.
[Abstract] [Full Text] [PDF]


Home page
Schizophr BullHome page
G. Oh and A. Petronis
Environmental Studies of Schizophrenia Through the Prism of Epigenetics
Schizophr Bull, November 1, 2008; 34(6): 1122 - 1129.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Bock, J. Walter, M. Paulsen, and T. Lengauer
Inter-individual variation of DNA methylation and its implications for large-scale epigenome mapping
Nucleic Acids Res., June 1, 2008; 36(10): e55 - e55.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
24/1/1    most recent
btm546v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (8)
Google Scholar
Right arrow Articles by Bock, C.
Right arrow Articles by Lengauer, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bock, C.
Right arrow Articles by Lengauer, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?