Bioinformatics Advance Access originally published online on July 12, 2006
Bioinformatics 2006 22(18):2204-2209; doi:10.1093/bioinformatics/btl377
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Predicting methylation status of CpG islands in the human brain
1 Bioinformatics Division, TNLIST, Department of Automation, Tsinghua University 100084 China
2 Cold Spring Harbor Laboratory, Cold Spring Harbor NY 11274, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Over 50% of human genes contain CpG islands in their 5'-regions. Methylation patterns of CpG islands are involved in tissue-specific gene expression and regulation. Mis-epigenetic silencing associated with aberrant CpG island methylation is one mechanism leading to the loss of tumor suppressor functions in cancer cells. Large-scale experimental detection of DNA methylation is still both labor-intensive and time-consuming. Therefore, it is necessary to develop in silico approaches for predicting methylation status of CpG islands.
Results: Based on a recent genome-scale dataset of DNA methylation in human brain tissues, we developed a classifier called MethCGI for predicting methylation status of CpG islands using a support vector machine (SVM). Nucleotide sequence contents as well as transcription factor binding sites (TFBSs) are used as features for the classification. The method achieves specificity of 84.65% and sensitivity of 84.32% on the brain data, and can also correctly predict about two-third of the data from other tissues reported in the MethDB database.
Availability: An online predictor based on MethCGI is available at http://166.111.201.7/MethCGI.html
Contact: mzhang{at}cshl.edu
Supplementary Information: Supplementary data available at Bioinformatics online and http://166.111.201.7/help.html
| 1 INTRODUCTION |
|---|
|
|
|---|
DNA methylation is involved in various biological phenomena including gene silencing, stabilization of chromosomal structure and suppressing the mobility of retrotransposons (Bird and Wolffe, 1999; Walsh and Bestor, 1999). In vertebrates, DNA methylation mainly occurs at the fifth carbon position of the cytosine residue in a 5'-CG-3' dinucleotide (called CpG, and the methylated CpG can be denoted as m5CpG), and this biochemical modification is owing to the enzymatic activity of DNA methyltransferases (DNMTs) (Bird, 1978; Gruenbaum et al., 1981). DNA methylation, together with histone modification, are commonly regarded as major epigenetic phenomena, which are responsible for heritable alteration of gene expression pattern without changes in primary DNA sequences (Singal and Ginder, 1999).
The dinucleotide CpG is notably under-represented in the human genome. Its frequency is only
20% of the expected frequency on the basis of the genomic G+C content. This is owing to the spontaneous deamination of methylated cytosines to yield thymine and generate a T:G mismatch that will be fixed as TpG (or CpA on the complementary strand) if the thymine is not repaired by cytosine before the next round of DNA replication (Lander et al., 2001; Venter et al., 2001; Sved and Bird, 1990). However, there are many regions (
1 kb long) where the frequency of CpGs is
10 times higher than the genome average. These regions were called CpG islands (CGIs) (Bird, 1986). Most CGIs are found at the 5' ends of genes (Antequera and Bird, 1993). The existing CGIs are traditionally thought to be unmethylated. However, many CGIs were subsequently found to be hypermethylated in the imprinted genes (Jones and Baylin, 2002). It is now known that some CGIs in non-imprinted regions are also methylated in normal cells and this is believed to be related to tissue-restricted gene expression patterns (Grunau et al., 2000; Song et al., 2005). A large number of methylated CGIs are also found in tumor cells (Jones and Baylin, 2002).
DNA methylation is involved in the regulation of many genes. At present, there are two major views regarding the causal relationship between DNA methylation and transcription. One popular view is that DNA methylation could repress transcription by either or both of the following two mechanisms: (1) the methyl group may disrupt the binding sites of the transcription factors and result in the failure of transcription (Kim et al., 2003; Strunnikova et al., 2005); (2) methylated cytosines may attract methyl-CpG binding domain proteins (MBD) which would bring repressors to silence the chromatin (Bird and Wolffe, 1999). In contrast, the alternative view argues that the transcriptionally inactive chromatin domains may be the true cause as they can serve as the targets for de novo methylation (Bird, 2002). Regardless of which review is more accurate, it is clear that the methylation status of a CGI is highly correlated with transcription. Studying such epigenetic events is crucial for our understanding of transcriptional regulation in development and differentiation, and may also provide new biomarkers for medical diagnostics such as the early detection of tumorigenesis.
Recent investigations have indicated that the propensity of a CGI to de novo methylation may be sequence-dependent (Das et al., 2006). For instance, Alu repeats could cause methylation spreading, while other factors such as Sp1 binding may protect a CGI from being methylated (Graff et al., 1997). Based on RLGS (Restriction Landmark Genome Scanning) analysis of de novo DNA methylation driven by over-expression of DNMT1 in vitro, Feltus et al. (2003) proposed that there exist some intrinsic DNA sequence preferences between methylation-prone and methylation-resistant CGIs, and they identified seven sequence patterns that are regarded as most discriminating between the two classes. More recently, Bhasin et al. (2005) developed an SVM-based prediction tool (named Methylator) to predict the methylation status of a single CpG. The algorithm was trained on the public dataset MethDB (Grunau et al., 2001), which contains some annotated m5CpG sites in a limited number of sequences for species ranging from plants to human.
Recently, a large in vivo genome-scale methylation dataset was published (Rollins et al., 2006), which contains
4000 methylated and unmethylated domains. These sequences represent a random sample of
30 Mb DNA from normal human brain. Based on this dataset, we have developed an SVM program (called MethCGI) to predict the methylation status (methylation-prone versus methylation-resistant) of CGIs in the human brain according to their local genomic sequences. Evaluating on randomly extracted independent test sets, the performance of MethCGI can reach a specificity (SP) of 84.65% and a sensitivity (SE) of 84.32%. The study also highlights some top discriminating features for predicting CGI methylation status, many of which have been reported to be related to the binding sites of methylation-sensitive transcription factors. The method is compared with a recent method by Bock et al. (2006) on the HEP data (Rakyan et al., 2004). We also applied the method on methylation data from different tissues reported in the MethDB database and about two-third of them can be correctly predicted. A tool MethCGI for predicting methylation status of CGI is freely accessible at http://166.111.201.7/MethCGI.html
| 2 MATERIALS AND METHODS |
|---|
|
|
|---|
2.1 Dataset
The dataset of Rollins et al. (2006) is the first large-scale description of the in vivo DNA methylation landscape of the human brain. It consists of the sequences of
4000 unmethylated and methylated DNA domains originally obtained by different sets of endonucleases, where McrBC (Rm5C-N40-500-Rm5C) was used to digest the DNA into largely unmethylated domains and five other restriction endonucleases (REs), namely Tail (ACGT), BstUI (CGCG), HhaI (GCGC), Hpa
(CCGG) and Acil (CCGC and CGCC), were used to digest the DNA into mostly methylated domains (see Rollins et al., 2006 for details). We used the procedures illustrated in Figure 1 to pinpoint locations of the boundaries in the original data by shrinking each fragment boundary to the inner side of the nearest endonuclease recognition site. We adopted the conventional definition of CpG islands: a CGI should be no less than a given length with G+C content (%G+C) >50% and (observed/expected) CpG ratio >0.6 (Gardiner-Garden and Frommer, 1987). At a given length (we used 200, 300, 400 and 500 bp in this study), all the non-overlapping CpG island fragments (called CGIFs for convenience) in the original data were extracted to compose our sample set. We defined the CGIFs extracted from methylated domains as methylation-prone CGIFs, while those from unmethylated domains as methylation-resistant CGIFs. We refer to such sample sets as CGIF-sets for convenience of discussion.
|
2.2 Support vector machines
Support vector machines (SVMs) are a widely-used machine learning method that is used especially for classification tasks in fields including computational biology. The method and its theoretical advantages have been described in many references (e.g. Vapnik, 1995, 1999). The basic principle of SVM classification is as follows: based on a training set of n samples,
, where
are vectors of d features and
are labels indicating the classes (+1 for methylation-resistant and 1 for methylation-prone in this study), SVM obtains a decision function or classifier of the form
![]() | (1) |
is and b are optimized in the training procedure with the objective of minimizing the prediction error on training data while maximizing the separation margin between the two classes. The
is a kernel function that can be regarded as a measure of the similarity between two samples. We used the software SVMTorch (Collobert and Bengio, 2001) (http://www.idiap.ch/learning/SVMTorch.html) for the implementation of the SVM algorithm, and adopted the linear kernel function (linear SVM) with default parameters. Details about the SVM method and our choice of the kernel function are provided in the Supplementary Materials.
2.3 Performance evaluation
We randomly separated all samples into a training set and a test set. The SVM method was trained on the training set and evaluated on the test set. We used the specificity (SP), sensitivity (SE), accuracy (ACC) and correlation coefficient (CC) to assess the performance of classification. Taking methylation-resistant CGIFs as the positive class, and methylation-prone CGIFs as the negative class, we calculated the expressions for SP, SE, ACC and CC as follows:
![]() | (2) |
![]() | (3) |
![]() | (4) |
![]() | (5) |
2.4 Features used for the classification
2.4.1 %G+C, CpG ratio and TpG content
The conventional definition for a CpG island involves three parameters: the sequence length, %G+C and CpG ratio. We compared the values of %G+C and CpG ratio for the methylation-prone and the methylation-resistant classes using the Wilcoxon rank-sum test and found very significant differences (Table 1). These two parameters are used as two of the features for the classification. It is worth-noting that this result contradicts the work by Feltus et al. (2003), where no such differences were found between the corresponding two classes based on in vitro experiments. This may be owing to the different definitions of the methylation-prone and the methylation-resistant CpG islands. Feltus et al. defined these two classes from the data in vitro by over-expression of DNMT1, while we defined them from the in vivo data of the human brain. In addition, they extracted the CGIs using the NEWCPGREPORT program (www.uk.embnet.org/Software/EMBOSS) which connects 200 bp CpG island fragments and extends them into longer CpG islands.
|
The distribution of the dinucleotide TpG (and its reverse complement CpA) is presumably also related to the deamination of mCpGs during evolution (Sved and Bird, 1990). We counted the dinucleotide TpG in the methylation-prone and the methylation-resistant samples, respectively, and also compared the distribution of such TpG content between the two classes with the Wilcoxon rank-sum test (Table 1). The significant difference implies association between the distribution of TpG content and DNA methylation. We adopted the TpG content as another feature for the classification.
2.4.2 The distribution of Alu Y
When studying CpG islands in mammalian genomes, one can notice many repeat-associated CGIs. The majority of these repeats are Alus and most of such repeats occur in the methylation-prone class. Alu elements make up of the majority of the short interspersed elements (SINEs), accounting for >10% of the human genome. The distribution of Alu elements is associated with DNA methylation, implied by the fact that one-third of the CpG dinucleotides in the human genome are located within Alu sequences and these CpGs tend to be targeted by DNA methylation (Batzer and Deininger, 2002).
Alu elements can be divided into several subfamilies according to their age, and the most recently integrated Alu elements compose the young Alu subfamily, Alu Y (Batzer and Deininger, 2002). The numbers of each type of Alu elements (e.g. Alu Y and Alu S) overlapping with the CGIFs are compared between the methylation-prone and the methylation-resistant classes. Significant differences are found by Wilcoxon rank-sum test: both P-values are <2.2 x 1016. Our experiments show that the feature of Alu Y provides the most discriminating information among all Alu elements, so the number of Alu Y sequences overlapping with each CGIF is selected as a feature in the predictor.
2.4.3 Transcription factor binding sites (TFBSs)
As CpG islands are often associated with promoter regions (Antequera and Bird, 1993), we expected the distribution of certain TFBSs may be different between the methylation-prone and the methylation-resistant CGIFs. Using TRANSFAC (v9.2) together with the program MatCompare (Schones et al., 2005), we derived a non-redundant set of the PWMs (position weight matrices) of 122 vertebrate TFBSs. Then with the software Match provided in TRANSFAC, we searched those TFBSs around each CGIF sample using cut-offs that minimize false positive rates (Kel et al., 2003). We use the occurrences of the TFBSs as features for the classification, i.e. for each TFBS, its occurrence counted in each CGIF is used as a feature describing this sample. TFBSs that cannot be found in any samples are excluded from this study. For instance, in the situation of 400 bp windows, 74 TFBSs are finally used as features for the classification (see Supplementary Materials for their TRANSFAC IDs).
| 3 RESULTS AND DISCUSSION |
|---|
|
|
|---|
3.1 Classification performance
The feature vector used for representing each CGIF sample in the SVM classifier comprises three features for nucleotide contents (%G+C, CpG ratio and TpG content), the number of overlapped Alu Y sequences and the occurrences of certain TFBSs in the sample.
We trained the SVM classifier with a training set and tested its performance on the corresponding test set. The training set is randomly selected from all CGIFs under the considered window length, ensuring the training sample sizes of the two classes are balanced. The remaining CGIFs compose the test set. The training sample size for each class is chosen to be
75% of the CGIFs in the class with fewer samples. For each window length, this experiment was repeated 100 times with different random selections of training and test sets. Table 2 summarizes the results.
|
From these results, we suggest that the window of 400 bp is a better choice for predicting the methylation status of CpG islands. However, it should be noticed that a conclusion on the optimal window length still needs more experimental evidences as the observed performances can be affected by the limited sample sizes. In the software we provided online, for any new sample, we combine prediction results from the 100 classifiers and report the final decision. The window length of CGIFs is an optional parameter and the default value is set to 400 bp.
3.2 Relevant transcription factors
We have used 74 TFBSs as the input features for the classification when using the window of 400 bp. Table 3 shows the top four TFBSs that are most differentially distributed between the two classes according to the P-value by Wilcoxon rank-sum test. Figure 2 shows their binding motifs represented as sequence logos. Comparison results of the other 70 TFBSs are shown in Supplementary Materials.
|
|
The AP-2 family of transcription factors is a group of three closely related and evolutionarily conserved proteins AP-2
, AP-2ß and AP-2
, regulating gene expression in a number of tissues including neural tissues (Hilger-Eversheim et al., 2000). In this family, AP-2
and AP-2
have been identified as methylation-sensitive transcription factors (McPherson et al., 1997; McPherson and Weigel, 1999). They only bind to unmethylated sites in the brain. The distribution of AP-2 binding sites between the methylation-prone and the methylation-resistant CGIFs corroborates this well: there are more than three fold AP-2 binding sites found in methylation-resistant CGIFs than in methylation-prone CGIFs. The Egr family of transcription factors (Egr-1, Egr-2, Egr-3, Egr-4) regulate genes involved in neuronal plasticity (O'Donovan et al., 1999). Egr-1 has been shown to be sensitive to methylation, i.e. the methyl-group will displace the binding sites of Egr (Ogishima et al., 2005). This is consistent with our observation that the distribution of Egr sites is more prevalent in methylation-resistant CGIFs in the brain.
Expression of ZF5 has been found in neuronal tissue (Dimitroulakos et al., 1999), but it has not been reported to be sensitive to methylation. It is suggested by our experiment that the binding of ZF5 may be methylation-sensitive, and the binding sites of ZF5 tend to be methylation-resistant in the brain tissue.
FOXM1 has been identified to be active in human malignant glioma, but inactive in normal brain tissue (Liu et al., 2006). This is in agreement with our observation that for normal brain tissue, targets of FOXM1 are mostly located in methylated domains.
3.3 Other methods for methylation prediction
The method reported by Feltus et al. (2003) is the first method for predicting methylation-prone or methylation-resistant CpG islands. It was trained on a very different dataset obtained by in vitro experiments, and they used an in-house DNA pattern discovery tool to select sequence motifs and a linear program optimization-based discriminant analysis method for the classification. Unfortunately, we have not been able to obtain the access to their program to do any comparisons.
Bhasin et al. (2005) designed the software Methylator to predict the methylation status of a single CpG using a much smaller window. It has a publicly available web server (http://bio.dfci.harvard.edu/Methylator). When applying it for the prediction of methylation status of CpG islands or CGIFs, we calculated a score defined as the proportion of m5CpGs out of all CpGs in the fragment based on Methylator predictions. CGIFs with score >0.5 were reported as methylation-prone, otherwise they were reported as methylation-resistant. We applied this method on all the 400bp CGIFs in the CGIF-set, and the results are shown in Table 4. The different performance between MethCGI and Methylator may be a result of differences in the methods, or it may also be due to differences in the training data used by the two methods (Methylator was trained on data from several different human tissues).
|
Recently, Bock et al. (2006) also investigated DNA attributes which discriminate between the methylation-prone and the methylation-resistant CpG islands based on data of 132 CpG islands (with %G+C >50%, CpG ratio >0.6 and the length longer than 400 bp) across Chromosome 21 in human peripheral blood lymphocytes. They used 918 DNA attributes from local genomic regions to predict the methylation status of CpG islands. To validate the predicting model, they applied it to data from the HEP pilot study (Rakyan et al., 2004). The HEP data contain 253 amplicons related to seven human tissues. After mapping to the NCBI35 genome, Bock et al. calculated the average methylation levels of each amplicon across seven tissues. With the threshold of 60% for the methylation level, 163 methylated and 47 unmethylated amplicons were obtained. As Bock et al. did not provide accessible software, we applied our MethCGI on the same HEP data for comparison, using the window length of 400 bp. The comparison results are shown in Table 5. MethCGI seems to perform slightly better, but large datasets will be necessary for a conclusive comparison.
|
3.4 The performance of MethCGI on MethDB
It is known that some CpG islands may have differential methylation patterns in different tissues (Shiota et al., 2002; Song et al., 2005). We would like to see how MethCGI performs on data in MethDB that are collected from different tissues. We collected all records of healthy humans in the database and obtained 233 sequences for 18 human tissues.
Since the length of sequence segments in MethDB varies, we extracted CGIFs from these sequences using different window lengths (200, 300 and 400bp). For each CGIF, we obtained its actual methylation status according to an m-score, which is defined as the proportion of m5CpGs out of all CpGs in the region, based on the annotation in the MethDB database. If the m-score is <0.5, the corresponding CGIF is defined as methylation-resistant; otherwise the CGIF is regarded as methylation-prone. We predicted the methylation status of these CGIFs with MethCGI. With windows of different lengths, the total prediction accuracy ranges from 64.81 to 72.87%.
It is encouraging to see that the prediction on data from other tissues with the MethCGI training on brain data is quite good, which agrees with the reported observation that many tissues have very similar methylation landscapes (Song et al., 2005; Grunau et al., 2000), and it also indicates that MethCGI may be applied to other tissues as a primary scanning. However, we are aware that the data used in this experiment are very limited (most tissues have <10 samples). More large-scale, high-resolution DNA methylation data from different tissues and developmental stages are needed for both the development of prediction tools with higher accuracy as well as for better understanding of the nature of DNA methylation.
| Acknowledgments |
|---|
The authors thank Dustin Schones for providing the PWMs of non-redundant vertebrate transcription factors and for his help on improving the English of the manuscript. The authors thank Drs Bock and Lengauer for providing the DNA methylation data from the HEP pilot study and helpful comments on an early version of the manuscript. This work is partially supported by NSFC grant 60234020 and the National Basic Research Program of China (2004CB518605) (X.Z and M.Q.Z), the Changjiang Professorship Award of China (M.Q.Z) and NIH grant HG001696 (M.Q.Z).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Keith A Crandall
Received on May 29, 2006; revised on June 24, 2006; accepted on July 5, 2006
| REFERENCES |
|---|
|
|
|---|
Antequera, F. and Bird, A. (1993) Number of CpG islands and genes in human and mouse. Proc. Natl Acad. Sci. USA, 90, 1199511999
Baldi, P., et al. (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics, 16, 412424
Batzer, M.A. and Deininger, P.L. (2002) Alu repeats and human genomic diversity. Nat Rev Genet, . 3, 370379[CrossRef][Web of Science][Medline].
Bhasin, M., et al. (2005) Prediction of methylated CpGs in DNA sequences using a support vector machine. FEBS Lett, . 579, 43024308.
Bird, A.P. (1978) Use of restriction enzymes to study eukaryotic DNA methylation. II. The symmetry of methylated sites supports semi-conservative copying of the methylation pattern. J. Mol. Biol, . 118, 4960[CrossRef][Web of Science][Medline].
Bird, A.P. (1986) CpG-rich islands and the function of DNA methylation. Nature, 321, 209213[CrossRef][Medline].
Bird, A.P. and Wolffe, A.P. (1999) Methylation-induced repression-belts, braces, and chromatin. Cell, 99, 451454[CrossRef][Web of Science][Medline].
Bird, A. (2002) DNA methylation patterns and epigenetic memory. Genes Dev, . 16, 621
Bock, C., et al. (2006) CpG island methylation in human lymphocytes is highly correlated with DNA sequence, repeats, and predicted DNA structure. PLoS Genet, . 2, e26[CrossRef][Medline].
Collobert, R. and Bengio, S. (2001) SVMTorch: Support vector machines for large-scale regression problems. J. Mach. Learn. Res, . 1, 143160.
Das, R., et al. (2006) Computational prediction of methylation status in human genomic sequences. Proc. Natl. Acad. Sci. USA, 103, 1071310716
Dimitroulakos, J., et al. (1999) Identification of a novel zinc finger gene, zf5-3, as a potential mediator of neuroblastoma differentiation. Int. J. Cancer, 81, 970978[CrossRef][Web of Science][Medline].
Feltus, F.A., et al. (2003) Predicting aberrant CpG island methylation. Proc. Natl Acad. Sci. USA, 100, 1225312258
Gardiner-Garden, M. and Frommer, M. (1987) CpG islands in vertebrate genomes. J. Mol. Biol, . 196, 261282[CrossRef][Web of Science][Medline].
Graff, J.R., et al. (1997) Mapping patterns of CpG island methylation in normal and neoplastic cells implicates both upstream and downstream regions in de novo methylation. J. Biol. Chem, . 272, 2232222329
Gruenbaum, Y., et al. (1981) Methylation of CpG sequences in eukaryotic DNA. FEBS lett, . 124, 6771[CrossRef][Web of Science][Medline].
Grunau, C., et al. (2000) Large-scale methylation analysis of human genomic DNA reveals tissue-specific differences between the methylation profiles of genes and pseudogenes. Hum. Mol. Genet, . 9, 26512663
Grunau, C., et al. (2001) MethDB-a public database for DNA methylation data. Nucleic Acids Res, . 29, 270274
Hilger-Eversheim, K., et al. (2000) Regulatory roles of AP-2 transcription factors in vertebrate development, apoptosis and cell-cycle control. Gene, 260, 112[CrossRef][Web of Science][Medline].
Jones, P.A. and Baylin, S.B. (2002) The fundamental role of epigenetic events in cancer. Nat. Rev. Genet, . 3, 415428[Web of Science][Medline].
Kel, A.E., et al. (2003) MATCHTM: a tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res, . 31, 35763579
Kim, J., et al. (2003) Methylation-sensitive binding of transcription factor YY1 to an insulator sequence within the paternally expressed imprinted gene, Peg3. Hum. Mol. Genet, . 12, 233245
Lander, E.S., et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860921[CrossRef][Medline].
Liu, M., et al. (2006) FoxM1B is overexpressed in human glioblastomas and critically regulates the tumorigenicity of glioma cells. Cancer Res, . 66, 35933602
McPherson, L.A., et al. (1997) Identification of ERF-1 as a member of the AP2 transcription factor family. Proc. Natl Acad. Sci. USA, 94, 43424347
McPherson, L.A. and Weigel, R.J. (1999) AP2
and AP2
: a comparison of binding site specificity and trans-activation of the estrogen receptor promoter and single site promoter constructs. Nucleic Acids Res, . 27, 40404049
O'Donovan, K.J., et al. (1999) The EGR family of transcription-regulatory factors: progress at the interface of molecular and systems neuroscience. Trends Neurosci, . 22, 167173[CrossRef][Web of Science][Medline].
Ogishima, T., et al. (2005) Promoter CpG hypomethylation and transcription factor EGR1 hyperactivate heparanase expression in bladder cancer. Oncogene, 24, 67656772[CrossRef][Web of Science][Medline].
Rakyan, V.K., et al. (2004) DNA methylation profiling of the human major histocompatibility complex: a pilot study for the human epigenome project. PLoS Biol, . 2, e405[CrossRef][Medline].
Rollins, R.A., et al. (2006) Large-scale structure of genomic methylation patterns. Genome Res, . 16, 157163
Schones, D.E., et al. (2005) Similarity of position frequency matrices for transcription factor binding sites. Bioinformatics, 21, 307313
Shiota, K., et al. (2002) Epigenetic marks by DNA methylation specific to stem, germ and somatic cells in mice. Genes Cells, 7, 961969[Abstract].
Singal, R. and Ginder, G.D. (1999) DNA methylation. Blood, 93, 40594070
Song, F., et al. (2005) Association of tissue-specific differentially methylated regions (TDMs) with differential gene expression. Proc. Natl Acad. Sci. USA, 102, 33363341
Strunnikova, M., et al. (2005) Chromatin inactivation precedes de novo DNA methylation during the progressive epigenetic silencing of the RASSF1A promoter. Mol. Cell. Biol, . 25, 39233933
Sved, J. and Bird, A. (1990) The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model. Proc. Natl Acad. Sci. USA, 87, 46924696
Vapnik, V.N. The Nature of Statistical Learning Theory, (1995) , New York Springer.
Vapnik, V.N. (1999) An overview of statistical learning theory. IEEE Trans. Neural Netw, . 10, 988999[CrossRef][Web of Science][Medline].
Venter, J.C., et al. (2001) The sequence of the human genome. Science, 291, 13041351
Walsh, C.P. and Bestor, T.H. (1999) Cytosine methylation and mammalian development. Genes Dev, . 13, 2634
This article has been cited by other articles:
![]() |
M. T. McCabe, J. C. Brandes, and P. M. Vertino Cancer DNA Methylation: Molecular Mechanisms and Clinical Implications Clin. Cancer Res., June 15, 2009; 15(12): 3927 - 3937. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Li, H.-i. H. Paik, C. Balch, Y. Kim, L. Li, T. H-M. Huang, K. P. Nephew, and S. Kim Enriched transcription factor binding sites in hypermethylated gene promoters in drug resistant cancer cells Bioinformatics, August 15, 2008; 24(16): 1745 - 1748. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Bock, J. Walter, M. Paulsen, and T. Lengauer Inter-individual variation of DNA methylation and its implications for large-scale epigenome mapping Nucleic Acids Res., June 1, 2008; 36(10): e55 - e55. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hannenhalli Eukaryotic transcription factor binding sites--modeling and integrative search methods Bioinformatics, June 1, 2008; 24(11): 1325 - 1331. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Bock and T. Lengauer Computational epigenetics Bioinformatics, January 1, 2008; 24(1): 1 - 10. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M. Brena and J. F. Costello Genome-epigenome interactions in cancer Hum. Mol. Genet., April 15, 2007; 16(R1): R96 - R105. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||










