Bioinformatics Advance Access published online on January 22, 2004
Bioinformatics, doi:10.1093/bioinformatics/btg450
Bioinformatics © Oxford University Press 2004; all rights reserved
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 The State Key Laboratory of Pharmaceutical Biotechnology, School of Life Science, Nanjing University, Nanjing 210093, China
* To whom correspondence should be addressed. E-mail: jwang{at}nju.edu.cn.
Motivation: Over-represented k-mers in non-coding genomic regions often lead to identification of potential transcriptional regulatory sites (TRS). This phenomenon has been employed by many algorithms to predict TRS in silico. Yet the improvement of these algorithms should be based on a deeper understanding of the enrichment feature. To obtain a general distributional profile of TRS in different regions of genomes as well as in different genomes, we here performed a systematic analysis on the over-representation of TRS in intergenic regions and gene upstream regions of yeasts and viral genomes, and the distributional pattern of TRS in intergenic and intron regions of the Drosophila genome. We also explored the way to evaluate the accuracy of TRS consensus sequences by measuring their enrichment. Results: To measure enrichment, a statistical background model was introduced by comparing TRS frequency in certain regions of genome to either the frequency in the whole genome or the frequency in exon region. This model was applied to different classes of non-coding genomic regions in four genomes. Most of the TRS were observed to be over-represented in the intergenic regions of the S. cerevisiae, S. pombe and Epstein-Barr virus (EBV) genomes. The enrichment of S. cerevisiae TRS in the 600bp upstream region of genes was also significant. In Drosophila genome, TRS didn't show enrichment in intergenic and intron regions when TRS frequency in the whole genome was taken as background, as we did in other genomes. However, when we took TRS frequency in exon region as background, over 70% TRS are over-represented in those two classes of non-coding regions. This fact indicates the existence of transcriptional regulatory signals in introns. The analysis of some S. cerevisiae TRS, which have inconsistent consensus sequences with different levels of enrichment in intergenic region, suggests the possibility of evaluating the accuracy of experimentally determined TRS by measuring their enrichment in non-coding genomic regions. Availability: Free programs are available at http://dii.nju.edu.cn/~xuewen/enrichment/
Revised September 3, 2003
Accepted September 26, 2003
Article
Enrichment of transcriptional regulatory sites in non-coding genomic region
2 The Center for Theoretical Biology, Beijing University, Beijing 100781, China
![]()
Abstract ![]()
CiteULike
Connotea
Del.icio.us What's this?