The prediction of vertebrate promoter regions using differential hexamer frequency analysis
Department of Medical Genetics, University of British Columbia Vancouver, British Columbia, Canada
1Correspondence address: c/o RabbitHutch Biotechnology Corporation, PO Box 506, 108 Mile Ranch, British Columbia, V0K 2Z0, Canada
MOTIVATION: To develop an algorithm utilizing differential hexamer frequency analysis to discriminate promoter from non-promoter regions in vertebrate DNA sequence, without relying upon an extensive database of known transcriptional elements.
RESULTS: By determining hexamer frequencies derived from known promoter regions, coding regions and non-coding regions in vertebrates' DNA sequence, and a formula first applied by Claverie and Bougueleret (1986), a discriminant measure was created that compares promoter regions with coding (D1) and non-coding (D2) sequence. The algorithm is able to identify correctly the promoter regions in 18 of 29 loci (62.1%) from an independent test data set. With program options set to identify only one promoter region in the forward strand, there are 11 false-positive predictions in 208 714 nucleotides (one false positive in 18 974 single-stranded bp). With options set to analyze sequence in discrete segments, there is no appreciable improvement in sensitivity, whereas the specificity falls off predictably. It is of particular interest than a search for a peak score (independent of an absolute threshold) is more accurate that a search based upon a fixed scoring threshold. This suggests that the selection of promoter sites may be influenced by the global properties of an entire sequence domain, rather than exclusively upon local characteristics.
AVAILABILITY: A binary-executable, MS-DOS version of PromFind is available free of charge by anonymous ftp, address: iubio.bio.indiana.edu, directory: molbio/ibmpc.
CONTACT: E-mail: hutch{at}netshop.bc.ca
This article has been cited by other articles:
![]() |
O. V. Vishnevsky and N. A. Kolchanov ARGO: a web system for the detection of degenerate motifs and large-scale recognition of eukaryotic promoters Nucleic Acids Res., July 1, 2005; 33(suppl_2): W417 - W422. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rombauts, K. Florquin, M. Lescot, K. Marchal, P. Rouze, and Y. Van de Peer Computational Approaches to Identify Promoters and cis-Regulatory Elements in Plant Genomes Plant Physiology, July 1, 2003; 132(3): 1162 - 1176. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Liu and D. J. States Consensus Promoter Identification in the Human Genome Utilizing Expressed Gene Markers and Gene Modeling Genome Res., March 1, 2002; 12(3): 462 - 469. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Ohler Promoter Prediction on a Genomic Scale---The Adh Experience Genome Res., April 1, 2000; 10(4): 539 - 542. [Abstract] [Full Text] |
||||
![]() |
T. G. Wolfsberg, A. E. Gabrielian, M. J. Campbell, R. J. Cho, J. L. Spouge, and D. Landsman Candidate Regulatory Sequence Elements for Cell Cycle-Dependent Transcription in Saccharomyces cerevisiae Genome Res., August 1, 1999; 9(8): 775 - 792. [Abstract] [Full Text] |
||||
![]() |
M. Q. Zhang Identification of Human Gene Core Promoters in Silico Genome Res., March 1, 1998; 8(3): 319 - 326. [Abstract] [Full Text] |
||||
![]() |
J. W. Fickett and A. G. Hatzigeorgiou Eukaryotic Promoter Recognition Genome Res., September 1, 1997; 7(9): 861 - 878. [Full Text] [PDF] |
||||


