Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Chen, Q. K.
Right arrow Articles by Stormo, G. D.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Chen, Q. K.
Right arrow Articles by Stormo, G. D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© Oxford University Press

PromFD 1.0: a computer program that predicts eukaryotic pol II promoters using strings and IMD matrices

Qing K. Chen 1, Gerald Z. Hertz and Gary D. Stormo

Department of Molecular, Cellular, and Developmental Biology, University of Colorado Boulder, CO 80309–0347, USA

1To whom correspondence should be addressed

Motivation: A large number of new DNA sequences with virtually unknown functions are generated as the Human Genome Project progresses. Therefore, it is essential to develop computer algorithms that can predict the functionality of DNA segments according to their primary sequences, including algorithms that can predict promoters. Although several promoter-predicting algorithms are available, they have high false-positive detections and the rate of promoter detection needs to be improved further.

Results: In this research, PromFD, a computer program to recognize vertebrate RNA polymerase II promoters, has been developed. Both vertebrate promoters and non-promoter sequences are used in the analysis. The promoters are obtained from the Eukaryotic Promoter Database. Promoters are divided into a training set and a test set. Non-promoter sequences are obtained from the GenBank sequence databank, and are also divided into a training set and a test set. The first step is to search out, among all possible permutations, patterns of strings 5–10 bp long, that are significantly over-represented in the promoter set. The program also searches IMD (Information Matrix Database) matrices that have a significantly higher presence in the promoter set. The results of the searches are stored in the PromFD database, and the program PromFD scores input DNA sequences according to their content of the database entries. PromFD predicts promoters—their locations and the location of potential TATA boxes, if found. The program can detect 71% of promoters in the training set with a false-positive rate of under 1 in every 13 000 bp, and 47% of promoters in the test set with a false-positive rate of under 1 in every 9800 bp. PromFD uses a new approach and its false-positive identification rate is better compared with other available promoter recognition algorithms. The source code for PromFD is in the ‘c++’ language.

Availability: PromFD is available for Unix platforms by anonymous ftp to: beagle. colorado. edu, cd pub, get promFD.tar. A Java version of the program is also available for netscape 2.0, by http: // beagle.colorado.edu/~chenq. Contact: E-mail: chenq{at}beagle.colorado.edu


Received on May 28, 1996; revised on August 29, 1996; accepted on August 29, 1996

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
T. Abeel, Y. Saeys, P. Rouze, and Y. Van de Peer
ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles
Bioinformatics, July 1, 2008; 24(13): i24 - i31.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
T. Abeel, Y. Saeys, E. Bonnet, P. Rouze, and Y. Van de Peer
Generic eukaryotic core promoter prediction using structural features of DNA
Genome Res., February 1, 2008; 18(2): 310 - 323.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Burden, Y.-X. Lin, and R. Zhang
Improving promoter prediction Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences
Bioinformatics, March 1, 2005; 21(5): 601 - 607.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
R. Liu and D. J. States
Consensus Promoter Identification in the Human Genome Utilizing Expressed Gene Markers and Gene Modeling
Genome Res., March 1, 2002; 12(3): 462 - 469.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. V. Ponomarenko, G. V. Orlova, M. P. Ponomarenko, S. V. Lavryushev, A. S. Frolov, S. V. Zybova, and N. A. Kolchanov
SELEX_DB: an activated database on selected randomized DNA/RNA sequences addressed to genomic sequence annotation
Nucleic Acids Res., January 1, 2000; 28(1): 205 - 208.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. W. Fickett and A. G. Hatzigeorgiou
Eukaryotic Promoter Recognition
Genome Res., September 1, 1997; 7(9): 861 - 878.
[Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.