ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles
1Department of Plant Systems Biology, VIB, 2Department of Molecular Genetics and 3Laboratoire Associé de I'INRA, Ghent University, Technologiepark 927, 9052 Gent, Belgium
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: More and more genomes are being sequenced, and to keep up with the pace of sequencing projects, automated annotation techniques are required. One of the most challenging problems in genome annotation is the identification of the core promoter. Because the identification of the transcription initiation region is such a challenging problem, it is not yet a common practice to integrate transcription start site prediction in genome annotation projects. Nevertheless, better core promoter prediction can improve genome annotation and can be used to guide experimental work.
Results: Comparing the average structural profile based on base stacking energy of transcribed, promoter and intergenic sequences demonstrates that the core promoter has unique features that cannot be found in other sequences. We show that unsupervised clustering by using self-organizing maps can clearly distinguish between the structural profiles of promoter sequences and other genomic sequences. An implementation of this promoter prediction program, called ProSOM, is available and has been compared with the state-of-the-art. We propose an objective, accurate and biologically sound validation scheme for core promoter predictors. ProSOM performs at least as well as the software currently available, but our technique is more balanced in terms of the number of predicted sites and the number of false predictions, resulting in a better all-round performance. Additional tests on the ENCODE regions of the human genome show that 98% of all predictions made by ProSOM can be associated with transcriptionally active regions, which demonstrates the high precision.
Availability: Predictions for the human genome, the validation datasets and the program (ProSOM) are available upon request.
Contact: yves.vandepeer{at}psb.ugent.be
This article has been cited by other articles:
![]() |
D. G. Dineen, A. Wilm, P. Cunningham, and D. G. Higgins High DNA melting temperature predicts transcription start site location in human and mouse Nucleic Acids Res., October 9, 2009; (2009) gkp821v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Zeng, S. Zhu, and H. Yan Towards accurate human promoter recognition: a review of currently used sequence features and classification methods Brief Bioinform, September 1, 2009; 10(5): 498 - 508. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Abeel, Y. Van de Peer, and Y. Saeys Toward a gold standard for promoter prediction evaluation Bioinformatics, June 15, 2009; 25(12): i313 - i320. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Megraw, F. Pereira, S. T. Jensen, U. Ohler, and A. G. Hatzigeorgiou A transcription factor affinity-based code for mammalian transcription initiation Genome Res., April 1, 2009; 19(4): 644 - 656. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Friedel, S. Nikolajewa, J. Suhnel, and T. Wilhelm DiProDB: a database for dinucleotide properties Nucleic Acids Res., January 1, 2009; 37(suppl_1): D37 - D40. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. G. Robertson, M. Bilenky, A. Tam, Y. Zhao, T. Zeng, N. Thiessen, T. Cezard, A. P. Fejes, E. D. Wederell, R. Cullum, et al. Genome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding Genome Res., December 1, 2008; 18(12): 1906 - 1917. [Abstract] [Full Text] [PDF] |
||||



