ARTS: accurate recognition of transcription starts in human
1 Fraunhofer Institute FIRST Kekuléstr. 7, Berlin, Germany
2 Max Planck Institute for Biological Cybernetics Spemannstr. 38, Tübingen, Germany
3 Friedrich Miescher Laboratory, Max Planck Society, Spemannstr. 39, Tübingen Germany
*To whom correspondence should be addressed.
We develop new methods for finding transcription start sites (TSS) of RNA Polymerase II binding genes in genomic DNA sequences. Employing Support Vector Machines with advanced sequence kernels, we achieve drastically higher prediction accuracies than state-of-the-art methods.
Motivation: One of the most important features of genomic DNA are the protein-coding genes. While it is of great value to identify those genes and the encoded proteins, it is also crucial to understand how their transcription is regulated. To this end one has to identify the corresponding promoters and the contained transcription factor binding sites. TSS finders can be used to locate potential promoters. They may also be used in combination with other signal and content detectors to resolve entire gene structures.
Results: We have developed a novel kernel based method called ARTS that accurately recognizes transcription start sites in human. The application of otherwise too computationally expensive Support Vector Machines was made possible due to the use of efficient training and evaluation techniques using suffix tries. In a carefully designed experimental study, we compare our TSS finder to state-of-the-art methods from the literature: McPromoter, Eponine and FirstEF. For given false positive rates within a reasonable range, we consistently achieve considerably higher true positive rates. For instance, ARTS finds about 35% true positives at a false positive rate of 1/1000, where the other methods find about a half (18%).
Availability: Datasets, model selection results, whole genome predictions, and additional experimental results are available at http://www.fml.tuebingen.mpg.de/raetsch/projects/arts
Contact: Gunnar.Raetsch{at}tuebingen.mpg.de
This article has been cited by other articles:
![]() |
G. Schweikert, A. Zien, G. Zeller, J. Behr, C. Dieterich, C. S. Ong, P. Philips, F. De Bona, L. Hartmann, A. Bohlen, et al. mGene: Accurate SVM-based gene finding with an application to nematode genomes Genome Res., November 1, 2009; 19(11): 2133 - 2143. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. G. Dineen, A. Wilm, P. Cunningham, and D. G. Higgins High DNA melting temperature predicts transcription start site location in human and mouse Nucleic Acids Res., October 9, 2009; (2009) gkp821v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Zeng, S. Zhu, and H. Yan Towards accurate human promoter recognition: a review of currently used sequence features and classification methods Brief Bioinform, September 1, 2009; 10(5): 498 - 508. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Narlikar and I. Ovcharenko Identifying regulatory elements in eukaryotic genomes Brief Funct Genomic Proteomic, July 1, 2009; 8(4): 215 - 230. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Abeel, Y. Van de Peer, and Y. Saeys Toward a gold standard for promoter prediction evaluation Bioinformatics, June 15, 2009; 25(12): i313 - i320. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Brejova, T. Vinar, Y. Chen, S. Wang, G. Zhao, D. G. Brown, M. Li, and Y. Zhou Finding genes in Schistosoma japonicum: annotating novel genomes with help of extrinsic evidence Nucleic Acids Res., April 1, 2009; 37(7): e52 - e52. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Megraw, F. Pereira, S. T. Jensen, U. Ohler, and A. G. Hatzigeorgiou A transcription factor affinity-based code for mammalian transcription initiation Genome Res., April 1, 2009; 19(4): 644 - 656. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Sonnenburg, A. Zien, P. Philips, and G. Ratsch POIMs: positional oligomer importance matrices--understanding support vector machine-based signal detectors Bioinformatics, July 1, 2008; 24(13): i6 - i14. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Abeel, Y. Saeys, P. Rouze, and Y. Van de Peer ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles Bioinformatics, July 1, 2008; 24(13): i24 - i31. [Abstract] [Full Text] [PDF] |
||||




