Bioinformatics Advance Access first published online on January 18, 2007
This version published online on January 19, 2007
Bioinformatics, doi:10.1093/bioinformatics/btl662
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Computing exact p-values for DNA motifs (Part I)
1State Key Laboratory of Intelligent Technology & System, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.
2Bioinformatics Devision, TNLIST and Department of Antomation, Tsinghua University, Beijing 100084, China.
3School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada.
4CWI, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands.
5Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11274, USA.
Michael Q. Zhang, Ming Li, E-mail: mzhang{at}cshl.edu, mli{at}uwaterloo.ca
| Abstract |
|---|
Motivation: Many heuristic algorithms have been designed to approximate p-values of DNA motifs described by position weight matrices, for evaluating their statistical significance. They often significantly deviate from the true p-value by orders of magnitude. Exact p-value computation is needed for ranking the motifs. Furthermore, surprisingly, the complexity of the problem is unknown.
Results: We show the problem to be NP-hard, and present MotifRank, software based on dynamic programming, to calculate exact p-values of motifs. We define the exact p-value on a general and more precise model. Asymptotically, MotifRank is faster than the best exact p-value computing algorithm, and is in fact practical. Our experiments clearly demonstrate that MotifRank significantly improves the accuracy of existing approximation algorithms.
Availability: MotifRank is available from http://bio.dlg.cn
Associate Editor: Dmitrij Frishman
Received on July 7, 2006; revised on December 1, 2006; accepted on December 22, 2006
This article has been cited by other articles:
![]() |
P. Ribeca and E. Raineri Faster exact Markovian probability functions for motif occurrences: a DFA-only approach Bioinformatics, December 15, 2008; 24(24): 2839 - 2848. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. G. S. da Fonseca, K. S. Guimaraes, and M.-F. Sagot Efficient representation and P-value computation for high-order Markov motifs Bioinformatics, August 15, 2008; 24(16): i160 - i166. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. J. Pape, S. Rahmann, and M. Vingron Natural similarity measures between position frequency matrices with an application to clustering Bioinformatics, February 1, 2008; 24(3): 350 - 357. [Abstract] [Full Text] [PDF] |
||||
