Bioinformatics Advance Access published online on December 9, 2008
Bioinformatics, doi:10.1093/bioinformatics/btn609
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ARCS-Motif: Discovering Correlated Motifs from Unaligned Biological Sequences
Case Western Reserve University, Cleveland, OH 44106 USA
To whom correspondence should be addressed. Jiong Yang, E-mail: jiong.yang{at}case.edu
| Abstract |
|---|
Motivation: The goal of motif discovery is to detect novel, unknown, and important signals from biology sequences. In most models, the importance of a motif is equal to the sum of the similarity of every single position. (Song et al. 2006) introduced ARCS measure which includes correlation information to the evaluation of motif importance. The paper showed that the ARCS measure is superior to other measures. Due to the complicated nature of the ARCS motif model, we cannot directly apply existing sequential motif discovery methods to find motifs with high ARCS values.
Results: This paper presents a novel mining algorithm, ARCSMotif, to discover related sequential motifs in biological sequences. ARCS-Motif is applied to 400 PROSITE data sets and compared with five alternative methods (CONSENSUS, Gibbs sampler, MEME, SPLASH, and DIALIGN-TX). ARCS-Motif outperforms all the methods in accuracy, and most of the methods in efficiency. Although SPLASH has better efficiency than ARCS-Motif, ARCS-Motif has much better accuracy than SPLASH. On average, ARCS-Motif is able to produce the motifs which are at least 10% better than the best of the alternative methods. Among the 400 PROSITE data sets, ARCSMotif produces the best motifs for more than 200 families. Other than SPLASH, the execution time of ARCS-Motif is less than a third of that of the fastest alternative method and its execution time grows at the slowest rate with respect to the number of sequences and the average sequence among all methods.
Availability: Software: http://beijing.case.edu/ARCS_Motif/ARCS_Motif Results: http://beijing.case.edu/ARCS_Motif.
Associate Editor: Dr. Limsoon Wong
Received on May 3, 2008; revised on November 19, 2008; accepted on November 22, 2008