Bioinformatics Advance Access originally published online on November 17, 2007
Bioinformatics 2008 24(1):46-55; doi:10.1093/bioinformatics/btm543
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A profile-based deterministic sequential Monte Carlo algorithm for motif discovery
Columbia University, Department of Electrical Engineering, New York, NY 10025, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Conserved motifs often represent biological significance, providing insight on biological aspects such as gene transcription regulation, biomolecular secondary structure, presence of non-coding RNAs and evolution history. With the increasing number of sequenced genomic data, faster and more accurate tools are needed to automate the process of motif discovery.
Results: We propose a deterministic sequential Monte Carlo (DSMC) motif discovery technique based on the position weight matrix (PWM) model to locate conserved motifs in a given set of nucleotide sequences, and extend our model to search for instances of the motif with insertions/deletions. We show that the proposed method can be used to align the motif where there are insertions and deletions found in different instances of the motif, which cannot be satisfactorily done using other multiple alignment and motif discovery algorithms.
Availability: MATLAB code is available at http://www.ee.columbia.edu/~kcliang
Contact: xw2008{at}columbia.edu
Associate Editor: Martin Bishop
Received on April 16, 2007; revised on October 11, 2007; accepted on October 27, 2007