Bioinformatics Vol. 19 no. 5 2003
Pages 607-617
© 2003 Oxford University Press
Greedy mixture learning for multiple motif discovery in biological sequences
Department of Computer Science, University of Ioannina 45110 Ioannina, Greece and Biomedical Research Institute, Foundation for Research and Technology, Hellas, 45110 Ioannina, Greece
Received on January 24, 2002
; revised on April 20, 2002 and June 20, 2002
; accepted on October 7, 2002
Motivation: This paper studies the problem of discovering subsequences, known as motifs, that are common to a given collection of related biosequences, by proposing a greedy algorithm for learning a mixture of motifs model through likelihood maximization. The approach adds sequentially a new motif to a mixture model by performing a combined scheme of global and local search for appropriately initializing its parameters. In addition, a hierarchical partitioning scheme based on kd-trees is presented for partitioning the input dataset in order to speed-up the global searching procedure. The proposed method compares favorably over the well-known MEME approach and treats successfully several drawbacks of MEME.
Results: Experimental results indicate that the algorithm is advantageous in identifying larger groups of motifs characteristic of biological families with significant conservation. In addition, it offers better diagnostic capabilities by building more powerful statistical motif-models with improved classification accuracy.
Availability: Source code in Matlab is available at http://www.cs.uoi.gr/~kblekas/greedy/GreedyEM.html
Contact: kblekas{at}cs.uoi.gr
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C.-M. Hsu, C.-Y. Chen, and B.-J. Liu Corrigendum Nucleic Acids Res., March 27, 2008; 36(4): 1400 - 1406. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hamada, K. Tsuda, T. Kudo, T. Kin, and K. Asai Mining frequent stem patterns from unaligned RNA sequences Bioinformatics, October 15, 2006; 22(20): 2480 - 2487. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-M. Hsu, C.-Y. Chen, and B.-J. Liu MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W356 - W361. [Abstract] [Full Text] [PDF] |
||||

