Skip Navigation

This Article
Right arrow Full Text (Print PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Fraenkel, Y. M.
Right arrow Articles by Margalit, H.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Fraenkel, Y. M.
Right arrow Articles by Margalit, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© Oxford University Press

Identification of common motifs in unaligned DNA sequences: application to Escherichia coli Lrp regulon

Yishai M. Fraenkel , Yael Mandel , Devorah Friedberg and Hanah Margalit 1

Departments of Molecular Genetics and Applied Microbiology, Hebrew University-Hadassah Medical School PO Box 12272, Jerusalem 91120, Israel

1 To whom correspondence should be addressed. Email: hanah{at}hujivms.huji.ac.il

We describe a relatively simple method for the identification of common motifs in DNA sequences that are known to share a common function. The input sequences are unaligned and there is no information regarding the position or orientation of the motif. Often such data exists for protein- binding regions, where genetic or molecular information that defines the binding region is available, but the specific recognition site within it is unknown. The method is based on the principle of ‘divide and conquer’; we first search for dominant submotifs and then build full-length motifs around them. This method has several useful features: (i) it screens all submotifs so that the results are independent of the sequence order in the data; (ii) it allows the submotifs to contain spacers; (iii) it identifies an existing motif even if the data contains ‘noise’. (iv) its running time depends linearly on the total length of the input. The method is demonstrated on two groups of protein-binding sequences: a well-studied group of known CRP-binding sequences, and a relatively newly identified group of genes known to be regulated by Lrp. The Lrp motif that we identify, based on 23 gene sequences, is similar to a previously identified motif based on a smaller data set, and to a consensus sequence of experimentally defined binding sites. Individual Lrp sites are evaluated and compared in regard to their regulation mode.



Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
S. Sinha and M. Tompa
Discovery of novel transcription factor binding sites by statistical overrepresentation
Nucleic Acids Res., December 15, 2002; 30(24): 5549 - 5560.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Besemer, A. Lomsadze, and M. Borodovsky
GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions
Nucleic Acids Res., June 15, 2001; 29(12): 2607 - 2618.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.