Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (17)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Hampson, S.
Right arrow Articles by Baldi, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hampson, S.
Right arrow Articles by Baldi, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 18 no. 4 2002
Pages 513-528
© 2002 Oxford University Press

Distribution patterns of over-represented k-mers in non-coding yeast DNA

Steven Hampson , Dennis Kibler and Pierre Baldi *

Department of Information and Computer Science, Institute for Genomics and Bioinformatics, University of California, Irvine, Irvine, CA 92697-3425, USA

Received on May 24, 2001 ; revised on November 30, 2001 ; accepted on December 5, 2001

Motivation: Over-represented k-mers in genomic DNA regions are often of particular biological interest. For example, over-represented k-mers in co-regulated families of genes are associated with the DNA binding sites of transcription factors. To measure over-representation, we introduce a statistical background model based on single-mismatches, and apply it to the pooled 500 bp ORF Upstream Regions (USRs) of yeast. More importantly, we investigate the context and spatial distribution of over-represented k-mers in yeast USRs.

Results: Single and double-stranded spatial distributions of most over-represented k-mers are highly non-random, and predominantly cluster into a small number of classes that are robust with respect to over-representation measures. Specifically, we show that the three most common distribution patterns can be related to DNA structure, function, and evolution and correspond to: (a) homologous ORF clusters associated with sharply localized distributions; (b) regulatory elements associated with a symmetric broad hill-shaped distribution in the 50–200 bp USR; and (c) runs of As, Ts, and ATs associated with a broad hill-shaped distribution also in the 50–200 bp USR, with extreme structural properties. Analysis of over-representation, homology, localization, and DNA structure are essential components of a general data-mining approach to finding biologically important k-mers in raw genomic DNA and understanding the ‘lexicon’ of regulatory regions.

Contact: hampson{at}ics.uci.edu; kibler{at}ics.uci.edu; pfbaldi{at}ics.uci.edu

* To whom correspondence should be addressed. Also at Department of Biological Chemistry, College of Medicine.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
X. Dai, J. He, and X. Zhao
A new systematic computational approach to predicting target genes of transcription factors
Nucleic Acids Res., July 26, 2007; 35(13): 4433 - 4440.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
Q. Zhou and W. H. Wong
CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling
PNAS, August 17, 2004; 101(33): 12114 - 12119.
[Abstract] [Full Text] [PDF]


Home page
Circ. Res.Home page
T. Li, Y.-H. Chen, T.-J. Liu, J. Jia, S. Hampson, Y.-X. Shan, D. Kibler, and P. H. Wang
Using DNA Microarray to Identify Sp1 as a Transcriptional Regulatory Element of Insulin-Like Growth Factor 1 in Cardiac Muscle Cells
Circ. Res., December 12, 2003; 93(12): 1202 - 1209.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. Richard and G. Nuel
SPA: simple web tool to assess statistical significance of DNA patterns
Nucleic Acids Res., July 1, 2003; 31(13): 3679 - 3681.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
S. Rombauts, K. Florquin, M. Lescot, K. Marchal, P. Rouze, and Y. Van de Peer
Computational Approaches to Identify Promoters and cis-Regulatory Elements in Plant Genomes
Plant Physiology, July 1, 2003; 132(3): 1162 - 1176.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
E. M. Conlon, X. S. Liu, J. D. Lieb, and J. S. Liu
Integrating regulatory motif discovery and genome-wide expression analysis
PNAS, March 18, 2003; 100(6): 3339 - 3344.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.