Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (16)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Baldi, P.
Right arrow Articles by Baisnée, P.-F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Baldi, P.
Right arrow Articles by Baisnée, P.-F.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 16 no. 10 2000
Pages 865-889
© 2000 Oxford University Press


Original Paper

Sequence analysis by additive scales: DNA structure for sequences and repeats of all lengths

Pierre Baldi 1,2, and Pierre-François Baisnée 1

1 Department of Information and Computer Science
2 Department of Biological Chemistry, College of Medicine, University of California, Irvine, CA 92697-3425, USA

Received on April 24, 2000 ; accepted on May 25, 2000

Motivation: DNA structure plays an important role in a variety of biological processes. Different di- and tri-nucleotide scales have been proposed to capture various aspects of DNA structure including base stacking energy, propeller twist angle, protein deformability, bendability, and position preference. Yet, a general framework for the computational analysis and prediction of DNA structure is still lacking. Such a framework should in particular address the following issues: (1) construction of sequences with extremal properties; (2) quantitative evaluation of sequences with respect to a given genomic background; (3) automatic extraction of extremal sequences and profiles from genomic databases; (4) distribution and asymptotic behavior as the length Nof the sequences increases; and (5) complete analysis of correlations between scales.

Results: We develop a general framework for sequence analysis based on additive scales, structural or other, that addresses all these issues. We show how to construct extremal sequences and calibrate scores for automatic genomic and database extraction. We show that distributions rapidly converge to normality as Nincreases. Pairwise correlations between scales depend both on background distribution and sequence length and rapidly converge to an analytically predictable asymptotic value. For di- and tri-nucleotide scales, normal behavior and asymptotic correlation values are attained over a characteristic window length of about 10–15 bp. With a uniform background distribution, pairwise correlations between empirically-derived scales remain relatively small and roughly constant at all lengths, except for propeller twist and protein deformability which are positively correlated. There is a positive (resp. negative) correlation between dinucleotide base stacking (resp. propeller twist and protein deformability) and AT-content that increases in magnitude with length. The framework is applied to the analysis of various DNA tandem repeats. We derive exact expressions for counting the number of repeat unit classes at all lengths. Tandem repeats are likely to result from a variety of different mechanisms, a fraction of which is likely to depend on profiles characterized by extreme structural features.

Contact: pfbaldi{at}ics.uci.edu; baisnee{at}ics.uci.edu

To whom all correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Genome ResHome page
Y. D. Kelkar, S. Tyekucheva, F. Chiaromonte, and K. D. Makova
The genome-wide determinants of human and chimpanzee microsatellite evolution
Genome Res., January 1, 2008; 18(1): 30 - 38.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
J. Klockgether, D. Wurdemann, O. Reva, L. Wiehlmann, and B. Tummler
Diversity of the Abundant pKLC102/PAGI-2 Family of Genomic Islands in Pseudomonas aeruginosa
J. Bacteriol., March 15, 2007; 189(6): 2443 - 2459.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Munch, K. Hiller, A. Grote, M. Scheer, J. Klein, M. Schobert, and D. Jahn
Virtual Footprint and PRODORIC: an integrative framework for regulon prediction in prokaryotes
Bioinformatics, November 15, 2005; 21(22): 4187 - 4189.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
V. Paar, N. Pavin, M. Rosandic, M. Gluncic, I. Basar, R. Pezer, and S. D. Zinic
ColorHOR--novel graphical algorithm for fast scan of alpha satellite higher-order repeats and HOR annotation for GenBank sequence of human genome
Bioinformatics, April 1, 2005; 21(7): 846 - 852.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.