Bioinformatics Vol. 16 no. 10 2000
Pages 865-889
© 2000 Oxford University Press
Original Paper |
Sequence analysis by additive scales: DNA structure for sequences and repeats of all lengths
1 Department of Information and Computer
Science
2 Department of Biological Chemistry,
College of Medicine, University of California, Irvine, CA
92697-3425, USA
Received on April 24, 2000
; accepted on May 25, 2000
Motivation: DNA structure plays an important role in a variety of biological processes. Different di- and tri-nucleotide scales have been proposed to capture various aspects of DNA structure including base stacking energy, propeller twist angle, protein deformability, bendability, and position preference. Yet, a general framework for the computational analysis and prediction of DNA structure is still lacking. Such a framework should in particular address the following issues: (1) construction of sequences with extremal properties; (2) quantitative evaluation of sequences with respect to a given genomic background; (3) automatic extraction of extremal sequences and profiles from genomic databases; (4) distribution and asymptotic behavior as the length Nof the sequences increases; and (5) complete analysis of correlations between scales.
Results: We develop a general framework for sequence analysis based on additive scales, structural or other, that addresses all these issues. We show how to construct extremal sequences and calibrate scores for automatic genomic and database extraction. We show that distributions rapidly converge to normality as Nincreases. Pairwise correlations between scales depend both on background distribution and sequence length and rapidly converge to an analytically predictable asymptotic value. For di- and tri-nucleotide scales, normal behavior and asymptotic correlation values are attained over a characteristic window length of about 1015 bp. With a uniform background distribution, pairwise correlations between empirically-derived scales remain relatively small and roughly constant at all lengths, except for propeller twist and protein deformability which are positively correlated. There is a positive (resp. negative) correlation between dinucleotide base stacking (resp. propeller twist and protein deformability) and AT-content that increases in magnitude with length. The framework is applied to the analysis of various DNA tandem repeats. We derive exact expressions for counting the number of repeat unit classes at all lengths. Tandem repeats are likely to result from a variety of different mechanisms, a fraction of which is likely to depend on profiles characterized by extreme structural features.
Contact: pfbaldi{at}ics.uci.edu; baisnee{at}ics.uci.edu
To whom all correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
Y. D. Kelkar, S. Tyekucheva, F. Chiaromonte, and K. D. Makova The genome-wide determinants of human and chimpanzee microsatellite evolution Genome Res., January 1, 2008; 18(1): 30 - 38. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Klockgether, D. Wurdemann, O. Reva, L. Wiehlmann, and B. Tummler Diversity of the Abundant pKLC102/PAGI-2 Family of Genomic Islands in Pseudomonas aeruginosa J. Bacteriol., March 15, 2007; 189(6): 2443 - 2459. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Munch, K. Hiller, A. Grote, M. Scheer, J. Klein, M. Schobert, and D. Jahn Virtual Footprint and PRODORIC: an integrative framework for regulon prediction in prokaryotes Bioinformatics, November 15, 2005; 21(22): 4187 - 4189. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Paar, N. Pavin, M. Rosandic, M. Gluncic, I. Basar, R. Pezer, and S. D. Zinic ColorHOR--novel graphical algorithm for fast scan of alpha satellite higher-order repeats and HOR annotation for GenBank sequence of human genome Bioinformatics, April 1, 2005; 21(7): 846 - 852. [Abstract] [Full Text] [PDF] |
||||


