Genome inhomogeneity is determined mainly by WW and SS dinucleotides
Institute of Control Sciences Moscow 117342
1Laboratory of Mathematical Methods, Institute of Genetics of Microorganisms Moscow 113545, USSR
2To whom reprint requests should be sent
According to the hypothesis of the modular structure of DNA, genomes consist of modules of various nature which may differ in statistical characteristics. Statistical analysis helps in revealing the differences in statistical characteristics and predicting the modular structure. In this connection the question about the contribution of each word of length l (l-tuple) to the inhomogeneity of genetic text arises. The notion of stationary (i.e. relatively evenly distributed over a genome) versus non-stationary l-tuples has been introduced previously. In this paper, the dinucleotide distributions for all long sequences from GenBank were analyzed and it was shown that non-stationary dinucleotides are closely associated with polyW and polyS tracts (W denotes weak nucleotides A or T, while S stands for the strong nucleotides G or C). Thus, genome inhomogeneity is shown to be determined mainly by AA, TT, GG, CC, AT, TA, GC and CG dinucleotides. It has been demonstrated that neither codon usage nor the isochore model can account for this phenomenon.
This article has been cited by other articles:
![]() |
S Karlin and V Brendel Chance and statistical significance in protein and DNA sequence analysis Science, July 3, 1992; 257(5066): 39 - 49. [Abstract] [PDF] |
||||
