Bioinformatics Vol. 15 no. 12 1999
Pages 994-999
© 1999 Oxford University Press
On the complexity measures of genetic sequences
1 Institute of Mathematics, Siberian Branch
of Russian Academy of Science, 630090 Novosibirsk, Russia
2 Department of Computer Science, Cardiff
University, PO Box 916, Cardiff CF24 3XF, UK
Motivation: It is well known that the regulatory regions of genomes are highly repetitive. They are rich in direct, symmetric and complemented repeats, and there is no doubt about the functional significance of these repeats. Among known measures of complexity, the ZivLempel complexity measure reflects most adequately repeats occurring in the text. But this measure does not take into account isomorphic repeats. By isomorphic repeats we mean fragments that are identical (or symmetric) modulo some permutation of the alphabet letters.
Results: In this paper, two complexity measures of symbolic sequences are proposed that generalize the ZivLempel complexity measure by taking into account any isomorphic repeats in the text (rather than just direct repeats as in ZivLempel). The first of them, the complexity vector, is designed for small alphabets such as the alphabet of nucleotides. The second is based on a search for the longest isomorphic fragment in the history of sequence synthesis and can be used for alphabets of arbitrary cardinality. These measures have been used for recognition of structural regularities in DNA sequences. Some interesting structures related to the regulatory region of the human growth hormone are reported.
Availability: Available on request from the authors.
Contact: gusev{at}math.nsc.ru; nadia.chuzhanova{at}cs.cf.ac.uk
Received on April 23, 1999
; revised on July 12, 1999
; accepted on August 4, 1999
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
G Spurlock, E Bennett, N Chuzhanova, N Thomas, H-P. Jim, L Side, S Davies, E Haan, B Kerr, S M Huson, et al. SPRED1 mutations (Legius syndrome): another clinically useful genotype for dissecting the neurofibromatosis type 1 phenotype J. Med. Genet., July 1, 2009; 46(7): 431 - 437. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Itzkovitz and U. Alon The genetic code is nearly optimal for allowing additional information within protein-coding sequences Genome Res., April 1, 2007; 17(4): 405 - 412. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Sironi, U. Pozzoli, G. P. Comi, S. Riva, A. Bordoni, N. Bresolin, and D. K. Nag A region in the dystrophin gene major hot spot harbors a cluster of deletion breakpoints and generates double-strand breaks in yeast FASEB J, September 1, 2006; 20(11): 1910 - 1912. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bacolla, J. R. Collins, B. Gold, N. Chuzhanova, M. Yi, R. M. Stephens, S. Stefanov, A. Olsh, J. P. Jakupciak, M. Dean, et al. Long homopurine*homopyrimidine sequences are characteristic of genes expressed in brain and the pseudoautosomal region. Nucleic Acids Res., January 1, 2006; 34(9): 2663 - 2675. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Shmulevich, S. A. Kauffman, and M. Aldana Eukaryotic cells are dynamically ordered or critical but not chaotic PNAS, September 20, 2005; 102(38): 13439 - 13444. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bacolla, A. Jaworski, J. E. Larson, J. P. Jakupciak, N. Chuzhanova, S. S. Abeysinghe, C. D. O'Connell, D. N. Cooper, and R. D. Wells Breakpoints of gross deletions coincide with non-B DNA conformations PNAS, September 28, 2004; 101(39): 14162 - 14167. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. L. Orlov and V. N. Potapov Complexity: an internet resource for analysis of DNA sequence complexity Nucleic Acids Res., July 1, 2004; 32(suppl_2): W628 - W633. [Abstract] [Full Text] [PDF] |
||||




