Identification of functional elements in unaligned nucleic acid sequences by a novel tuple search algorithm
Institut Für Säugetiergenetik Germany
1Institut für Medizinische Informatik und Systemforschung, GSF-Forchungszentrum für Umwelt und Gesundheit GmbH Ingolstädter Landstraáe 1, D-85758 Oberschleiáheim, Germany
2 To whom correspondence should be addressed. E-mail:werner{at}gsf.de
We present an algorithm to identify potential functional elements like protein binding sites in DNA sequences, solely from nucleotide sequence data. Prerequisites are a set of at least seven not closely related sequences with a common biological function which is correlated to one or more unknown sequence elements present in most but not necessarily all of the sequences. The algorithm is based on a search for n-tuples which occur at least in a minimum percentage of the sequences with no or one mismatch, which may be at any position of the tuple. In contrast to functional tuples, random tuples show no preferred pattern of mismatch locations within the tuple nor is the conservation extended beyond the tuple. Both features of functional tuples are used to eliminate random tuples. Selection is carried out by maximization of the information content first for the n-tuple, then for a region containing the tuple and finally for the complete binding site. Further matches are found in an additional selection step, using the ConsInd method previously described. The algorithm is capable of identifying and delimiting elements (e.g. protein binding sites) represented by single short cores (e.g. TATA box) in sets of unaligned sequences of about 500 nucleotides using no information other than the nucleotide sequences. Further more, we show its ability to identify multiple elements in a set of complete LTR sequences (more than 600 nucleotides per sequence).
This article has been cited by other articles:
![]() |
H. Sammalkorpi, P. Alhopuro, R. Lehtonen, J. Tuimala, J.-P. Mecklin, H. J. Jarvinen, J. Jiricny, A. Karhu, and L. A. Aaltonen Background Mutation Frequency in Microsatellite-Unstable Colorectal Cancer Cancer Res., June 15, 2007; 67(12): 5691 - 5698. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-H. Peng, J.-T. Hsu, Y.-S. Chung, Y.-J. Lin, W.-Y. Chow, D. F. Hsu, and C. Y. Tang Identification of degenerate motifs using position restricted selection and hybrid ranking combination Nucleic Acids Res., December 2, 2006; 34(22): 6379 - 6391. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Cartharius, K. Frech, K. Grote, B. Klocke, M. Haltmeier, A. Klingenhoff, M. Frisch, M. Bayerlein, and T. Werner MatInspector and beyond: promoter analysis based on transcription factor binding sites Bioinformatics, July 1, 2005; 21(13): 2933 - 2942. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. I. Gershenzon, G. D. Stormo, and I. P. Ioshikhes Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites Nucleic Acids Res., April 22, 2005; 33(7): 2290 - 2301. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Döhr, A. Klingenhoff, H. Maier, M. H. de Angelis, T. Werner, and R. Schneider Linking disease-associated genes to regulatory networks via promoter organization Nucleic Acids Res., February 8, 2005; 33(3): 864 - 872. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. Varley, J. Stahlschmidt, W.-C. Lee, J. Holder, C. Diggle, P. J. Selby, L. K. Trejdosiewicz, and J. Southgate Role of PPAR {gamma} and EGFR signalling in the urothelial terminal differentiation programme J. Cell Sci., April 15, 2004; 117(10): 2029 - 2036. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rombauts, K. Florquin, M. Lescot, K. Marchal, P. Rouze, and Y. Van de Peer Computational Approaches to Identify Promoters and cis-Regulatory Elements in Plant Genomes Plant Physiology, July 1, 2003; 132(3): 1162 - 1176. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Wolff, R. Brack-Werner, M. Neumann, T. Werner, and R. Schneider Integrated functional and bioinformatics approach for the identification and experimental verification of RNA signals: application to HIV-1 INS Nucleic Acids Res., June 1, 2003; 31(11): 2839 - 2851. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. v. Helden, Alma. F. Rios, and J. Collado-Vides Discovering regulatory elements in non-coding sequences by analysis of spaced dyads Nucleic Acids Res., April 15, 2000; 28(8): 1808 - 1818. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. V. Ponomarenko, G. V. Orlova, M. P. Ponomarenko, S. V. Lavryushev, A. S. Frolov, S. V. Zybova, and N. A. Kolchanov SELEX_DB: an activated database on selected randomized DNA/RNA sequences addressed to genomic sequence annotation Nucleic Acids Res., January 1, 2000; 28(1): 205 - 208. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Brazma, I. Jonassen, J. Vilo, and E. Ukkonen Predicting Gene Regulatory Elements in Silico on a Genomic Scale Genome Res., November 1, 1998; 8(11): 1202 - 1215. [Abstract] [Full Text] |
||||





