Bioinformatics, Vol 15, 563-577, Copyright © 1999 by Oxford University Press
GZ Hertz and GD Stormo
MOTIVATION: Molecular biologists frequently can obtain interesting insight
by aligning a set of related DNA, RNA or protein sequences. Such alignments
can be used to determine either evolutionary or functional relationships.
Our interest is in identifying functional relationships. Unless the
sequences are very similar, it is necessary to have a specific strategy for
measuring-or scoring-the relatedness of the aligned sequences. If the
alignment is not known, one can be determined by finding an alignment that
optimizes the scoring scheme. RESULTS: We describe four components to our
approach for determining alignments of multiple sequences. First, we review
a log-likelihood scoring scheme we call information content. Second, we
describe two methods for estimating the P value of an individual
information content score: (i) a method that combines a technique from
large-deviation statistics with numerical calculations; (ii) a method that
is exclusively numerical. Third, we describe how we count the number of
possible alignments given the overall amount of sequence data. This count
is multiplied by the P value to determine the expected frequency of an
information content score and, thus, the statistical significance of the
corresponding alignment. Statistical significance can be used to compare
alignments having differing widths and containing differing numbers of
sequences. Fourth, we describe a greedy algorithm for determining
alignments of functionally related sequences. Finally, we test the accuracy
of our P value calculations, and give an example of using our algorithm to
identify binding sites for the Escherichia coli CRP protein. AVAILABILITY:
Programs were developed under the UNIX operating system and are available
by anonymous ftp from ftp://beagle.colorado.edu/pub/consensus.
ARTICLES
Identifying DNA and protein patterns with statistically significant alignments of multiple sequences
Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, CO 80309-0347, USA. hertz@colorado.edu
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R. S. Linheiro and C. M. Bergman Testing the palindromic target site model for DNA transposon insertion using the Drosophila melanogaster P-element Nucleic Acids Res., October 1, 2008; (2008) gkn563v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Ahola, T. Aittokallio, M. Vihinen, and E. Uusipaikka Model-based prediction of sequence alignment quality Bioinformatics, October 1, 2008; 24(19): 2165 - 2171. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Liu, X. Xu, and G. D. Stormo The cis-regulatory map of Shewanella genomes Nucleic Acids Res., September 1, 2008; 36(16): 5376 - 5390. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Thomas-Chollier, O. Sand, J.-V. Turatsinze, R. Janky, M. Defrance, E. Vervisch, S. Brohee, and J. van Helden RSAT: regulatory sequence analysis tools Nucleic Acids Res., July 1, 2008; 36(suppl_2): W119 - W127. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Roemer, J. Adelman, M. E. A. Churchill, and D. P. Edwards Mechanism of high-mobility group protein B enhancement of progesterone receptor sequence-specific DNA binding Nucleic Acids Res., June 1, 2008; 36(11): 3655 - 3666. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Hernandez-Lucas, A. L. Gallego-Hernandez, S. Encarnacion, M. Fernandez-Mora, A. G. Martinez-Batallar, H. Salgado, R. Oropeza, and E. Calva The LysR-Type Transcriptional Regulator LeuO Controls Expression of Several Genes in Salmonella enterica Serovar Typhi J. Bacteriol., March 1, 2008; 190(5): 1658 - 1670. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Nagarajan and U. Keich FAST: Fourier transform based algorithms for significance testing of ungapped multiple alignments Bioinformatics, February 15, 2008; 24(4): 577 - 578. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Gunewardena and Z. Zhang A hybrid model for robust detection of transcription factor binding sites Bioinformatics, February 15, 2008; 24(4): 484 - 491. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Cordero, M. Botta, and R. A. Calogero Microarray data analysis and mining approaches Brief Funct Genomic Proteomic, January 22, 2008; (2008) elm034v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Zeng, J. Yan, T. Wang, D. Mosbrook-Davis, K. T. Dolan, R. Christensen, G. D. Stormo, D. Haussler, R. H. Lathrop, R. K. Brachmann, et al. Genome wide screens in yeast to identify potential binding sites and target genes of DNA-binding proteins Nucleic Acids Res., January 17, 2008; 36(1): e8 - e8. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Shim and S. Keles Integrating quantitative information from ChIP-chip experiments into motif finding Biostat., January 1, 2008; 9(1): 51 - 65. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Schwartz, J. Silva, D. Burstein, T. Pupko, E. Eyras, and G. Ast Large-scale comparative analysis of splicing signals and their corresponding splicing factors in eukaryotes Genome Res., January 1, 2008; 18(1): 88 - 103. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.-C. Liang, X. Wang, and D. Anastassiou A profile-based deterministic sequential Monte Carlo algorithm for motif discovery Bioinformatics, January 1, 2008; 24(1): 46 - 55. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Isogai, S. Keles, M. Prestel, A. Hochheimer, and R. Tjian Transcription of histone gene cluster by differential core-promoter factors Genes & Dev., November 15, 2007; 21(22): 2936 - 2949. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. G Lemay, A. M Zivkovic, and J B. German Building the bridges to bioinformatics in nutrition research Am. J. Clinical Nutrition, November 1, 2007; 86(5): 1261 - 1269. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. R. Davies, L.-W. Chang, D. Patra, X. Xing, K. Posey, J. Hecht, G. D. Stormo, and L. J. Sandell Computational identification and functional validation of regulatory motifs in cartilage-expressed genes Genome Res., October 1, 2007; 17(10): 1438 - 1447. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-H. Bae, H. Tang, J. Wu, J. Xie, and S. Kim dPattern: transcription factor binding site (TFBS) discovery in human genome using a discriminative pattern analysis Bioinformatics, October 1, 2007; 23(19): 2619 - 2621. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. J. McAlister and R. A. Owens Preferential Integration of Adeno-Associated Virus Type 2 into a Polypyrimidine/Polypurine-Rich Region within AAVS1 J. Virol., September 15, 2007; 81(18): 9718 - 9726. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Tomovic and E. J. Oakeley Quality estimation of multiple sequence alignments by Bayesian hypothesis testing Bioinformatics, September 15, 2007; 23(18): 2488 - 2490. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. C. Hollenhorst, A. A. Shah, C. Hopkins, and B. J. Graves Genome-wide analyses reveal properties of redundant and specific promoter occupancy within the ETS gene family Genes & Dev., August 1, 2007; 21(15): 1882 - 1894. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mahony and P. V. Benos STAMP: a web tool for exploring DNA-binding motif similarities Nucleic Acids Res., July 13, 2007; 35(suppl_2): W253 - W258. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-K. Tsai, M.-Y. Chou, C.-H. Shih, G. T.-W. Huang, T.-H. Chang, and W.-H. Li MYBS: a comprehensive web server for mining transcription factor binding sites in yeast Nucleic Acids Res., July 13, 2007; 35(suppl_2): W221 - W226. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Wijaya, K. Rajaraman, S.-M. Yiu, and W.-K. Sung Detection of generic spaced motifs using submotif pattern mining Bioinformatics, June 15, 2007; 23(12): 1476 - 1485. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. X. Jin, H. O'Geen, S. Iyengar, R. Green, and P. J. Farnham Identification of an OCT4 and SRY regulatory module using integrated computational and experimental genomics approaches Genome Res., June 1, 2007; 17(6): 807 - 817. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Tian, T. Kasuga, M. S. Sachs, and N. L. Glass Transcriptional Profiling of Cross Pathway Control in Neurospora crassa and Comparative Analysis of the Gcn4 and CPC1 Regulons Eukaryot. Cell, June 1, 2007; 6(6): 1018 - 1029. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Li, Y. Liang, and R. L. Bass GAPWM: a genetic algorithm method for optimizing a position weight matrix Bioinformatics, May 15, 2007; 23(10): 1188 - 1194. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Zhao, L. A. Schriefer, and G. D. Stormo Identification of muscle-specific regulatory modules in Caenorhabditis elegans Genome Res., March 1, 2007; 17(3): 348 - 357. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Sandmann, C. Girardot, M. Brehme, W. Tongprasit, V. Stolc, and E. E.M. Furlong A core transcriptional network for early mesoderm development in Drosophila melanogaster Genes & Dev., February 15, 2007; 21(4): 436 - 449. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. G. Perez, V. E. Angarica, A. T. R. Vasconcelos, and J. Collado-Vides Tractor_DB (version 2.0): a database of regulatory interactions in gamma-proteobacterial genomes Nucleic Acids Res., January 12, 2007; 35(suppl_1): D132 - D136. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Galuschka, M. Schindler, L. Bulow, and R. Hehl AthaMap web tools for the analysis and identification of co-regulated genes Nucleic Acids Res., January 12, 2007; 35(suppl_1): D857 - D862. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-H. Peng, J.-T. Hsu, Y.-S. Chung, Y.-J. Lin, W.-Y. Chow, D. F. Hsu, and C. Y. Tang Identification of degenerate motifs using position restricted selection and hybrid ranking combination Nucleic Acids Res., December 2, 2006; 34(22): 6379 - 6391. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Girard, S. Barends, S. Rigali, E. T. van Rij, B. J. J. Lugtenberg, and G. V. Bloemberg Pip, a Novel Activator of Phenazine Biosynthesis in Pseudomonas chlororaphis PCL1391 J. Bacteriol., December 1, 2006; 188(23): 8283 - 8293. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Elnitski, V. X. Jin, P. J. Farnham, and S. J.M. Jones Locating mammalian transcription factor binding sites: A survey of computational and experimental techniques Genome Res., December 1, 2006; 16(12): 1455 - 1464. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Haberer, M. T. Mader, P. Kosarev, M. Spannagl, L. Yang, and K. F.X. Mayer Large-Scale cis-Element Detection by Analysis of Correlated Expression and Sequence Conservation between Arabidopsis and Brassica oleracea Plant Physiology, December 1, 2006; 142(4): 1589 - 1602. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Ohler Identification of core promoter modules in Drosophila and their application in accurate transcription start site prediction Nucleic Acids Res., November 6, 2006; 34(20): 5943 - 5950. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. M. Phuong, C. B. Do, R. C. Edgar, and S. Batzoglou Multiple alignment of protein sequences with repeats and rearrangements Nucleic Acids Res., November 6, 2006; 34(20): 5932 - 5942. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. D. S. Cameron and R. J. Redfield Non-canonical CRP sites control competence regulons in Escherichia coli and many other {gamma}-proteobacteria Nucleic Acids Res., November 6, 2006; 34(20): 6001 - 6014. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Kobayashi, E. Takahashi, S.-i. Miyagawa, H. Watanabe, and T. Iguchi Chromatin immunoprecipitation-mediated target identification proved aquaporin 5 is regulated directly by estrogen in the uterus. Genes Cells, October 1, 2006; 11(10): 1133 - 1143. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Sheth, X. Roca, M. L. Hastings, T. Roeder, A. R. Krainer, and R. Sachidanandam Comprehensive splice-site analysis using comparative genomics Nucleic Acids Res., September 1, 2006; 34(14): 3955 - 3967. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Janga, W. F. Lamboy, A. M. Huerta, and G. Moreno-Hagelsieb The distinctive signatures of promoter regions and operon junctions across prokaryotes Nucleic Acids Res., September 1, 2006; 34(14): 3980 - 3987. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Michaloski, P. A.F. Galante, and B. Malnic Identification of potential regulatory motifs in odorant receptor genes by analysis of promoter sequences Genome Res., September 1, 2006; 16(9): 1091 - 1098. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-G. Joung, D. Shin, R. H. Seong, and B.-T. Zhang Identification of regulatory modules by co-clustering latent variable models: stem cell differentiation Bioinformatics, August 15, 2006; 22(16): 2005 - 2011. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. GuhaThakurta Computational identification of transcriptional regulatory elements in DNA sequence Nucleic Acids Res., July 19, 2006; 34(12): 3585 - 3598. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. R. Kulkarni, X. Cui, J. W. Williams, A. M. Stevens, and R. V. Kulkarni Prediction of CsrA-regulating small RNAs in bacteria and their experimental verification in Vibrio fischeri Nucleic Acids Res., July 5, 2006; 34(11): 3361 - 3369. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. G. Lemay and D. H. Hwang Genome-wide identification of peroxisome proliferator response elements using integrated computational genomics J. Lipid Res., July 1, 2006; 47(7): 1583 - 1587. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Kankainen, P. Pehkonen, P. Rosenstom, P. Toronen, G. Wong, and L. Holm POXO: a web-enabled tool series to discover transcription factor binding sites. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W534 - W540. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Wei and S. T. Jensen GAME: detecting cis-regulatory elements using a genetic algorithm Bioinformatics, July 1, 2006; 22(13): 1577 - 1584. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. S Hon and A. N Jain A deterministic motif finding algorithm with application to the human genome Bioinformatics, May 1, 2006; 22(9): 1047 - 1054. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Huerta, J. Collado-Vides, and M. P. Francino Positional Conservation of Clusters of Overlapping Promoter-Like Sequences in Enterobacterial Genomes Mol. Biol. Evol., May 1, 2006; 23(5): 997 - 1010. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. D. Smith, P. Sumazin, Z. Xuan, and M. Q. Zhang DNA motifs in human and mouse proximal promoters predict tissue-specific expression PNAS, April 18, 2006; 103(16): 6275 - 6280. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Ramirez-Romero, I. Masulis, M. A. Cevallos, V. Gonzalez, and G. Davila The Rhizobium etli {sigma}70 (SigA) factor recognizes a lax consensus promoter Nucleic Acids Res., March 9, 2006; 34(5): 1470 - 1480. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Yu. Mitrophanov and M. Borodovsky Statistical significance in biological sequence analysis Brief Bioinform, March 1, 2006; 7(1): 2 - 24. |
||||
![]() |
I. Abnizova and W. R. Gilks Studying statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the eukaryotic genomes Brief Bioinform, March 1, 2006; 7(1): 48 - 54. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. D. MacIsaac, D. B. Gordon, L. Nekludova, D. T. Odom, J. Schreiber, D. K. Gifford, R. A. Young, and E. Fraenkel A hypothesis-based approach for identifying the binding specificity of regulatory proteins from chromatin immunoprecipitation data Bioinformatics, February 15, 2006; 22(4): 423 - 429. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Narlikar and A. J. Hartemink Sequence features of DNA binding sites reveal structural class of associated transcription factor Bioinformatics, January 15, 2006; 22(2): 157 - 163. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z.-C. Yuan, R. Zaheer, R. Morton, and T. M. Finan Genome prediction of PhoB regulated promoters in Sinorhizobium meliloti and twelve proteobacteria. Nucleic Acids Res., January 1, 2006; 34(9): 2686 - 2697. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Robertson, M. Bilenky, K. Lin, A. He, W. Yuen, M. Dagpinar, R. Varhol, K. Teague, O. L. Griffith, X. Zhang, et al. cisRED: a database system for genome-scale computational discovery of regulatory elements Nucleic Acids Res., January 1, 2006; 34(suppl_1): D68 - D73. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Jensen, M. P. Styczynski, I. Rigoutsos, and G. N. Stephanopoulos A generic motif discovery algorithm for sequential data Bioinformatics, January 1, 2006; 22(1): 21 - 28. [Abstract] [Full Text] [PDF] |
||||
![]() |















