Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (322)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Hertz, G. Z.
Right arrow Articles by Stormo, G. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hertz, G. Z.
Right arrow Articles by Stormo, G. D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics, Vol 15, 563-577, Copyright © 1999 by Oxford University Press


ARTICLES

Identifying DNA and protein patterns with statistically significant alignments of multiple sequences

GZ Hertz and GD Stormo
Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, CO 80309-0347, USA. hertz@colorado.edu

MOTIVATION: Molecular biologists frequently can obtain interesting insight by aligning a set of related DNA, RNA or protein sequences. Such alignments can be used to determine either evolutionary or functional relationships. Our interest is in identifying functional relationships. Unless the sequences are very similar, it is necessary to have a specific strategy for measuring-or scoring-the relatedness of the aligned sequences. If the alignment is not known, one can be determined by finding an alignment that optimizes the scoring scheme. RESULTS: We describe four components to our approach for determining alignments of multiple sequences. First, we review a log-likelihood scoring scheme we call information content. Second, we describe two methods for estimating the P value of an individual information content score: (i) a method that combines a technique from large-deviation statistics with numerical calculations; (ii) a method that is exclusively numerical. Third, we describe how we count the number of possible alignments given the overall amount of sequence data. This count is multiplied by the P value to determine the expected frequency of an information content score and, thus, the statistical significance of the corresponding alignment. Statistical significance can be used to compare alignments having differing widths and containing differing numbers of sequences. Fourth, we describe a greedy algorithm for determining alignments of functionally related sequences. Finally, we test the accuracy of our P value calculations, and give an example of using our algorithm to identify binding sites for the Escherichia coli CRP protein. AVAILABILITY: Programs were developed under the UNIX operating system and are available by anonymous ftp from ftp://beagle.colorado.edu/pub/consensus.
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
R. S. Linheiro and C. M. Bergman
Testing the palindromic target site model for DNA transposon insertion using the Drosophila melanogaster P-element
Nucleic Acids Res., October 1, 2008; (2008) gkn563v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
V. Ahola, T. Aittokallio, M. Vihinen, and E. Uusipaikka
Model-based prediction of sequence alignment quality
Bioinformatics, October 1, 2008; 24(19): 2165 - 2171.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Liu, X. Xu, and G. D. Stormo
The cis-regulatory map of Shewanella genomes
Nucleic Acids Res., September 1, 2008; 36(16): 5376 - 5390.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Thomas-Chollier, O. Sand, J.-V. Turatsinze, R. Janky, M. Defrance, E. Vervisch, S. Brohee, and J. van Helden
RSAT: regulatory sequence analysis tools
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W119 - W127.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. C. Roemer, J. Adelman, M. E. A. Churchill, and D. P. Edwards
Mechanism of high-mobility group protein B enhancement of progesterone receptor sequence-specific DNA binding
Nucleic Acids Res., June 1, 2008; 36(11): 3655 - 3666.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
I. Hernandez-Lucas, A. L. Gallego-Hernandez, S. Encarnacion, M. Fernandez-Mora, A. G. Martinez-Batallar, H. Salgado, R. Oropeza, and E. Calva
The LysR-Type Transcriptional Regulator LeuO Controls Expression of Several Genes in Salmonella enterica Serovar Typhi
J. Bacteriol., March 1, 2008; 190(5): 1658 - 1670.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
N. Nagarajan and U. Keich
FAST: Fourier transform based algorithms for significance testing of ungapped multiple alignments
Bioinformatics, February 15, 2008; 24(4): 577 - 578.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Gunewardena and Z. Zhang
A hybrid model for robust detection of transcription factor binding sites
Bioinformatics, February 15, 2008; 24(4): 484 - 491.
[Abstract] [Full Text] [PDF]


Home page
Brief Funct Genomic ProteomicHome page
F. Cordero, M. Botta, and R. A. Calogero
Microarray data analysis and mining approaches
Brief Funct Genomic Proteomic, January 22, 2008; (2008) elm034v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Zeng, J. Yan, T. Wang, D. Mosbrook-Davis, K. T. Dolan, R. Christensen, G. D. Stormo, D. Haussler, R. H. Lathrop, R. K. Brachmann, et al.
Genome wide screens in yeast to identify potential binding sites and target genes of DNA-binding proteins
Nucleic Acids Res., January 17, 2008; 36(1): e8 - e8.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
H. Shim and S. Keles
Integrating quantitative information from ChIP-chip experiments into motif finding
Biostat., January 1, 2008; 9(1): 51 - 65.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
S. Schwartz, J. Silva, D. Burstein, T. Pupko, E. Eyras, and G. Ast
Large-scale comparative analysis of splicing signals and their corresponding splicing factors in eukaryotes
Genome Res., January 1, 2008; 18(1): 88 - 103.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K.-C. Liang, X. Wang, and D. Anastassiou
A profile-based deterministic sequential Monte Carlo algorithm for motif discovery
Bioinformatics, January 1, 2008; 24(1): 46 - 55.
[Abstract] [Full Text] [PDF]


Home page
Genes Dev.Home page
Y. Isogai, S. Keles, M. Prestel, A. Hochheimer, and R. Tjian
Transcription of histone gene cluster by differential core-promoter factors
Genes & Dev., November 15, 2007; 21(22): 2936 - 2949.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Clin. Nutr.Home page
D. G Lemay, A. M Zivkovic, and J B. German
Building the bridges to bioinformatics in nutrition research
Am. J. Clinical Nutrition, November 1, 2007; 86(5): 1261 - 1269.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
S. R. Davies, L.-W. Chang, D. Patra, X. Xing, K. Posey, J. Hecht, G. D. Stormo, and L. J. Sandell
Computational identification and functional validation of regulatory motifs in cartilage-expressed genes
Genome Res., October 1, 2007; 17(10): 1438 - 1447.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S.-H. Bae, H. Tang, J. Wu, J. Xie, and S. Kim
dPattern: transcription factor binding site (TFBS) discovery in human genome using a discriminative pattern analysis
Bioinformatics, October 1, 2007; 23(19): 2619 - 2621.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
V. J. McAlister and R. A. Owens
Preferential Integration of Adeno-Associated Virus Type 2 into a Polypyrimidine/Polypurine-Rich Region within AAVS1
J. Virol., September 15, 2007; 81(18): 9718 - 9726.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Tomovic and E. J. Oakeley
Quality estimation of multiple sequence alignments by Bayesian hypothesis testing
Bioinformatics, September 15, 2007; 23(18): 2488 - 2490.
[Abstract] [Full Text] [PDF]


Home page
Genes Dev.Home page
P. C. Hollenhorst, A. A. Shah, C. Hopkins, and B. J. Graves
Genome-wide analyses reveal properties of redundant and specific promoter occupancy within the ETS gene family
Genes & Dev., August 1, 2007; 21(15): 1882 - 1894.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Mahony and P. V. Benos
STAMP: a web tool for exploring DNA-binding motif similarities
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W253 - W258.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H.-K. Tsai, M.-Y. Chou, C.-H. Shih, G. T.-W. Huang, T.-H. Chang, and W.-H. Li
MYBS: a comprehensive web server for mining transcription factor binding sites in yeast
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W221 - W226.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
E. Wijaya, K. Rajaraman, S.-M. Yiu, and W.-K. Sung
Detection of generic spaced motifs using submotif pattern mining
Bioinformatics, June 15, 2007; 23(12): 1476 - 1485.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
V. X. Jin, H. O'Geen, S. Iyengar, R. Green, and P. J. Farnham
Identification of an OCT4 and SRY regulatory module using integrated computational and experimental genomics approaches
Genome Res., June 1, 2007; 17(6): 807 - 817.
[Abstract] [Full Text] [PDF]


Home page
Eukaryot CellHome page
C. Tian, T. Kasuga, M. S. Sachs, and N. L. Glass
Transcriptional Profiling of Cross Pathway Control in Neurospora crassa and Comparative Analysis of the Gcn4 and CPC1 Regulons
Eukaryot. Cell, June 1, 2007; 6(6): 1018 - 1029.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. Li, Y. Liang, and R. L. Bass
GAPWM: a genetic algorithm method for optimizing a position weight matrix
Bioinformatics, May 15, 2007; 23(10): 1188 - 1194.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
G. Zhao, L. A. Schriefer, and G. D. Stormo
Identification of muscle-specific regulatory modules in Caenorhabditis elegans
Genome Res., March 1, 2007; 17(3): 348 - 357.
[Abstract] [Full Text] [PDF]


Home page
Genes Dev.Home page
T. Sandmann, C. Girardot, M. Brehme, W. Tongprasit, V. Stolc, and E. E.M. Furlong
A core transcriptional network for early mesoderm development in Drosophila melanogaster
Genes & Dev., February 15, 2007; 21(4): 436 - 449.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. G. Perez, V. E. Angarica, A. T. R. Vasconcelos, and J. Collado-Vides
Tractor_DB (version 2.0): a database of regulatory interactions in gamma-proteobacterial genomes
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D132 - D136.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Galuschka, M. Schindler, L. Bulow, and R. Hehl
AthaMap web tools for the analysis and identification of co-regulated genes
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D857 - D862.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C.-H. Peng, J.-T. Hsu, Y.-S. Chung, Y.-J. Lin, W.-Y. Chow, D. F. Hsu, and C. Y. Tang
Identification of degenerate motifs using position restricted selection and hybrid ranking combination
Nucleic Acids Res., December 2, 2006; 34(22): 6379 - 6391.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
G. Girard, S. Barends, S. Rigali, E. T. van Rij, B. J. J. Lugtenberg, and G. V. Bloemberg
Pip, a Novel Activator of Phenazine Biosynthesis in Pseudomonas chlororaphis PCL1391
J. Bacteriol., December 1, 2006; 188(23): 8283 - 8293.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
L. Elnitski, V. X. Jin, P. J. Farnham, and S. J.M. Jones
Locating mammalian transcription factor binding sites: A survey of computational and experimental techniques
Genome Res., December 1, 2006; 16(12): 1455 - 1464.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
G. Haberer, M. T. Mader, P. Kosarev, M. Spannagl, L. Yang, and K. F.X. Mayer
Large-Scale cis-Element Detection by Analysis of Correlated Expression and Sequence Conservation between Arabidopsis and Brassica oleracea
Plant Physiology, December 1, 2006; 142(4): 1589 - 1602.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
U. Ohler
Identification of core promoter modules in Drosophila and their application in accurate transcription start site prediction
Nucleic Acids Res., November 6, 2006; 34(20): 5943 - 5950.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. M. Phuong, C. B. Do, R. C. Edgar, and S. Batzoglou
Multiple alignment of protein sequences with repeats and rearrangements
Nucleic Acids Res., November 6, 2006; 34(20): 5932 - 5942.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. D. S. Cameron and R. J. Redfield
Non-canonical CRP sites control competence regulons in Escherichia coli and many other {gamma}-proteobacteria
Nucleic Acids Res., November 6, 2006; 34(20): 6001 - 6014.
[Abstract] [Full Text] [PDF]


Home page
GENES CELLSHome page
M. Kobayashi, E. Takahashi, S.-i. Miyagawa, H. Watanabe, and T. Iguchi
Chromatin immunoprecipitation-mediated target identification proved aquaporin 5 is regulated directly by estrogen in the uterus.
Genes Cells, October 1, 2006; 11(10): 1133 - 1143.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. Sheth, X. Roca, M. L. Hastings, T. Roeder, A. R. Krainer, and R. Sachidanandam
Comprehensive splice-site analysis using comparative genomics
Nucleic Acids Res., September 1, 2006; 34(14): 3955 - 3967.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. C. Janga, W. F. Lamboy, A. M. Huerta, and G. Moreno-Hagelsieb
The distinctive signatures of promoter regions and operon junctions across prokaryotes
Nucleic Acids Res., September 1, 2006; 34(14): 3980 - 3987.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. S. Michaloski, P. A.F. Galante, and B. Malnic
Identification of potential regulatory motifs in odorant receptor genes by analysis of promoter sequences
Genome Res., September 1, 2006; 16(9): 1091 - 1098.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J.-G. Joung, D. Shin, R. H. Seong, and B.-T. Zhang
Identification of regulatory modules by co-clustering latent variable models: stem cell differentiation
Bioinformatics, August 15, 2006; 22(16): 2005 - 2011.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. GuhaThakurta
Computational identification of transcriptional regulatory elements in DNA sequence
Nucleic Acids Res., July 19, 2006; 34(12): 3585 - 3598.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. R. Kulkarni, X. Cui, J. W. Williams, A. M. Stevens, and R. V. Kulkarni
Prediction of CsrA-regulating small RNAs in bacteria and their experimental verification in Vibrio fischeri
Nucleic Acids Res., July 5, 2006; 34(11): 3361 - 3369.
[Abstract] [Full Text] [PDF]


Home page
J. Lipid Res.Home page
D. G. Lemay and D. H. Hwang
Genome-wide identification of peroxisome proliferator response elements using integrated computational genomics
J. Lipid Res., July 1, 2006; 47(7): 1583 - 1587.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Kankainen, P. Pehkonen, P. Rosenstom, P. Toronen, G. Wong, and L. Holm
POXO: a web-enabled tool series to discover transcription factor binding sites.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W534 - W540.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z. Wei and S. T. Jensen
GAME: detecting cis-regulatory elements using a genetic algorithm
Bioinformatics, July 1, 2006; 22(13): 1577 - 1584.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. S Hon and A. N Jain
A deterministic motif finding algorithm with application to the human genome
Bioinformatics, May 1, 2006; 22(9): 1047 - 1054.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. M. Huerta, J. Collado-Vides, and M. P. Francino
Positional Conservation of Clusters of Overlapping Promoter-Like Sequences in Enterobacterial Genomes
Mol. Biol. Evol., May 1, 2006; 23(5): 997 - 1010.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
A. D. Smith, P. Sumazin, Z. Xuan, and M. Q. Zhang
DNA motifs in human and mouse proximal promoters predict tissue-specific expression
PNAS, April 18, 2006; 103(16): 6275 - 6280.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. A. Ramirez-Romero, I. Masulis, M. A. Cevallos, V. Gonzalez, and G. Davila
The Rhizobium etli {sigma}70 (SigA) factor recognizes a lax consensus promoter
Nucleic Acids Res., March 9, 2006; 34(5): 1470 - 1480.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
A. Yu. Mitrophanov and M. Borodovsky
Statistical significance in biological sequence analysis
Brief Bioinform, March 1, 2006; 7(1): 2 - 24.



Home page
Brief BioinformHome page
I. Abnizova and W. R. Gilks
Studying statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the eukaryotic genomes
Brief Bioinform, March 1, 2006; 7(1): 48 - 54.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. D. MacIsaac, D. B. Gordon, L. Nekludova, D. T. Odom, J. Schreiber, D. K. Gifford, R. A. Young, and E. Fraenkel
A hypothesis-based approach for identifying the binding specificity of regulatory proteins from chromatin immunoprecipitation data
Bioinformatics, February 15, 2006; 22(4): 423 - 429.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. Narlikar and A. J. Hartemink
Sequence features of DNA binding sites reveal structural class of associated transcription factor
Bioinformatics, January 15, 2006; 22(2): 157 - 163.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Z.-C. Yuan, R. Zaheer, R. Morton, and T. M. Finan
Genome prediction of PhoB regulated promoters in Sinorhizobium meliloti and twelve proteobacteria.
Nucleic Acids Res., January 1, 2006; 34(9): 2686 - 2697.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Robertson, M. Bilenky, K. Lin, A. He, W. Yuen, M. Dagpinar, R. Varhol, K. Teague, O. L. Griffith, X. Zhang, et al.
cisRED: a database system for genome-scale computational discovery of regulatory elements
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D68 - D73.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. L. Jensen, M. P. Styczynski, I. Rigoutsos, and G. N. Stephanopoulos
A generic motif discovery algorithm for sequential data
Bioinformatics, January 1, 2006; 22(1): 21 - 28.
[Abstract] [Full Text] [PDF]


Home page