Bioinformatics Advance Access originally published online on May 19, 2005
Bioinformatics 2005 21(14):3164-3165; doi:10.1093/bioinformatics/bti481
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TAMO: a flexible, object-oriented framework for analyzing transcriptional regulation using DNA-sequence motifs
1Whitehead Institute for Biomedical Research, Nine Cambridge Center Cambridge, MA 02142, USA
2MIT Computer Science and Artificial Intelligence Laboratory 32 Vassar Street, Cambridge, MA 02139, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: TAMO (Tools for Analysis of MOtifs) is an object-oriented computational framework for interpreting transcriptional regulation using DNA-sequence motifs. To simplify the application of multiple motif discovery programs to genome-wide data, TAMO provides a sophisticated motif object with interfaces to several popular programs. In addition, TAMO provides modules for integrating motif analysis with diverse data sources including genomic sequences, microarrays and various databases. Finally, TAMO includes tools for sequence analysis, algorithms for scoring, comparing and clustering motifs, and several useful statistical tests. Recently, we have applied these tools to analyze tens of thousands of motifs derived from hundreds of microarray experiments.
Availability: TAMO is a Python/C++ package and requires Python 2.3 or higher. Source code and documentation are available at http://web.wi.mit.edu/fraenkel/TAMO/
Contact: efraenkel{at}wi.mit.edu
| 1 INTRODUCTION |
|---|
|
|
|---|
Motif discovery from genome-wide data is improved by using multiple programs in concert (Harbison et al., 2004; Tompa et al., 2005), but the management of such calculations is difficult. The purpose of the TAMO package is to provide tools that facilitate combining motif information from different motif-finding programs and interpreting motif data using biological databases. In contrast to other packages, TAMO provides an integrated motif analysis framework in a high-level language; however, it is built on fast C++ routines.
The TAMO package includes (1) command-line programs for motif discovery, scoring and analysis, and (2) source code for use by programmers to develop new analysis tools using the python language. The package provides both a simple interface and a wealth of supporting object definitions and lower-level functions for operating on motifs, microarrays, sequences and other bioinformatic data sources. These routines are supported by functions for manipulating genome sequence data and by a library of statistical tests. Finally, TAMO includes a utility that automatically downloads data from sources such as SGD and UCSC (Cherry et al., 1997; Karolchik et al., 2003).
| 2 EXAMPLE USAGE |
|---|
|
|
|---|
Table 1 illustrates how the TAMO framework can be used to unify diverse algorithms and data sources. In this example, we use three programs to search for motifs among the promoters of a set of genes in yeast. We then identify and display the best-scoring motif, and test whether promoters containing it are significantly associated with any categories in the GO database (Ashburner et al., 2000).
|
In the example, AlignACE (Hughes et al., 2000) is run 10 times with different random number seeds and MDscan (Liu et al., 2002) is run repeatedly with different motif widths. Finally, MEME (Bailey and Elkan, 1995) is applied to the data. Next, the sample code computes the group specificity score of AlignACE (called church by TAMO) for the motifs found by all three programs, identifies the motif with the best score and displays it in various ways. To help interpret the biological meaning of the motif, the sample code identifies promoters containing sub-sequences that match the motif and searches the corresponding list of ORFs for statistically over-represented GO categories.
| 3 PACKAGE FEATURE OVERVIEW |
|---|
|
|
|---|
Motifs and motif discovery. TAMO is developed around a unified motif representation of a position-specific scoring matrix (PSSM) that can be assembled from many sources. The package also contains interfaces to publicly available motif discovery programs as well as its own internal motif discovery programs.
External data sources. Several TAMO modules provide access to public repositories of genomic information. For example, there are modules to provide access to SGD feature maps and functions for translating between different types of feature identifiers (e.g. gene name to Swiss-Prot ID, etc.) The GO module uses gene annotations from GO-slim and has facilities for finding functional categories that are statistically over-represented within a set of genes. TAMO also provides interfaces to the human Gene Atlas (Su et al., 2004), to yeast transcription rates (Holstege et al., 1998) and to other genome-wide data. Finally, TAMO provides fast, random-access interfaces to human and yeast (Saccaromyces cerevisiae) genomesequences.
Motif scoring, comparison and clustering. TAMO includes metrics for evaluating motif quality including the group specificity score (Hughes et al., 2000), the enrichment score (Harbison et al., 2004), the ROC AUC metric (Clarke and Granek, 2003) and several others. Functions are provided for finding the optimal alignment of two motifs and for quantitatively reporting their similarity (or divergence) with several choices of distance metrics. TAMO also includes implementations of the k-medoids and UPGMA algorithms for clustering motifs.
Sequence and microarray data. TAMO has fast routines for reading, writing, manipulating and using motifs to scan large collections of sequences. A general-purpose dataset object stores collections of microarray experiments and provides methods to quickly extract sets of genes or experiments that satisfy user-supplied P-value or ratio thresholds.
Statistics. TAMO includes a set of useful statistical routines for computing P-values for normal, binomial, Poisson and hypergeometric distributions. The ShapiroWilk normality test and the WilcoxonMannWhitney rank sum test are also provided.
| Acknowledgments |
|---|
E.F. is a Whitehead Fellow and was funded in part by Pfizer. D.B.G. was supported by NIH/NIGMS NRSA award GM068278.
Received on February 18, 2005; revised on April 14, 2005; accepted on April 30, 2005
| REFERENCES |
|---|
|
|
|---|
Ashburner, M., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet., 25, 2529[CrossRef][Web of Science][Medline].
Bailey, T.L. and Elkan, C. (1995) The value of prior knowledge in discovering motifs with MEME. Proc. Int. Conf. Intell. Syst. Mol. Biol., 3, 2129[Medline].
Cherry, J.M., et al. (1997) Genetic and physical maps of Saccharomyces cerevisiae. Nature, 387, 6773[CrossRef][Medline].
Clarke, N.D. and Granek, J.A. (2003) Rank order metrics for quantifying the association of sequence features with gene regulation. Bioinformatics, 19, 212218
Harbison, C.T., et al. (2004) Transcriptional regulatory code of a eukaryotic genome. Nature, 431, 99104[CrossRef][Medline].
Holstege, F.C., et al. (1998) Dissecting the regulatory circuitry of a eukaryotic genome. Cell, 95, 717728[CrossRef][Web of Science][Medline].
Hughes, J.D., et al. (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol., 296, 12051214[CrossRef][Web of Science][Medline].
Karolchik, D., et al. (2003) The UCSC Genome Browser Database. Nucleic Acids Res., 31, 5154
Liu, X.S., et al. (2002) An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat. Biotechnol., 20, 835839[Web of Science][Medline].
Su, A.I., et al. (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA, 101, 60626067
Tompa, M., et al. (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol., 23, 137144[CrossRef][Web of Science][Medline].
This article has been cited by other articles:
![]() |
E.-J. Blom, J. B. T. M. Roerdink, O. P. Kuipers, and S. A. F. T. van Hijum MOTIFATOR: detection and characterization of regulatory motifs using prokaryote transcriptome data Bioinformatics, February 15, 2009; 25(4): 550 - 551. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Coppe, F. Ferrari, A. Bisognin, G. A. Danieli, S. Ferrari, S. Bicciato, and S. Bortoluzzi Motif discovery in promoters of genes co-localized and co-expressed during myeloid cells differentiation Nucleic Acids Res., February 1, 2009; 37(2): 533 - 549. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Phan and N. A. Furlotte Motif Tool Manager: a web-based framework for motif discovery Bioinformatics, December 15, 2008; 24(24): 2930 - 2931. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Wijaya, S.-M. Yiu, N. T. Son, R. Kanagasabai, and W.-K. Sung MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders Bioinformatics, October 15, 2008; 24(20): 2288 - 2295. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Smeenk, S. J. van Heeringen, M. Koeppel, M. A. van Driel, S. J. J. Bartels, R. C. Akkers, S. Denissov, H. G. Stunnenberg, and M. Lohrum Characterization of genome-wide p53-binding sites upon stress response Nucleic Acids Res., June 1, 2008; 36(11): 3639 - 3654. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Yuan and K.-C. Li Context-dependent clustering for dynamic cellular state modeling of microarray gene expression Bioinformatics, November 15, 2007; 23(22): 3039 - 3047. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Romer, G.-R. Kayombya, and E. Fraenkel WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches Nucleic Acids Res., July 13, 2007; 35(suppl_2): W217 - W220. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. D. MacIsaac, D. B. Gordon, L. Nekludova, D. T. Odom, J. Schreiber, D. K. Gifford, R. A. Young, and E. Fraenkel A hypothesis-based approach for identifying the binding specificity of regulatory proteins from chromatin immunoprecipitation data Bioinformatics, February 15, 2006; 22(4): 423 - 429. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

