Skip Navigation


Bioinformatics Advance Access originally published online on May 19, 2005
Bioinformatics 2005 21(14):3164-3165; doi:10.1093/bioinformatics/bti481
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/14/3164    most recent
bti481v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (19)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gordon, D. B.
Right arrow Articles by Fraenkel, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gordon, D. B.
Right arrow Articles by Fraenkel, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

TAMO: a flexible, object-oriented framework for analyzing transcriptional regulation using DNA-sequence motifs

D. Benjamin Gordon 1, Lena Nekludova 1, Scott McCallum 1 and Ernest Fraenkel 1,2,*

1Whitehead Institute for Biomedical Research, Nine Cambridge Center Cambridge, MA 02142, USA
2MIT Computer Science and Artificial Intelligence Laboratory 32 Vassar Street, Cambridge, MA 02139, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 EXAMPLE USAGE
 3 PACKAGE FEATURE OVERVIEW
 REFERENCES
 

Summary: TAMO (Tools for Analysis of MOtifs) is an object-oriented computational framework for interpreting transcriptional regulation using DNA-sequence motifs. To simplify the application of multiple motif discovery programs to genome-wide data, TAMO provides a sophisticated motif object with interfaces to several popular programs. In addition, TAMO provides modules for integrating motif analysis with diverse data sources including genomic sequences, microarrays and various databases. Finally, TAMO includes tools for sequence analysis, algorithms for scoring, comparing and clustering motifs, and several useful statistical tests. Recently, we have applied these tools to analyze tens of thousands of motifs derived from hundreds of microarray experiments.

Availability: TAMO is a Python/C++ package and requires Python 2.3 or higher. Source code and documentation are available at http://web.wi.mit.edu/fraenkel/TAMO/

Contact: efraenkel{at}wi.mit.edu


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 EXAMPLE USAGE
 3 PACKAGE FEATURE OVERVIEW
 REFERENCES
 
Motif discovery from genome-wide data is improved by using multiple programs in concert (Harbison et al., 2004; Tompa et al., 2005), but the management of such calculations is difficult. The purpose of the TAMO package is to provide tools that facilitate combining motif information from different motif-finding programs and interpreting motif data using biological databases. In contrast to other packages, TAMO provides an integrated motif analysis framework in a high-level language; however, it is built on fast C++ routines.

The TAMO package includes (1) command-line programs for motif discovery, scoring and analysis, and (2) source code for use by programmers to develop new analysis tools using the python language. The package provides both a simple interface and a wealth of supporting object definitions and lower-level functions for operating on motifs, microarrays, sequences and other bioinformatic data sources. These routines are supported by functions for manipulating genome sequence data and by a library of statistical tests. Finally, TAMO includes a utility that automatically downloads data from sources such as SGD and UCSC (Cherry et al., 1997; Karolchik et al., 2003).


    2 EXAMPLE USAGE
 TOP
 Abstract
 1 INTRODUCTION
 2 EXAMPLE USAGE
 3 PACKAGE FEATURE OVERVIEW
 REFERENCES
 
Table 1 illustrates how the TAMO framework can be used to unify diverse algorithms and data sources. In this example, we use three programs to search for motifs among the promoters of a set of genes in yeast. We then identify and display the best-scoring motif, and test whether promoters containing it are significantly associated with any categories in the GO database (Ashburner et al., 2000).


View this table:
[in this window]
[in a new window]
 
Table 1 Sample python TAMO code

 
In the example, AlignACE (Hughes et al., 2000) is run 10 times with different random number seeds and MDscan (Liu et al., 2002) is run repeatedly with different motif widths. Finally, MEME (Bailey and Elkan, 1995) is applied to the data. Next, the sample code computes the ‘group specificity score’ of AlignACE (called ‘church’ by TAMO) for the motifs found by all three programs, identifies the motif with the best score and displays it in various ways. To help interpret the biological meaning of the motif, the sample code identifies promoters containing sub-sequences that match the motif and searches the corresponding list of ORFs for statistically over-represented GO categories.


    3 PACKAGE FEATURE OVERVIEW
 TOP
 Abstract
 1 INTRODUCTION
 2 EXAMPLE USAGE
 3 PACKAGE FEATURE OVERVIEW
 REFERENCES
 
Motifs and motif discovery. TAMO is developed around a unified motif representation of a position-specific scoring matrix (PSSM) that can be assembled from many sources. The package also contains interfaces to publicly available motif discovery programs as well as its own internal motif discovery programs.

External data sources. Several TAMO modules provide access to public repositories of genomic information. For example, there are modules to provide access to SGD feature maps and functions for translating between different types of feature identifiers (e.g. gene name to Swiss-Prot ID, etc.) The GO module uses gene annotations from GO-slim and has facilities for finding functional categories that are statistically over-represented within a set of genes. TAMO also provides interfaces to the human Gene Atlas (Su et al., 2004), to yeast transcription rates (Holstege et al., 1998) and to other genome-wide data. Finally, TAMO provides fast, random-access interfaces to human and yeast (Saccaromyces cerevisiae) genomesequences.

Motif scoring, comparison and clustering. TAMO includes metrics for evaluating motif quality including the ‘group specificity score’ (Hughes et al., 2000), the enrichment score (Harbison et al., 2004), the ROC AUC metric (Clarke and Granek, 2003) and several others. Functions are provided for finding the optimal alignment of two motifs and for quantitatively reporting their similarity (or divergence) with several choices of distance metrics. TAMO also includes implementations of the k-medoids and UPGMA algorithms for clustering motifs.

Sequence and microarray data. TAMO has fast routines for reading, writing, manipulating and using motifs to scan large collections of sequences. A general-purpose ‘dataset’ object stores collections of microarray experiments and provides methods to quickly extract sets of genes or experiments that satisfy user-supplied P-value or ratio thresholds.

Statistics. TAMO includes a set of useful statistical routines for computing P-values for normal, binomial, Poisson and hypergeometric distributions. The Shapiro–Wilk normality test and the Wilcoxon–Mann–Whitney rank sum test are also provided.


    Acknowledgments
 
E.F. is a Whitehead Fellow and was funded in part by Pfizer. D.B.G. was supported by NIH/NIGMS NRSA award GM068278.

Received on February 18, 2005; revised on April 14, 2005; accepted on April 30, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 EXAMPLE USAGE
 3 PACKAGE FEATURE OVERVIEW
 REFERENCES
 

    Ashburner, M., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet., 25, 25–29[CrossRef][Web of Science][Medline].

    Bailey, T.L. and Elkan, C. (1995) The value of prior knowledge in discovering motifs with MEME. Proc. Int. Conf. Intell. Syst. Mol. Biol., 3, 21–29[Medline].

    Cherry, J.M., et al. (1997) Genetic and physical maps of Saccharomyces cerevisiae. Nature, 387, 67–73[CrossRef][Medline].

    Clarke, N.D. and Granek, J.A. (2003) Rank order metrics for quantifying the association of sequence features with gene regulation. Bioinformatics, 19, 212–218[Abstract/Free Full Text].

    Harbison, C.T., et al. (2004) Transcriptional regulatory code of a eukaryotic genome. Nature, 431, 99–104[CrossRef][Medline].

    Holstege, F.C., et al. (1998) Dissecting the regulatory circuitry of a eukaryotic genome. Cell, 95, 717–728[CrossRef][Web of Science][Medline].

    Hughes, J.D., et al. (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol., 296, 1205–1214[CrossRef][Web of Science][Medline].

    Karolchik, D., et al. (2003) The UCSC Genome Browser Database. Nucleic Acids Res., 31, 51–54[Abstract/Free Full Text].

    Liu, X.S., et al. (2002) An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat. Biotechnol., 20, 835–839[Web of Science][Medline].

    Su, A.I., et al. (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA, 101, 6062–6067[Abstract/Free Full Text].

    Tompa, M., et al. (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol., 23, 137–144[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
V. X. Jin, J. Apostolos, N. S. V. R. Nagisetty, and P. J. Farnham
W-ChIPMotifs: a web application tool for de novo motif discovery from ChIP-based high-throughput data
Bioinformatics, December 1, 2009; 25(23): 3191 - 3193.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
E.-J. Blom, J. B. T. M. Roerdink, O. P. Kuipers, and S. A. F. T. van Hijum
MOTIFATOR: detection and characterization of regulatory motifs using prokaryote transcriptome data
Bioinformatics, February 15, 2009; 25(4): 550 - 551.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Coppe, F. Ferrari, A. Bisognin, G. A. Danieli, S. Ferrari, S. Bicciato, and S. Bortoluzzi
Motif discovery in promoters of genes co-localized and co-expressed during myeloid cells differentiation
Nucleic Acids Res., February 1, 2009; 37(2): 533 - 549.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
V. Phan and N. A. Furlotte
Motif Tool Manager: a web-based framework for motif discovery
Bioinformatics, December 15, 2008; 24(24): 2930 - 2931.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
E. Wijaya, S.-M. Yiu, N. T. Son, R. Kanagasabai, and W.-K. Sung
MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders
Bioinformatics, October 15, 2008; 24(20): 2288 - 2295.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Smeenk, S. J. van Heeringen, M. Koeppel, M. A. van Driel, S. J. J. Bartels, R. C. Akkers, S. Denissov, H. G. Stunnenberg, and M. Lohrum
Characterization of genome-wide p53-binding sites upon stress response
Nucleic Acids Res., June 1, 2008; 36(11): 3639 - 3654.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Yuan and K.-C. Li
Context-dependent clustering for dynamic cellular state modeling of microarray gene expression
Bioinformatics, November 15, 2007; 23(22): 3039 - 3047.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. A. Romer, G.-R. Kayombya, and E. Fraenkel
WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W217 - W220.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. D. MacIsaac, D. B. Gordon, L. Nekludova, D. T. Odom, J. Schreiber, D. K. Gifford, R. A. Young, and E. Fraenkel
A hypothesis-based approach for identifying the binding specificity of regulatory proteins from chromatin immunoprecipitation data
Bioinformatics, February 15, 2006; 22(4): 423 - 429.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/14/3164    most recent
bti481v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (19)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gordon, D. B.
Right arrow Articles by Fraenkel, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gordon, D. B.
Right arrow Articles by Fraenkel, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?