Skip Navigation


Bioinformatics Advance Access originally published online on November 24, 2006
Bioinformatics 2007 23(4):502-503; doi:10.1093/bioinformatics/btl601
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
23/4/502    most recent
btl601v2
btl601v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Frickey, T.
Right arrow Articles by Weiller, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Frickey, T.
Right arrow Articles by Weiller, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Mclip: motif detection based on cliques of gapped local profile-to-profile alignments

Tancred Frickey and Georg Weiller *

ARC Centre of Excellence for Integrative Legume Research and Bioinformatics Laboratory, Genomic Interactions Group, Research School of Biological Sciences, Australian National University GPO Box 475, Canberra, ACT 2601, Australia

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 APPLICATION
 REFERENCES
 

Summary: A multitude of motif-finding tools have been published, which can generally be assigned to one of three classes: expectation-maximization, Gibbs-sampling or enumeration. Irrespective of this grouping, most motif detection tools only take into account similarities across ungapped sequence regions, possibly causing short motifs located peripherally and in varying distance to a ‘core’ motif to be missed. We present a new method, adding to the set of expectation-maximization approaches, that permits the use of gapped alignments for motif elucidation.

Availability: The program is available for download from: http://bioinfoserver.rsbs.anu.edu.au/downloads/mclip.jar

Contact: Georp.Weiller{at}anu.edu.au

Supplementary information: http://bioinfoserver.rsbs.anu.edu.au/utils/mclip/info.php


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 APPLICATION
 REFERENCES
 
Motif detection methods can generally be classified into one of three groups: enumeration methods [Weeder (Pavesi et al., 2001)] Gibbs-sampling [AGLAM (Tharakaraman et al., 2005)] and expectation-maximization [MEME (Bailey and Elkan, 1994)]. The basic premise for all of the methods is that, for a given set of co-expressed sequences, the motifs responsible for this co-expression will be more conserved and present more frequently in those sequences than in other sets of sequences. Finding all possible motifs of any length in a highly variable number of sequences, some of which may contain the motif and some not, is a daunting task and, most likely, the reason why many motif detection tools require the user to specify bounding parameters, such as specific motif lengths, or the number of times a motif is to be found in a sequence. Unfortunately, regulatory motifs vary in size [TRANSFAC (Matys et al., 2003)] and frequently it is uncertain whether the observed co-expression of a set of sequences is due to common regulation or chance. Both are factors that may cause relevant motifs to be missed if inappropriate bounding parameters are used. In addition, many programs will simply output one or multiple sequence regions in which a motif was detected without providing an estimate of how likely this motif is to have occurred there by chance. A further disadvantage is that many tools are available only via a web-interface, making large-scale analyses tedious, or require extensive dependencies, making installation of the programs a major stumbling block to their everyday use. Fortunately there are also many easy to use, readily installable programs that provide adequate significance measures for their results; such as A1ignACE (Hughes et al., 2000), AGLAM and MEME.

We wish to extend that list by presenting a tool basing its motif detection on cliques of gapped local profile–profile alignments, in this case, ‘cliques’ refer to sets of alignment traces for which all profiles share co-aligned residues. The use of gapped alignments comes at the cost of increased complexity and longer running time, but may increase sensitivity by enabling the detection of gapped or additional motifs located peripherally and in variable distance from a ‘core’ motif. An example application of Mclip to sets of coexpresses sequences and a more detailed explanation of the alignment and motif-finding procedure can be found as part of the supplementary information.


    2 IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 APPLICATION
 REFERENCES
 
The program uses a multi-step approach to finding motifs (Fig. 1). Local alignments are generated for all sequence pairs. Based on these alignments, 5-state profiles [‘A’,‘C’,‘G’,‘T’,‘gap’] are derived for each sequence and provide a numerical representation of the residues contained in the alignments covering that sequence. Local alignments are then generated for all pairs of profiles by maximizing the log-odds ratio of one profile region emitting the residue counts present in a region of the other profile and vice versa (similar to COMPASS Sadreyev and Grishin, 2003). Motifs can then be inferred from cliques of local profile–profile alignments sharing co-aligned residues. A motif is derived by combining the position specific residue frequencies of the profile regions covered by the clique and adding gaps as specified by the local alignments.


Figure 1
View larger version (31K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Motif detection by Mclip. The motif shown is one of the gapped motifs derived from the –1 kb upstream region of groups of Arabidopsis thaliana sequences with expression levels correlating strongly over a set of eight microarray experiments.

 
This produces a set of possible motifs. Which of these are present in which sequences is determined by aligning the input sequences back to the motifs. The program returns the motifs and sequence regions with high-scoring alignments to the motifs.


    3 APPLICATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 APPLICATION
 REFERENCES
 
Default input is a set of unaligned FASTA format sequences. Command line parameters as well as a web-interface allow the user to modify the parameters. Mclip automatically determines the size of motifs and aligns them to the input sequences, providing a statistical estimate for the motif-sequence alignment. The output is similar to MEME and consists of a list of detected motifs, the sequences with significant similarities to the motifs, their start, motif-match, end, alignment score, Z-score and E-value. The motif-sequence alignment routine is available separately (Mmatch) and can be used to search for motifs found by Mclip in a different set of sequences. Both programs are written in Java and run under MacOS, Windows and Unix/Linux; a Java 1.5 or better runtime environment is required. The programs are available under the GNU-General Public License; all source code is included in the jar archives.

Mclip is available for download from http://bioinfoserver.rsbs.anu.edu.au/downloads/mclip.jar. Mmatch is available for download from http://bioinfoserver.rsbs.anu.edu.au/downloads/mmatch.jar.

In addition, Mclip can be run via the web-interface at http://bioinfoserver.rsbs.anu.edu.au/utils/mclip/.


    Acknowledgments
 
This research was funded by an Australian Research Council Centre of Excellence grant. Funding to pay the Open Access publication charges for this article was provided by the same grant.


    FOOTNOTES
 
Associate Editor: John Quackenbush

Received on September 15, 2006; revised on November 5, 2006; accepted on November 20, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 APPLICATION
 REFERENCES
 

    Bailey, LL. and Elkan, C. (1994) Fitting a mixture model by expectation maximization to discover rnotifs in biopolymers. In Altman, R.B., Brutlag, D.L., Karp, P.D., Lathrop, R.H., Searls, D.B. (Eds.). Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, , Menlo Park, CA AAAI Press, pp. 28–36.

    Hughes, J.D., et al. (2000) Computational identification of cis-regulatory elements associated with functionally coherent groups of genes in Saccharornyccs cerevisiae. J. Mol. Biol, . 296, 1205–1214[CrossRef][ISI][Medline].

    Matys, V., et al. (2003) Tranfac: transcriptional regulation, from patterns to profiles. Nucleic Acids Res, . 31, 374–378[Abstract/Free Full Text].

    Pavesi, C.L., et al. (2001) An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics, 17, S207–S214[Abstract].

    Sadreyev, R. and Grishin, N. (2003) COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J. Mol. Biol, . 326, 317–336[CrossRef][ISI][Medline].

    Tharakaraman, K., et al. (2005) Alignments anchored on genomic landmarks can aid the identification of regulatory elements. Bioinformatics, 21, i440–i448[Abstract].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
23/4/502    most recent
btl601v2
btl601v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Frickey, T.
Right arrow Articles by Weiller, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Frickey, T.
Right arrow Articles by Weiller, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?