Skip Navigation


Bioinformatics Advance Access originally published online on April 12, 2005
Bioinformatics 2005 21(12):2912-2913; doi:10.1093/bioinformatics/bti434
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/12/2912    most recent
bti434v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Schuster-Böckler, B.
Right arrow Articles by Bateman, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Schuster-Böckler, B.
Right arrow Articles by Bateman, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Visualizing profile–profile alignment: pairwise HMM logos

Benjamin Schuster-Böckler * and Alex Bateman

The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SA, UK

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 FEATURES
 REQUIREMENTS
 REFERENCES
 

Summary: The availability of advanced profile–profile comparison tools, such as PRC or HHsearch demands sophisticated visualization tools not presently available. We introduce an approach built upon the concept of HMM logos. The method illustrates the similarities of pairs of protein family profiles in an intuitive way. Two HMM logos, one for each profile, are drawn one upon the other. The aligned states are then highlighted and connected.

Availability: A web interface offering online creation of pairwise HMM logos is available at http://www.sanger.ac.uk/Software/analysis/logomat-p. Furthermore, software developers may download a Perl package that includes methods for creation of pairwise HMM logos locally.

Contact: bsb{at}sanger.ac.uk


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 FEATURES
 REQUIREMENTS
 REFERENCES
 
The problem of profile–profile comparison has a long history but has received a lot of attention recently (Söding, 2004; Lyngsø et al., 1999; Madera, 2005; Edgar and Sjölander, 2004a). This is a result of the growing number of well characterized protein families in databases, such as Pfam (Bateman et al., 2004). By adding additional information about properties of the entire family, it has been shown that profile–profile methods significantly increase sensitivity compared with profile–sequence comparison (Edgar and Sjölander, 2004b). Several different concepts for profile–profile comparison have been reported. We focused on the visualization of HMM–HMM alignments. The algorithms behind all currently available HMM alignment programs are very similar. Newer approaches mainly differ in details of the scoring function and in the transitions that are taken into account. The approach is to find a sequence of state-to-state pairings that maximizes the probability of both HMMs emitting the same sequence (frequently called co-emission probability). This can be done efficiently by creating a pair HMM (Durbin et al., 1998; Söding, 2004) from the two source HMMs and using standard forward or viterbi algorithms for searching an optimal solution. Nevertheless, the raw output of the alignment tools can be difficult to understand. From the state-to-state pairings alone, it is not immediately obvious which features the two protein families have in common. It was our aim to develop a graphical representation of HMM–HMM alignments that resolves this issue.


    FEATURES
 TOP
 Abstract
 INTRODUCTION
 FEATURES
 REQUIREMENTS
 REFERENCES
 
Pairwise HMM Logos can be currently accessed in two different ways. First, they can be made online at http://www.sanger.ac.uk/Software/analysis/logomat-p. Second, they can be constructed locally by downloading and installing the Perl sources. In the near future, pairwise HMM Logos will also be added to the Pfam website. A typical pairwise HMM Logo is shown in Figure 1. We intended to construct pairwise HMM Logos to look as similar to HMM Logos as possible. This should facilitate their comprehension for users accustomed to HMM Logos. Therefore, we draw two HMM Logos, one for each aligned family. To illustrate individual aligned states they are framed and connected by a block. Unaligned states are shaded in grey. In a local alignment, positions before the first and after the last aligned states are not shown. A brief summary on the features of simple HMM logos is given in the caption to Figure 1. A more detailed description can be found in (Schuster-Böckler et al., 2004).



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 1 Alignment of the Toxin_7 against the Toxin_9 Pfam family. For each family, an HMM logo is drawn. The numbers above and below each logo show state positions in the HMM. The overall height of the letter stacks represents the information content, the relative letter height corresponds to its emission probability. The column width denotes the relative contribution, the product of the probability that the state is traversed with the expected number of self transitions for the respective state. This is to account for the varying length of insertions. Insert states are drawn in red. Frequently, their relative contribution is very small, making them hard to see. In this picture, you find narrow insert states e.g. at positions 27 and 28 of the Toxin_7 family. The aligned states in each HMM are framed and connected by a block. Omitted states are shaded in grey.

 
In our previous work (Schuster-Böckler et al., 2004), we introduced the HMM Perl package. It provides generalized methods to access and modify HMMs. Emission and transition probabilities are stored and retrieved as multidimensional matrices using PDL, the Perl Data Language. HMMER files can be parsed and written. It also allows the creation of HMM logos from profile HMMs. We added a class called HMM::Alignment to this existing framework that works as an abstraction layer to the HMM alignment program PRC (Madera, 2005 http://supfam.mrc-lmb.cam.ac.uk/PRC/). It can parse and write PRC output as well as run PRC directly if it is installed on the system. As it integrates into the HMM package, it takes HMM::Profile objects, HMMER files, Pfam IDs or combinations thereof as arguments for creating alignment objects.


    REQUIREMENTS
 TOP
 Abstract
 INTRODUCTION
 FEATURES
 REQUIREMENTS
 REFERENCES
 
On-the-fly creation of pairwise HMM Logos from HMMER files, multiple sequence alignments or Pfam IDs is available from the website http://www.sanger.ac.uk/Software/analysis/logomat-p. Uploaded HMMs are aligned directly using PRC. Multiple alignments in ClustalW, MSF or SELEX format are used to create HMMs using HMMER before aligning them. The plain PRC output can be downloaded separately. Local installation of the HMM Perl package requires the PDL and Imager packages to be installed on the system together with a working PRC binary. Both Perl packages can be downloaded from http://www.cpan.org. PRC is available from http://supfam.mrc-lmb.cam.ac.uk/PRC/. This software was tested against PRC version 1.5.2.


    Acknowledgments
 
We would like to thank Martin Madera and Robert Finn for the valuable information about theoretical and practical aspects of PRC. Johannes Söding kindly answered numerous questions about his HHsearch algorithm. The authors are grateful for the valuable suggestions and corrections made by the reviewers. B.S.-B. is funded by the Wellcome Trust.

Received on February 8, 2005; revised on March 29, 2005; accepted on March 31, 2005

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 FEATURES
 REQUIREMENTS
 REFERENCES
 

    Bateman, A., et al. (2004) The Pfam protein families database. Nucleic Acids Res., 32, D138–D141[Abstract/Free Full Text].

    Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G. Biological Sequence Analysis, (1998) , Cambridge, UK Cambridge University Press.

    Eddy, S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755–763[Abstract/Free Full Text].

    Eddy, S.R. HMMER User's Guide: Biological Sequence Analysis Using Profile Hidden Markov Models, Version 2.2, (2001) http://hmmer.wustl.edu Washington University School of Medicine.

    Edgar, R.C. and Sjölander, K. (2004a) COACH: profile–profile alignment of protein families using hidden Markov models. Bioinformatics, 20, 1309–1318[Abstract/Free Full Text].

    Edgar, R.C. and Sjölander, K. (2004b) A comparison of scoring functions for protein sequence profile alignment. Bioinformatics, 20, 1301–1308[Abstract/Free Full Text].

    Lyngsø, R., et al. (1999) Metrics and similarity measures for hidden Markov models. Proc. Int. Conf. Intell. Syst. Mol. Biol., 1999, 178–186.

    Madera, M. (2005) PRC—the profile comparer.

    Schneider, T.D. and Stephens, R. (1990) Sequence logos: A new way to display consensus sequences. Nucleic Acids Res., 18, 6097–6100[Abstract/Free Full Text].

    Schuster-Böckler, B., Schultz, J., Rahmann, S. (2004) HMM Logos for visualization of protein families. BMC Bioinformatics, 5, 7[CrossRef][Medline].

    Söding, J. (2005) Protein homology detection by HMM–HMM comparison. Bioinformatics, 21, 951–960[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
G. Manning, S. L. Young, W. T. Miller, and Y. Zhai
From the Cover: The protist, Monosiga brevicollis, has a tyrosine kinase signaling network more elaborate and diverse than found in any known metazoan
PNAS, July 15, 2008; 105(28): 9674 - 9679.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Bateman and R. D. Finn
SCOOP: a simple method for identification of novel protein superfamily relationships
Bioinformatics, April 1, 2007; 23(7): 809 - 814.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Flaus, D. M. A. Martin, G. J. Barton, and T. Owen-Hughes
Identification of multiple distinct Snf2 subfamilies with conserved structural motifs
Nucleic Acids Res., May 31, 2006; 34(10): 2887 - 2905.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. D. Finn, J. Mistry, B. Schuster-Bockler, S. Griffiths-Jones, V. Hollich, T. Lassmann, S. Moxon, M. Marshall, A. Khanna, R. Durbin, et al.
Pfam: clans, web tools and services
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D247 - D251.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/12/2912    most recent
bti434v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Schuster-Böckler, B.
Right arrow Articles by Bateman, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Schuster-Böckler, B.
Right arrow Articles by Bateman, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?