Bioinformatics Advance Access originally published online on April 12, 2005
Bioinformatics 2005 21(12):2912-2913; doi:10.1093/bioinformatics/bti434
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Visualizing profileprofile alignment: pairwise HMM logos
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SA, UK
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: The availability of advanced profileprofile comparison tools, such as PRC or HHsearch demands sophisticated visualization tools not presently available. We introduce an approach built upon the concept of HMM logos. The method illustrates the similarities of pairs of protein family profiles in an intuitive way. Two HMM logos, one for each profile, are drawn one upon the other. The aligned states are then highlighted and connected.
Availability: A web interface offering online creation of pairwise HMM logos is available at http://www.sanger.ac.uk/Software/analysis/logomat-p. Furthermore, software developers may download a Perl package that includes methods for creation of pairwise HMM logos locally.
Contact: bsb{at}sanger.ac.uk
| INTRODUCTION |
|---|
|
|
|---|
The problem of profileprofile comparison has a long history but has received a lot of attention recently (Söding, 2004; Lyngsø et al., 1999; Madera, 2005; Edgar and Sjölander, 2004a). This is a result of the growing number of well characterized protein families in databases, such as Pfam (Bateman et al., 2004). By adding additional information about properties of the entire family, it has been shown that profileprofile methods significantly increase sensitivity compared with profilesequence comparison (Edgar and Sjölander, 2004b). Several different concepts for profileprofile comparison have been reported. We focused on the visualization of HMMHMM alignments. The algorithms behind all currently available HMM alignment programs are very similar. Newer approaches mainly differ in details of the scoring function and in the transitions that are taken into account. The approach is to find a sequence of state-to-state pairings that maximizes the probability of both HMMs emitting the same sequence (frequently called co-emission probability). This can be done efficiently by creating a pair HMM (Durbin et al., 1998; Söding, 2004) from the two source HMMs and using standard forward or viterbi algorithms for searching an optimal solution. Nevertheless, the raw output of the alignment tools can be difficult to understand. From the state-to-state pairings alone, it is not immediately obvious which features the two protein families have in common. It was our aim to develop a graphical representation of HMMHMM alignments that resolves this issue.
| FEATURES |
|---|
|
|
|---|
Pairwise HMM Logos can be currently accessed in two different ways. First, they can be made online at http://www.sanger.ac.uk/Software/analysis/logomat-p. Second, they can be constructed locally by downloading and installing the Perl sources. In the near future, pairwise HMM Logos will also be added to the Pfam website. A typical pairwise HMM Logo is shown in Figure 1. We intended to construct pairwise HMM Logos to look as similar to HMM Logos as possible. This should facilitate their comprehension for users accustomed to HMM Logos. Therefore, we draw two HMM Logos, one for each aligned family. To illustrate individual aligned states they are framed and connected by a block. Unaligned states are shaded in grey. In a local alignment, positions before the first and after the last aligned states are not shown. A brief summary on the features of simple HMM logos is given in the caption to Figure 1. A more detailed description can be found in (Schuster-Böckler et al., 2004).
|
In our previous work (Schuster-Böckler et al., 2004), we introduced the HMM Perl package. It provides generalized methods to access and modify HMMs. Emission and transition probabilities are stored and retrieved as multidimensional matrices using PDL, the Perl Data Language. HMMER files can be parsed and written. It also allows the creation of HMM logos from profile HMMs. We added a class called HMM::Alignment to this existing framework that works as an abstraction layer to the HMM alignment program PRC (Madera, 2005 http://supfam.mrc-lmb.cam.ac.uk/PRC/). It can parse and write PRC output as well as run PRC directly if it is installed on the system. As it integrates into the HMM package, it takes HMM::Profile objects, HMMER files, Pfam IDs or combinations thereof as arguments for creating alignment objects.
| REQUIREMENTS |
|---|
|
|
|---|
On-the-fly creation of pairwise HMM Logos from HMMER files, multiple sequence alignments or Pfam IDs is available from the website http://www.sanger.ac.uk/Software/analysis/logomat-p. Uploaded HMMs are aligned directly using PRC. Multiple alignments in ClustalW, MSF or SELEX format are used to create HMMs using HMMER before aligning them. The plain PRC output can be downloaded separately. Local installation of the HMM Perl package requires the PDL and Imager packages to be installed on the system together with a working PRC binary. Both Perl packages can be downloaded from http://www.cpan.org. PRC is available from http://supfam.mrc-lmb.cam.ac.uk/PRC/. This software was tested against PRC version 1.5.2.
| Acknowledgments |
|---|
We would like to thank Martin Madera and Robert Finn for the valuable information about theoretical and practical aspects of PRC. Johannes Söding kindly answered numerous questions about his HHsearch algorithm. The authors are grateful for the valuable suggestions and corrections made by the reviewers. B.S.-B. is funded by the Wellcome Trust.
Received on February 8, 2005; revised on March 29, 2005; accepted on March 31, 2005
| REFERENCES |
|---|
|
|
|---|
Bateman, A., et al. (2004) The Pfam protein families database. Nucleic Acids Res., 32, D138D141
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G. Biological Sequence Analysis, (1998) , Cambridge, UK Cambridge University Press.
Eddy, S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755763
Eddy, S.R. HMMER User's Guide: Biological Sequence Analysis Using Profile Hidden Markov Models, Version 2.2, (2001) http://hmmer.wustl.edu Washington University School of Medicine.
Edgar, R.C. and Sjölander, K. (2004a) COACH: profileprofile alignment of protein families using hidden Markov models. Bioinformatics, 20, 13091318
Edgar, R.C. and Sjölander, K. (2004b) A comparison of scoring functions for protein sequence profile alignment. Bioinformatics, 20, 13011308
Lyngsø, R., et al. (1999) Metrics and similarity measures for hidden Markov models. Proc. Int. Conf. Intell. Syst. Mol. Biol., 1999, 178186.
Madera, M. (2005) PRCthe profile comparer.
Schneider, T.D. and Stephens, R. (1990) Sequence logos: A new way to display consensus sequences. Nucleic Acids Res., 18, 60976100
Schuster-Böckler, B., Schultz, J., Rahmann, S. (2004) HMM Logos for visualization of protein families. BMC Bioinformatics, 5, 7[CrossRef][Medline].
Söding, J. (2005) Protein homology detection by HMMHMM comparison. Bioinformatics, 21, 951960
This article has been cited by other articles:
![]() |
G. Manning, S. L. Young, W. T. Miller, and Y. Zhai From the Cover: The protist, Monosiga brevicollis, has a tyrosine kinase signaling network more elaborate and diverse than found in any known metazoan PNAS, July 15, 2008; 105(28): 9674 - 9679. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bateman and R. D. Finn SCOOP: a simple method for identification of novel protein superfamily relationships Bioinformatics, April 1, 2007; 23(7): 809 - 814. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Flaus, D. M. A. Martin, G. J. Barton, and T. Owen-Hughes Identification of multiple distinct Snf2 subfamilies with conserved structural motifs Nucleic Acids Res., May 31, 2006; 34(10): 2887 - 2905. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Finn, J. Mistry, B. Schuster-Bockler, S. Griffiths-Jones, V. Hollich, T. Lassmann, S. Moxon, M. Marshall, A. Khanna, R. Durbin, et al. Pfam: clans, web tools and services Nucleic Acids Res., January 1, 2006; 34(suppl_1): D247 - D251. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



