Bioinformatics Advance Access originally published online on November 3, 2005
Bioinformatics 2006 22(1):112-114; doi:10.1093/bioinformatics/bti761
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
LogoBar: bar graph visualization of protein logos with gaps


1Department of Biosciences at Novum, and Center for Genomics and Bioinformatics, Karolinska Institutet Alfred Nobels Allé 7, SE-141 89 Huddinge, Sweden
2Department of Life Sciences, Södertörns högskola Alfred Nobels Allé 7, SE-141 89 Huddinge, Sweden
3International Master's Programme in Bioinformatics, Chalmers University of Technology Gothenburg, Sweden
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: LogoBar is a Java application to display protein sequence logos. In our software gaps are accounted for when calculating the information content present at each residue position in a multiple alignment. The resulting logo is displayed as a graph consisting of bars, although traditional letter representation is also possible. Amino acids are displayed from the bottom up with decreasing frequencies i.e. the most abundant residue is placed at the bottom of the logo. The bars can be color-coded according to user specifications. Gaps in the alignment are also displayed, either on top or at the bottom of the logo. Furthermore, residues can either be arranged according to their relative abundance or grouped according to user criteria to emphasize the conserved nature of particular positions.
Availability: LogoBar and further documentation is available at http://www.biosci.ki.se/groups/tbu/logobar/
Contact: thomas.burglin{at}biosci.ki.se
| 1 INTRODUCTION |
|---|
|
|
|---|
Sequence logos, first described by Schneider and Stephens (1990), graphically represent the information present in a multiple sequence alignment. Each position in the logo is represented by a stack consisting of letters occurring at the corresponding position in the alignment. The height of each character is proportional to its frequency, where the most common character is placed at the top of the stack. The height of the entire stack indicates the information content available at that position. Several implementations for creating sequence logos exist; two commonly used ones are available via a web interface (Crooks et al., 2004; Gorodkin et al., 1997). The present protein sequence logo implementations create visibility problems when several characters at a position are stacked on top of each other. The letters often are substantially distorted, either very tall at highly conserved positions or extremely squashed at less conserved positions. Depending on the settings of the y-axis scale, certain aspects of the logos become difficult to read. Further, none of the existing applications visualizes gaps. Instead, if a column in the alignment contains gaps the height of the displayed amino acids will be adjusted to compensate for this (Schneider and Stephens, 1990). However, by displaying the gaps as we do in LogoBar, gaps and poorly conserved positions can be distinguished. Further, we display the protein logo as a graph consisting of bars that are color coded instead of using only height-scaled characters. The amino acids are still represented as letters within each bar, but using normal font ratios. The goal was to improve the visibility of the information content of a protein logo.
| 2 METHODS |
|---|
|
|
|---|
The algorithm for calculating protein logos has been described previously (Schneider and Stephens, 1990). Here, we modify the algorithm and treat gaps in a distinct way. Even though a gap is obviously not an amino acid, it nevertheless represents a character state at a particular position in an alignment of a conserved motif, i.e. a particular position can have a gap or different types of amino acids. Like amino acids that are represented with different frequencies at particular positions, gaps can or cannot occur with different likelihoods at particular positions within a conserved motif. Since a protein logo should provide us with information about the possible states at each position, it seems logical to include gaps as well.
Our modified formula looks like this:
![]() |
| 3 RESULTS |
|---|
|
|
|---|
When a multiple sequence alignment file is read by LogoBar an output text file [output.txt] is created in the background. This output file stores various parameters from the input file (name, length, number), as well as three tables created by LogoBar consisting of the number, the frequency and the height of the amino acids at every position in the alignment. Thus a complete data table with the frequency of each residue at each position is generated. Once the output file is created, LogoBar will draw the sequence logo as a graph consisting of colored bars with the most frequent amino acid at the bottom, i.e. the amino acids are displayed in reversed frequency order in comparison with other logos (Fig. 1A). Within each bar, the letter for the corresponding amino acid is shown. Gaps in the logo are also represented as bars labeled -. Various parameters can modify the appearance of the bar graphs: gap bar placement, font size or residue placement within the bar. Further, for long alignments the logo can be split into blocks. Optionally, a traditional character-based logo can also be displayed. Underneath the LogoBar graph a consensus sequence is shown (Fig. 1A). Optionally additional residues up to a selectable frequency cut-off value (e.g. 5%) can be shown in decreasing order of frequency underneath (Fig. 1B).
|
A key feature of LogoBar is that amino acids and gaps can be assigned to different groups, maximum 21, but usually only 68 depending on how one groups amino acids according to their properties. Each group is given its own color. For example, all acidic residues are placed in group 1, and this group is given the color red, all basic residues are in group 2, which is blue, etc. The legend for these residue assignments is shown in a floating window. Hence, the colored bars of LogoBar give an immediate sense of the residue properties at particular positions. An option allows the bar graph to be sorted such that residues in the same color group are grouped together, see e.g. positions 1, 6 and 22 in Fig. 1B. We have provided several default color groupings and the user can create new ones, e.g. one color scheme allows highlighting of conserved cysteine residues. The screen images drawn by LogoBar can be saved in EPS format for printing or further editing.
LogoBar uses a slightly altered algorithm when calculating the sequence logo from a multiple sequence alignment. However, the key difference between LogoBar and other sequence logo applications is the way in which the logo is represented. We find that our representation does allow for better visualization of protein logo features due to its colored bars which can be grouped. Especially for longer sequences, it is easy to scroll along the logo interactively. Further, many options for coloring bars and displaying gaps are provided. We believe these features will make this program interesting to many other users.
| Acknowledgments |
|---|
We are grateful to Liyi Meng for fruitful discussions about Java features and to Marija Cvijovic for mathematical advice. This research is supported by the Swedish Foundation for Strategic Research.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
The authors Wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. Associate Editor: Christos Ouzounis
Received on June 22, 2005; revised on October 27, 2005; accepted on November 2, 2005
| REFERENCES |
|---|
|
|
|---|
Chenna, R., et al. (2003) Multiple sequence alignment with the clustal series of programs. Nucleic Acids Res, . 31, 34973500
Crooks, G., et al. (2004) WebLogo: a sequence logo generator. Genome Res, . 14, 11881190
Gorodkin, J., et al. (1997) Displaying the information contents of structural RNA alignments: the structure logos. Comput. Appl. Biosci, . 13, 583586
Schneider, T.D. and Stephens, M.R. (1990) Sequence logos: A new way to display consensus sequences. Nucleic Acids Res, . 18, 60976100
Thompson, J.D., et al. (1997) The clustal x windows interface: flexible strategies for multiple alignment aided by quality analysis tools. Nucleic Acids Res, . 25, 48764882
This article has been cited by other articles:
![]() |
J. M. Eirin-Lopez, T. Ishibashi, and J. Ausio H2A.Bbd: a quickly evolving hypervariable mammalian histone that destabilizes nucleosomes in an acetylation-independent way FASEB J, January 1, 2008; 22(1): 316 - 326. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. W. Abbott, J. M. Eirin-Lopez, and A. B. Boraston Insight into Ligand Diversity and Novel Biological Roles for Family 32 Carbohydrate-Binding Modules Mol. Biol. Evol., January 1, 2008; 25(1): 155 - 167. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Mukherjee and T. R. Burglin MEKHLA, a Novel Domain with Similarity to PAS Domains, Is Fused to Plant Homeodomain-Leucine Zipper III Proteins. Plant Physiology, April 1, 2006; 140(4): 1142 - 1150. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




