Skip Navigation


Bioinformatics Advance Access originally published online on November 3, 2005
Bioinformatics 2006 22(1):112-114; doi:10.1093/bioinformatics/bti761
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
22/1/112    most recent
bti761v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Pérez-Bercoff, A.
Right arrow Articles by Bürglin, T. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pérez-Bercoff, A.
Right arrow Articles by Bürglin, T. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

LogoBar: bar graph visualization of protein logos with gaps

Åsa Pérez-Bercoff 1,2,3,{dagger}, Johan Koch 1,2,{dagger} and Thomas R. Bürglin 1,2,*

1Department of Biosciences at Novum, and Center for Genomics and Bioinformatics, Karolinska Institutet Alfred Nobels Allé 7, SE-141 89 Huddinge, Sweden
2Department of Life Sciences, Södertörns högskola Alfred Nobels Allé 7, SE-141 89 Huddinge, Sweden
3International Master's Programme in Bioinformatics, Chalmers University of Technology Gothenburg, Sweden

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 REFERENCES
 

Summary: LogoBar is a Java application to display protein sequence logos. In our software gaps are accounted for when calculating the information content present at each residue position in a multiple alignment. The resulting logo is displayed as a graph consisting of bars, although traditional letter representation is also possible. Amino acids are displayed from the bottom up with decreasing frequencies i.e. the most abundant residue is placed at the bottom of the logo. The bars can be color-coded according to user specifications. Gaps in the alignment are also displayed, either on top or at the bottom of the logo. Furthermore, residues can either be arranged according to their relative abundance or grouped according to user criteria to emphasize the conserved nature of particular positions.

Availability: LogoBar and further documentation is available at http://www.biosci.ki.se/groups/tbu/logobar/

Contact: thomas.burglin{at}biosci.ki.se


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 REFERENCES
 
Sequence logos, first described by Schneider and Stephens (1990), graphically represent the information present in a multiple sequence alignment. Each position in the logo is represented by a stack consisting of letters occurring at the corresponding position in the alignment. The height of each character is proportional to its frequency, where the most common character is placed at the top of the stack. The height of the entire stack indicates the information content available at that position. Several implementations for creating sequence logos exist; two commonly used ones are available via a web interface (Crooks et al., 2004; Gorodkin et al., 1997). The present protein sequence logo implementations create visibility problems when several characters at a position are stacked on top of each other. The letters often are substantially distorted, either very tall at highly conserved positions or extremely squashed at less conserved positions. Depending on the settings of the y-axis scale, certain aspects of the logos become difficult to read. Further, none of the existing applications visualizes gaps. Instead, if a column in the alignment contains gaps the height of the displayed amino acids will be adjusted to compensate for this (Schneider and Stephens, 1990). However, by displaying the gaps as we do in LogoBar, gaps and poorly conserved positions can be distinguished. Further, we display the protein logo as a graph consisting of bars that are color coded instead of using only height-scaled characters. The amino acids are still represented as letters within each bar, but using normal font ratios. The goal was to improve the visibility of the information content of a protein logo.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 REFERENCES
 
The algorithm for calculating protein logos has been described previously (Schneider and Stephens, 1990). Here, we modify the algorithm and treat gaps in a distinct way. Even though a gap is obviously not an amino acid, it nevertheless represents a character state at a particular position in an alignment of a conserved motif, i.e. a particular position can have a gap or different types of amino acids. Like amino acids that are represented with different frequencies at particular positions, gaps can or cannot occur with different likelihoods at particular positions within a conserved motif. Since a protein logo should provide us with information about the possible states at each position, it seems logical to include gaps as well.

Our modified formula looks like this:

Formula
where log221 replaces log220 of the original formula, since we account for the 20 residues plus gaps. This also results in a slight alteration of the correction factor, e(n). e(n) = ((s – 1)/(2ln(2)n)) is an approximation for the expectation of a sampled uncertainty, where s is the number of possible characters (21 in LogoBar and 20 in traditional sequence logo programs) and n is the number of aligned sequences. The correction factor approaches zero as the number of sequences in the alignment increases. Presently, LogoBar can handle multiple alignments created in Clustal, ClustalX or ClustalW (Thompson et al., 1997; Chenna et al., 2003). EPS output is produced via the epsgraphics.jar package developed by Paul Mutton (http://www.jibble.org/).


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 REFERENCES
 
When a multiple sequence alignment file is read by LogoBar an output text file [output.txt] is created in the background. This output file stores various parameters from the input file (name, length, number), as well as three tables created by LogoBar consisting of the number, the frequency and the height of the amino acids at every position in the alignment. Thus a complete data table with the frequency of each residue at each position is generated. Once the output file is created, LogoBar will draw the sequence logo as a graph consisting of colored bars with the most frequent amino acid at the bottom, i.e. the amino acids are displayed in reversed frequency order in comparison with other logos (Fig. 1A). Within each bar, the letter for the corresponding amino acid is shown. Gaps in the logo are also represented as bars labeled ‘-’. Various parameters can modify the appearance of the bar graphs: gap bar placement, font size or residue placement within the bar. Further, for long alignments the logo can be split into blocks. Optionally, a traditional character-based logo can also be displayed. Underneath the LogoBar graph a consensus sequence is shown (Fig. 1A). Optionally additional residues up to a selectable frequency cut-off value (e.g. 5%) can be shown in decreasing order of frequency underneath (Fig. 1B).


Figure 1
View larger version (72K):
[in this window]
[in a new window]
 
Fig. 1 Protein sequence logo of a 79 residue long alignment of 58 sequences created using LogoBar. (A) Screen shot of LogoBar: graph of the amino acids' relative abundance with the most frequent residue at the bottom. Note the legend in the floating window that shows the color-grouping of the amino acids. (B) Graph of color-grouped amino acids sorted according to abundance of the color groups. Note that residues with the same color are now grouped together, e.g. position 6. Further, the option is used to show all residues in decreasing order of frequency underneath the graph.

 
A key feature of LogoBar is that amino acids and gaps can be assigned to different groups, maximum 21, but usually only 6–8 depending on how one groups amino acids according to their properties. Each group is given its own color. For example, all acidic residues are placed in group 1, and this group is given the color red, all basic residues are in group 2, which is blue, etc. The legend for these residue assignments is shown in a floating window. Hence, the colored bars of LogoBar give an immediate sense of the residue properties at particular positions. An option allows the bar graph to be sorted such that residues in the same color group are grouped together, see e.g. positions 1, 6 and 22 in Fig. 1B. We have provided several default color groupings and the user can create new ones, e.g. one color scheme allows highlighting of conserved cysteine residues. The screen images drawn by LogoBar can be saved in EPS format for printing or further editing.

LogoBar uses a slightly altered algorithm when calculating the sequence logo from a multiple sequence alignment. However, the key difference between LogoBar and other sequence logo applications is the way in which the logo is represented. We find that our representation does allow for better visualization of protein logo features due to its colored bars which can be grouped. Especially for longer sequences, it is easy to scroll along the logo interactively. Further, many options for coloring bars and displaying gaps are provided. We believe these features will make this program interesting to many other users.


    Acknowledgments
 
We are grateful to Liyi Meng for fruitful discussions about Java features and to Marija Cvijovic for mathematical advice. This research is supported by the Swedish Foundation for Strategic Research.

Conflict of Interest: none declared.


    FOOTNOTES
 
{dagger}The authors Wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. Back

Associate Editor: Christos Ouzounis

Received on June 22, 2005; revised on October 27, 2005; accepted on November 2, 2005

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 REFERENCES
 

    Chenna, R., et al. (2003) Multiple sequence alignment with the clustal series of programs. Nucleic Acids Res, . 31, 3497–3500[Abstract/Free Full Text].

    Crooks, G., et al. (2004) WebLogo: a sequence logo generator. Genome Res, . 14, 1188–1190[Abstract/Free Full Text].

    Gorodkin, J., et al. (1997) Displaying the information contents of structural RNA alignments: the structure logos. Comput. Appl. Biosci, . 13, 583–586[Abstract/Free Full Text].

    Schneider, T.D. and Stephens, M.R. (1990) Sequence logos: A new way to display consensus sequences. Nucleic Acids Res, . 18, 6097–6100[Abstract/Free Full Text].

    Thompson, J.D., et al. (1997) The clustal x windows interface: flexible strategies for multiple alignment aided by quality analysis tools. Nucleic Acids Res, . 25, 4876–4882[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
FASEB J.Home page
J. M. Eirin-Lopez, T. Ishibashi, and J. Ausio
H2A.Bbd: a quickly evolving hypervariable mammalian histone that destabilizes nucleosomes in an acetylation-independent way
FASEB J, January 1, 2008; 22(1): 316 - 326.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
D. W. Abbott, J. M. Eirin-Lopez, and A. B. Boraston
Insight into Ligand Diversity and Novel Biological Roles for Family 32 Carbohydrate-Binding Modules
Mol. Biol. Evol., January 1, 2008; 25(1): 155 - 167.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
K. Mukherjee and T. R. Burglin
MEKHLA, a Novel Domain with Similarity to PAS Domains, Is Fused to Plant Homeodomain-Leucine Zipper III Proteins.
Plant Physiology, April 1, 2006; 140(4): 1142 - 1150.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
22/1/112    most recent
bti761v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Pérez-Bercoff, A.
Right arrow Articles by Bürglin, T. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pérez-Bercoff, A.
Right arrow Articles by Bürglin, T. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?