Bioinformatics Advance Access originally published online on October 6, 2007
Bioinformatics 2007 23(22):3093-3094; doi:10.1093/bioinformatics/btm489
BiasViz: visualization of amino acid biased regions in protein alignments
1Molecular Medicine, Ottawa Health Research Institute, 501 Smyth Road, Ottawa, ON, Canada K1H 8L6, 2Department of Cell and Developmental Biology, John Innes Centre, Norwich UK and 3Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa, Canada
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: About a third of all protein sequences have at least one composition biased region (CBR). Such regions might act as linkers between protein domains but often confer specific binding to various molecules; therefore, their characterization in terms of their boundaries and over-represented residues is important. Analysis of CBRs in a particular sequence can be time consuming if several types of biases have to be explored and their position visualized. Assessment of the significance of the detected CBRs can be approached by comparison to homologous protein sequences. To assist this procedure, we have developed BiasViz, a tool that allows to graphically studying local amino acid composition in protein sequences of a multiple sequence alignment.
Availability: BiasViz java applet and source code can be accessed from http://biasviz.sourceforge.net
Contact: matthuska{at}alumni.uwaterloo.ca
| 1 INTRODUCTION |
|---|
|
|
|---|
Most protein sequences are a complex series of amino acids with side chains of varied properties. However, about a third of protein sequences contains composition biased regions (CBRs), also described as regions of low complexity, unusually rich in one amino acid or in amino acids with similar properties (Wootton, 1994). CBRs can act as flexible linkers (spacers) between compact domains, in which their precise sequence is actually unimportant, but they have also being reported to function in the binding of proteins and other substrates (see e.g. Sim and Creamer, 2004; Ulbert et al., 2006; Wootton, 1994 and references therein). It is therefore important to characterize the extent and composition properties of such regions.
In some cases, the characterization of a CBR is straightforward (e.g. the N-terminal poly glutamine tract of mammalian Huntingtins, which in the human sequence is a series of 23 consecutive glutamine residues). However, functionally relevant composition biases are often small (e.g. a frequency of 30% of a given amino acid in a region in contrast to a 10% in the unbiased regions) and the property of the amino acids involved in the bias might not be obvious at first sight; the bias can be produced by a particular amino acid, like lysine, or by amino acids with similar properties such as having positive charge or being polar.
Programs have been developed to detect low-complexity regions [e.g. seg (Wootton and Federhen, 1996)]. These programs are routinely used to filter them before sequence analysis to avoid false positives in pairwise sequence comparisons (Bork and Koonin, 1998), and do not inform of the significance or composition bias of the region. Pairwise sequence comparison algorithms can assess the statistical significance of the similarity between phylogenetically divergent proteins but assume that local amino acid composition is close to random and therefore cannot be used to characterize CBRs by sequence similarity (Altschul et al., 1994). As a result, CBRs have a tendency to escape homology detection by pairwise sequence comparisons. An alternative to assess the significance of a CBR in a particular protein sequence is to examine if the CBR is present in some of its homologous sequences in equivalent positions of their multiple sequence alignment (MSA) (Sim and Creamer, 2004).
We recently applied this idea to analyze the CBR of AIR9-like proteins (characterized by a basic Serine/Threonine-rich region) implicated in microtubule binding (Buschmann et al., 2006). CBRs of AIR9-like proteins from plants show little homology in linear sequence alignments, but present a conserved bias for basic and hydroxylated residues. This became obvious after an MSA of the family was studied and plots of sequence composition of the plant members were compared with those of other sequences of the family (Buschmann et al., 2007). Following these ideas, we have developed a tool, BiasViz, which allows the interactive visualization of amino acid composition biases of protein sequences in a multiple sequence alignment at variable ranges.
| 2 JAVA TOOL |
|---|
|
|
|---|
Input to BiasViz is a multiple sequence alignment in FASTA format (one example is preloaded at the BiasViz web site). This alignment is entered into a simple web form which when submitted launches the BiasViz applet. Once the applet is loaded, the alignment will be displayed along with controls that can be used to change how sequence composition is visualized, including amino acids of interest and window size.
The user can select any combination of amino acids for composition analysis and view the sum of their local frequencies in the sequences contained in the alignment. The visualization is generated by running a sliding window across each sequence (excluding the gaps inserted in the alignment) and recording the fraction of amino acids within the window that belong to the set of amino acids that the user has selected. This information can be displayed in a scale from white (100%) to black (0%), or scaled up so that the location with the highest value is displayed as white (Fig. 1a). A threshold can be set so that values of intensity above a cutoff are displayed as white and those below it as black (Fig.1c). Alignment gaps are represented in red. Output from the program can be saved in the form of a comma delimited table containing the currently displayed intensity values at each location in each sequence, which can be used for further graphing (Fig. 1b).
|
2.1 Technical specifications
BiasViz is implemented as a Java 1.5 applet and includes a small PHP input form used for input of the multiple sequence alignment to be visualized. As such, the program runs on any platform in a standard browser that has the Java plug-in installed, BiasViz itself requires no installation. The source code is licensed under the permissive MIT open source license (http://www.opensource.org/licenses/mit-license.php).
| 3 CONCLUSION |
|---|
|
|
|---|
BiasViz has been developed to assist the analysis of amino acid composition bias in sets of protein sequences arranged in a multiple sequence alignment. BiasViz fills a gap that is not covered by algorithms to study sequence complexity or by pairwise sequence comparison methods. We expect that this tool will serve molecular biologists wishing to explore and describe composition bias in protein families and to produce graphical representations for the communication of results.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
M.A.A. is a recipient of a Canada Research Chair in Bioinformatics. H.B. was supported by a BBSRC grant to Clive W. Lloyd. (John Innes Centre, Norwich UK).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Limsoon Wong
Received on July 25, 2007; revised on July 25, 2007; accepted on September 14, 2007
| REFERENCES |
|---|
|
|
|---|
Altschul SF, et al. Issues in searching molecular sequence databases. Nat. Genet. (1994) 6:119–129.[CrossRef][Web of Science][Medline]
Bork P, Koonin EV. Predicting functions from protein sequences – where are the bottlenecks? Nat. Genet. (1998) 18:313–318.[CrossRef][Web of Science][Medline]
Buschmann H, et al. Microtubule-associated AIR9 recognizes the cortical division site at preprophase and cell-plate insertion. Curr. Biol. (2006) 16:1938–1943.[CrossRef][Web of Science][Medline]
Buschmann H, et al. Homologues of Arabidopsis microtubule-associated AIR9 in trypanosomatid parasites: hints on evolution and function. Plant Signal. Behav. (2007) 16:1938–1943.
Howard L, et al. Interaction of the metalloprotease disintegrins MDC9 and MDC15 with two SH3 domain-containing proteins, endophilin I and SH3PX1. J. Biol. Chem. (1999) 274:31693–31699.
Huang L, et al. Screen and identification of proteins interacting with ADAM19 cytoplasmic tail. Mol. Biol. Rep. (2002) 29:317–323.[CrossRef][Web of Science][Medline]
Kang Q, et al. Metalloprotease-disintegrin ADAM 12 binds to the SH3 domain of Src and activates Src tyrosine kinase in C2C12 cells. Biochem. J. (2000) 352(Pt 3):883–892.[CrossRef][Web of Science][Medline]
Sim KL, Creamer TP. Protein simple sequence conservation. Proteins (2004) 54:629–638.[CrossRef][Web of Science][Medline]
Thompson JD, et al. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. (1994) 22:4673–4680.
Ulbert S, et al. Direct membrane protein-DNA interactions required early in nuclear envelope assembly. J. Cell Biol. (2006) 173:469–476.
Wolfsberg TG, et al. ADAM a novel family of membrane proteins containing the disintegrin and metalloprotease domain: multipotential functions in cell-cell and cell-matrix interactions. J. Cell Biol (1995) 131:275–278.
Wootton JC. Sequences with unusualamino acid compositions. Curr. Opin. Struct. Biol. (1994) 4:413–421.[CrossRef][Web of Science]
Wootton JC, Federhen S. Analysis of compositionally biased regions in sequence databases. Meth. Enzymol. (1996) 266:554–571.[Web of Science][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
