Bioinformatics Advance Access originally published online on September 17, 2004
Bioinformatics 2005 21(2):257-259; doi:10.1093/bioinformatics/bth489
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics vol. 21 issue 2 © Oxford University Press 2005; all rights reserved.
RALEERNA ALignment Editor in Emacs
The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus, Hinxton, CAMBS, CB10 1SA, UK
| Abstract |
|---|
|
|
|---|
Summary: Production of high quality multiple sequence alignments of structured RNAs relies on an iterative combination of manual editing and structure prediction. An essential feature of an RNA alignment editor is the facility to mark-up the alignment based on how it matches a given secondary structure prediction, but few available alignment editors offer such a feature. The RALEE (RNA ALignment Editor in Emacs) tool provides a simple environment for RNA multiple sequence alignment editing, including structure-specific colour schemes, utilizing helper applications for structure prediction and many more conventional editing functions. This is accomplished by extending the commonly used text editor, Emacs, which is available for Linux, most UNIX systems, Windows and Mac OS.
Availability: The ELISP source code for RALEE is freely available from http://www.sanger.ac.uk/Users/sgj/ralee/ along with documentation and examples.
Contact: sgj{at}sanger.ac.uk
| INTRODUCTION |
|---|
|
|
|---|
Non-coding RNA (ncRNA) genes often produce structured RNA products, some of the best known of which are involved in essential ribonucleoprotein complexes, such as the ribosome and the spliceosome. Such structured RNAs are often poorly conserved in sequence, but conserve a secondary structure with patterns of base covariation. This covariation forms the basis of several algorithms for de novo prediction of ncRNA genes (Rivas and Eddy, 2001; di Bernardo et al., 2003). Statistical models incorporating both sequence and structure information [termed covariance models, or stochastic context free grammars (Eddy, 2002)] have recently allowed the Rfam database of ncRNA families to be established (http://www.sanger.ac.uk/Software/Rfam/, Griffiths-Jones et al., 2003).
Computational alignment of ncRNAs is a challenging problem, because the correct alignment is often not evident without knowledge of the secondary structure. However, the best secondary structure predictions rely on comparative analysis of good multiple sequence alignments. Algorithms that align sequence and structure simultaneously are starting to emerge (http://dart.sourceforge.net/stemloc/, Holmes and Rubin, 2002; Gorodkin et al., 1997; Mathews and Turner, 2002), but are in their infancy and are often prohibitively expensive in both time and memory. Production of high quality alignments of structured RNAs is thus a laborious and iterative process of manual alignment and structure prediction.
Several excellent multiple sequence alignment editors are available, including BioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html), GeneDoc (http://www.psc.edu/biomed/genedoc), DCSE (http://rrna.uia.ac.be/dcse/) and Seaview (http://pbil.univ-lyon1.fr/software/seaview.html, Galtier et al., 1996). Of particular note, Jalview (http://www.jalview.org/, Clamp et al., 2004) provides extensive functionality for editing alignments of both proteins and nucleic acids. However, few editors cater specifically to the problem of aligning structured RNAs. A simple but effective solution to the problem is presented here, using ELISP extensions to the widely used, multi-platform text editor, Emacs (http://www.gnu.org/software/emacs/).
| FEATURES |
|---|
|
|
|---|
The primary requirement of an RNA alignment editing tool is to be able to mark-up the alignment based on the prediction of its consensus secondary structure. Such annotation, in the form of a structure-based colouring scheme, allows the user to quickly and intuitively identify regions of the alignment that do not match the structure well, and thus to refine both the alignment and structure manually. RALEE (RNA ALignment Editor in Emacs) provides such a colouring scheme (shown in Fig. 1), as well as allowing the user to colour the alignment more conventionally by conservation or residue identity. Other RNA specific features of RALEE include the ability to integrate secondary structure predictions of arbitrary sequences in the alignment (using an external package such as ViennaRNAhttp://www.tbi.univie.ac.at/~ivo/RNA/, Hofacker, 2003), and to test how the alignment matches the new structure prediction. Helper applications also allow the user to view depictions of predicted secondary structures. Standard alignment editing methods, such as insertion and deletion of whole columns of gaps, trimming the alignment at either end and removing columns that contain only gaps, are also accessible through user-customized control character combinations, or by using the menus provided. A split-screen mode facilitates the viewing and editing of base-paired regions that may be far apart in sequence (see Fig. 1).
|
Using an available and well-used text editor such as Emacs as the basis for the RALEE tool has a number of advantages:
- Many core features are already available and well tested, including file handling, cut and paste, deep undo and menus. Development can therefore concentrate on useful user-driven features.
- The interface is familiar to a large user base. RALEE additions conform to the Emacs look and feel.
- Emacs is available for a wide-range of platforms, including GNU/Linux, most UNIX systems, Windows and Mac OS.
| REQUIREMENTS |
|---|
|
|
|---|
RALEE extends GNU Emacs 21 (http://www.gnu.org/software/emacs/) and the vast majority of the functionality is also compatible with XEmacs 21 (http://www.xemacs.org/). If available, the ViennaRNA package (http://www.tbi.univie.ac.at/~ivo/RNA/, Hofacker, 2003) can be used as a helper application for on-the-fly structure prediction and display. RALEE at present reads alignments in Stockholm format (http://www.cgr.ki.se/cgb/groups/sonnhammer/Stockholm.html), which is the format in which the Rfam database distributes alignments of RNA families (http://www.sanger.ac.uk/Software/Rfam/). Future development should allow import and export of alignments in a variety of formats, as well as the facility to handle mark-up of pseudoknot interactions. RALEE is being used actively by Rfam curators to improve the quality of alignments in the database.
| Acknowledgments |
|---|
I would like to thank Simon Moxon, Alex Bateman, Gayle McEwan, Tobias Mourier and Ashwin Hajarnavis for testing features and providing feedback. S.G.-J. is funded by the Wellcome Trust.
Received on July 1, 2004; revised on August 16, 2004; accepted on August 17, 2004
| REFERENCES |
|---|
|
|
|---|
Clamp, M., Cuff, J., Searle, S.M., Barton, G.J. (2004) The Jalview Java alignment editor. Bioinformatics, 20, 426427
di Bernardo, D., Down, T., Hubbard, T. (2003) ddbRNA: detection of conserved secondary structures in multiple alignments. Bioinformatics, 19, 16061611
Eddy, S.R. (2002) A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics, 3, 18[CrossRef][Medline].
Galtier, N., Gouy, M., Gautier, C. (1996) SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. Comput. Appl. Biosci., 12, 543548
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., Eddy, S.R. (2003) Rfam: an RNA family database. Nucleic Acids Res., 31, 439441
Gorodkin, J., Heyer, L.J., Stormo, G.D. (1997) Finding the most significant common sequence and structure motifs in a set of RNA sequences. Nucleic Acids Res., 25, 37243732
Hofacker, I.L. (2003) Vienna RNA secondary structure server. Nucleic Acids Res., 31, 34293431
Holmes, I. and Rubin, G.M. (2002) Pairwise RNA structure comparison using stochastic context-free grammars. Proceedings, Pacific Symposium on Biocomputing, 163174.
Mathews, D.H. and Turner, D.H. (2002) Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol., 317, 191203[CrossRef][Web of Science][Medline].
Rivas, E. and Eddy, S.R. (2001) Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics, 2, 8[CrossRef][Medline].
This article has been cited by other articles:
![]() |
M. Marz, A. Donath, N. Verstraete, V. T. Nguyen, P. F. Stadler, and O. Bensaude Evolution of 7SK RNA and Its Protein Partners in Metazoa Mol. Biol. Evol., December 1, 2009; 26(12): 2821 - 2830. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Mosig, L. Zhu, and P. F. Stadler Customized strategies for discovering distant ncRNA homologs Brief Funct Genomic Proteomic, November 1, 2009; 8(6): 451 - 460. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. F. Stadler, J. J.-L. Chen, J. Hackermuller, S. Hoffmann, F. Horn, P. Khaitovich, A. K. Kretzschmar, A. Mosig, S. J. Prohaska, X. Qi, et al. Evolution of Vault RNAs Mol. Biol. Evol., September 1, 2009; 26(9): 1975 - 1991. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hertel, D. de Jong, M. Marz, D. Rose, H. Tafer, A. Tanzer, B. Schierwater, and P. F. Stadler Non-coding RNA annotation of the genome of Trichoplax adhaerens Nucleic Acids Res., April 1, 2009; 37(5): 1602 - 1615. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Abraham, O. Dror, R. Nussinov, and H. J. Wolfson Analysis and classification of RNA tertiary structures RNA, November 1, 2008; 14(11): 2274 - 2289. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Gruber, C. Kilgus, A. Mosig, I. L. Hofacker, W. Hennig, and P. F. Stadler Arthropod 7SK RNA Mol. Biol. Evol., September 1, 2008; 25(9): 1923 - 1930. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. R. Bendana and I. H. Holmes Colorstock, SScolor, Raton: RNA alignment visualization tools Bioinformatics, February 15, 2008; 24(4): 579 - 580. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. M. Meyer A practical guide to the art of RNA gene prediction Brief Bioinform, November 1, 2007; 8(6): 396 - 414. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Andersen, A. Lind-Thomsen, B. Knudsen, S. E. Kristensen, J. H. Havgaard, E. Torarinsson, N. Larsen, C. Zwieb, P. Sestoft, J. Kjems, et al. Semiautomated improvement of RNA alignments RNA, November 1, 2007; 13(11): 1850 - 1859. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Weinberg, J. E. Barrick, Z. Yao, A. Roth, J. N. Kim, J. Gore, J. X. Wang, E. R. Lee, K. F. Block, N. Sudarsan, et al. Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline Nucleic Acids Res., July 9, 2007; (2007) gkm487v1. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






