Skip Navigation


Bioinformatics Advance Access originally published online on August 19, 2008
Bioinformatics 2008 24(19):2252-2253; doi:10.1093/bioinformatics/btn428
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/19/2252    most recent
btn428v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Moxon, S.
Right arrow Articles by Moulton, V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Moxon, S.
Right arrow Articles by Moulton, V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

A toolkit for analysing large-scale plant small RNA datasets

Simon Moxon 1,{dagger}, Frank Schwach 1,{dagger}, Tamas Dalmay 2, Dan MacLean 3, David J. Studholme 3 and Vincent Moulton 1,*

1School of Computing Sciences, 2School of Biological Sciences, University of East Anglia, Norwich, NR4 7TJ and 3The Sainsbury Laboratory, Colney Lane, Norwich, NR4 7UH, UK

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION OF THE...
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: Recent developments in high-throughput sequencing technologies have generated considerable demand for tools to analyse large datasets of small RNA sequences. Here, we describe a suite of web-based tools for processing plant small RNA datasets. Our tools can be used to identify micro RNAs and their targets, compare expression levels in sRNA loci, and find putative trans-acting siRNA loci.

Availability: The tools are freely available for use at http://srna-tools.cmp.uea.ac.uk

Contact: vincent.moulton{at}cmp.uea.ac.uk


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION OF THE...
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Several classes of small (20–30 nt) non-coding RNAs (sRNAs) can be distinguished by biogenesis and function in post-transcriptional gene regulation and epigenetic control in plants, animals and fungi (for reviews see: Brodersen and Voinnet, 2006; Lippman and Martienssen, 2004). Micro RNAs (miRNAs) and trans-acting siRNAs (ta-siRNAs) are two important classes of sRNAs that both induce post-transcriptional silencing of target genes. Computationally, miRNAs can be identified by their characteristic fold-back precursors, while ta-siRNA are found by a ‘phased’ alignment pattern at their genomic regions of origin (Axtell et al., 2006).

Novel high-throughput sequencing technologies greatly facilitate small RNA detection and analysis (Hafner et al., 2007). However, the lack of supporting data analysis tools presents a major bottleneck. Here, we present an easy-to-use web-based toolkit that is specifically geared towards the analysis of large-scale plant sRNA datasets. Plant specific tools are necessary due to important differences in the biogenesis and mode of action between plant and animal sRNAs (Millar and Waterhouse, 2005).


    2 DESCRIPTION OF THE TOOLS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION OF THE...
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 miRCat: miRNA detection
miRCat identifies mature miRNAs and their precursors. Users upload a FASTA file of sRNA sequences, which are mapped to a plant genome using PatMaN (Prüfer et al., 2008) and grouped into loci. To enrich for miRNA candidates, a number of empirical and published criteria for bona fide miRNA loci are applied by the software details listed on the tool's website (Jones-Rhoades et al., 2006). In brief, the program searches for a two-peak alignment pattern of sRNAs on one strand of the locus and assesses the secondary structures of a series of putative precursor transcripts using the RNAfold (Hofacker et al., 1994) and randfold (Bonnet et al., 2004) programs. As a result, miRCat produces three files: (i) a comma-separated text (csv) file with the details for predicted miRNA candidates, (ii) the RNAfold output for candidate precursors and (iii) a FASTA file of predicted mature miRNA sequences. miRCat has been tested on several high-throughput plant sRNA datasets and shows a high level of sensitivity and specificity. When tested on a publicly available Arabidopsis leaf sRNA dataset (GEO accession GSM118373 [NCBI GEO] ; Rajagopalan et al., 2006) containing 186 899 sRNA sequences, miRCat predicted 89 miRNA loci using default parameters. Eighty-three of these predictions were known miRNA sequences and 6 novel miRNA loci were predicted (Fig. 1a). There were 91 known miRNA loci with an sRNA abundance of five or more (default threshold for miRCat) in the dataset. This shows 91.2% sensitivity and, even if all novel predictions would have been false positives, this would give a specificity of 99.93% (8362 loci tested). As a web-based tool, miRCat complements related software developed for local installation and command line use, such as a recently published program for discovering miRNAs in animal datasets (Friedländer et al., 2008).


Figure 1
View larger version (6K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Example results: (a) secondary structure plot of a putative novel miRNA identified by miRCat in a publicly available dataset (Rajagopalan et al., 2006). MiRNA/miRNA* highlighted in red/purple using the RNAfold/annotate tool on our website. (b) The top-ranking differentially expressed locus found by SiLoCo in a comparison of two public Arabidopsis sRNA datasets leaf and flower tissue; (Rajagopalan et al., 2006) shown in the ASRP genome browser (Backman et al., 2008). Identified sRNA locus highlighted in yellow, sRNA matches shown by coloured arrows. Genome browser tracks: genes (top), leaf sRNAs (middle), flower sRNAs (bottom).

 
2.2 SiLoCo: sRNA locus expression comparison
High-throughput sequencing can be used to compare sRNA expression profiles under varying conditions or between mutants and wild-type to gain insights into the biogenesis and function of sRNAs. Plant sRNA populations are highly complex with many genomic loci producing highly diverse sRNA populations. In such cases, individual sequences may not be found more than once even in very large datasets, thus making it necessary to group sRNAs by their locus of origin in the genome and compare expression levels on a locus, rather than individual sequence levels. Such an approach also needs to take into account the degree of repetitiveness of sRNA matches to the genome. SiLoCo identifies sRNA loci on plant genomes from two sRNA datasets, which can be uploaded by the user and/or selected from publicly available datasets. SiLoCo maps sRNA sequences to the genome using PatMaN (Prüfer et al., 2008) and weighs each sRNA hit by its repetitiveness in the genome. Loci are defined as described previously (Mosher et al., 2008; Molnár et al., 2007) by a minimum number of sRNA hits to a region and a maximum ‘gap’, i.e. absence of sRNA hits, between them. Hit counts are normalized to the total number of genome-matching reads in each sample to make them comparable. For each locus, the log2 ratio and the average of the normalized sRNA hit counts are calculated and ranked independently. A sum of the two ranks is also provided and the results can be downloaded as a csv-formatted file. Sorting the list of loci by the rank sum in a spreadsheet program is an easy way of finding the best candidates for differentially expressed loci, where sRNA abundance differs greatly at a high overall expression level (Fig. 1b). Hyperlinks to some public genome browsers can also be included in the result file.

2.3 ta-siRNA prediction
ta-siRNAs are produced from a double-stranded RNA molecule. Alignments of ta-siRNAs to their region of origin exhibit a characteristic ‘phased’ pattern (Axtell et al., 2006) that can be identified computationally. Our tool is a web-based implementation of an algorithm proposed by Chen et al. (2007) for calculating the probability of obtaining the observed percentage (or more) of phased sRNA matches by chance. An adjustable P-value cutoff is used to filter for loci with a significant degree of 21 nt phasing. Results are downloadable as a csv file. A test run with a publicly available Arabidopsis dataset (Rajagopalan et al., 2006) returned eight candidate loci, including four known ta-siRNA loci and three phased loci also reported by Chen et al. (2007).

2.4 Helper tools
We provide a web tool to find target transcripts of sRNAs based on published rules for plant miRNAs (Allen et al., 2005; Schwab et al., 2005). This tool allows batch searching of up to 50 sRNAs against 20 different plant gene datasets. In addition, we provide an interface to the RNAfold/RNAplot programs (Hofacker et al., 1994) that allows the visualization of miRNA candidates. This tool accepts a precursor RNA and sRNA sequences which are highlighted on the resulting secondary structure (Fig. 1a).


    3 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION OF THE...
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
High-throughput sRNA sequencing has great potential to identify new members of known sRNA classes, especially in tissues or under environmental conditions that have not been investigated yet. The technology can also be used to compare sRNA profiles, thus gaining further insights into sRNA biogenesis and function. Our tools are ideally suited for these types of analyses on plant sRNA data and are easy to use.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION OF THE...
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The authors wish to thank D.C. Baulcombe and K. Kelly for helpful ideas and discussions and A. Courtenay, C. Collins and M. Burrell for IT support.

Funding: This work was supported by the Biotechnology and Biological Sciences Research Council [grant number BB/E004091/1] and the Gatsby Charitable Foundation (to D.M. and D.J.S.).

Conflict of Interest: none declared.


    FOOTNOTES
 
{dagger}The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. Back

Associate Editor: Ivo Hofacker

Received on May 27, 2008; revised on July 14, 2008; accepted on August 10, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION OF THE...
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Allen E, et al. microRNA-directed phasing during transacting siRNA biogenesis in plants. Cell (2005) 121:207–221.[CrossRef][Web of Science][Medline]

    Axtell MJ, et al. A two-hit trigger for siRNA biogenesis in plants. Cell (2006) 127:565–577.[CrossRef][Web of Science][Medline]

    Backman TW, et al. Update of ASRP: the Arabidopsis small RNA project database. Nucleic Acids Res (2008) 36:D982–D985.[Abstract/Free Full Text]

    Brodersen P, Voinnet O. The diversity of RNA silencing pathways in plants. Trends Genet (2006) 22:268–280.[CrossRef][Web of Science][Medline]

    Bonnet E, et al. Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences. Bioinformatics (2004) 20:2911–2917.[Abstract/Free Full Text]

    Chen H, et al. Bioinformatic prediction and experimental validation of a microRNA-directed tandem trans-acting siRNA cascade in Arabidopsis. Proc. Natl Acad. Sci. USA (2007) 104:3318–3323.[Abstract/Free Full Text]

    Friedländer MR, et al. Discovering microRNAs from deep sequencing data using miRDeep. Nat. Biotechnol (2008) 26:407–415.[CrossRef][Web of Science][Medline]

    Hafner M, et al. Identification of microRNAs and other small regulatory RNAs using cDNA library sequencing. Methods (2007) 44:3–12.[CrossRef][Web of Science]

    Hofacker IL, et al. Fast folding and comparison of RNA secondary structures. Monatsh. Chem (1994) 125:167–188.[CrossRef]

    Jones-Rhoades MW, et al. MicroRNAs and their regulatory roles in plants. Annu. Rev. Plant Biol (2006) 57:19–53.[CrossRef][Medline]

    Lippman Z, Martienssen R. The role of RNA interference in heterochromatic silencing. Nature (2004) 431:364–370.[CrossRef][Web of Science][Medline]

    Millar A, Waterhouse PM. Plant and animal microRNAs: similarities and differences. Funct. Integr. Genomics (2005) 5:129–135.[CrossRef][Medline]

    Molnár A, et al. miRNAs control gene expression in the single-cell alga Chlamydomonas reinhardtii. Nature (2007) 447:1126–1129.[CrossRef][Web of Science][Medline]

    Mosher RA, et al. PolIVb influences RNA-directed DNA methylation independently of its role in siRNA biogenesis. Proc. Natl Acad. Sci. USA (2008) 105:3145–3150.[Abstract/Free Full Text]

    Prüfer K, et al. PatMaN: rapid alignment of short sequences to large databases. Bioinformatics (2008) 24:1530–1531.[Abstract/Free Full Text]

    Rajagopalan R, et al. A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev (2006) 20:3407–3425.[Abstract/Free Full Text]

    Schwab R, et al. Specific effects of microRNAs on the plant transcriptome. Dev. Cell (2005) 8:517–527.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief Funct Genomic ProteomicHome page
F. Schwach, S. Moxon, V. Moulton, and T. Dalmay
Deciphering the diversity of small RNAs in plants: the long and short of it
Brief Funct Genomic Proteomic, November 1, 2009; 8(6): 472 - 481.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
D. S. Horner, G. Pavesi, T. Castrignano, P. D. De Meo, S. Liuni, M. Sammeth, E. Picardi, and G. Pesole
Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing
Brief Bioinform, October 27, 2009; (2009) bbp046v1.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
C. J. Creighton, J. G. Reid, and P. H. Gunaratne
Expression profiling of microRNAs by deep sequencing
Brief Bioinform, September 1, 2009; 10(5): 490 - 497.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
B. D. Pant, M. Musialak-Lange, P. Nuc, P. May, A. Buhtz, J. Kehr, D. Walther, and W.-R. Scheible
Identification of Nutrient-Responsive Arabidopsis and Rapeseed MicroRNAs by Comprehensive Real-Time Polymerase Chain Reaction Profiling and Small RNA Sequencing
Plant Physiology, July 1, 2009; 150(3): 1541 - 1555.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/19/2252    most recent
btn428v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Moxon, S.
Right arrow Articles by Moulton, V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Moxon, S.
Right arrow Articles by Moulton, V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?