Bioinformatics Advance Access originally published online on October 29, 2008
Bioinformatics 2008 24(24):2807-2813; doi:10.1093/bioinformatics/btn560
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Small RNA gene identification and mRNA target predictions in bacteria
1Unité Pathogénie Bactérienne des Muqueuses, Institut Pasteur, 25-28 Rue du Docteur Roux, 75724, Paris and 2Inserm U835, UPRES JE2311, Laboratoire de Biochimie Pharmaceutique, Université de Rennes 1, 2 Av. du Pr. Léon Bernard 35043 Rennes, France
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Bacterial small ribonucleic acids (sRNAs) that are not ribosomal and transfer or messenger RNAs were initially identified in the sixties, whereas their molecular functions are still under active investigation today. It is now widely accepted that most play central roles in gene expression regulation in response to environmental changes. Interestingly, some are also implicated in bacterial virulence. Functional studies revealed that a large subset of these sRNAs act by an antisense mechanism thanks to pairing interactions with dedicated mRNA targets, usually around their translation start sites, to modulate gene expression at the posttranscriptional level. Some sRNAs modulate protein activity or mimic the structure of other macromolecules. In the last few years, in silico methods have been developed to detect more bacterial sRNAs. Among these, computational analyses of the bacterial genomes by comparative genomics have predicted the existence of a plethora of sRNAs, some that were confirmed to be expressed in vivo. The prediction accuracy of these computational tools is highly variable and can be perfectible. Here we review the computational studies that have contributed to detecting the sRNA gene and mRNA targets in bacteria and the methods for their experimental testing. In addition, the remaining challenges are discussed.
Contact: bfelden{at}univ-rennes1.fr
| 1 INTRODUCTION |
|---|
|
|
|---|
Small ribonucleic acids (sRNAs) function directly at RNA level in cells in all three domains of life. sRNAs are also called nonconventional, noncoding, functional or regulatory RNAs but are not messenger (mRNA), transfer (tRNA) or ribosomal (rRNA) RNAs. In recent years, many computational strategies have identified a wealth of sRNA candidates in various organisms, from bacteria (Vogel and Sharma, 2005) to humans (Washietl et al., 2005b). Some were subsequently validated experimentally and considered as novel sRNAs [see Gottesman et al. (2006) for the sRNAs in bacteria]. They possess tremendous heterogeneity in sizes (from
50- to 600-nt long), structures and functions, and are usually not translated into proteins (there are some exceptions). In the bacterial metabolism, sRNAs are involved in a growing number of regulatory pathways in response to environmental changes. Until a decade ago, only a handful of bacterial sRNAs were known, but post-genomic investigations have revealed, unexpectedly, their abundance in many bacterial species (Argaman et al., 2001; Axmann et al., 2005; Ostberg et al., 2004; Pichon and Felden, 2005; Rivas and Eddy, 2001; Wassarman et al., 2001). The field of bacterial sRNA regulations has recently received a boost of attention, thanks to the availability of many sequenced bacterial genomes and is a fast, exciting and stimulating area of research. sRNAs have been most intensively tracked in Escherichia coli. To date, the majority of the prokaryotic sRNAs annotated in the available databases [e.g. the Rfam database, Griffiths-Jones et al. (2005)], were identified in this bacterium. Nowadays, the quest for new sRNAs among diverse bacteria is very intense. For bacterial genes encoding proteins, sophisticated algorithms were developed that have a strong accuracy in predicting coding sequences (CS) (Kang et al., 2007). The use of computational methods to discover novel bacterial sRNAs, however, is a difficult task because (i) they usually do not contain recurrent nucleotide motifs (biased word occurrences) as the ribosome binding sites or the codons specifying the 20 amino acids for the protein CS, (ii) are generally small, (iii) most are only conserved among closely related bacterial species and sometimes and (iv) only expressed in pathotype-specific strains of the same bacterial species. Powerful general methods based on the statistical analysis of genomic sequences, RNA structure similarity searches and comparative genomics allow searching for novel sRNAs in most sequenced bacterial genomes and they will be reviewed here. Combining the prediction of promoters and terminators, base composition statistics, genome annotations, sequence and structure conservation suggest the existence of novel sRNA genes that have to be subjected to experimental testing (Fig. 1). The effectiveness of all the reported computational methods, however, is still perfectible suggesting that new computational approaches have to be developed. Nevertheless, in the last few years, the number of validated sRNAs has increased tremendously and thousands of sRNA encoding gene candidates have yet to be verified experimentally. This recent success in detecting novel sRNAs has come from bioinformatic searches for sRNAs outside of E.coli.
|
The majority of the recently discovered bacterial sRNAs is only conserved among closely related species. A significant portion of the sRNAs characterized to date interact with dedicated mRNAs targets at and around their translation start sites, affecting their stability and/or translation. Trans-encoded sRNAs are encoded at genomic locations distant from mRNA-encoding genes they regulate. The mRNA targets of each of these sRNA regulators are, for the most part, currently unknown. To add more complexity to the problem, usually each sRNA is thought to regulate the expression of more than one mRNA. As a specific example, the interactions between RNAIII from Staphylococcus aureus and its mRNA targets involve several structural domains from its 3'-domain via one or two loop–loop interactions with its mRNA targets (Boisset et al., 2007). Based on structural probing, the RNAIII–mRNA pairings are imperfect and contain mismatches with a few nucleotides being unpaired. Therefore, predicting such sRNA–mRNA interactions by computational studies is a difficult task. Based on experimentally validated sRNA–mRNA interactions, algorithms have been developed, predicting putative interaction sites and proposing candidate mRNA targets (Mandin et al., 2007; Rehmsmeier et al., 2004; Tjaden et al., 2006), but their success rates are perfectible. Another growing subset of functionally characterized sRNAs corresponds to the protein sequestrators, as for example the 6S RNA that regulates complex formation between the
70 promoter and the
70–RNA polymerase complex. There are currently no general in silico methods to predict the existence of new sequestrator-like sRNAs. Excellent reviews summarize the identification of sRNAs by biochemical approaches (Altuvia, 2007; Hüttenhofer and Vogel, 2006) and mRNA targets (Vogel and Wagner, 2007) in E.coli and other species (Livny and Waldor, 2007). Here, the primary focus is on predictive bioinformatic searches for sRNAs and mRNA targets in bacterial species, with their accompanying experimental testing methods. | 2 DETECTING SMALL RNA ENCODING GENES THROUGH BIOINFORMATICS |
|---|
|
|
|---|
The methods are either based on the RNA primary sequences, RNA secondary structures or depend upon the analysis, at a larger scale, on the comparison of phylogenetically related bacterial genomic sequences. Recently reported methods are a combination of the latter (Fig. 2, intersections).
|
2.1 Base-composition statistics
The hypothesis that genomic regions rich in sRNAs can be identified using local variations in single-base and dinucleotide statistics has been investigated for some prokaryotic species (Carter et al., 2001; Pichon and Felden, 2003; Rivas and Eddy, 2000; Schattner, 2002). In bacteria, the GC content of the whole genome is not correlated with the optimal growth temperature of the living organism but, however, the GC content of structural RNAs is a selective response to high temperature, being higher for the micro-organisms living in hot environments (Hurst and Merchant, 2001). In consequence, genome variations in base composition display an elevated percentage of GC in structural RNA sequences (e.g. tRNAs) than in mRNAs. Computing a local GC content curve, however, is not sufficient by itself to automatically detect features such as sRNAs. In a first attempt to find sRNAs by RNA secondary structures analyses in an entire genome (Rivas and Eddy, 2000), a base composition model, formally a stochastic regular grammar with one state, was used to filter non-sRNA sequences by calculating the local GC content of the genomic sequence. Unfortunately, no experimental validations were performed. This approach can be used to detect sRNA genes in the AT-rich genomes, as demonstrated for Pyrococcus furiosus and Methanococcus jannaschii, two AT-rich archaeas (Klein et al., 2002). Analyzing the local GC content of genome sequences has already shown its effectiveness for a low GC firmicute, S.aureus (Pichon and Felden, 2005). The expression of all sRNA candidates had been tested by northern blot experiments for all previous cited bacteria, showing that kind of methodology-enhanced discovery of new sRNA genes. A combination of comparative genomics, gene structure prediction and local GC calculations is described for sRNA gene identifications, using the ISI software (Pichon and Felden, 2003). ISI creates a local GC content curve by calculating the GC percentage of the sequences within a small window that moves along the genomic sequence. Other base composition statistics are performed by a recent version of ISI (C. Pichon et al., unpublished data). The analysis of the GC content is useful for sRNA gene detection in bacteria especially for sRNAs that are subjected to structural constraints that influence their dinucleotide compositions. In 2002, Schattner combined local GC, GC skew and dinucleotide frequencies to eliminate false positive results, detecting sRNA candidates in the M.jannaschii genome (Schattner, 2002). A combining of base composition statistics with RNA secondary structure predictions (Yachie et al., 2006) or with neural networks (Carter et al., 2001) has been proposed but their usefulness is difficult to assess due to the lack of experimental testing of the detected candidates.
2.2 RNA structure similarity searches
2.2.1 Algorithms to detect RNA structural patterns
tRNAs are highly structured RNAs acting as amino acid donors during protein synthesis. With the exception of the CCA 3' -end, the primary sequences of tRNAs vary, whereas their secondary and tertiary structures are well conserved and fold, respectively, as cloverleaves and L-shapes. These conserved structural characteristics have allowed the design of specific bioinformatic software for tRNA identification in genomic sequences. For example, tRNAscan detects genes containing a cloverleaf structure with variable lengths for the stems and loops. To reduce false positives, the training is performed on a set of known tRNA genes (Fichant and Burks, 1991). Combining several tRNA genefinders leads to the design of tRNAscan-SE that detects tRNA genes with a
100% success rate (Lowe and Eddy, 1997). These approaches were also utilized to detect sRNAs including the 4.5S RNA (Regalia et al., 2002) and tRNA–mRNA (Laslett and Canback, 2004), with an increased difficulty since such RNAs have longer, more variable and intricate structures. Such computational approaches, however, are only applicable in the case of a known sRNA secondary structure. In a more general case, RNAMotif (Lambert et al., 2004; Macke et al., 2001); interprets a user-defined file in which the RNA structure template is described. This software can, in theory, describe and search for any RNA structural element. In addition, it provides a valuable scoring system due to its flexibility for the user who can implement his/her own values.
2.2.2 Algorithms based on RNA secondary structure predictions
Bacterial sRNA secondary structures can be predicted by algorithms based on computed global minimum free energy (Zuker, 1989), that should be validated experimentally using structural probes. Unfortunately, secondary structure prediction alone is not sufficient for the detection of bacterial sRNAs (Rivas and Eddy, 2000). These algorithms assume that RNAs for which the native state (minimum free energy secondary structure) is functionally important, all have lower folding energy than random RNAs of the same length and dinucleotide frequency (Clote et al., 2005). These tools are useful during an initial screening but should be combined with additional tools searching for RNA structure conservations within organisms which are phylogenetically related. Thermodynamic methods based on free energy (
G
) minimization, the identification of conserved structural motifs and the use of sequence covariance between species which are phylogenetically related, are accurate tools for sRNA detection (Babak et al., 2007; Coventry et al., 2004; di Bernardo et al., 2003; Gruber et al., 2007; Rivas and Eddy, 2001; Washietl et al., 2005a). These studies describe various algorithms and scoring systems for the prediction candidates but, unfortunately, some are restricted to the intergenic regions (IGRs) of the genomes, with no experimental testing of the predictions. Comparative genomics between related species, in combination with RNA structure prediction, are considered as the more effective methods to predict the existence of novel bacterial sRNAs (Pichon and Felden, 2005).
More recent developments (Gruber et al., 2007; Uzilov et al., 2006) are promising because they mix RNA structure identification methods and comparative genomics.
2.3 Comparative genomics
The use of sequence homology between closely related species to identify novel sRNA genes in IGRs has initiated a new era in sRNA identification, initially in E.coli (Argaman et al., 2001; Rivas et al., 2001; Wassarman et al., 2001) and more recently in many other bacterial species [for recent works, see Silvaggi et al. (2006) for searches in Bacillus subtilis and del Val et al. (2007) for searches in S.meliloti genomes]. Parts of the nucleotide sequences and secondary structures of the sRNA genes are conserved between closely phylogenetically related species (a few sRNAs, however, are present in all the bacterial species), sometimes even presumably required for function, thanks to compensatory mutations.
QRNA and RNAz are computational tools for sRNA gene predictions based on comparative sequence analyses and structure prediction (Rivas and Eddy, 2001; Washietl et al., 2005a). The QRNA software is one of the first attempts to combined RNA structure prediction and comparative genomics. QRNA is based on three probabilistic models, one for detecting coding regions, another for locating putative sRNA loci and a null hypothesis model. The protein genes are detected based on the variation of the third base of the codon triplets when sequences are compared, whereas the first two nucleotides of the codon are conserved. The sRNA probabilistic model was designed to detect covariances in stem-loop structures by implementing a stochastic context free grammar (SCFG). The third model detects mutations that can occur at single positions within a nucleotide sequence. QRNA, however, is only able to compare two nucleotide sequences.
Instead of QRNA in which the RNA structure prediction is based on a probabilistic model, RNAz computes a global RNA consensus secondary structure with the Vienna software package and identifies covariations from the multiple alignments (for details, see Washietl et al., 2005a). Together with primary sequence and RNA secondary structure homologies, the search for putative transcription signals, including the promoters and the intrinsic transcription terminators, and the GC content within bacterial IGRs, especially for the AT-rich bacterial genomes, were of particular interest for selecting a subset of candidate IGRs. Algorithms that have combined these selection criteria have become available (ISI, Pichon and Felden, 2003, sRNAPredict2, Livny et al., 2005). These powerful methods have allowed to identify novel sRNAs in many bacterial species, including Gram-positive bacteria. The current limitations of these approaches are often the small number of phylogenetically related sequences of the bacterium under study and the difficulty in extending the search for sRNA-encoding genes into protein-encoding regions. Also, comparative genomics do not often allow to detect sRNA genes that are specific to a given bacterial strain, but only the core RNome of a species.
Many sRNA genefinder algorithms have evaluated their detection performances by counting the known sRNAs they could detect in a given micro-organism. However, they systematically test a different subset of sRNAs. Therefore, the rated values are not comparable and the assessment of the various in silico methods is difficult to perform.
| 3 COMBINING PREDICTIVE BIOINFORMATIC SEARCHES WITH EXPERIMENTAL RNOMICS |
|---|
|
|
|---|
Initially, a few bacterial sRNA genes were discovered fortuitously, in the absence of any computational methods. Highly expressed bacterial sRNAs such as the 4.5S RNA (part of the secretion machinery), the 6S RNA which modulates RNA polymerase activity, the RNase P RNA that is the catalytic part of the ribozyme for 5' -end pre-tRNA maturation, tmRNA that releases eubacterial ribosomes stalled on defective mRNAs and the antisense regulator spot42 were all discovered in E.coli by metabolic labeling of total RNA and direct analysis by fractionations, together with a substantial amount of serendipity. These approaches favor the detection of the few abundant and stable sRNAs. Starting in 2000, computation and global tracking for sRNAs, based on sequence conservations, have suggested the presence of a plethora of novel sRNA genes (Argaman et al., 2001; Rivas and Eddy, 2000; Rivas et al., 2001; Wassarman et al., 2001). Most of the currently experimentally validated sRNAs were detected from these genome-wide analyses. From the huge amount of predicted bacterial sRNA genes, the in vivo expression was later confirmed for only a fraction of those. It suggests that computational tools dig out false positive candidates and also that, for some predictions, the experimental conditions that trigger their expressions has not been found yet.
Several methods are available to experimentally assay computer predictions (Fig. 3A). For the few sRNA genes that are highly expressed under specific conditions, their cellular expression can be directly verified after purification, size separation by denaturing PAGE, extraction, labeling and RNA sequencing by enzymes or chemicals (Pichon and Felden, 2005). Those expressed to lower levels can be detected by northern blots, using oligonucleotides complementary to the predicted transcribed DNA strand. An alternative strategy consists in generating cDNA libraries, isolating approximately from 50- to 500-nt long RNAs by size separation on denaturing PAGE, reverse transcribing into cDNA, shotgun cloning and sequencing the sRNAs (Vogel et al., 2003). This method implies powerful sequencing facilities and to be aware of the occurrence of many false positive sequences due to tRNAs and rRNAs degradations. Wider testing includes microarray analyses using high-density oligonucleotide probe arrays within each bacterial gene and IGRs, efficiently assaying various experimental conditions for high-throughput transcript detection (Tjaden et al., 2002), some that can be expressed to very low levels. Microarray analyses of transcripts from a given growth condition should reveal the existence of freestanding transcripts in places where no gene has been annotated. Positive signals revealed by fluorescent dyes or by antibodies detecting DNA–RNA hybrids, however, should be analyzed further by RACE (rapid amplification of cDNA ends) mapping to discriminate novel sRNA genes from mRNA leader sequences from neighboring genes. Moreover, the validation of the microarray data by independent methods, such as northern blots or RT quantitative PCR, is required. Reliable computational tools are essential for transcriptomic data analysis, including intensity normalization (Pichon and Felden, 2005; Tjaden et al., 2002). Elegant methods combine an initial round of immunoprecipitation of the extracted RNAs in complex with a known purified sRNA-binding protein, followed by the direct detection of the bound RNAs on genomic microarrays. The use of the association of a subset of E.coli sRNAs with the RNA chaperone Hfq is described earlier (Zhang et al., 2003).
|
| 4 DETECTING THE mRNA TARGETS OF BACTERIAL sRNAS |
|---|
|
|
|---|
Many bacterial sRNAs act as post-transcriptional regulators by base-pairing with target mRNAs (Storz and Gottesman, 2006). Unlike cis-encoded antisense sRNAs, trans-encoded sRNAs have imperfect and sometimes only short complementarities with their mRNA targets including non canonical pairings implying that their identifications, based on computational methods, is tricky. The starting point has to be all the experimentally supported sRNA–target mRNA interactions in bacteria. In the easiest cases, simple BlastN or Fasta3 searches could detect mRNA targets that were confirmed experimentally by genetic (Chen et al., 2004) or biochemical (Pichon and Felden, 2005) approaches. Dedicated algorithms for predicting the secondary structure of two interacting RNA molecules by means of free energy minimization have been proposed (Alkan et al., 2006). Another approach consists in calculating optimal hybridization scores between a sRNA and all the mRNAs from a given genome, focusing on the translational start sites, providing a list of candidate mRNAs (TargetRNA, Tjaden et al., 2006). An approach that was successfully applied on the micA-ompA regulation searches for sRNA complementarities in sequence windows containing translation initiation sites, allowing noncontiguous pairing, the candidate target being compared to reiterated searches in related bacteria (Udekwu et al., 2005). Another strategy uses a training set of validated sRNA–target mRNA pairs, and pairing energies between an sRNA and putative mRNA targets are calculated, maximized and the gene sequences are selected on their abilities to pair with the sRNA around the translation start and stop sites (Mandin et al., 2007). The RNAup software computes the probabilities that a sequence interval is unpaired, allowing to determine binding free energies of short oligomers to mRNA targets (Mückstein et al., 2006). Another software, RNA-hybrid, predicts multiple potential binding sites of sRNAs in target mRNAs, finding the most energetically favorable hybridization sites (Rehmsmeier et al., 2004).
The likelihood of detecting all the mRNA targets of a given sRNA by the above computational methods, however, is still low and perfectible since several known interactions failed detection using these tools. The intrinsic structure of the mRNA target should be systematically considered during sRNA–mRNA pairing predictions. Therefore, experimental strategies, either alone or in combination with bioinformatic approaches, are essential for mRNA target detections (Fig. 3B). Global experimental searches for mRNA targets of an sRNA can be performed by microarrays, either by affinity capture of target mRNAs by a biotinylated sRNA (Douchin et al., 2006) or by comparing the mRNA profiles of an sRNA-overproducing strain versus a control strain, using differential fluorescence labeling of the two cDNA pools (Massé et al., 2005). Also, cellular RNAs that co-immunoprecipitate with the Hfq protein on an Affymetrix K12 E.coli high-density oligonucleotide array have detected mRNA fragments that are putative mRNA targets of Hfq-associated sRNAs in E.coli (Zhang et al., 2003). sRNAs or sRNA–protein complexes can be used as bait for capturing target mRNAs by affinity purification (Antal et al., 2005). mRNA targets can also be detected by proteomic approaches, in comparing total protein extracts of a bacterial strain lacking or overexpressing a given sRNA against a wild-type strain (Udekwu et al., 2005). This is achieved by using 1D or 2D gels followed by identification of the proteins by mass spectrometry. When a given protein expression level is either reduced or increased when the sRNA is lacking or is overexpressed, the regulation can be either direct (interaction between the sRNA and the mRNA encoding the protein) or indirect, via additional regulators.
| 5 FUTURE PROSPECTS |
|---|
|
|
|---|
In this review, the available computational methods for the detection of sRNA genes and for the prediction of the mRNAs, whose expression levels are regulated by sRNAs acting through antisense pairing, are described. The most recent approaches for sRNA gene detection are a combination of several pre-existing independent methods, to increase their sensitivity and predictive potentials. A plethora of RNA secondary structure prediction methods are available, with some tested in combination with comparative genomic approaches (Pichon and Felden, 2003) or with statistical methods (Uzilov et al., 2006). While these strategies are interesting, their limitations come from extensive secondary structure variations for some bacterial sRNAs that escape identification using these methods (the structural homology is probably only detectable at the 3D level). The actual tendency to combine various approaches can be further extended, mixing comparative genomics, RNA structure, transcription unit and rho-independent terminator detections, and any other signatures specific of the sRNA-encoding genes.
In bacterial genomics, most algorithms were initially designed and applied to the Gram-negative E.coli bacterium, with serious limitations and the need for adjustments for their use on other genomes. Indeed, transcription promoters are highly variable among bacterial species and their DNA sequence consensus is unknown in most bacterial species. Also, E.coli is a mesophile and sRNA gene detection has to be adjusted in the case of GC- or AT-rich genomes. Another important limitation of the current studies is the restriction of the computational searches for novel sRNA genes located in the IGRs. Recent studies using a hidden Markov model (Yachie et al., 2006) enabled identification of sRNAs in protein-coding regions but their efficiencies should be improved. The bacterial sRNAs partially or entirely overlapping protein-coding genes on the opposite DNA strand escape from being detected by the current tools: an interesting but difficult challenge would be to discriminate sequence conservations depending upon the presence of a protein (or polypeptide) coding sequence from conservations due to the presence of an sRNA gene.
Quite a few recent scientific reports describe novel sRNA genefinders and their validations consist in counting the number of previously known sRNAs that their tools are able to detect, with no detections of new sRNAs, limiting the interest of these new tools for biologists. Each of the existing sRNA genefinders, however, is unable to identify all the experimentally validated E.coli sRNA genes, indicating that all these in silico methods are perfectible and also that their predictions have to be systematically tested experimentally. A substantial number of proteins are closely associated with bacterial sRNA function and structures [see Pichon and Felden (2007) for a review]. Among them, a growing class of sRNAs acting as protein sequestrators (Babitzke and Romeo, 2007), trapping and therefore controlling the activity of regulatory proteins is currently only detected by experimental methods. The computational predictions of sRNA–Protein interactions are one of the next challenges.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We would like to thank E. Westhof (IMBC, Strasbourg) and C. Le Bouguénec (Institut Pasteur, Paris) for critical reading of the article.
Funding: ANR grant (ERA-NET Pathogenomics Project to C.P.) Deciphering the intersection of commensal and extra intestinal pathogenic E.Coli; ACI (BCMS 136 to B.F.); ANR grant (program MIME to B.F.).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Tracy Knight
Received on July 9, 2008; revised on October 25, 2008; accepted on October 25, 2008
| REFERENCES |
|---|
|
|
|---|
Alkan C, et al. RNA-RNA interaction prediction and antisense RNA target search. J. Comput. Biol. (2006) 13:267–282.[CrossRef][Web of Science][Medline]
Altuvia S. Identification of bacterial small non-coding RNAs: experimental approaches. Curr. Opin. Microbiol. (2007) 10:257–261.[CrossRef][Web of Science][Medline]
Antal M, et al. A small bacterial RNA regulates a putative ABC transporter. J. Biol. Chem. (2005) 280:7901–7908.
Argaman L, et al. Novel small RNA-encoding genes in the intergenic regions of Escherichia coli. Curr. Biol. (2001) 11:941–950.[CrossRef][Web of Science][Medline]
Axmann I, et al. Identification of cyanobacterial non-coding RNAs by comparative genome analysis. Genome Biol (2005) 6:R73.[CrossRef][Medline]
Babak T, et al. Considerations in the identification of functional RNA structural elements in genomic alignments. Bioinformatics (2007) 8:33.[CrossRef][Medline]
Babitzke P, Romeo T. CsrB sRNA family: sequestration of RNA-binding regulatory proteins. Curr. Opin. Microbiol (2007) 10:156–163.[CrossRef][Web of Science][Medline]
Boisset S, et al. Staphylococcus aureus RNAIII coordinately represses the synthesis of virulence factors and the transcription regulator Rot by an antisense mechanism. Genes Dev (2007) 21:1353–1366.
Carter R, et al. A computational approach to identify genes for functional RNAs in genomic sequences. Nucleic Acids Res. (2001) 29:3928–3938.
Chen S, et al. A bioinformatics based approach to discover small RNA genes in the Escherichia coli genome. BioSystems (2002) 65:157–177.[CrossRef][Web of Science][Medline]
Chen S, et al. MicC, second small-RNA regulator of Omp protein expression in Escherichia coli. J. Bacteriol. (2004) 186:6689–6697.
Clote P, et al. Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. RNA (2005) 11:578–591.
Coventry A, et al. MSARI: multiple sequence alignments forstatistical detection of RNA secondary structure. Proc. Natl Acad. Sci. USA (2004) 101:12102–12107.
del Val C, et al. Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics. Mol. Microbiol. (2007) 66:1080–1091.[CrossRef][Web of Science][Medline]
di Bernardo D, et al. ddbRNA: detection of conserved secondary structures in multiple alignments. Bioinformatics (2003) 19:1606–1611.
Douchin V, et al. Down-regulation of porins by a small RNA bypasses the essentiality of the regulated intramembrane proteolysis protease RseP in Escherichia coli. J. Biol. Chem. (2006) 281:12253–12259.
Fichant GA, Burks C. Identifying potential tRNA genes in genomic DNA sequences. J. Mol. Biol. (1991) 220:659–671.[CrossRef][Web of Science][Medline]
Griffiths-Jones S, et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. (2005) 33:121–124.
Gruber AR, et al. The RNAz web server: prediction of thermodynamically stable and evolutionarily conserved RNA structures. Nucleic Acids Res (2007) 35:W335–W338.
Gottesman S, et al. Small RNA regulators and the bacterial response to stress. Cold Spring Harb. Symp. Quant. Biol (2006) 71:1–11.
Hurst LD, Merchant AR. High guanine-cytosine content is not an adaptation to high temperature: a comparative analysis amongst prokaryotes. Proc. Biol. Sci. (2001) 268:493–497.
Hüttenhofer A, Vogel J. Experimental approaches to identify non-coding RNAs. Nucleic Acids Res. (2006) 34:635–646.
Kang S, et al. CONSORF: a consensus prediction system for prokaryotic coding sequences. Bioinformatics (2007) 23:3088–3090.
Klein RJ, et al. Noncoding RNA genes identified in AT-rich hyperthermophiles. Proc. Natl Acad. Sci. USA (2002) 99:7542–7547.
Lambert A, et al. The ERPIN server: an interface to profile-based RNA motif identification. Nucleic Acids Res (2004) 32:W160–W165.
Laslett D, Canback B. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. (2004) 32:11–16.
Livny J, et al. sRNAPredict: an integrative computational approach to identify sRNAs in bacterial genomes. Nucleic Acids Res. (2005) 13:4096–4105.
Livny J, Waldor MK. Identification of small RNAs in diverse bacterial species. Curr. Opin. Microbiol. (2007) 10:96–101.[CrossRef][Web of Science][Medline]
Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. (1997) 25:955–964.
Macke TJ, et al. RNAMotif, an RNA secondary structure definition and search algorithm. Nucleic Acids Res. (2001) 29:4724–4735.
Mandin P, et al. Identification of new noncoding RNAs in Listeria monocytogenes and prediction of mRNA targets. Nucleic Acids Res (2007) 35:962–974.
Massé E, et al. Effect of RyhB small RNA on global iron use in Escherichia coli. J. Bacteriol. (2005) 187:6962–6971.
Mückstein U, et al. Thermodynamics of RNA-RNA binding. Bioinformatics (2006) 22:1177–1182.
Ostberg Y, et al. The etiological agent of Lyme disease Borrelia burgdorferi appears to contain only a few small RNA molecules. J. Bacteriol. (2004) 186:8472–8477.
Pichon C, Felden B. Intergenic sequence inspector: searching and identifying bacterial RNAs. Bioinformatics (2003) 19:1707–1709.
Pichon C, Felden B. Small RNA genes expressed from Staphylococcus aureus genomic and pathogenicity islands with specific expression among pathogenic strains. Proc. Natl Acad. Sci. USA (2005) 102:14249–14254.
Pichon C, Felden B. Proteins that interact with bacterial small RNA regulators. FEMS Microbiol. Rev. (2007) 31:614–625.[CrossRef][Web of Science][Medline]
Regalia M, et al. Prediction of signal recognition particle RNA genes. Nucleic Acid Res. (2002) 30:3368–3377.
Rehmsmeier M, et al. Fast and effective prediction of microRNA/target duplexes. RNA (2004) 10:1507–1517.
Rivas E, Eddy SR. Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics (2000) 16:583–605.
Rivas E, Eddy S. Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics (2001) 2:8.[CrossRef][Medline]
Rivas E, et al. Computational identification of noncoding RNAs in E. coli by comparative genomics. Curr. Biol. (2001) 11:1369–1373.[CrossRef][Web of Science][Medline]
Schattner P. Searching for RNA genes using base composition statistics. Nucleic Acid Res. (2002) 30:2076–2082.
Storz G, Gottesman S. Versatile roles of small RNA regulators in bacteria. In: The RNA World.—Gesteland RF, Cech TR, Atkins JF, eds. (2006) 3rd edn. USA: Cold Spring Harbor Laboratory Press. 567–594.
Tjaden B, et al. Transcriptome analysis of Escherichia coli using high-density oligonucleotide probe arrays. Nucleic Acid Res. (2002) 30:3732–3738.
Tjaden B, et al. Target prediction for small, noncoding RNAs in bacteria. Nucleic Acids Res. (2006) 34:2791–2802.
Udekwu KI, et al. Hfq-dependent regulation of OmpA synthesis is mediated by an antisense RNA. Genes Dev. (2005) 19:2355–2366.
Uzilov A, et al. Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics (2006) 7:173–203.[CrossRef][Medline]
Vogel J, et al. RNomics in Escherichia coli detects new sRNA species and indicates parallel transcriptional output in bacteria. Nucleic Acids Res (2003) 31:6435–6443.
Vogel J, Sharma CM. How to find small non-coding RNAs in bacteria. Biol. Chem. (2005) 386:1219–1238.[CrossRef][Web of Science][Medline]
Vogel J, Wagner EGH. Target identification of small noncoding RNAs in bacteria. Curr. Opin. Microbiol. (2007) 10:262–270.[CrossRef][Web of Science][Medline]
Washietl S, et al. Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat. Biotechnol. (2005a) 23:1383–1390.[CrossRef][Web of Science][Medline]
Washietl S, et al. Fast and reliable prediction of noncoding RNAs. Proc. Natl Acad. Sci. USA (2005b) 102:2454–2459.
Wassarman KM, et al. Small RNAs in Escherichia coli. Trends Microbiol. (1999) 7:37–45.[CrossRef][Web of Science][Medline]
Wassarman K, et al. Identification of novel small RNAs using comparative genomics and microarrays. Genes Dev. (2001) 15:1637–1651.
Yachie N, et al. Prediction of non-coding and antisense RNA genes in Escherichia coli with Gapped Markov Model. Gene (2006) 372:171–181.[CrossRef][Web of Science][Medline]
Zhang A, et al. Global analysis of small RNA and mRNA targets of Hfq. Mol. Microbiol. (2003) 50:1111–1124.[CrossRef][Web of Science][Medline]
Zuker M. On finding all suboptimal foldings of an RNA molecule. Science (1989) 244:48–52.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


