Skip Navigation


Bioinformatics Advance Access originally published online on March 29, 2005
Bioinformatics 2005 21(11):2590-2595; doi:10.1093/bioinformatics/bti411
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
21/11/2590    most recent
bti411v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (14)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Clutterbuck, D. R.
Right arrow Articles by Semple, C. A. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Clutterbuck, D. R.
Right arrow Articles by Semple, C. A. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

A bioinformatic screen for novel A–I RNA editing sites reveals recoding editing in BC10

D. R. Clutterbuck *, A. Leroy , M. A. O'Connell and C. A. M. Semple

MRC Human Genetics Unit, Western General Hospital Crewe Road, Edinburgh, EH4 2XU, UK

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 

Motivation: Recent studies have demonstrated widespread adenosine–inosine RNA editing in non-coding sequence. However, the extent of editing in coding sequences has remained unknown. For many of the known sites, editing can be observed in multiple species and often occurs in well-conserved sequences. In addition, they often occur within imperfect inverted repeats and in clusters. Here we present a bioinformatic approach to identify novel sites based on these shared features. Mismatches between genomic and expressed sequences were filtered to remove the main sources of false positives, and then prioritized based on these features. This protocol is tailored to identifying specific recoding editing sites, rather than sites in non-coding repeat sequences.

Results: Our protocol is more sensitive for identifying known coding editing sites than any previously published mammalian screen. A novel multiply edited transcript, BC10, was identified and experimentally verified. BC10 is highly conserved across a range of metazoa and has been implicated in two forms of cancer.

Contact: daniel.clutterbuck{at}hgu.mrc.ac.uk

Supplementary information: On journal website.


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
RNA editing is the modification of RNA sequence that is distinct from other RNA processing events such as splicing, capping or 3' end processing (Bass, 2002; Keegan et al., 2001). Editing is a widespread phenomenon that has been identified in numerous species including a range of metazoans, plants, protozoa and bacteria. In mammals adenosine–inosine (A–I) is the most prevalent type of editing. There are eight known mammalian genes whose amino acid sequences are recoded by A—I conversions (GluR-B, GluR-C, GluR-D, GluR-5, GluR-6, 5HT2cR, ADAR2, KCNA1 (Hoopengardner et al., 2003; Keegan et al., 2001)). Inosine is read by the translational machinery and reverse transcriptase as a guanosine; so A–I edits appear as A–G mismatches between genomic and expressed sequences. We have focused on recoding A–I mRNA editing sites. A recoding site is a site where editing alters the amino acid sequence. Many of the known recoding editing sites are critical to receptor function and are potentially implicated in a number of neurological disorders (Kawahara et al., 2004; Keegan et al., 2001). In addition, a mutation in ADAR1 (Adenosine Deaminase Acting on RNA) has recently been identified as the causative mutation behind a human pigmentation disorder (Miyamura et al., 2003). These observations underlie the importance of identifying the remaining editing sites in mammals.

Most mammalian recoding sites have been discovered serendipitously, or have been identified through homology to previously discovered edited genes. KCNA1 contains the only experimentally confirmed mammalian site that has been identified through a screen, although indirectly (Hoopengardner et al., 2003). There is some biochemical evidence for additional A–I editing sites based on the amount of inosine in rat brain, compared with the amount expected based on the abundance of editing in the known sites (Paul and Bass, 1998). There have been several computational and experimental screens for mammalian mRNA editing that have identified novel targets, but none of them have managed to identify any of the known recoding sites (Kim et al., 2004; Levanon et al., 2004; Morse et al., 2002). This fact also supports the possibility of additional sites.

RNA editing has been shown to increase protein diversity by introducing altered splicing patterns, frame shifts, alternative start or stop codons, or by directly affecting the protein sequence (Schaub and Keller, 2002). However, recent evidence suggests that the large majority of human A–I editing sites are situated in introns and UTRs, which could either have regulatory functions or could be non-functional (Athanasiadis et al., 2004; Blow et al., 2004; Kikuno et al., 2002; Kim et al., 2004; Levanon et al., 2004). These analyses all used the alignment of expressed sequences to the human genome. Levanon et al. (2004) specifically looked at A–G mismatches in inverted repeats and identified more than 12 723 sites in 1637 genes, with 95% accuracy. Similar results were obtained in other studies, although there will be significant overlap in their results (Athanasiadis et al., 2004; Kim et al., 2004). All five of the above studies showed that ~90% of the novel edits are located in Alu repeat sequences, which are unique to primates. Kim et al. (2004) also showed that the mouse had only 91 cDNAs with significant evidence of editing compared with 2674 in human. These data suggest that this form of editing is either specific to or highly exaggerated in primates. The function of these sites remains to be determined. It is striking that none of these studies identified any of the known recoding edited sites. Indeed, the two unconfirmed recoding sites identified by Levanon are the only novel mammalian recoding sites identified by any of these screens. A more recent paper suggests that 85% of all pre-mRNAs contain edited Alu repeats and that editing of Alu repeats can result in their exonisation and incorporation into coding sequence (Athanasiadis et al., 2004). Although this recodes the predicted proteins, it is unclear whether the inclusion of Alu sequences will necessarily result in functional proteins.

Using a biochemical approach, Morse et al. (2002) developed a screen to identify transcripts containing inosine residues. They identified 10 editing sites in nematode and 19 in human. All the sites are in 3' UTRs, introns or non-coding RNAs. A possible reason for this is that their protocol may have been biased towards transcripts containing multiple inosines.

A number of previous protocols have tried to identify novel editing sites. The most successful screen was based in the fruit fly. Hoopengardner et al. (2003) used comparative genomics between Drosophila melanogaster and Drosophila pseudobscura to identify genes containing regions that were conserved almost perfectly over 50 bp or more. This approach successfully identified all the known sites as well as 16 novel recoding RNA editing sites, which were all involved in rapid electrical or chemical neurotransmission (Hoopengardner et al., 2003). Looking for clusters of A–G mismatches in alignments between a Drosophila cDNA collection and the genome also identified one of these sites (Stapleton et al., 2002).

All the known recoding sites are individually functional (Keegan et al., 2001), whereas the functions of editing in non-coding repeats (Levanon et al., 2004) are proposed to be involved in more general processes such as RNA stabilisation, RNAi, protection from retrotransposition or destabilisation of viral duplexes (Kim et al., 2004; Levanon et al., 2004). We have focused our analyses on identifying only the recoding editing sites.

It has been established that many of the known recoding A–I editing sites occur in clusters, are well conserved and are found in imperfect inverted repeats termed ECSs (editing site complementary sequences) (Higuchi et al., 1993). This is illustrated in Figure 1 and Table 1. We have reviewed and analysed these features to determine their predictive power for identifying novel RNA editing sites. We have also attempted to identify other features common to these sites including similarities in local secondary structures or motifs.



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 1 The 5-Hydroxytryptamine (2c) receptor illustrates some features of recoding A–I editing sites.(A) The predicted structure of the RNA hairpin that the editing enzyme requires for editing to occur. Editing sites are shown by the arrows and Is (inosines). (B) The editing complementary sequence (ECS) and the four sites are shown in relation to the exon structure. Below, the high degree of sequence conservation is illustrated. Edited sites are highlighted. The recoding changes are shown below the alignment.

 

View this table:
[in this window]
[in a new window]
 
Table 1 Sequence conservation and ECS predictions for the known recoding edited regions

 
We have used a combination of seven predictive features to screen a large set of expressed versus genomic sequence mismatches, including suitable filters to remove SNPs, sequencing errors and alignment errors. In contrast to previous screens this method successfully identifies most of the known A–I recoding sites as well as identifying a novel experimentally validated recoding site.


    2 SYSTEMS AND METHODS
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
The analysis presented here is based on mouse and human, as large amounts of expressed and genomic sequence data are required. We have focused on the mouse as most of the mouse data is from a single strain, which reduces noise from single nucleotide polymorphisms. Human data are only used to confirm the putative mouse editing sites. To reduce the computational expense of this project to a manageable level, we required a collection of reference sequences that contained non-redundant sequences for the majority of the known genes for each species. For this purpose, concatenated exon sequences were obtained from Ensembl (Clamp et al., 2003) (based on mouse NCBI version 30 and human NCBI version 33). These included most of the known exonic recoding editing sites, with the exception of GluR-6 which was obtained from GenBank (Benson et al., 1999) using the accession NM_010348 [GenBank] . Orthologous mouse–human pairs were obtained from Ensembl. Where multiple homologues were predicted, we analysed each of them for sequence conservation and putatively orthologous mismatches, and then used the best homologous gene in the remaining analyses.

Mismatches were identified using MegaBlast (Altschul et al., 1997) (version 2.2.6) searches of mouse and human Ensembl genes against all publicly available expressed and genomic sequences for their respective species. BLAST matches that were <100 bp long, or <98% nucleotide identity, were discarded. This threshold removed low quality sequences and matches from homologous genes, but it also removed one edited EST from a known editing site. Unknown edited transcripts may also have been removed. Expressed sequences were obtained from dbEST (Boguski et al., 1993) and GenBank (Benson et al., 1999). Genomic sequences were obtained from the EMBL htg repository for both mouse and human. Mouse shotgun trace repository sequences were obtained from Ensembl (Clamp et al., 2003). All sequences were up to date as of September 2003. Clone and strain data were obtained from GenBank (Benson et al., 1999). Clone identifiers were used to remove redundancy from the set of expressed sequences. An editing region is defined here as an exon with one or more editing site. A necessary limitation of this protocol is that editing sites would not be observed if they are intronic or do not occur within an Ensembl transcript. This was the case for three of the known RNA editing regions (ADAR2 and GluR-6 Q/R and I/V sites). The remaining known editing sites constitute the positive control set for this protocol.

An initial set of 28 992 A–G mismatches found between the genomic and expressed sequences, but not between genomic sequences was constructed. Every mismatch was then analysed for seven features; (1) Number of putatively edited mouse cDNAs or ESTs with the same mismatch as the same position (Allowed values: 1, 2, >2); (2) Number of non-edited mouse cDNAs or ESTs combined with the number of publicly available genomic sequences for each given mismatch (Allowed values: 1, 2, >2); (3) Where possible the human homologues were aligned using Lagan (Brudno et al., 2003). We considered putative mouse sites to be conserved in human if they were also observed as a putative editing site at the equivalent location in human expressed sequences (Allowed values: Y,N); (4) We calculated the effect of the edit on the amino acid sequence by BLAST (Altschul et al., 1997) searching the Ensembl nucleotide sequence against the equivalent protein sequence, then mapping the putative editing site onto the alignment. This allowed us to distinguish between edits that alter the amino acid sequence and those that do not (Allowed values: Synonymous, Non-synonymous); (5) Sequence conservation was analysed using the same Lagan mouse/human alignments, from which the best conserved 120 bp window overlapping each putative editing site was selected (Continuous variable); (6) Putative mouse ECSs were identified by scanning for inverted repeats using a Smith–Waterman alignment algorithm from EMBOSS (Rice et al., 2000), Water, based on a scoring matrix modified for RNA base pairing specificities (available on request). The alignment was generated between a 70 bp region flanking the editing site and a reversed flanking 4 kb region. This test is unavoidably biased against edits that occur towards the end of inverted repeats (Continuous variable); (7) Clusters of sites were defined by the observation of more than one putative editing site within an exon (Continuous variable).

The results of these analyses were combined using a relative entropy approach (Lim et al., 2003). For a given feature i with a value xi, we assign a log-odds (LOD) score:

where fi(xi) is the proportion of all the positive controls in an interval containing feature value xi and gi(xi) is the proportion of the remaining ~29 000 A–G mismatches in the same interval. The proportions used are smoothed to avoid over-fitting to the limited number of positive controls (see supplementary methods for details of this and the calculation of intervals). The overall score assigned to a putative editing site is the sum of the LOD scores for the seven features. The mismatches were then ranked by their total scores. Methods to reduce sources of false positives, avoid overfitting to the positive controls and the experimental validation techniques are described in the Supplementary methods.


    3 RESULTS
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
3.1 Analysis of known editing sites
Recoding A–I edits tend to be conserved across related species (Emeson and Singh, 2000). Twelve of the thirteen A–I recoding mouse edits shown in Table 1 were supported by A–G mismatches in the public databases, and eight of these were also observed in human expressed sequences (although four of these are from the same cluster in the 5HT2cR gene). The levels of sequence conservation observed between the A–I mouse and human editing sites varied between 99 and 82% identity over 120 bp around the editing site (Table 1). This high level of conservation agrees with previous observations (Hoopengardner et al., 2003) and suggests that sequence conservation is a useful predictor of recoding editing sites.

Several of the known recoding sites have published ECSs in nearby introns (Burns et al., 1997; Dawson et al., 2004; Herb et al., 1996; Higuchi et al., 1993; Lomeli et al., 1994; Sommer et al., 1991). Our novel method for finding mouse ECSs was able to correctly identify four out of seven of these ECSs. Putatively, orthologous ECSs could also be observed in human for these four ECSs (data not shown). The remaining ECSs, overlapping the GluR-B, GluR-5 and GluR-6 Q/R sites, were not identified owing to the identification of higher scoring putative ECSs nearby. We also identified putative ECSs for several of the other positive controls, although they were generally weaker and did not tend to occur in the 3' introns as with most previously characterized ECSs (Table 1). We analysed the regions incorporated between the editing sites and their putative ECSs, seeking any common features in the secondary structure or common motifs. Neither analysis identified any convincing results (data not shown).

One feature of many recoding sites is that in addition to one highly edited adenosine, other nearby adenosines are also edited, although to a lesser extent. We define an editing region as an exon that contains one or more known editing sites. Table 1 demonstrates the usefulness of a cluster analysis as 5 of the 11 A–I regions contain a cluster. Finally, all the A—I editing regions contain at least one recoding edit, many of which have been shown to be functional and some have been implicated in disease.

Through simple BLAST searches of the publicly available cDNA and EST databases, we were able to identify edited sequences expressed in the nervous system for all the mammalian A–I recoding edits, except for the KCNA1 site, the ADAR2 site and the GluR-D N/S site. This observation agrees with previous reports on mammalian (Emeson and Singh, 2000) and Drosophila recoding editing sites (Hoopengardner et al., 2003) which suggest that most A–I recoding edits are specific to the brain and associated tissues. Although this could be a powerful predictor of novel editing sites, it would introduce bias against editing in other tissues and therefore, we have not included it in this analysis.

The remaining known sites were not identified, as the ADAR2 site is intronic (Rueter et al., 1999), the GluR-6 edited exons were not included in the Ensembl gene set, and there were no expressed sequences with mismatches to the KCNA1 site. The KCNA1 site emphasizes the limitations of the available expressed sequence data demonstrating that some sites may be missed. This analysis was not applicable to C–U, U–C or U–A editing as the known edited sites could not be identified owing to a lack of expressed sequence data or their absence from Ensembl genes.

3.2 Genome-wide identification of RNA editing sites
To screen all A–G mismatches in the genome, a relative entropy approach (Lim et al., 2003) was used to combine the results of the seven editing site features (see Systems and methods section). The positive control set consisted of the 11 recoding sites (from 8 edited regions) that are included in Ensembl transcripts. To combat overfitting to the positive control set we applied an iterated smoothing operation to the frequency distributions before generating LOD scores. In addition, we used a conservative jack-knife approach which ensured that each positive control was scored using only the non-related positive controls (see Systems and methods section).

This is the first mammalian screen to identify any of the known recoding editing sites. Figure 2 shows the distributions for the three continuous variables (sequence conservation, clustering and ECS score) and the total LOD scores of all A–G mismatches. This figure demonstrates that the three variable features and the total LOD score are useful and efficient for distinguishing the positive controls from the remaining mismatches.



View larger version (31K):
[in this window]
[in a new window]
 
Fig. 2 Distributions of results for the three continuous variables and the final combined LOD score. The 8 positive control regions contain 11 recoding edits. The total number of all remaining A–G mismatches is 28 979. The four discrete variables are not shown here. In each of the following graphs, the distribution of the positive controls skews heavily towards the right. (A) The ECS score describes the quality of the best predicted inverted repeat within 2 kb of the mismatch (See System and methods section). (B) The cluster size is the number of A–G mismatches in the exon containing the given mismatch. (C) Sequence conservation over 120 bp overlapping the mismatch. (D) The sum of all seven LOD scores for each mismatch.

 
This scoring system identified 7 of 11 positive control edits in the 10 top ranked mismatches (including the GluR-B Q/R, GluR-C R/G, GluR-D R/G, and all 4 5HT2cR edits). Of the four remaining positive controls, KCNA1 was missed as we did not find any matching edited sequences, while the rest were ranked badly owing to poor sequence conservation, poor ECS predictions, the lack of clusters or the lack of orthologous mismatches in human. These results are also relatively robust to modifications in the smoothing and binning of the score distributions (data not shown). Complete results for the 20 top ranked edits are given in the Supplementary data (Table S1).

The 10 highest scoring novel candidate recoding edits were experimentally tested for evidence of editing. We tested mouse brain and heart RT–PCR products for evidence of editing at the predicted sites. The brain was chosen as all the known A–I recoding sites are edited in this tissue, and the heart was chosen as a control. Athanasiadis et al. (2004) tested brain and lung for the same reasons. For 9 of the top 10 novel candidates, there was no evidence of editing. It is possible that these genes are edited in other tissues. However, exhaustive testing of these sites in every tissue and developmental stage was beyond the scope of this work. The top ranking novel edit, BC10, contains a novel editing region, which we have experimentally verified (Fig. 3).



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 3 Evidence for editing sites in BC10. The predicted ECS is shown with the putative A–I editing sites indicated by black boxes. The underlined nucleotides show the start of the coding region. The intervening 448 bp are shown by the arrow on the left hand side. The supporting evidence and the coding modification of the putative edits are given above/below the sites. There are four sources of edited sequences.

a The number of intronic brain RT–PCR products edited at this site (out of 50).

b The number of exonic brain RT–PCR products edited at this site (out of 23).

c The number of publicly available mouse brain expressed sequences edited at this site (out of 28).

d The number of publicly available human expressed sequences edited at the orthologous sites (out of 85).

 
We tested 17 other novel candidate editing sites, in addition to the 10 highest scoring novel candidates. These candidates were randomly selected and vary widely in their ranks. None of these candidates showed experimental evidence of editing, which suggests that recoding editing is only common at the top end of the distribution, and confirms the reliability of the scoring system.

3.3 Confirmation of BC10—a novel A—I editing region
The top novel candidate, BC10, shows all the features of the positive controls. It has a very high scoring putative ECS 480 bp 5' of the most frequently edited site; its edited region is 99% identical between mouse and human; it shows orthologous A–G mismatches in human; it is highly edited in brain tissue, affects the amino acid sequence and the editing sites occur in a tight cluster. The three recoding sites are supported by multiple ESTs/cDNAs in both species (Fig. 3). The putative ECS is the second highest scoring ECS from the ~30 000 A–G mismatches. The region containing these sites has been tested and editing has been verified in the lab. Interestingly, most of the editing was observed in the intron across the length of the predicted ECS and was specific to brain. The low number of edited RT–PCR products from the exonic region was partially due to an expressed pseudogene, which was preferentially amplified. BC10 specific primers could not be found. This expressed pseudogene is not found in human and cannot explain the observed editing sites. These data demonstrate that this protocol is able to predict novel A–I editing sites.

BC10 is differentially expressed in bladder cancer (Gromova et al., 2002) and renal cancer (Rae et al., 2000) cell lines and is predicted to be a small globular protein containing two transmembrane helices. All the editing sites are found in either the 5'-UTR or the N-terminal section of the protein, which is predicted to be outside the membrane. The three coding edits are all non-synonymous and predicted to encode exposed residues. A multi-species alignment shows that this region of BC10 is exceptionally well conserved from human down to Caenorhabditis elegans, suggesting that it is a fundamentally important gene (data not shown). Notably, the three recoding edited adenosines are conserved in all the species as well as most of the adjacent bases. This suggests that editing of this region may be conserved across all of these species.


    4 DISCUSSION
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
Our protocol is sensitive, given that it identified 7 of the 11 positive control edits. In contrast, no other screen for mammalian editing sites has identified any of the known recoding editing sites. Our protocol is also specific as five of the top eight genes contain known or experimentally verified sites. One of these is a confirmed novel editing region, BC10, which is an extremely well conserved gene implicated in renal and bladder cancer (Gromova et al., 2002; Rae et al., 2000). These results demonstrate that this is a useful and efficient method for identifying both known and novel A–I editing sites.

Using other computational protocols two groups have shown that there are over 1500 genes edited in introns or non-coding regions (Kim et al., 2004; Levanon et al., 2004). This shows a clear difference in magnitude between the number of coding and non-coding editing sites. The results of Levanon et al. (2004) suggest that the occurrence of strong a ECS is a very good predictor of editing sites in non-coding sequences. Here, we find that our ECS predictions are only moderate predictors for recoding sites and that a combination of features must be used to identify these sites. A better ECS prediction method could improve this situation. In contrast, we found that sequence conservation and the observation of orthologous mismatches in human are strong indicators of recoding sites. Notably, these features cannot be applied to editing sites in Alu repeats, as they are not conserved in the mouse.

Despite the success of this protocol for identifying many of the known recoding sites, it is possible that there are many more sites to identify in addition to BC10 and the previously known sites. For example, it is possible that some of the genes we experimentally tested may be edited, but that the degree of editing was too low or restricted to particular tissues or developmental stages other than adult brain and heart. The frequency of editing is 100% for only the GluR-B Q/R site, whereas the frequency of editing for other sites is often much lower (Keegan et al., 2001). It can also be developmentally regulated as with the AMPA receptor R/G sites (GluR-B,C and D), which are poorly edited early in mouse development (Lomeli et al., 1994). Although all the known mammalian recoding sites are edited in adult brain (Keegan et al., 2001), there may be some ascertainment bias towards brain editing. A large proportion of the publicly available expressed sequence data is from brain libraries, adding to this bias. Both the editing enzymes and edited Alu elements appear in a range of tissues (Keegan et al., 2001; Kim et al., 2004; Levanon et al., 2004), supporting the possibility of recoding editing in these tissues. Indeed, the disease phenotype of the ADAR1 null allele in humans is a skin condition, rather than neurological (Miyamura et al., 2003). It is also possible that there is an additional class of recoding sites that do not conform to the features used in this analysis; however, there is no evidence for this.

Environmental factors, such as the disease state of the tissue, may also be important. One form of ADAR1 is known to be interferon inducible (Patterson and Samuel, 1995), suggesting the existence of sites that are edited only during inflammation (Yang et al., 2003). Aberrant A–I editing of the endothelin receptor has been implicated in Hirschsprung disease (Tanoue et al., 2002), while aberrant C–U editing has been shown to induce liver dysplasia and carcinomas (Yamanaka et al., 1995, 1997). It is possible that there are many more sites that could be aberrantly edited, some of which could contribute to disease. When looking for ECSs we identified >1500 mismatches with inverted repeats that scored better than half of the known recoding sites. This suggests that there are a lot of potential hairpins formed between exons and their introns, which the editing enzymes could potentially bind. It is interesting to ask whether aberrant editing of these potential hairpins could be involved in disease.

Our results demonstrate that sequence conservation between mouse and human and the observation of identical orthologous mismatches are powerful predictors of recoding editing sites. As a result, this scoring system will be less useful for identifying putative species-specific sites. However, most of the positive controls have been shown to be widely conserved throughout mammals (Keegan et al., 2001).

In addition to recoding sites, there are many non-coding sites remaining to be discovered or characterized (Kim et al., 2004; Levanon et al., 2004). Understanding the functions of these non-coding sites is vital for a more complete understanding of RNA editing. This work has identified many of the known mammalian recoding editing sites and one novel edited region in BC10 and it is clear that there may be further sites to be identified. However, the present data suggest that recoding editing is a rare phenomenon, both as a proportion of total editing activity and as a proportion of affected exons in the mammalian genome.


    Acknowledgments
 
This work was funded by the Medical Research Council, UK.

Received on January 18, 2005; revised on March 21, 2005; accepted on March 24, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 

    Altschul, S.F., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402[Abstract/Free Full Text].

    Athanasiadis, A., et al. (2004) Widespread A-to-I RNA Editing of Alu-Containing mRNAs in the Human Transcriptome. PLoS. Biol., 2, 1–15 e391.

    Bass, B.L. (2002) RNA editing by adenosine deaminases that act on RNA. Annu. Rev. Biochem., 71, 817–846[CrossRef][ISI][Medline].

    Benson, D.A., et al. (1999) GenBank. Nucleic Acids Res., 27, 12–17[Abstract/Free Full Text].

    Blow, M., et al. (2004) A survey of RNA editing in human brain. Genome Res., 14, 2379–2387[Abstract/Free Full Text].

    Boguski, M.S., et al. (1993) dbEST—database for ‘expressed sequence tags’. Nat. Genet., 4, 332–333[CrossRef][ISI][Medline].

    Brudno, M., et al. (2003) LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res., 13, 721–731[Abstract/Free Full Text].

    Burns, C.M., et al. (1997) Regulation of serotonin-2C receptor G-protein coupling by RNA editing. Nature, 387, 303–308[CrossRef][Medline].

    Clamp, M., et al. (2003) Ensembl 2002: accommodating comparative genomics. Nucleic Acids Res., 31, 38–42[Abstract/Free Full Text].

    Dawson, T.R., et al. (2004) Structure and sequence determinants required for the RNA editing of ADAR2 substrates. J. Biol. Chem., 279, 4941–4951[Abstract/Free Full Text].

    Emeson, R.B. and Singh, M. (2000) Adenosine-to-inosine RNA editing: substrates and consequences. In Baos, B. (Ed.). RNA Editing, , Oxford Oxford University Press, pp. 108–138.

    Gromova, I., et al. (2002) bc10: A novel human bladder cancer-associated protein with a conserved genomic structure downregulated in invasive cancer. Int. J. Cancer, 98, 539–546[CrossRef][ISI][Medline].

    Herb, A., et al. (1996) Q/R site editing in kainate receptor GluR5 and GluR6 pre-mRNAs requires distant intronic sequences. Proc. Natl Acad. Sci. USA, 93, 1875–1880[Abstract/Free Full Text].

    Higuchi, M., et al. (1993) RNA editing of AMPA receptor subunit GluR-B: a base-paired intron–exon structure determines position and efficiency. Cell, 75, 1361–1370[CrossRef][ISI][Medline].

    Hoopengardner, B., et al. (2003) Nervous system targets of RNA editing identified by comparative genomics. Science, 301, 832–836[Abstract/Free Full Text].

    Kawahara, Y., et al. (2004) Glutamate receptors: RNA editing and death of motor neurons. Nature, 427, 801[CrossRef][Medline].

    Keegan, L.P., Gallo, A., O’Connell, M.A. (2001) The many roles of an RNA editor. Nat. Rev. Genet., 2, 869–878[ISI][Medline].

    Kikuno, R., et al. (2002) HUGE: a database for human large proteins identified in the Kazusa cDNA sequencing project. Nucleic Acids Res., 30, 166–168[Abstract/Free Full Text].

    Kim, D.D., et al. (2004) Widespread RNA editing of embedded alu elements in the human transcriptome. Genome Res., 14, 1719–1725[Abstract/Free Full Text].

    Levanon, E.Y., et al. (2004) Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat. Biotechnol., 22, 1001–1005[CrossRef][ISI][Medline].

    Lim, L.P., et al. (2003) The microRNAs of Caenorhabditis elegans. Genes Dev., 17, 991–1008[Abstract/Free Full Text].

    Lomeli, H., et al. (1994) Control of kinetic properties of AMPA receptor channels by nuclear RNA editing. Science, 266, 1709–1713[Abstract/Free Full Text].

    Miyamura, Y., et al. (2003) Mutations of the RNA-specific adenosine deaminase gene (DSRAD) are involved in dyschromatosis symmetrica hereditaria. Am. J. Hum. Genet., 73, 693–699[CrossRef][ISI][Medline].

    Morse, D.P., et al. (2002) RNA hairpins in noncoding regions of human brain and Caenorhabditis elegans mRNA are edited by adenosine deaminases that act on RNA. Proc. Natl Acad. Sci. USA, 99, 7906–7911[Abstract/Free Full Text].

    Patterson, J.B. and Samuel, C.E. (1995) Expression and regulation by interferon of a double-stranded-RNA-specific adenosine deaminase from human cells: evidence for two forms of the deaminase. Mol. Cell Biol., 15, 5376–5388[Abstract].

    Paul, M.S. and Bass, B.L. (1998) Inosine exists in mRNA at tissue-specific levels and is most abundant in brain mRNA. EMBO J., 17, 1120–1127[CrossRef][ISI][Medline].

    Rae, F.K., et al. (2000) Novel association of a diverse range of genes with renal cell carcinoma as identified by differential display. Int. J. Cancer, 88, 726–732[CrossRef][ISI][Medline].

    Rice, P., et al. (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet., 16, 276–277[CrossRef][ISI][Medline].

    Rueter, S.M., et al. (1999) Regulation of alternative splicing by RNA editing. Nature, 399, 75–80[CrossRef][Medline].

    Schaub, M. and Keller, W. (2002) RNA editing by adenosine deaminases generates RNA and protein diversity. Biochimie, 84, 791–803[Medline].

    Sommer, B., et al. (1991) RNA editing in brain controls a determinant of ion flow in glutamate-gated channels. Cell, 67, 11–19[CrossRef][ISI][Medline].

    Stapleton, M., et al. (2002) A Drosophila full-length cDNA resource. Genome Biol., 3, 1–8 RESEARCH0080[Medline].

    Tanoue, A., et al. (2002) Two novel transcripts for human endothelin B receptor produced by RNA editing/alternative splicing from a single gene. J. Biol. Chem., 277, 33205–33212[Abstract/Free Full Text].

    Yamanaka, S., et al. (1995) Apolipoprotein B mRNA-editing protein induces hepatocellular carcinoma and dysplasia in transgenic animals. Proc. Natl Acad. Sci. USA, 92, 8483–8487[Abstract/Free Full Text].

    Yamanaka, S., et al. (1997) A novel translational repressor mRNA is edited extensively in livers containing tumors caused by the transgene expression of the apoB mRNA-editing enzyme. Genes Dev., 11, 321–333[Abstract/Free Full Text].

    Yang, J.H., et al. (2003) Widespread inosine-containing mRNA in lymphocytes regulated by ADAR1 in response to inflammation. Immunology, 109, 15–23[CrossRef][ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
RNAHome page
E. M. Riedmann, S. Schopoff, J. C. Hartner, and M. F. Jantsch
Specificity of ADAR-mediated RNA editing in newly identified targets
RNA, June 1, 2008; 14(6): 1110 - 1118.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
N. Paz, E. Y. Levanon, N. Amariglio, A. B. Heimberger, Z. Ram, S. Constantini, Z. S. Barbash, K. Adamsky, M. Safran, A. Hirschberg, et al.
Altered adenosine-to-inosine RNA editing in human cancer
Genome Res., November 1, 2007; 17(11): 1586 - 1595.
[Abstract] [Full Text] [PDF]


Home page
RNAHome page
H. Poulsen, R. Jorgensen, A. Heding, F. C. Nielsen, B. Bonven, and J. Egebjerg
Dimerization of ADAR2 is mediated by the double-stranded RNA binding domain
RNA, July 1, 2006; 12(7): 1350 - 1360.
[Abstract] [Full Text] [PDF]


Home page
Brief Funct Genomic ProteomicHome page
E. Y. Levanon and E. Eisenberg
Algorithmic approaches for identification of RNA editing sites
Brief Funct Genomic Proteomic, March 1, 2006; 5(1): 43 - 45.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. Biol.Home page
Y. Feng, C. L. Sansam, M. Singh, and R. B. Emeson
Altered RNA Editing in Mice Lacking ADAR2 Autoregulation
Mol. Cell. Biol., January 15, 2006; 26(2): 480 - 488.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. E. Watkins Jr and J. SantaLucia Jr
Nearest-neighbor thermodynamics of deoxyinosine pairs in DNA duplexes
Nucleic Acids Res., November 1, 2005; 33(19): 6258 - 6267.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
21/11/2590    most recent
bti411v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (14)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Clutterbuck, D. R.
Right arrow Articles by Semple, C. A. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Clutterbuck, D. R.
Right arrow Articles by Semple, C. A. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?