Bioinformatics Advance Access originally published online on March 29, 2005
Bioinformatics 2005 21(11):2580-2589; doi:10.1093/bioinformatics/bti400
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
An algorithm for identification of bacterial selenocysteine insertion sequence elements and selenoprotein genes
Department of Biochemistry, University of Nebraska Lincoln, NE 68588-0664, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: Incorporation of selenocysteine (Sec) into proteins in response to UGA codons requires a cis-acting RNA structure, Sec insertion sequence (SECIS) element. Whereas SECIS elements in Escherichia coli are well characterized, a bacterial SECIS consensus structure is lacking.
Results: We developed a bacterial SECIS consensus model, the key feature of which is a conserved guanosine in a small apical loop of the properly positioned structure. This consensus was used to build a computational tool, bSECISearch, for detection of bacterial SECIS elements and selenoprotein genes in sequence databases. The program identified 96.5% of known selenoprotein genes in completely sequenced bacterial genomes and predicted several new selenoprotein genes. Further analysis revealed that the size of bacterial selenoproteomes varied from 1 to 11 selenoproteins. Formate dehydrogenase was present in most selenoproteomes, often as the only selenoprotein family, whereas the occurrence of other selenoproteins was limited. The availability of the bacterial SECIS consensus and the tool for identification of these structures should help in correct annotation of selenoprotein genes and characterization of bacterial selenoproteomes.
Availability: The web server interface is freely accessible to users at http://genomics.unl.edu/bSECISearch/
Contact: vgladyshev1{at}unl.edu
Supplementary information: http://genomics.unl.edu/bSECISearch/supplement.html (includes detailed Methods and Figures S1S3).
| INTRODUCTION |
|---|
|
|
|---|
The 21st naturally occurring amino acid, selenocysteine (Sec), has been identified as the major biological form of selenium in several enzymes and proteins found in bacteria, archaea and eukaryotes (Böck, 2000). The synthesis of Sec and its insertion into nascent polypeptides requires a complex molecular machinery that recodes in-frame UGA codons, which normally function as stop signals, to serve as Sec codons (Hatfield and Gladyshev, 2002). A key feature that instructs ribosomes to recognize UGA as Sec codon is a selenocysteine insertion sequence (SECIS) element, a stemloop structure residing within selenoprotein mRNAs (Low and Berry, 1996).
In eukaryotes and archaea, SECIS elements are located in untranslated regions (UTRs) of selenoprotein genes (Böck, 2000). Conserved features of eukaryotic SECIS elements have been well characterized (Low and Berry, 1996; Walczak et al., 1998). The Quartet (SECIS core) formed by four non-WatsonCrick base pairs and two unpaired adenosines in the apical loop, are essential for SECIS function (Supplementary information, Figure S1). Predicted primary and secondary structures of archaeal SECIS elements differ from those in the eukaryotic counterparts and display a common motif containing a purine-only GAA ... A internal loop and three consecutive CG or GC base pairs (Supplementary Figure S1; Wilting et al., 1997).
Bacterial SECIS (bSECIS) elements differ from both eukaryotic and archaeal elements with respect to sequence and structure and are located immediately downstream of Sec-encoding UGA codons (Berg et al., 1991; Hüttenhofer and Böck, 1998). However, identification of conserved features in bSECIS elements proved difficult. To date, the best characterized bSECIS elements are in genes encoding formate dehydrogenases H (fdhF), N (fdnG) and O (fdoG) in Escherichia coli (Supplementary Figure S1). A number of structurefunction studies have shown that E.coli SECIS elements are composed of two domains: one containing a Sec UGA codon and the other a 17 nt stemloop separated from UGA by 11 nt. Exposed GU in the apical loop and bulged UU in the upper stem are regarded as a common core of the E.coli SECIS elements (Heider et al., 1992; Hüttenhofer et al., 1996). A fixed distance between the in-frame UGA codon and the apical loop is also important for SECIS function (Chen et al., 1993). However, putative SECIS elements identified in selenoprotein mRNAs in several other bacteria, such as Clostridium sticklandii, Clostridium purinolyticum and Eubacterium acidaminophilum, seem to bear no resemblance to each other or to the E.coli counterparts with respect to loop sequences or lengths of the stems (Heider et al., 1991; Gursinsky et al., 2000). Thus, although the E.coli SECIS elements are well characterized, it is not known if these structures are present in all bacterial selenoprotein genes, and if so, what the common features of bacterial SECIS elements are.
Various bioinformatics algorithms have been developed for detection of eukaryotic SECIS elements. These programs successfully identified new selenoproteins in mammalian and Drosophila genomes and in several expressed sequence tag (EST) databases (Kryukov et al., 1999; Lescure et al., 1999; Castellano et al., 2001; Kryukov et al., 2003). Recently, this method was extended to archaeal SECIS elements (Kryukov and Gladyshev, 2004). In contrast, owing to lack of bacterial consensus SECIS models, prediction of bacterial selenoproteins in genomic sequences is difficult. Instead, these proteins can be identified through searches for Sec/Cys pairs in homologous sequences (Castellano et al., 2004; Kryukov and Gladyshev, 2004). One deficiency of this approach is the inability to identify selenoproteins, which have no Cys-containing homologs. Although only one such protein, glycine reductase selenoprotein A, is known in bacteria, it is possible that additional proteins exist.
In this report, we analyzed sequences downstream of Sec UGA codons in known bacterial selenoprotein genes and built a consensus bSECIS structural model. Based on this model, we developed bSECISearch, an algorithm for prediction of bacterial SECIS elements and selenoprotein genes in genomic databases. We used this approach to screen completely sequenced genomes containing Sec insertion machinery genes and further analyzed selenoproteomes in these organisms.
| SYSTEMS AND METHODS |
|---|
|
|
|---|
Sequences and resources
Among all completely sequenced bacterial genomes (240 genomes, December 31, 2004) available at the NCBI ftp server (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/), we selected those containing genes involved in Sec biosynthesis and insertion, including Sec synthase (SelA), Sec-specific elongation factor (SelB), tRNASec (SelC) and selenophosphate synthetase (SelD) (Forchhammer et al., 1990; Ehrenreich et al., 1992). A total of 29 Sec-utilizing completely sequenced bacterial genomes were identified and at least one known selenoprotein was found in each of these genomes.
Blast programs were obtained from the NCBI ftp server (ftp://ftp.ncbi.nih.gov/blast/). We used the 2.2.9 version of this program. RNA secondary structures were predicted by RNAfold v.1.4 (available at http://www.tbi.univie.ac.at/~ivo/RNA/). Multiple alignment and phylogenetic tree analyses were performed using ClustalW (http://www.ebi.ac.uk/clustalw/) and visualized with BoxShade and Treeme programs, respectively.
Composition of bSECISearch
The bSECISearch tool is composed of three modules: bSECIScan is responsible for initial identification of bacterial SECIS-like structures in query genomes; bSECISProfile profiles and evaluates candidate bSECIS elements; bSECISFilter filters out false positives by homology searches. A general scheme of the entire algorithm is shown in Figure 1.
|
Development of a consensus bSECIS structural model
We collected 100 known bacterial selenoprotein sequences from different selenoprotein families and organisms, predicted their optimal secondary structures in regions downstream of Sec UGA codons by RNAfold and compared structures and sequences to identify common features. A consensus bSECIS structural model (Fig. 2) was then developed. The minimum free energy (MFE) cutoff was based on the free energy calculation of known bSECIS elements and was set at 7.5 kcal/mol.
|
bSECIS element prediction and ORF identification (bSECIScan module)
Since bacterial SECIS elements are located immediately downstream of Sec-encoding UGA codons in the coding regions of selenoprotein genes, bSECIScan searches with a sliding window (39100 nt) starting from each UGA codon in a query genome and retrieves UGA-containing sequences. Secondary structure of each sequence is predicted by RNAfold and analyzed against the consensus bSECIS model (Fig. 2). For each UGA-containing sequence that satisfies this consensus, regions upstream and downstream of the UGA codon are analyzed for occurrence of open reading frames (ORFs) (Fig. 2). If a stop codon is detected closer to the UGA codon than an appropriate start codon (AUG or GUG) in the same frame of a candidate selenoprotein gene, the UGA-containing sequence is discarded.
Segment-based bSECIS profiling and statistical evaluation (bSECISProfile module)
To profile candidate bSECIS elements, a training dataset containing 60 SECIS elements in known bacterial selenoprotein genes was prepared. These SECIS elements were derived from various selenoprotein families and used to construct a statistical measure. To avoid bias in the profiling score on the origin of the sample, sequences were selected such that no pair of SECISes had >90% sequence identity within the bSECIS element region. Secondary structures of bSECIS elements in the training set were divided into basic components of a standard stemloop structure: apical loop, upper and lower stems, internal loop, etc. A segment-based algorithm, DIALIGN (Morgenstern et al., 1996), was used to separately align the apical loop and the upper-stem of the training data as these regions are known to be most important for SECIS function (Engelberg-Kulka et al., 2001). This procedure allowed detection and correct alignment of short similar regions in long sequences of low overall similarity (e.g. the conserved G in the apical loop in our bSECIS model).
Position specific scoring matrices (PSSMs, Staden, 1984) for the apical loop and the upper stem were then derived from the alignment. To find optimal bSECIS elements in the candidate set, we developed a quasi-greedy alignment algorithm based on the standard Gotoh's dynamic programming algorithm (Gotoh, 1982). We optimized the standard algorithm by adding additional constraints, including eliminating sequence combinations with negative weight scores or excessive number of gaps. Moreover, the score was obtained not only from the substitution score, but also from the weight matrices and our bSECIS structural model (see Supplementary information). Optimal bSECIS elements and their predicted ORFs were presented with weight scores greater than the cutoff. In this study, the score cutoff was predefined as 28.0 based on the observation that at least 95% of bSECIS elements in the training set scored greater than the cutoff.
For statistical evaluation, we calculated how often a putative bSECIS element of a given (or greater) score would be occurring under a null model (E-value, Hertz and Stormo, 1999). The following approximate equation was obtained to calculate our E-value:
![]() | (1) |
Analysis of conservation of UGA-flanking regions (bSECISFilter module)
bSECISFilter makes use of the blast search tool (Altschul et al., 1990) to identify homologs of putative bSECIS-containing ORFs in microbial genomes and non-redundant (NR) databases. The key process of the procedure is to analyze the conservation of UGA-flanking regions in each putative bSECIS-containing ORF.
The tblastn program was first used to screen the NCBI microbial genome and nucleotide NR databases with the bSECIS-containing ORFs. Only those hits with E-value
0.05 and the percent similar residues in the high-scoring segment pair (HSP)
40% were selected. Genomic sequence hits that were derived from the same query organism were filtered out. The remaining highly significant hits were then screened to assess the residues aligned with Sec in the query sequence. If the following criteria were not satisfied:
The resulting primary candidate set was then divided into homologs of previously known selenoproteins [including experimentally verified and computationally predicted selenoproteins (Kryukov and Gladyshev, 2004)] and candidate selenoproteins. All candidates were manually analyzed for the location of the UGA codon, the occurrence of Sec-containing and Cys-containing homologs in Sec-utilizing or other organisms, and the presence of bSECIS elements in Sec-containing homologs. Selenoproteins were designated as new if they satisfied the following criteria:
- the UGA codon was not present between two different functional domains;
- if additional Sec-containing homologs were available, at least one was present in an evolutionarily distant Sec-utilizing organism;
- at least 50% of Sec-containing homologs in known Sec-utilizing organisms contained bSECIS elements.
| IMPLEMENTATION |
|---|
|
|
|---|
The bSECISearch algorithm was implemented mainly in Perl, except for the bSECISProfile module, which was written in ANSI C. The program is completely automated and was successfully tested on a LINUX platform.
| RESULTS |
|---|
|
|
|---|
Consensus bSECIS structural model
We constructed a structural alignment of 100 predicted SECIS structures present in representative bacterial selenoprotein sequences and developed a consensus bSECIS structural model (Fig. 2). This model described a common stemloop core in bacterial SECIS elements. However, individual bSECIS elements may have additional functionally important features. For example, a bulged U is present in the stem of the fdhF SECIS element and was shown to be required for Sec insertion (Hüttenhofer et al., 1996), but this feature is absent in most other bSECISes. In our bSECIS model, the common core is composed of a 314 nt apical loop, which is small (35 nt) in most SECIS elements, and an adjacent 416 bp stem. Primary sequences are not conserved except a single guanosine (G) present among the first two nucleotides in the apical loop. The G is often followed with a U, which was suggested to be important for interaction with SelB (Fourmy et al., 2002), but this nucleotide is not strictly conserved. We observed a minimal spacing of 16 nt and a maximal spacing of 37 nt between the UGA codon and the apical loop, although the spacing for most bacterial SECISes was limited to 1823 nt. Other features associated with the SECIS structure, such as number and composition of internal loops, bulges, lower stems were not obvious from our analysis. These data suggested that an absolute majority of bacterial SECIS elements can be described by a common structural model and that these structures probably occur exclusively in downstream sequences flanking the UGA. A recent study described a WatsonCrick base pair within the apical loop of the Moorella thermoacetica fdhA SECIS element, which probably stabilized the SelB/SECIS interaction (Yoshizawa et al., 2005). Although this base pair is not strictly conserved within bacterial SECIS elements, this feature might in future help to further improve the bSECIS model.
Identification of bSECIS elements and selenoprotein genes in completely sequenced Sec-utilizing genomes
As a first application of our program, we analyzed completely sequenced genomes of Sec-utilizing organisms, i.e. organisms that had SelA, SelB and SelD genes. Among bacterial genomes available at NCBI, 29 genomes possessed these genes (Table 1). These genomes summed up to 98 566 493 nt and contained 3 142 018 TGA triplets on both strands. To identify selenoprotein genes, the program initially tested the occurrence of candidate bSECIS elements downstream of each candidate UGA codon. Primary and secondary structures of candidate bSECIS elements were analyzed against the bSECIS structural model and ORF constraints. 48 472 candidate bSECIS elements (1.5% of the total number of UGA codons) were selected by the bSECIScan module. Thus, this module could quickly filter out most Sec-unrelated UGA codons (98.5%) in bacterial genomes. Subsequent application of bSECISProfile resulted in 28 974 candidate structures, which were further reduced to 291 hits by the bSECISFilter module. These hits were divided into homologs of previously known selenoproteins (83 sequences) and candidate selenoproteins (208 sequences). The latter sequences were manually analyzed for the location of UGA codons, occurrence of Sec-containing and Cys-containing homologs and presence of potential bSECIS elements in Sec-containing homologs. This procedure resulted in four new selenoprotein genes (Table 1).
|
A control genome, Lactococcus lactis, was also analyzed, which did not have Sec insertion machinery and was not expected to possess selenoprotein genes. No hits were found in this genome, suggesting that our algorithm could distinguish, at least among some bacteria, Sec-utilizing from other organisms.
Previously known selenoproteins detected in completely sequenced genomes
The 83 known selenoprotein sequences belonged to 19 selenoprotein families (Table 2). Importantly, this set included glycine reductase selenoprotein A genes, which could not be identified by searching for Sec/Cys pairs in homologous sequences as no Cys homologs are known for this protein (Kryukov and Gladyshev, 2004). Structural alignment of bSECIS sequences present in these selenoproteins highlighted conserved features of the bSECIS model (Fig. 3A). An exhaustive search against Sec-utilizing genomes with all previously known selenoproteins (Kryukov and Gladyshev, 2004) revealed a total of 86 selenoproteins belonging to the same 19 selenoprotein families, but only 45 of them were correctly annotated in genomic sequences (Table 1). The three selenoproteins missed by bSECISearch included two SelD (Campylobacter jejuni, Photobacterium profundum) and one selenoprotein A (P.profundum) genes. We analyzed these selenoproteins and found that two of them contained unusual bSECIS-like structures that could not be detected by our model (Fig. 3B). It cannot be excluded that secondary structures were incorrectly predicted in these bSECIS elements. It is also possible that additional bSECIS types occur in these genes or there are sequencing errors within sequences downstream of Sec UGA codons. The third, C.jejuni SelD, was discarded because a UAA stop codon was detected upstream of the in-frame UGA codon within the SelD ORF. Thus, the C.jejuni SelD probably had a sequencing error (or was a pseudogene). In spite of the inability to detect these selenoprotein genes, the program identified 96.5% (true positive rate) known selenoprotein genes representing all known selenoprotein families in the 29 Sec-utilizing genomes.
|
|
Analysis of distribution of selenoproteins in the genomes of Sec-utilizing organisms showed that most genomes contained one or two selenoproteins. In addition, several selenoprotein-rich bacteria were identified, including Symbiobacterium thermophilum (11 selenoproteins), Desulfotalea psychrophila (11 selenoproteins), Desulfovibrio vulgaris (8 selenoproteins), Geobacter sulfurreducens (8 selenoproteins) and Treponema denticola (6 selenoproteins). A total of 44 selenoproteins in these selenoprotein-rich genomes accounted for 51.2% of detected selenoprotein sequences, suggesting high Sec usage in these organisms.
Of all selenoproteins, the formate dehydrogenase family had a particularly high representation. This selenoprotein was identified in 24 of the 29 bacterial species (Table 3). In many of these organisms, formate dehydrogenase was the only selenoprotein and its gene often flanked Sec insertion genes. Previous smaller scale analyses of prokaryotic genomes for Sec/Cys pairs also revealed that this protein was present in many organisms that utilize Sec, and its occurrence was by far, more common than any other selenoprotein (Kryukov and Gladyshev, 2004). Phylogenetic analyses provided additional clues on the evolution of this enzyme family (Supplementary Figure S2). We found that most Cys-containing formate dehydrogenases belonged to the fdhF subfamily, whereas most enzymes of the fdoG and fdnG subfamilies were selenoproteins. No definitive conclusions could be made on what was the original form of formate dehydrogenase (i.e. whether it was Sec-containing or Cys-containing protein).
|
Interestingly, a Cys-containing formate dehydrogenase fdnG from Mannheimia succiniciproducens clustered with a Sec-containing homolog from the same organism and both belonged to the fdnG/fdoG subfamily. A bSECIS-like structure was found downstream of its Cys UGU codon, and this structure was similar to the corresponding structure detected in the selenoprotein homolog (Supplementary Figure S3). The data suggest that the Cys-containing fdnG evolved from a Sec-containing ancestor by replacing UGA with UGU. The presence of such fossil SECIS elements was also previously observed in archaea [Methanococcus voltae vhuD protein, (Böck and Rother, 2005)] and eukaryotes [mouse GPx6 (Kryukov et al., 2003)].
In the five Sec-utilizing organisms, in which the Sec-containing formate dehydrogenases were absent, only Photobacterium profundum possessed a Cys-containing homolog, whereas neither Sec-containing nor Cys-containing formate dehydrogenases could be detected in Clostridium perfringens, Haemophilus ducreyi, Thermoanerobacter tengcongensis and Treponema denticola. It is possible that adaptations to new living environments resulted in changes in the requirement of these enzymes for anaerobic respiration. Under these new conditions, other selenoproteins (perhaps, SelD as its Sec-containing form is present in all four of these bacteria) might have become responsible for maintaining the Sec utilization trait.
SelD, which is a key component in prokaryotic selenoprotein biosynthesis (Ehrenreich et al., 1992), was the second most abundant selenoprotein family which was detected in 14 Sec-utilizing organisms. In Haemophilus ducreyi, SelD was the only selenoprotein detected. All other selenoproteins had low occurrence, including eight which were represented by single sequences (all present in selenoprotein-rich organisms). The identical pattern of occurrence of selenoproteins A and B was consistent with previous studies that placed these enzymes in the same pathway (Kreimer and Andreesen, 1995).
New selenoproteins detected in completely sequenced genomes
Four new selenoprotein families (Table 2; Fig. 3C) were manually identified among 208 candidate selenoproteins generated by bSECISearch, and all had Cys-containing homologs in other organisms. Multiple alignments of these new selenoprotein families, along with their Cys-containing homologs, revealed sequence conservation of Sec/Cys pairs and their flanking regions (Fig. 4). All four selenoproteins either had a domain of known function or were homologous to protein families with known functions. These new selenoproteins included G.sulfurreducens radical SAM domain protein (COG0535, predicted FeS oxidoreductase family) and three different families of rhodanese-like sulfurtransferases: G.sulfurreducens sulfurtransferase (COG2897, SseA), D.psychrophila sulfurtransferase (COG0607, PspE) and D.psychrophila sulfurtransferase [homologous to a putative rhodanese-related sulfurtransferase in Rubrivivax gelatinosus (ZP_00243227)].
|
The presence of three selenoprotein sequences representing various families of the rhodanese superfamily is interesting and suggests an advantage that the use of Sec may provide for the sulfurtransferase function. In addition, we recently identified a fourth family of Sec-containing sulfurtransferases in the microbial sequence dataset of the environmental genomes of the Sargasso Sea (Y. Zhang, D.E. Fomenko and V.N. Gladyshev 2005, submitted for publication). Further experimental verification is needed for the newly identified selenoproteins.
One purpose of our bSECISearch algorithm was to test how many selenoproteins exist that do not have Cys-containing homologs. To date, only one such protein, glycine reductase selenoprotein A, is known. However, 4 of 5 selenoprotein A genes present in the 29 Sec-utilizing genomes were detected by our method, suggesting that it can indeed identify selenoproteins with no Cys-containing homologs. Since we did not find additional such selenoproteins in the 29 completely sequenced genomes, it appears that selenoproteins with no Cys homologs are extremely rare.
Finally, we tested if bSECISearch can distinguish the Sec-encoding function of UGA codon from other coding functions. In Mycoplasma, UGA codons designate Trp (Christiansen et al., 1997). We analyzed the genome of Mycoplasma gallisepticum, which contains 44 606 TGA triplets. The use of bSECIScan and bSECISProfile modules of bSECISearch resulted in 42 candidate bSECIS-like structures; however, a subsequent bSECISFilter screening discarded all of these hits. Thus, our method may also be used to distinguish the Sec-encoding function from other recoding function of UGA codons.
| DISCUSSION |
|---|
|
|
|---|
SECIS elements are essential for recognition of UGA as Sec codons (Thanbichler and Böck, 2002). These structures are well characterized in E.coli (Berg et al., 1991). However, one of the major deficiencies in the field has been the inability to identify bSECIS elements in many other selenoprotein genes as well as the lack of a common model for bacterial SECIS elements. In this study, we addressed these problems by detecting such structures in most bacterial selenoprotein genes, identifying conserved features in them and building a consensus bSECIS structural model. We then used the model to develop a bSECISearch algorithm, which combined three independent approaches to identify bSECIS elements and bacterial selenoprotein genes in genomic sequences.
The algorithm was designed for routine investigations of bacterial genomes. However, the use of the consensus bacterial SECIS model alone is not sufficient to identify bSECIS elements in bacterial genomic databases because of low conservation of this structure. In our study, we intended our consensus bSECIS structural model to be somewhat loose and to focus on the common stemloop core in either simple (standard hairpin) or complex (additional nested or juxtaposed hairpin structures) bSECIS elements, so that it could have a greater tolerance for variations within the bSECIS region. We found that the number of predicted bSECIS-like structures in organisms with high GC content (e.g. Mycobacterium avium) was much higher than in organisms with low GC content (e.g. C.jejuni). This is probably because of the likelihood of finding a G in the apical loop position. A recent study (Sandman et al., 2003) suggested an unexpected tolerance of mutations within the SECIS element, which appears to be consistent with our consensus model. It is possible that distinct classes of SECIS elements exist, which could not be recognized by our model. Further experimental verifications and tests are necessary to examine this possibility.
Unlike most previous methods used in eukaryotic SECIS element prediction, our method introduced a statistical foundation based on the training data and E-value calculation (the bSECISProfile module), as well as homology search (the bSECISFilter module). Our search results are not only consistent with the previous studies (Kryukov and Gladyshev, 2004) that identified selenoprotein genes by searching for Sec/Cys pairs in homologous sequences, but also show an improvement with respect to identification of selenoproteins, for which Cys homologs are not known. Additional computational methods, such as covariance models based on stochastic context-free grammars (Eddy and Durbin, 1994), may further improve accuracy of our algorithm.
Among selenoprotein-containing organisms that were analyzed in our study, most were obligatory or facultative anaerobes that grow optimally at ambient temperatures and neutral pH. Distribution of these organisms did not match the evolutionary history of bacteria. The five selenoprotein-rich bacteria belonged to three evolutionarily distant phyla (Proteobacteria, Firmicutes and Spirochaetes) which also contained selenoprotein-poor organisms. No clear links could be established with respect to the occurrence and number of selenoproteins and the phylogeny of the organisms.
The high abundance of formate dehydrogenase genes was consistent with the idea that this selenoprotein family is largely responsible for maintaining the Sec utilization trait. On the other hand, the absence of Sec-containing formate dehydrogenases in a small number of Sec-utilizing organisms and a scattered occurrence of most of other selenoproteins illustrate a highly dynamic nature of Sec evolution. The analysis of selenoproteins and the compensatory sets of Cys-containing homologs (for example, formate dehydrogenase) provides a model system to analyze origins and evolution of selenoprotein families.
An additional novelty of our study was identification of four new selenoprotein genes. Among these, G.sulfurreducens radical SAM domain protein (NP_952365 [GenBank] ) and G.sulfurreducens sulfurtransferase (COG2897, NP_951984 [GenBank] ) have been annotated as putative selenoproteins in this genome (Methe et al., 2003). Although these annotations were correct, the criteria that were used are not clear as misannotations of selenoprotein genes are common. In fact, we found that only 45 detected selenoprotein genes are correctly annotated in Genbank (including the two new G.sulfurreducens selenoproteins). We also found that several sequences are incorrectly annotated as selenoproteins. For example, YP_066331, a homolog of 30S ribosomal protein S6 in D.psychrophila is annotated as selenoprotein containing two putative Sec residues; however, this sequence has neither Sec-containing or Cys-containing homologs nor bSECIS. On the other hand, since the two G.sulfurreducens have passed the stringent criteria employed by bSECISearch, these should be viewed as excellent candidates.
In conclusion, we show that most bacterial SECIS elements can be described by a common structural model. Our bSECISearch tool that was built using this model can provide significant hints to assist with identification and characterization of bacterial SECIS elements and selenoprotein genes. As scientific community is faced with the explosion in the amount of sequence data, the ability to identify bacterial SECIS elements can help interpret correctly the selenoprotein sequences. Systematic exploration of bSECIS elements, selenoproteins and selenoproteomes should in turn, result in a better understanding of recoding processes as well as the role of the trace element selenium in nature.
| Acknowledgments |
|---|
This work was supported by NIH GM061603. This research was completed in part utilizing the Research Computing Facility of the University of Nebraska, Lincoln.
Received on February 19, 2005; revised on March 17, 2005; accepted on March 21, 2005
| REFERENCES |
|---|
|
|
|---|
Altschul, S.F., et al. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403410[CrossRef][ISI][Medline].
Berg, B.L., et al. (1991) Nitrate-inducible formate dehydrogenase in Escherichia coli K-12. II. Evidence that a mRNA stemloop structure is essential for decoding opal (UGA) as selenocysteine. J. Biol. Chem., 266, 2238622391
Böck, A. (2000) Biosynthesis of selenoproteinsan overview. Biofactors, 11, 7778[Medline].
Böck, A. and Rother, M. (2005) A pseudo-SECIS element in Methanococcus voltae documents evolution of a selenoprotein into a sulphur-containing homologue. Arch. Microbiol., 183, 148150[Medline].
Castellano, S., et al. (2001) In silico identification of novel selenoproteins in the Drosophila melanogaster genome. EMBO Rep., 2, 697702[CrossRef][ISI][Medline].
Castellano, S., et al. (2004) Reconsidering the evolution of eukaryotic selenoproteins: a novel nonmammalian family with scattered phylogenetic distribution. EMBO Rep., 5, 7177[CrossRef][ISI][Medline].
Chen, G.F., et al. (1993) Effect of the relative position of the UGA codon to the unique secondary structure in the fdhF mRNA on its decoding by selenocysteinyl tRNA in Escherichia coli. J. Biol. Chem., 268, 2312823131
Christiansen, G., et al. (1997) Molecular biology of Mycoplasma. Wien. Klin. Wochenschr., 109, 557561[ISI][Medline].
Eddy, S.R. and Durbin, R. (1994) RNA sequence analysis using covariance models. Nucleic Acids Res., 22, 20792288
Ehrenreich, A., et al. (1992) Selenoprotein synthesis in E.coli. Purification and characterisation of the enzyme catalyzing selenium activation. Eur. J. Biochem., 206, 767773[ISI][Medline].
Engelberg-Kulka, H., et al. (2001) An extended Escherichia coli selenocysteine insertion sequence (SECIS) as a multifunctional RNA structure. Biofactors, 14, 6168[ISI][Medline].
Forchhammer, K., et al. (1990) Purification and biochemical characterization of SELB, a translation factor involved in selenoprotein synthesis. J. Biol. Chem., 265, 93469350
Fourmy, D., et al. (2002) Structure of prokaryotic SECIS mRNA hairpin and its interaction with elongation factor SelB. J. Mol. Biol., 324, 137150[CrossRef][ISI][Medline].
Gotoh, O. (1982) An improved algorithm for matching biological sequences. J. Mol. Biol., 162, 705708[CrossRef][ISI][Medline].
Gursinsky, T., et al. (2000) A selDABC cluster for selenocysteine incorporation in Eubacterium acidaminophilum. Arch. Microbiol., 174, 200212[CrossRef][ISI][Medline].
Hatfield, D.L. and Gladyshev, V.N. (2002) How selenium has altered our understanding of the genetic code. Mol. Cell. Biol., 22, 35653576
Heider, J., et al. (1991) Interspecies compatibility of selenoprotein biosynthesis in Enterobacteriaceae. Arch. Microbiol., 155, 221228[Medline].
Heider, J., et al. (1992) Coding from a distance: dissection of the mRNA determinants required for the incorporation of selenocysteine into protein. EMBO J., 11, 37593766[ISI][Medline].
Hertz, G.Z. and Stormo, G.D. (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 15, 563577
Hüttenhofer, A., et al. (1996) Solution structure of mRNA hairpins promoting selenocysteine incorporation in Escherichia coli and their base-specific interaction with special elongation factor SELB. RNA, 2, 354366[Abstract].
Hüttenhofer, A. and Böck, A. (1998) The RNA structures involved in selenoprotein synthesis. In Simons, R.W. and Grunberg-Manago, M. (Eds.). RNA Structure and Function, , Cold Spring Harbor, NY Cold Spring Harbor Laboratory Press, pp. 603639.
Kreimer, S. and Andreesen, J.R. (1995) Glycine reductase of Clostridium litorale. Cloning, sequencing, and molecular analysis of the grdAB operon that contains two in-frame TGA codons for selenium incorporation. Eur. J. Biochem., 234, 192199[ISI][Medline].
Kryukov, G.V., et al. (1999) New mammalian selenocysteine-containing proteins identified with an algorithm that searches for selenocysteine insertion sequence elements. J. Biol. Chem., 274, 3388833897
Kryukov, G.V., et al. (2003) Characterization of mammalian selenoproteomes. Science, 300, 14391443
Kryukov, G.V. and Gladyshev, V.N. (2004) The prokaryotic selenoproteome. EMBO Rep., 5, 538543[CrossRef][ISI][Medline].
Lescure, A., et al. (1999) Novel selenoproteins identified in silico and in vivo by using a conserved RNA structural motif. J. Biol. Chem., 274, 3814738154
Low, S. and Berry, M. (1996) Knowing when not to stop: selenocysteine incorporation in eukaryotes. Trends Biochem. Sci., 21, 203208[CrossRef][ISI][Medline].
Methe, B.A., et al. (2003) Genome of Geobacter sulfurreducens: metal reduction in subsurface environments. Science, 302, 19671969
Morgenstern, B., et al. (1996) Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Natl Acad. Sci. USA, 93, 1209812103
Sandman, K.E., et al. (2003) Revised Escherichia coli selenocysteine insertion requirements determined by in vivo screening of combinatorial libraries of SECIS variants. Nucleic Acids Res., 31, 22342241
Staden, R. (1984) Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res., 12, 505519[ISI][Medline].
Thanbichler, M. and Böck, A. (2002) The function of SECIS RNA in translational control of gene expression in Escherichia coli. EMBO J., 21, 69256934[CrossRef][ISI][Medline].
Walczak, R., et al. (1998) An essential non-WatsonCrick base pair motif in 3'UTR to mediate selenoprotein translation. RNA, 4, 7484[Abstract].
Wilting, R., et al. (1997) Selenoprotein synthesis in archaea: identification of an mRNA element of Methanococcus jannaschii probably directing selenocysteine insertion. J. Mol. Biol., 266, 637641[CrossRef][ISI][Medline].
Yoshizawa, S., et al. (2005) Structural basis for mRNA recognition by elongation factor SelB. Nat. Struct. Mol. Biol., 12, 198203[CrossRef][ISI][Medline].
This article has been cited by other articles:
![]() |
T. Gursinsky, D. Grobe, A. Schierhorn, J. Jager, J. R. Andreesen, and B. Sohling Factors and Selenocysteine Insertion Sequence Requirements for the Synthesis of Selenoproteins from a Gram-Positive Anaerobe in Escherichia coli Appl. Envir. Microbiol., March 1, 2008; 74(5): 1385 - 1393. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Castellano, V. N. Gladyshev, R. Guigo, and M. J. Berry SelenoDB 1.0 : a database of selenoprotein genes, proteins and SECIS elements Nucleic Acids Res., January 11, 2008; 36(suppl_1): D332 - D338. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. A. Shchedrina, S. V. Novoselov, M. Yu. Malinouski, and V. N. Gladyshev Identification and characterization of a selenoprotein family containing a diselenide bond in a redox motif PNAS, August 28, 2007; 104(35): 13919 - 13924. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zhang and V. N. Gladyshev High content of proteins containing 21st and 22nd amino acids, selenocysteine and pyrrolysine, in a symbiotic deltaproteobacterium of gutless worm Olavius algarvensis Nucleic Acids Res., August 1, 2007; 35(15): 4952 - 4963. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. Howard, M. W. Moyle, G. Aggarwal, B. A. Carlson, and C. B. Anderson A recoding element that stimulates decoding of UGA codons by Sec tRNA[Ser]Sec RNA, June 1, 2007; 13(6): 912 - 920. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-L. Xiao, S. R. Smith, N. Ishmael, J. C. Redman, N. Kumar, E. L. Monaghan, M. Ayele, B. J. Haas, H. C. Wu, and C. D. Town Analysis of the cDNAs of Hypothetical Genes on Arabidopsis Chromosome 2 Reveals Numerous Transcript Variants Plant Physiology, November 1, 2005; 139(3): 1323 - 1337. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||










