Bioinformatics Advance Access originally published online on January 28, 2006
Bioinformatics 2006 22(8):1027-1028; doi:10.1093/bioinformatics/btl026
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
siRecords: an extensive database of mammalian siRNAs with efficacy ratings


Department of Neuroscience, University of Minnesota Minneapolis, MN 55455, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: Short interfering RNAs (siRNAs) have been gaining popularity as the gene knock-down tool of choice by many researchers because of the clean nature of their workings as well as the technical simplicity and cost efficiency in their applications. We have constructed siRecords, a database of siRNAs experimentally tested by researchers with consistent efficacy ratings. This database will help siRNA researchers develop more reliable siRNA design rules; in the mean time, siRecords will benefit experimental researchers directly by providing them with information about the siRNAs that have been experimentally tested against the genes of their interest. Currently, more than 4100 carefully annotated siRNA sequences obtained from more than 1200 published siRNA studies are hosted in siRecords. This database will continue to expand as more experimentally tested siRNAs are published.
Availability: The siRecords database can be accessed at http://siRecords.umn.edu/siRecords/
Contact: toli{at}biocompute.umn.edu
| INTRODUCTION |
|---|
|
|
|---|
Short interfering RNAs (siRNAs) are double-stranded RNAs typically of length between 19 and 25 with 2 nt overhangs on the 3' ends, capable of inducing sequence-specific, post-transcriptional deletion of gene products and silencing of the gene activity. The siRNA-based gene knock-down techniques are particularly attractive for gene silencing studies in mammalian cells, because, unlike longer double-stranded RNAs, siRNAs are not likely to trigger interferon responses which lead to non-specific mRNA degradation.
The designing of siRNAs that produce high knock-down activity has been a major challenge. It is known that only a fraction of candidate siRNAs are highly effective in silencing the target genes (Holen et al., 2002; Reynolds et al., 2004). Despite numerous efforts (e.g. Chalk et al., 2004; Cui et al., 2004; Elbashir et al., 2002; Hsieh et al., 2004; Khvorova et al., 2003; Reynolds et al., 2004; Saetrom and Snove, 2004; Ui-Tei et al., 2004; Yiu et al., 2005), the practice of producing high efficacy siRNAs is far from satisfactory. Following the current best design practice, <35% of siRNAs experimentally tested produced >90% knock-down efficacies, and almost 20% of siRNAs produced <50% efficacies.
We have undertaken the siRecords project, the purpose of which is to establish a database of mammalian siRNAs experimentally tested and systematically rated for knock-down efficacy. This is a continuing endeavor aiming to cover as many experimentally tested mammalian siRNAs as possible. The significance of this database project is 2-fold. First, a large collection of siRNAs experimentally tested and consistently evaluated and annotated for efficacy from diverse origins will help siRNA researchers develop more reliable rules for designing high efficacy siRNAs. Second, in the absence of more reliable siRNA design rules, this database will help experimental researchers directly by providing information about which siRNAs have been tested and what efficacy levels were achieved by other researchers that target the genes that they are interested in.
| SYSTEM AND METHODS |
|---|
|
|
|---|
siRNA data gathering and efficacy rating
The current siRecords collection is built entirely based on data gathered from published siRNA related studies. For the making of the collection, queries were made on the PubMed database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed) with keywords siRNA and RNAi. The matching articles were examined, and irrelevant articles (including articles about non-mammalian studies, reviews and some methodological articles where no siRNA experiments were reported) were eliminated for further checking. Articles in non-English languages were also excluded. The remaining articles were carefully scrutinized. For all reported siRNA experiments in these articles, the siRNA sequences, the target genes and key information about experimental conditions (cell lines; the method of producing the siRNAchemically synthesized or vector-based; the method of testing the siRNA efficacywestern blot or real-time PCR or others) were recorded. The siRNA sequences were aligned with the mRNA sequences of the target genes using bl2seq (http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi), and the aligned sequences were recorded. The descriptions in the articles about the results of the experiments were carefully examined for information about the efficacies of the siRNAs, and efficacy ratings were made based on these descriptions. The informative sentences in the original articles describing the siRNA efficacy were copied down and stored in the original_assessment field in the corresponding record. When sufficient textual descriptions about the siRNA efficacies were not available, best efforts were made to assign the efficacy scores based on the figures (gel images or summary bar-graphs) presented in the articles, and this information (the basis of the efficacy score assignment) was also kept in the original_assessment field in the database.
The efficacy rating scheme was designed with balanced considerations. A very coarse-grained rating scheme (e.g. a binary scheme that rates siRNAs with effective and ineffective) would result in poor usefulness of the database because of the limited information it provides. On the other hand, a too fine-grained rating scheme (e.g. one that classifies siRNAs into 10 efficacy categories) would lead to difficulty in obtaining accurate ratings, resulting in a less reliable database being produced. We balanced these two factors and chose to use a four-level rating scheme, where the efficacy of a siRNA is rated as very high if the gene product is reduced by >90%; it is rated as high if the gene product is reduced by 7090%; medium if between 50 and 70% of gene knock-down is achieved and low if <50% of gene knock-down is obtained.
| IMPLEMENTATION |
|---|
|
|
|---|
The siRecords database is a relational database implemented with MySQL on a Fedora II Linux system running on a P4 computer. There are four major tables in the database schema: SiRecord, which stores the siRNA sequences, key experimental conditions, original efficacy assessment (sentences related to efficacy assessment in the original articles) and efficacy ratings made by siRecords curators; Gene, which stores information about the genes targeted by the siRNAs; Correspondent, which stores the contact information of the siRNA origins and Publication, which stores key information about the published studies from which the siRNA data are obtained.
The front-end web interface is implemented as a PHP project running under Apache 2.0.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
Utility of the siRecords database
The main page of siRecords can be accessed at http://siRecords.umn.edu/siRecords/. The use of the database is intuitive and easy. The user makes a query by entering the GenBank accession number or GI number of a gene of interest, and matching records will be displayed for the user to view. After the user makes selection of one record, the record display page will present with all relevant information about the record: sequences, experimental conditions, efficacy ratings and the sources of the records. The links to all other records targeting the same gene, and all other records obtained from the same source will be displayed as well. The user can contact the source of the records easily by clicking the Send Email button at the bottom of the record display page.
Current status
The siRecords database hosts 4162 siRNA records targeting 1453 different genes, collected from 1325 published articles currently. Among these records, 3486 records are complete records. The rest 676 are incomplete, accounting for 16.2% of the total collection.
Comparison with similar database projects
Two database projects similar to siRecordsHuSiDa and siRNAdbhave been unveiled recently (Chalk et al., 2005; Truss et al., 2005). In both the projects, siRNA sequences collected from literature are stored in relational databases. Comparison between siRecords and the other projects reveals substantial advantages of the siRecords project over the other two: First, siRecords hosts the largest collection of siRNA records among the three. HuSiDA stores about 1150 records, and siRNAdb stores about 1230 records. The siRecords project, on the other hand, has hosted more than 4100 siRNA records collected from more than 1200 publications currently. Second, In HuSiDA, only human siRNAs are collected. Although siRNAdb does not exclude non-human records, major focus is put on human siRNA records. siRecords is the only database of the three that extensively hosts non-human mammalian siRNA records. About 16% records in the current collection originate from non-human mammalian studies. Third, in HuSiDA, no efficacy information is provided for the siRNA sequences stored. Rather, only siRNA sequences with at least 50% silencing efficacy are included in the collection. siRNAdb does provide some efficacy assessment of the siRNA stored, but the criteria for making the assessment are complicated and inconsistent. siRecords is the only project among the three that provides systematically annotated efficacy ratings of the siRNA experiments.
| Acknowledgments |
|---|
The authors thank S. Li and W. Liu for technical assistance and helpful discussions. They also thank the Supercomputing Institute, University of Minnesota for providing computing resources. This work was supported in part by the University of Minnesota Graduate School.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
Associate Editor: Dmitrij Frishman ![]()
Received on September 12, 2005; revised on January 24, 2006; accepted on January 24, 2006
| REFERENCES |
|---|
|
|
|---|
Chalk, A.M., et al. (2004) Improved and automated prediction of effective siRNA. Biochem. Biophys. Res. Commun, . 319, 264274[CrossRef][ISI][Medline].
Chalk, A.M., et al. (2005) siRNAdb: a database of siRNA sequences. Nucleic Acids Res, . 33, D131134
Cui, W., et al. (2004) OptiRNAi, an RNAi design tool. Comput. Methods Programs Biomed, . 75, 6773[CrossRef][ISI][Medline].
Elbashir, S.M., et al. (2002) Analysis of gene function in somatic mammalian cells using small interfering RNAs. Methods, 26, 199213[CrossRef][ISI][Medline].
Holen, T., et al. (2002) Positional effects of short interfering RNAs targeting the human coagulation trigger Tissue Factor. Nucleic Acids Res, . 30, 17571766
Hsieh, A.C., et al. (2004) A library of siRNA duplexes targeting the phosphoinositide 3-kinase pathway: determinants of gene silencing for use in cell-based screens. Nucleic Acids Res, . 32, 893901
Khvorova, A., Reynolds, A., Jayasena, S.D. (2003) Functional siRNAs and miRNAs exhibit strand bias [Erratum (2003) Cell, 115, 505.]. Cell, 115, 209216[CrossRef][ISI][Medline].
Reynolds, A., et al. (2004) Rational siRNA design for RNA interference. Nat. Biotechnol, . 22, 326330[CrossRef][ISI][Medline].
Saetrom, P. and Snove, O., Jr. (2004) A comparison of siRNA efficacy predictors. Biochem. Biophys. Res. Commun, . 321, 247253[CrossRef][ISI][Medline].
Truss, M., et al. (2005) HuSiDathe human siRNA database: an open-access database for published functional siRNA sequences and technical details of efficient transfer into recipient cells. Nucleic Acids Res, . 33, D108D111
Ui-Tei, K., et al. (2004) Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference. Nucleic Acids Res, . 32, 936948
Yiu, S.M., et al. (2005) Filtering of ineffective siRNAs and improved siRNA design tool. Bioinformatics, 21, 144151
This article has been cited by other articles:
![]() |
W. Gong, Y. Ren, H. Zhou, Y. Wang, S. Kang, and T. Li siDRM: an effective and generally applicable online siRNA design tool Bioinformatics, October 15, 2008; 24(20): 2405 - 2406. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ichihara, Y. Murakumo, A. Masuda, T. Matsuura, N. Asai, M. Jijiwa, M. Ishida, J. Shinmi, H. Yatsuya, S. Qiao, et al. Thermodynamic instability of siRNA duplex is a prerequisite for dependable prediction of siRNA activities Nucleic Acids Res., September 25, 2007; 35(18): e123 - e123. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Holen Efficient prediction of siRNAs with siRNArules 1.0: An open-source JAVA approach to siRNA algorithms RNA, September 1, 2006; 12(9): 1620 - 1625. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


