Bioinformatics Advance Access originally published online on August 26, 2008
Bioinformatics 2008 24(21):2561-2563; doi:10.1093/bioinformatics/btn441
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
GOfetcher: a database with complex searching facility for gene ontology
1Department of Biological Sciences, University of Southern Mississippi, Hattiesburg, MS 39406 and 2Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS 39180, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: An important contribution to the Gene Ontology (GO) project is to develop tools that facilitate the creation, maintenance and use of ontologies. Several tools have been created for communicating and using the GO project. However, a limitation with most of these tools is that they suffer from lack of a comprehensive search facility. We developed a web application, GOfetcher, with a very comprehensive search facility for the GO project and a variety of output formats for the results. GOfetcher has three different levels for searching the GO: Quick Search, Advanced Search and Upload Files for searching. The application includes a unique search option which generates gene information given a nucleotide or protein accession number which can then be used in generating GO information. The output data in GOfetcher can be saved into several different formats; including spreadsheet, comma-separated values and the extensible markup language (XML) format. The database is available at http://mcbc.usm.edu/gofetcher/.
Contact: youping.deng{at}usm.edu or mehdi.pirooznia{at}usm.edu
| 1 INTRODUCTION |
|---|
|
|
|---|
Biologists spend a lot of time and effort in searching for information about gene functions in their research. This is further complicated by variations in terminology resulting in redundancy of available information. The Gene Ontology (GO) Consortium (www.geneontology.org) (Ashburner et al., 2000) is a collaborative approach to address the need for consistent description of gene products in different databases. Since 1998, the GO Consortium has grown to include many databases, including several of the world's major repositories for plant, animal and microbial genomes. The GO has several benefits including the long-term maintenance of non-redundant annotation datasets. Research groups that do not have an established database for certain model species or the time to commit to long-term maintenance of their datasets can supply annotations to the central repository GO project.
The GO project has developed three structured vocabularies, called ontologies, which describe gene products in terms of their associated biological processes, cellular components and molecular functions. An important contribution to the GO project is the development of tools that facilitate the creation, maintenance and use of ontologies. Several tools have already been created for this purpose. These are divided into two categories: consortium, developed by the GO consortium, and non-consortium tools, developed by other groups. Among the consortium developed tools is the AmiGO (http://amigo.geneontology.org) which provides an interface to search and browse the ontology and annotation data. Several tools are included in the non-consortium tools, including Blast2GO (Conesa et al., 2005) to obtain GO results from nucleotide sequences, and QuickGO (http://www.ebi.ac.uk/ego/) to find relationships and definitions for specific GO terms. These tools search either GO or its specific databases and the outcome is often satisfactory. Some of them support batch searching, while the others only have single keyword searching.
However, a limitation with most of these tools is that they suffer from lack of a comprehensive search facility. When one is handling several thousands of search queries, such as annotation results of the expressed sequence tags (EST) data for an organism, the search becomes cumbersome. Another limitation with the batch BLAST outcome of any local or web-based BLAST application is that it contains accession version numbers of nucleotides or proteins that are inadequate for subsequent GO tool searches where gene symbols are required to obtain GO information. This necessitates an additional query of the RefSeq NCBI database (http://www.ncbi.nlm.nih.gov/RefSeq/).
Here, we describe a database and web application that can be used for extraction of GO information. The output contains information fetched from related external databases, such as NCBI, ArabidopsisDB (Poole, 2007), GeneDB (Hertz-Fowler et al., 2004), Saccharomyces Genome Database (Hong et al., 2008), FlybaseDB (Grumbling and Strelets, 2006), Mouse Genome Informatics (Blake et al., 2003), Wormbase (Rogers et al., 2008) and TIGR Annotation (http://www.tigr.org). An additional search option to generate gene information from nucleotide or protein accession number has also been implemented to link batch BLAST outcomes with GO searches.
| 2 IMPLEMENTATION |
|---|
|
|
|---|
The GOfetcher web application and search engine have been written in PHP programming language (http://www.php.net). Therefore, GOfetcher is platform independent, and can run on any standard machine with a web browser. It communicates with a local MySQL database (http://www.mysql.com) which stores data.
The search options enable users to input simple as well as complex queries and search the GOfetcher. The advanced search panel allows users to define specific queries using Boolean operators connecting multiple fields for specific requirements. An online tutorial has been developed to describe the various features of the database with examples.
GOfetcher has three different levels for searching the GO project.
- Quick Search: it searches any keyword as a species ID, DB specific ID, Gene symbol, GO term, GO name, category, References and Evidence Code. Keywords should be separated by any comma delimited or whitespace, such as space, tab or line break. There is also option for searching Any words, All words or Exact phrase.
- Advanced Search: the Advanced Search is able to search complicated combination of keywords for the species ID, DB specific ID, Gene symbol, GO term, GO name, category, References and Evidence Code. The search can be made with exact match, contain, not contain or starts with keywords.
- Upload Files: the Upload Files can upload file(s) containing keywords like quick search separated by comma or any white spaces. GOfetcher then searches for any words in the files and show the results.
From the browse menu it is possible to browse GOfetcher by species. Currently our database includes 18 model organism, including Arabidopsis thaliana, Bacillus anthracis, Caenorhabditis elegans, Campylobacter jejuni, Candida albicans, Drosophila melanogaster, Mus musculus, Oryza sativa, Rattus norvegicus, Saccharomyces cerevisiae and Vibrio cholera. There are 8 47 510 annotations in the database.
| 3 RESULTS |
|---|
|
|
|---|
Each search returns a result list including Species ID, Species Unique ID, Symbol, GO Term, Name and Category (see below) as well as a summary of the distinct matching entries with the pie chart for categories:
- Species ID: this is often a two or three letter abbreviation for a species, for instance FB for flybase, and MGI for mouse.
- Species Unique ID: this is the specific accession ID of a species, e.g. MGI:1918918 for mouse and FBgn0015567 for flybase Drosophila. Information about specific organism from related external databases is provided here.
- Symbol: gene name with access to NCBI gene database information.
- GO Term: a GO-specific term ID with both tree and graphic view.
- Name: a GO-specific term name with related information from GO database (geneontology.org).
- Category: this categorizes gene in one of the three organizing principles of GO which are cellular component, biological process and molecular function.
- References: corresponds to literature reference(s) or database record(s) from a model database or PubMed.
- Evidence Code: explains the codes that are used to indicate the nature of the evidence that supports a particular annotation. See the GO evidence code guide (http://www.geneontology.org/GO.evidence.shtml) for the list of valid evidence codes for GO annotations.
If selected by the user, a summary of the distinct matching entries with a pie chart for categories will appear on the top of the search result page (Fig. 1). The summary table contains the unique number for species, unique ID, symbol, GO term and non-redundant term name. By clicking on each number, the user will be able to view the related list.
When search results appear, each record contains hyperlinked information fetched from related external databases. The GOfetcher extracts information from a variety of databases including NCBI, ArabidopsisDB, GeneDB, Saccharomyces Genome Database, FlybaseDB, Mouse Genome Informatics, Wormbase and TIGR Annotation. Sixteen databases are currently available through the fetching process. The output in GOfetcher can be saved into several different formats: tabular format spreadsheet, Word document, comma-separated values (CSV), the extensible markup language (XML) format and printer friendly format.
|
| 4 PROJECT AVAILABILITY |
|---|
|
|
|---|
The GOfetcher has been written in PHP programming language and uses MySQL as a database RDBMS. The GOfetcher will be maintained in the USM server and we plan to keep it functional from now on. The database will be refreshed at least once a month to catch up with new data that are updated by GO consortium. It is available at http://mcbc.usm.edu/gofetcher/.
Funding: Mississippi Functional Genomics Networks and Mississippi Computational Biology Consortium (NSF Grant # EPS-0556308); US Army Environmental Quality Program (contract #W912HZ-05-P-0145).
Conflict of Interest: Permission was granted by the Chief of the US Army Corps of Engineers to publish this information.
| FOOTNOTES |
|---|
Associate Editor: Dmitrij Frishman
Received on March 4, 2008; revised on May 29, 2008; accepted on August 18, 2008
| REFERENCES |
|---|
|
|
|---|
Ashburner M, et al. Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat. Genet (2000) 25:25–29.[CrossRef][Web of Science][Medline]
Blake JA, et al. MGD: the Mouse Genome Database. Nucleic Acids Res. (2003) 31:193–195.
Conesa A, et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics (2005) 21:3674–3676.
Grumbling G, Strelets V. FlyBase: anatomical data, images and queries. Nucleic Acids Res. (2006) 34:D484–D488.
Hertz-Fowler C, et al. GeneDB: a resource for prokaryotic and eukaryotic organisms. Nucleic Acids Res. (2004) 32:D339–D343.
Hong EL, et al. Gene ontology annotations at SGD: new data sources and annotation methods. Nucleic Acids Res. (2008) 36:D577–D581.
Poole RL. The TAIR database. Methods Mol. Biol. (2007) 406:179–212.[Medline]
Rogers A, et al. WormBase 2007. Nucleic Acids Res. (2008) 36:D612–D617.
This article has been cited by other articles:
![]() |
R. P. Huntley, D. Binns, E. Dimmer, D. Barrell, C. O'Donovan, and R. Apweiler QuickGO: a user tutorial for the web-based Gene Ontology browser Database, September 30, 2009; 2009(0): bap010 - bap010. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Gust, M. Pirooznia, M. J. Quinn Jr, M. S. Johnson, L. Escalon, K. J. Indest, X. Guan, J. Clarke, Y. Deng, P. Gong, et al. Neurotoxicogenomic Investigations to Assess Mechanisms of Action of the Munitions Constituents RDX and 2,6-DNT in Northern Bobwhite (Colinus virginianus) Toxicol. Sci., July 1, 2009; 110(1): 168 - 180. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


