Bioinformatics Advance Access originally published online on July 19, 2005
Bioinformatics 2005 21(18):3681-3682; doi:10.1093/bioinformatics/bti587
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
GeneCruiser: a web service for the annotation of microarray data
Broad Institute of MIT and Harvard Cambridge, MA 02141, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: GeneCruiser is a web service allowing users to annotate their genomic data by mapping microarray feature identifiers to gene identifiers from databases, such as UniGene, while providing links to web resources, such as the UCSC Genome Browser. It relies on a regularly updated database that retrieves and indexes the mappings between microarray probes and genomic databases. Genes are identified using the Life Sciences Identifier standard.
Availability: GeneCruiser is freely available in the following forms: Web service and Web application, http://www.genecruiser.org; GenePattern, GeneCruiser access has been integrated into our microarray analysis platform, GenePattern. http://www.genepattern.org
Contact: liefeld{at}broad.mit.edu
| 1 INTRODUCTION |
|---|
|
|
|---|
One difficulty facing researchers who use microarray technologies is that to determine information about a microarray feature, such as the gene it represents, its chromosomal location or its molecular function, a researcher must amalgamate this information from a number of publicly available databases. Although vendors have included more of these annotations with their products, this information quickly becomes obsolete.
In addition, researchers often have a list of genes or a genomic category, e.g. tyrosine kinases and would like to find which microarray identifiers correspond to them.
Although numerous standalone applications exist for annotating microarray identifiers, such as DRAGON and Resourcerer, they are not easily integrated with other tools and applications. GeneCruiser was designed to address both the need for identifier annotation and the desire for easy integration by incorporating a public Web Service Description Language (WSDL) defined SOAP web service interface.
| 2 THE GENECRUISER APPLICATION |
|---|
|
|
|---|
GeneCruiser is a web service and web application designed to annotate genomic data in several ways. GeneCruiser allows users to map gene identifiers from genomic databases to Affymetrix probes, find information about Affymetrix probes in genomic databases by keyword searches and locate Affymetrix probes in the human genome using web resources, such as the UCSC Genome Browser.
The GeneCruiser web application facilitates the annotation queries via a web browser-based interface. Desired annotations and identifiers are selectable via HTML forms. IDs may be entered directly into the form or through uploading text files.
2.1 Databases and annotations
GeneCruiser currently links Affymetrix probe sets to Gene Ontology terms, IMAGE clone IDs and data available in UniGene, LocusLink, RefSeq, SwissProt, and the TIGR human and mouse gene indices. GeneCruiser also provides links from an identifier to its corresponding information in the UCSC Genome Browser, PubMed, GenBank, GeneCards and the Gene Expression Omnibus (Fig. 1). For optimum performance, these databases are automatically downloaded and indexed on a machine local to the GeneCruiser application on a periodic basis.
|
| 3 GENECRUISER WEB SERVICE |
|---|
|
|
|---|
To facilitate integration, GeneCruiser provides a SOAP web service interface allowing other applications to make use of its functionality.
3.1 LSID identifiers
Any service that deals with IDs from multiple disparate sources must deal with the issue of computationally recognizing the context or scope of an identifier. GeneCruiser uses the Life Science Identifier (LSID) (Clark et al., 2004; OMG, 2004) as a generic mechanism to allow the transmission of both the identifier and its context.
The LSID is an Object Management Group specification to represent an identifier with its context in the form of a single Universal Resource Name (URN). It uses the following syntax to specify an identifier and its context: urn:lsid:<authority>: <lsid_namespace>:<identifier>:<version> where, urn declares this is a URN, lsid is the URN namespace, <authority> is the issuing authority or source, <lsid_namespace> is the context, <identifier> is a string representation and <version> is an optional version field.
For example, the Affymetrix probe D10537_s_at on the hu6800 chip is represented by the LSID, urn:lsid:affymetrix.com:probeset.hu6800:D10537_s_at.
3.2 Web service interface
The primary interface methods of the GeneCruiser SOAP interface are
- annotateProbes(...)retrieve annotations for a list of microarray identifiers.
- idsToProbes(...)retrieve microarray IDs for a known accession (e.g. LocusLink, SwissProt).
- keywordsToProbes(...)retrieve microarray identifiers based on a keyword search.
Identifiers for probe sets passed as query parameters to the annotateProbes method may be written as LSIDs or as strings containing the identifier. Strings used in this method are assumed to be Affymetrix probe set identifiers. For the idsToProbes(...) method, an ID may be in the form of an LSID in order to provide the identifier's context (i.e. to allow the server to distinguish between IDs from GenBank, SwissProt, etc.), or as a string containing the accession identifier itself, in which case heuristics are used to determine the identifiers' context.
| 4 INTEGRATION WITH OTHER APPLICATIONS |
|---|
|
|
|---|
Other applications can access GeneCruiser as a web service. For example, we have integrated it with the HeatMapViewer in our microarray analysis platform, GenePattern (see Availability section for URL). As users view the heat map image of their microarray data, they may retrieve information about the probes directly from GeneCruiser, allowing them to view annotations from multiple sources simultaneously with their data.
Applications can integrate GeneCruiser from any programming language that supports the SOAP protocol, including Perl, Java, C, etc. The interface is available as WSDL (see Availability section for URL) to permit applications to generate client bindings for GeneCruiser using their local web service tool set.
In addition, a client side library is available for integration from the Java programming language. Example code using the GeneCruiser web service via this library is shown in Figure 2.
|
| Acknowledgments |
|---|
The authors wish to thank the following members of the Cancer Program at the Broad Institute: Todd Golub, Ken Ross, Stefano Monti, Justin Lamb, Jim Lerner and Sridhar Ramaswamy.
Conflict of Interest: none declared.
Received on March 23, 2005; revised on May 16, 2005; accepted on July 14, 2005
| REFERENCES |
|---|
|
|
|---|
Clark, T., et al. (2004) Globally distributed object identification for biological knowledgebases. Brief. Bioinformatics, 5, 5970
OMG. (2004) Life Science Identifiers, OMG Adopted Specification, dtc/040802, August 2004.
This article has been cited by other articles:
![]() |
P. Romano Automation of in-silico data analysis processes through workflow management systems Brief Bioinform, January 1, 2008; 9(1): 57 - 68. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gould, G. Getz, S. Monti, M. Reich, and J. P. Mesirov Comparative gene marker selection suite Bioinformatics, August 1, 2006; 22(15): 1924 - 1925. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



