Bioinformatics Advance Access originally published online on September 25, 2007
Bioinformatics 2007 23(22):3103-3104; doi:10.1093/bioinformatics/btm462
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CellMontage: similar expression profile search server
1Computational Biology Research Center, AIST Waterfront Bio-IT Research Building, 2-42 Aomi Koto-ku, Tokyo 135-0064 and 2Mitsubishi Research Institute, Inc., 3-6, Otemachi 2-chome Chiyoda-ku, Tokyo 100-8141, Japan
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: The establishment and rapid expansion of microarray databases has created a need for new search tools. Here we present CellMontage, the first server for expression profile similarity search over a large database—69 000 microarray experiments derived from NCBI's; GEO site. CellMontage provides a novel, content-based search engine for accessing gene expression data. Microarray experiments with similar overall expression to a user-provided expression profile (e.g. microarray experiment) are computed and displayed—usually within 20 s. The core search engine software is downloadable from the site.
Availability: http://cellmontage.cbrc.jp/
Contact: horton-p{at}aist.go.jp
Supplementary information: http://cellmontage.cbrc.jp/supplementary/
| 1 INTRODUCTION |
|---|
|
|
|---|
Microarray data repositories such as gene expression omnibus (GEO) (Barrett et al., 2005, 2007) and Array Express (Parkinson et al., 2005) make it possible to access tens of thousands of microarray profiles, and provide useful initial analysis of the data. Specialized repositories, such as the Oncomine (Rhodes et al., 2007) site for cancer-related gene expression profiles, provide in depth analysis for particular application areas.
These sites allow users to access gene expression profiles via information about the chip platform (such as manufacturer and model) or the type of experiment performed (the gene experiment series, keyword match on the sample description, etc.).
CellMontage allows users to access expression data in a novel, content-based manner. Query profiles (samples) are searched against a database to find profiles with similar overall gene expression.
The idea of gene expression profile similarity search has been suggested (Basset et al., 1999) and a prototype system, GEST (Hunter et al., 2001), has been implemented in the past. However, we believe that our web site is the first practical, large-scale implementation made available to the public via the internet.
| 2 METHODS |
|---|
|
|
|---|
2.1 Similarity measure
Our approach is based on the Spearman's; rank correlation of gene expression, which allows data normalized in different ways to be directly compared. Profiles with a high rank correlation (with sufficient number of common genes or probes) to a query are returned to the user as the top hits. More precisely, the hits are ranked by their statistical significance, based on a null model of randomly ordered genes.
2.2 Data source and integration
Our current database contains 69 000 profiles obtained from GEO. The expression values used are the raw ID_REF expression values as provided by GEO. ID_REF ids are mapped to their UniGene names. Comparison at the UniGene (rather than probe) level enables cross-platform comparisons.
2.3 Search engine
Our search engine uses a customized algorithm for the efficient calculation of the correlation coefficient between profiles measuring different sets of genes (due to different platforms and possibly missing values). The algorithm has been described in detail in Horton et al. (2006). A stand-alone, command line version of the search engine software is downloadable from the server.
| 3 RESULTS |
|---|
|
|
|---|
3.1 Example search result
Profile similarity search is performed with the CellMontage server by choosing Profile Matching on the top page. The microarray platform of the profiles to search against must be selected and the query itself must be given. For the query, the user's; original data can be used, or profiles from GEO can be retrieved to use as a query through CellMontage's; Profile Retrieval function. The currently supported data formats are the CellMontage format shown in Figure 1, and the CHP format for several Affymetrix platforms.
Figure 2 shows the query input screen, loaded with a microarray profile query taken from the left temporal parietal region of human brain. The results of this query (using all genes for comparison) are shown in Table 1. As expected, the query itself is found as the first hit. In this case, all of the top hits come from brain. Screen shots of the search results of this and other queries can be found in the Supplementary Material.
|
|
|
| 4 DISCUSSION |
|---|
|
|
|---|
4.1 Search accuracy
In general, we find that the top hits usually come from similar cell types to the query in same platform searches. Accurate matching for cross-platform comparisons is more difficult and the results are mixed [more concrete accuracy estimates are given in Horton et al. (2006)]. We are currently investigating weighting genes as a way to increase cross-platform comparison accuracy (manuscript in preparation).
4.2 Applications
CellMontage is a unique server with many possible applications, including:
- Expression motif matching
- Sample validation
- Exploring the relationship between gene expression in different cell types
- Prediction of sample characteristics.
Using the CellMontage format, queries can be constructed to match expression motifs, i.e. find profiles in which a given set of genes is (approximately) expressed in a given order.
Sample validation is simple but pragmatic. Samples taken in the laboratory could possibly be contaminated by neighboring tissues or mislabeled. If a sample's; closest CellMontage hits are consistent, but different than expected, that would indicate that the sample should be closely inspected for correctness.
The genes used for matching can be restricted to Gene Ontology categories. Using this function, one can investigate the similarity of the query to database profiles within the context of a particular type of gene (i.e. secreted proteins).
Finally, a query may be expected to be similar to its top hits in phenotype, response to treatment, etc. This application is closely analogous to sequence similarity search and will become increasingly powerful as the gene expression database becomes richer and better annotated.
| 5 CONCLUSION |
|---|
|
|
|---|
We have presented CellMontage, the first web server for gene expression profile similarity search. This unique server has applications in microarray, gene expression and cell biology research.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
L.K. and P.H. were partially supported by a Japanese Ministry of Education, Culture, Sport, Science and Technology, Grant-in-Aid for Scientific Research (B).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Martin Bishop
Received on July 17, 2007; revised on August 17, 2007; accepted on September 6, 2007
| REFERENCES |
|---|
|
|
|---|
Barrett T, et al. NCBI GEO: mining millions of expression profiles - database and tools. Nucleic Acids Res. (2005) 33:D562–D566.
Barrett T, et al. NCBI GEO: mining tens of millions of expression profiles. Nucleic Acids Res (2007) 35:D760–D765.
Basset D Jr, et al. Gene expression informatics – it s all in your mine. Nat. Gene (1999) 21:51–55.[CrossRef][Web of Science][Medline]
Horton P, et al. RaPiDS: an algorithm for rapid expression profile database search. Genome Inform (2006) 17:67–76.[Medline]
Hunter L, et al. GEST: a gene expression search tool based on a novel bayesian similarity metric. Bioinformatics (2001) 17:S115–S122.[Abstract]
Parkinson H, et al. ArrayExpress–a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. (2005) 33. D553–D555.
Rhodes DR, et al. Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. Neoplasia (2007) 9:166–180.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
J. Caldas, N. Gehlenborg, A. Faisal, A. Brazma, and S. Kaski Probabilistic retrieval and visualization of biologically relevant microarray experiments Bioinformatics, June 15, 2009; 25(12): i145 - i153. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


