Skip Navigation


Bioinformatics Advance Access originally published online on September 25, 2007
Bioinformatics 2007 23(22):3103-3104; doi:10.1093/bioinformatics/btm462
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow A corrigendum has been published
Right arrow All Versions of this Article:
23/22/3103    most recent
btm462v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Fujibuchi, W.
Right arrow Articles by Horton, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fujibuchi, W.
Right arrow Articles by Horton, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

CellMontage: similar expression profile search server

Wataru Fujibuchi 1, Larisa Kiseleva 1, Takeaki Taniguchi 2, Hajime Harada 1 and Paul Horton 1,*

1Computational Biology Research Center, AIST Waterfront Bio-IT Research Building, 2-42 Aomi Koto-ku, Tokyo 135-0064 and 2Mitsubishi Research Institute, Inc., 3-6, Otemachi 2-chome Chiyoda-ku, Tokyo 100-8141, Japan

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: The establishment and rapid expansion of microarray databases has created a need for new search tools. Here we present CellMontage, the first server for expression profile similarity search over a large database—69 000 microarray experiments derived from NCBI's; GEO site. CellMontage provides a novel, content-based search engine for accessing gene expression data. Microarray experiments with similar overall expression to a user-provided expression profile (e.g. microarray experiment) are computed and displayed—usually within 20 s. The core search engine software is downloadable from the site.

Availability: http://cellmontage.cbrc.jp/

Contact: horton-p{at}aist.go.jp

Supplementary information: http://cellmontage.cbrc.jp/supplementary/


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Microarray data repositories such as gene expression omnibus (GEO) (Barrett et al., 2005, 2007) and Array Express (Parkinson et al., 2005) make it possible to access tens of thousands of microarray profiles, and provide useful initial analysis of the data. Specialized repositories, such as the Oncomine (Rhodes et al., 2007) site for cancer-related gene expression profiles, provide in depth analysis for particular application areas.

These sites allow users to access gene expression profiles via information about the chip platform (such as manufacturer and model) or the type of experiment performed (the gene experiment series, keyword match on the sample description, etc.).

CellMontage allows users to access expression data in a novel, content-based manner. Query profiles (samples) are searched against a database to find profiles with similar overall gene expression.

The idea of gene expression profile similarity search has been suggested (Basset et al., 1999) and a prototype system, GEST (Hunter et al., 2001), has been implemented in the past. However, we believe that our web site is the first practical, large-scale implementation made available to the public via the internet.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Similarity measure
Our approach is based on the Spearman's; rank correlation of gene expression, which allows data normalized in different ways to be directly compared. Profiles with a high rank correlation (with sufficient number of common genes or probes) to a query are returned to the user as the top ‘hits’. More precisely, the hits are ranked by their statistical significance, based on a null model of randomly ordered genes.

2.2 Data source and integration
Our current database contains 69 000 profiles obtained from GEO. The expression values used are the raw ID_REF expression values as provided by GEO. ID_REF ids are mapped to their UniGene names. Comparison at the UniGene (rather than probe) level enables cross-platform comparisons.

2.3 Search engine
Our search engine uses a customized algorithm for the efficient calculation of the correlation coefficient between profiles measuring different sets of genes (due to different platforms and possibly missing values). The algorithm has been described in detail in Horton et al. (2006). A stand-alone, command line version of the search engine software is downloadable from the server.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
3.1 Example search result
Profile similarity search is performed with the CellMontage server by choosing ‘Profile Matching’ on the top page. The microarray platform of the profiles to search against must be selected and the query itself must be given. For the query, the user's; original data can be used, or profiles from GEO can be retrieved to use as a query through CellMontage's; Profile Retrieval function. The currently supported data formats are the ‘CellMontage format’ shown in Figure 1, and the CHP format for several Affymetrix platforms.

Figure 2 shows the query input screen, loaded with a microarray profile query taken from the left temporal parietal region of human brain. The results of this query (using all genes for comparison) are shown in Table 1. As expected, the query itself is found as the first hit. In this case, all of the top hits come from brain. Screen shots of the search results of this and other queries can be found in the Supplementary Material.


View this table:
[in this window]
[in a new window]

 
Table 1. Example search results

 

Figure 1
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. An example of a CellMontage format file is shown.

 

Figure 2
View larger version (52K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. The query input screen of CellMontage is shown. The screen shown was obtained by first following the ‘Get profile’ link to obtain a GEO profile to use as a query. (Although, expression values are shown, only the relative rank information is used for matching.)

 

    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
4.1 Search accuracy
In general, we find that the top hits usually come from similar cell types to the query in same platform searches. Accurate matching for cross-platform comparisons is more difficult and the results are mixed [more concrete accuracy estimates are given in Horton et al. (2006)]. We are currently investigating weighting genes as a way to increase cross-platform comparison accuracy (manuscript in preparation).

4.2 Applications
CellMontage is a unique server with many possible applications, including:

  • Expression motif matching
  • Sample validation
  • Exploring the relationship between gene expression in different cell types
  • Prediction of sample characteristics.

Using the CellMontage format, queries can be constructed to match expression motifs, i.e. find profiles in which a given set of genes is (approximately) expressed in a given order.

Sample validation is simple but pragmatic. Samples taken in the laboratory could possibly be contaminated by neighboring tissues or mislabeled. If a sample's; closest CellMontage hits are consistent, but different than expected, that would indicate that the sample should be closely inspected for correctness.

The genes used for matching can be restricted to Gene Ontology categories. Using this function, one can investigate the similarity of the query to database profiles within the context of a particular type of gene (i.e. secreted proteins).

Finally, a query may be expected to be similar to its top hits in phenotype, response to treatment, etc. This application is closely analogous to sequence similarity search and will become increasingly powerful as the gene expression database becomes richer and better annotated.


    5 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We have presented CellMontage, the first web server for gene expression profile similarity search. This unique server has applications in microarray, gene expression and cell biology research.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
L.K. and P.H. were partially supported by a Japanese Ministry of Education, Culture, Sport, Science and Technology, Grant-in-Aid for Scientific Research (B).

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on July 17, 2007; revised on August 17, 2007; accepted on September 6, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Barrett T, et al. NCBI GEO: mining millions of expression profiles - database and tools. Nucleic Acids Res. (2005) 33:D562–D566.[Abstract/Free Full Text]

    Barrett T, et al. NCBI GEO: mining tens of millions of expression profiles. Nucleic Acids Res (2007) 35:D760–D765.[Abstract/Free Full Text]

    Basset D Jr, et al. Gene expression informatics – it s all in your mine. Nat. Gene (1999) 21:51–55.[CrossRef][Web of Science][Medline]

    Horton P, et al. RaPiDS: an algorithm for rapid expression profile database search. Genome Inform (2006) 17:67–76.[Medline]

    Hunter L, et al. GEST: a gene expression search tool based on a novel bayesian similarity metric. Bioinformatics (2001) 17:S115–S122.[Abstract]

    Parkinson H, et al. ArrayExpress–a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. (2005) 33. D553–D555.

    Rhodes DR, et al. Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. Neoplasia (2007) 9:166–180.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
J. Caldas, N. Gehlenborg, A. Faisal, A. Brazma, and S. Kaski
Probabilistic retrieval and visualization of biologically relevant microarray experiments
Bioinformatics, June 15, 2009; 25(12): i145 - i153.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow A corrigendum has been published
Right arrow All Versions of this Article:
23/22/3103    most recent
btm462v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Fujibuchi, W.
Right arrow Articles by Horton, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fujibuchi, W.
Right arrow Articles by Horton, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?