Skip Navigation


Bioinformatics Advance Access originally published online on February 2, 2006
Bioinformatics 2006 22(8):1024-1026; doi:10.1093/bioinformatics/btl036
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/8/1024    most recent
btl036v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (15)
Google Scholar
Right arrow Articles by Ameur, A.
Right arrow Articles by Komorowski, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ameur, A.
Right arrow Articles by Komorowski, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org

The LCB Data Warehouse

Adam Ameur 1, Vladimir Yankovski 1, Stefan Enroth 1, Ola Spjuth 2 and Jan Komorowski 1,*

1 The Linnaeus Centre for Bioinformatics, Uppsala University and The Swedish University for Agricultural Sciences Sweden
2 Department of Pharmaceutical Biosciences, Uppsala University Sweden

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION
 3 WORKFLOW
 4 CURRENT DEVELOPMENTS
 REFERENCES
 

Summary: The Linnaeus Centre for Bioinformatics Data Warehouse (LCB-DWH) is a web-based infrastructure for reliable and secure microarray gene expression data management and analysis that provides an online service for the scientific community. The LCB-DWH is an effort towards a complete system for storage (using the BASE system), analysis and publication of microarray data. Important features of the system include: access to established methods within R/Bioconductor for data analysis, built-in connection to the Gene Ontology database and a scripting facility for automatic recording and re-play of all the steps of the analysis. The service is up and running on a high performance server. At present there are more than 150 registered users.

Availability: An open functional version is available at https://dw.lcb.uu.se/index.phtml?i_login=test. User accounts are created upon request. Additional facilities including plug-ins, user documentation and a password protected data storage system are available from http://www.lcb.uu.se/lcbdw.php

Contact: Jan.Komorowski{at}lcb.uu.se


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION
 3 WORKFLOW
 4 CURRENT DEVELOPMENTS
 REFERENCES
 
The aim of the LCB-DWH is to help facilitate management and analysis of two-channel microarray data, and to help non-experts to keep-up with developments in the field of data analysis by continuously integrating new tools and new sources of biological information. LCB-DWH differs from most other systems in that it also provides secure and reliable storage according to current international standards.


    2 DESCRIPTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION
 3 WORKFLOW
 4 CURRENT DEVELOPMENTS
 REFERENCES
 
The LCB-DWH is developed from BASE, a widely used open source platform for comprehensive management and analysis of microarray data (Saal et al., 2002). The main difference in system design is that LCB-DWH, as opposed to BASE, enables storage and analysis of data to be performed on separate hardware. The LCB-DWH benefits from features that are inherited directly from the BASE architecture, such as MIAME compliant storage (Brazma et al., 2001), data sharing between groups of researchers, separation of projects, publication through MAGE-ML format (Spellman et al., 2002) and presentation of data analysis in a tree structure.

Development in the LCB-DWH has been focused on integrating a wide collection of data analysis tools, and this work has been facilitated by the BASE plug-in architecture. A wrapper to the programming language R (Ithaka and Gentleman, 1996) has been developed, enabling access to the open source packages within Bioconductor (Gentleman et al., 2004), which contains a wide collection of efficient tools for microarray data analysis and visualization. Within the LCB-DWH system, those tools and several new ones are integrated in one and easy to comprehend framework that allows non-expert users to apply the sophisticated tools to their data in an intuitive manner. Moreover, we have designed and implemented an interactive Gene Ontology (The Gene Ontology Consortium, 2000) tool, which is invoked from within the user interface. It enables users to explore the biological function of a set of genes with a GO browser, or to test for the statistical over- or under-representation of different GO classes in a set of genes with respect to a reference set, e.g. all genes on the array. The problem of multiple testing is handled in two ways. The default option is to calculate the expected number of significant GO terms at some user specified cut-off level. If it is considerably lower than the number of observed significant GO terms at that level, then the results of the whole test is more likely to be correct. Optionally, the user may select a method for multiple test correction. Furthermore, we have implemented a useful feature for creating customized links from all genes in a dataset to any external web resource of biological knowledge.

Another issue that is addressed in the LCB-DWH is reproduction of data analysis. For this purpose, a facility that enables the user to save all steps of data analysis in a script has been developed. The script may be applied either to a specific path in the data analysis tree or to the complete structure. These scripts form protocols that may be re-used for automating repetitive tasks or by reviewers judging the quality of the analysis.

Security and reliability issues are given high priority. Data are stored on a server with a double RAID solution; in addition, incremental backups are taken daily. Communication with the server is done through encrypted connections for password protected accounts.


    3 WORKFLOW
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION
 3 WORKFLOW
 4 CURRENT DEVELOPMENTS
 REFERENCES
 
LCB-DWH implements the whole dataflow described in the MIAME requirements for microarray experiments. The starting point is data from image quantification software, and data from several arrays are grouped into a single experiment. The experiment may then be shared within a research group, before it is transferred to data analysis module.

Then follows the pre-processing of data, where there are a number of different methods available for background correction, normalization and filtering. Moreover, spots that are printed on multiple positions on an array may be merged into one single value. At any point the quality of data may be checked with several different data visualization methods such as, for instance, array plots, PCA plots and plots for control clones. Such plots can prove helpful, e.g. when selecting an appropriate pre-processing method for some specific dataset. The step after pre-processing usually is detection of candidate genes, i.e. all genes that were targeted by the particular experiment. For example, this can be done using various methods for detecting differentially expressed genes, or by clustering methods. Once a set of candidate genes has been identified, the Gene Ontology tool can be used to analyze biological processes, molecular functions and cellular compartments in which those genes are involved. Sample pictures produced within the LCB-DWH in the data analysis process are shown in Figure 1.


Figure 1
View larger version (56K):
[in this window]
[in a new window]
 
Fig. 1 Examples of plots produced in the LCB-DWH. (a) Signal distributions over a microarray. (b) Plots of differentially expressed genes, detected by an empirical Bayes method. (c) Visualization of a cluster produced by the k-means algorithm. (d) Functions for a set of differentially expressed genes, viewed with the built-in Gene Ontology browser.

 
Major journals require that the expression data be made available on public repositories such as ArrayExpress (Brazma et al., 2003) at EBI. The LCB-DWH enables export of data in MAGE-ML format, which can be uploaded to such repositories. Manual uploading of data is otherwise a tedious and time consuming process.


    4 CURRENT DEVELOPMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION
 3 WORKFLOW
 4 CURRENT DEVELOPMENTS
 REFERENCES
 
Ongoing development of the LCB-DWH system includes adaptation to new microarray technologies such as ChIP–chip and array-CGH, which require new methodologies both for data storage and analysis.


    Acknowledgments
 
The authors thank Hanna Göransson for helpful discussions and Jakub Orzechowski Westholm for help with some implementations. The LCB-DWH is supported by grants from the Wallenberg Consortium North and from the Knut and Alice Wallenberg foundation. Funding to pay the Open Access publication charges for this article was provided by the Linnaeus Centre for Bioinformatics.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Alfonso Valencia

Received on August 26, 2005; revised on November 14, 2005; accepted on January 31, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION
 3 WORKFLOW
 4 CURRENT DEVELOPMENTS
 REFERENCES
 

    Brazma, A., et al. (2003) ArrayExpress—a public repository for microarray gene expression data at the EBI. Nucleic Acids Res, . 31, 68–71[Abstract/Free Full Text].

    Brazma, A., et al. (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet, . 29, 365–371[CrossRef][ISI][Medline].

    Gentleman, R.C., et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol, . 5, R80[CrossRef][Medline].

    Ihaka, R. and Gentleman, R. (1996) R: a language for data analysis and graphics. J. Comput. Graph. Stat, . 5, 299–314[CrossRef].

    Saal, L.H., et al. (2002) Bioarray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data. Genome Biol, . 3, SOFTWARE0003.

    Spellman, P.T., et al. (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol, . 3, research0046.1–research0046.9.

    The Gene Ontology Consortium. (2000) Gene Ontology: tool for the unification of biology. Nat. Genet, . 25, 25–29[CrossRef][ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
R. Andersson, C. E. G. Bruder, A. Piotrowski, U. Menzel, H. Nord, J. Sandgren, T. R. Hvidsten, T. Diaz de Stahl, J. P. Dumanski, and J. Komorowski
A segmental maximum a posteriori approach to genome-wide copy number profiling
Bioinformatics, March 15, 2008; 24(6): 751 - 758.
[Abstract] [Full Text] [PDF]


Home page
Nephrol Dial TransplantHome page
R. Marsell, T. Krajisnik, H. Goransson, C. Ohlsson, O. Ljunggren, T. E. Larsson, and K. B. Jonsson
Gene expression analysis of kidneys from transgenic mice expressing fibroblast growth factor-23
Nephrol. Dial. Transplant., March 1, 2008; 23(3): 827 - 833.
[Abstract] [Full Text] [PDF]


Home page
Mol Hum ReprodHome page
C. Bredhult, L. Sahlin, and M. Olovsson
Gene expression analysis of human endometrial endothelial cells exposed to op'-DDT
Mol. Hum. Reprod., February 1, 2008; 14(2): 97 - 106.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Draminski, A. Rada-Iglesias, S. Enroth, C. Wadelius, J. Koronacki, and J. Komorowski
Monte Carlo feature selection for supervised classification
Bioinformatics, January 1, 2008; 24(1): 110 - 117.
[Abstract] [Full Text] [PDF]


Home page
BloodHome page
P. U. Magnusson, A. Dimberg, S. Mellberg, A. Lukinius, and L. Claesson-Welsh
FGFR-1 regulates angiogenesis through cytokines interleukin-4 and pleiotrophin
Blood, December 15, 2007; 110(13): 4214 - 4222.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
A. Rada-Iglesias, S. Enroth, A. Ameur, C. M. Koch, G. K. Clelland, P. Respuela-Alonso, S. Wilcox, O. M. Dovey, P. D. Ellis, C. F. Langford, et al.
Butyrate mediates decrease of histone acetylation centered on transcription start sites and down-regulation of associated genes
Genome Res., June 1, 2007; 17(6): 708 - 719.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
22/8/1024    most recent
btl036v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (15)
Google Scholar
Right arrow Articles by Ameur, A.
Right arrow Articles by Komorowski, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ameur, A.
Right arrow Articles by Komorowski, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?