Skip Navigation


Bioinformatics Advance Access originally published online on November 17, 2007
Bioinformatics 2008 24(5):738-740; doi:10.1093/bioinformatics/btm559
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
24/5/738    most recent
btm559v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Draghici, S.
Right arrow Articles by Romero, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Draghici, S.
Right arrow Articles by Romero, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

KUTE-BASE: storing, downloading and exporting MIAME-compliant microarray experiments in minutes rather than hours

Sorin Draghici 1,*, Adi L. Tarca 1,2,3, Longfei Yu 1, Stephen Ethier 3 and Roberto Romero 2

1Department of Computer Science, Wayne State University, 431 State Hall, Detroit, MI 48202, 2Perinatology Research Branch-NIH/NICHD, 4 Brush, 3990 John R and 3Barbara Ann Karmanos Cancer Institute, 110 Warren Avenue Detroit, MI 48201, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ENHANCEMENTS PROVIDED BY...
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCE
 

Motivation: The BioArray Software Environment (BASE) is a very popular MIAME-compliant, web-based microarray data repository. However in BASE, like in most other microarray data repositories, the experiment annotation and raw data uploading can be very timeconsuming, especially for large microarray experiments.

Results: We developed KUTE (Karmanos Universal daTabase for microarray Experiments), as a plug-in for BASE 2.0 that addresses these issues. KUTE provides an automatic experiment annotation feature and a completely redesigned data work-flow that dramatically reduce the human–computer interaction time. For instance, in BASE 2.0 a typical Affymetrix experiment involving 100 arrays required 4 h 30 min of user interaction time forexperiment annotation, and 45 min for data upload/download. In contrast, for the same experiment, KUTE required only 28 min of user interaction time for experiment annotation, and 3.3 min for data upload/download.

Availability: http://vortex.cs.wayne.edu/kute/index.html

Contact: sod{at}cs.wayne.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ENHANCEMENTS PROVIDED BY...
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCE
 
The BioArray Software Environment (BASE) has become a very popular repository for microarray data, as suggested by the large number of installations world wide (Saal et al., 2002). Our experience in managing medium to large microarray studies, revealed that although BASE is a very flexible data management system, certain aspects of it could still be improved. In our study of BASE 1.x/2.0, we identified a number of issues that can make it inefficient and time consuming. These issues have been addressed with a work-flow redesign, as well as a number of other modifications and additions, engineered together as a plug-in for the existing BASE 2.0 (henceforth BASE). These modifications led to significant improvements in the overall efficiency of the system. The issues described in this article are still pertinent even for the latest release of BASE that is 2.4.


    2 ENHANCEMENTS PROVIDED BY KUTE-BASE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ENHANCEMENTS PROVIDED BY...
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCE
 
2.1 Automatic experiment annotation
A first area in which improvements can be madeis related to the experiment annotation. In BASE 2.0, assuming an ideal framework in which all protocols, as well as all hardware and software information are already available in the system, annotating a single-channel Affymetrix experiment involving 50 arrays requires over 2 h of human–computer interaction.

This is because for every single arraythe user has to create and annotate items such as Biosource, Sample, Extract, Labeled Extract, Hybridization, Scan and Raw bioassay. Typical fields that need to be filled for each of these seven items are: dates, protocols, name of hardware and software used, etc. Overall, there are approximately 35 fields to be filled for every array.

The user is required to specify the content for all fields even though there might be a lot of redundancy between items of the same type (e.g. between arrays, samples, etc.). In fact, a sound experiment design would require the researcher to minimize the variability introduced by nuisance factors, such as the variability introduced by using different protocols, in order to maximize the statistical power. Thus, in most experiments, large batches of arrays are very likely to share the same protocols for mRNA extraction, labeling, hybridization, etc. BASE 2.4 takes advantage of this redundancy by having some default values for each experiment. This helps but more can be done.

KUTE-BASE takes advantage of this redundancy by automatically creating most of the necessary annotation items. This is done by: (i) assuming a one-to-one correspondence between the experiment items, (i.e. Biosource-1 will be linked by default to Sample-1, Extract-1, etc.) and (ii) using a naming convention. The assumption here is that all items in the microarray processing pipeline (e.g. Samples, Extracts, Labeled extracts, Hybridizations,Scans and Raw bioassays) that are associated with a given experimental unit share the same name in addition to their extensions. For example, if the name of a sample is MYO, then, when users annotate experiments by using the Kute-Express feature, the system will assign default names, MYO.e1, for Extract, MYO.le1 for Labeled extract, MYO.h1 for Hybridization, MYO.s1 for Scan and MYO.rb1 for Raw bioassay. Note that these conventions do not prevent the users from subsequently assigning arbitrary names, preserving therefore the flexibility offered by the original BASE.

2.2 Redesign of data upload/download work-flow
Another important factor affecting the raw data upload in BASE is the inter-twining between the human interaction and the data transfer. This is especially important for microarray technologies producing a large raw data file for each array (e.g. Affymetrix). As shown in Figure 1, BASE 2.x requires the user to specify a file name, after which the respective file is uploaded. The upload of such a file may require between tens of seconds and minutes, depending on the system and connection speed. This amount of waiting time is not sufficient for the user to switch to some other task during any one particular file upload. However, when cumulated over the entire data set involving hundreds of arrays, this waiting time can sum up to several hours.


Figure 1
View larger version (35K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. The work-flow redesign in KUTE vs BASE (top) and its effect on the user-interaction time for experiment annotation (bottom-left), and data upload/download (bottom-right). The average transfer speed reported by BASE was 800KB/s.

 
In KUTE, this work-flow has been redesigned as shown in Figure 1. Here, the user interaction with the system is disentangled from the file transfers. The user interacts with the computer for a few minutes only, providing the file names, after which the tens or hundreds of files necessary can be automatically uploaded into BASE system without further user intervention. Raw data download may also be needed to perform various analysis using other software tools that are not integrated with BASE. Instead of downloading each data file one by one, KUTE allows to download all raw data into a single archive (zip) file. This feature is currently implemented for Affymetrix data only, but can be extended to other platforms as well.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ENHANCEMENTS PROVIDED BY...
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCE
 
KUTE implements both the automatic experiment annotation as well as the batch upload/download of raw data files, minimizing the human–computer interaction. KUTE-Express is a feature that allows the user to specify the names of the samples in the Sample section of the GUI. The system generates all required items (Samples, Extracts, etc), and annotates them with the default values. If the Affymetrix platform is used, the Affymetrix File Batch Uploader is a better choice, since the user can specify the CEL files to upload after the experiment annotation. The sample names will be directly derived from the.CEL file names. Unlike the conventional BASE 2.x work-flow, this process requires only minimum human intervention that saves a considerable amount of time.

The effect of using the KUTE features on the user interaction time is also shown in Figure 1. The user interaction time is defined as the time a user is requiredto spend in front of the computer. The overall interaction with the database was split into three phases: experiment annotation, data upload and data download. The experiment annotation phase comprises all steps necessary to build the experiment structure (create and annotate samples, extracts, labeled extracts, etc.). The data upload phase includes browsing for the files in the local file system and associating them with the appropriate entries in the database. The data download phase includes the time necessary to navigate the database in order to specify which raw data files one wishes to download from the database tothe local machine. Both upload and download of raw data files associated to the raw bioassays, require the same user interaction: a file selection and a confirmation step. Hence, there is only one value reported for the upload/download time. The separation of the user interaction from the file upload/download process and the automatic experiment annotation dramatically reduced the human–computer interaction time. For instance, in BASE, an experiment involving 100 arrays processed with the same sample extraction, sample preparation, scanning and hybridization protocols required 4 h 30 min of user interaction time for experiment annotation, 45 min for data upload/download. In contrast, for the same experiment KUTE required only 2.2 min of user interaction time for experiment annotation and 3.5 min for data upload/download.

The substantial differences are explained by the very different work-flows as well asby the addition of the automatic experiment annotation feature. In BASE 2.0, all processing is completed at the end of each phase but the user is forced to remain in front of the computer for the entire duration (many hours in most cases). In KUTE, the user is required to remain in front of the computer only as long as necessary to provide all required information (minutes in most cases) but not all processing is completed when the user leaves the machine. Even though the computer continues to do a lot of background processing long after the user is gone, the most expensive resource—the highly qualified human—is now available for other tasks.

Conflict of Interest: none declared.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ENHANCEMENTS PROVIDED BY...
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCE
 
This work has been partially supported by the following grants: NSF DBI-0234806 and CCF-0438970, NIH 1R01HG003491, 1U01CA117478, 1R21CA100740, 1R01NS045207, 5R21EB000990 and NCI 2P30 CA022453. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF or NIH. This research was supported, in part, by the Intramural Research Program of the National Institute of Child Health and Human Development, NIH/DHHS. This research was supported, in part, by the Intramural Research Program of the National Institute of Child Health and Human Development, NIH/DHHS.


    FOOTNOTES
 
Associate Editor: John Quackenbush

Received on October 14, 2007; revised on October 14, 2007; accepted on November 2, 2007

    REFERENCE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 ENHANCEMENTS PROVIDED BY...
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCE
 

    Saal LH, et al. BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data. Genome Biol (2002) 3:1465–6914.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
EndocrinologyHome page
S. M. Dupre, D. W. Burt, R. Talbot, A. Downing, D. Mouzaki, D. Waddington, B. Malpaux, J. R. E. Davis, G. A. Lincoln, and A. S. I. Loudon
Identification of Melatonin-Regulated Genes in the Ovine Pituitary Pars Tuberalis, a Target Site for Seasonal Hormone Control
Endocrinology, November 1, 2008; 149(11): 5527 - 5539.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
24/5/738    most recent
btm559v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Draghici, S.
Right arrow Articles by Romero, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Draghici, S.
Right arrow Articles by Romero, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?