Bioinformatics Advance Access originally published online on August 12, 2004
Bioinformatics 2005 21(3):396-398; doi:10.1093/bioinformatics/bth474
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics vol. 21 issue 3 © Oxford University Press 2005; all rights reserved.
Microarray data mining with visual programming
1 Faculty of Computer and Information Science, University of Ljubljana Ljubljana, Slovenia
2 Jozef Stefan Institute Ljubljana, Slovenia
3 Department of Molecular and Human Genetics, Baylor College of Medicine Houston, TX 77030, USA
4 Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine Houston, TX 77030, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: Visual programming offers an intuitive means of combining known analysis and visualization methods into powerful applications. The system presented here enables users who are not programmers to manage microarray and genomic data flow and to customize their analyses by combining common data analysis tools to fit their needs.
Availability: http://www.ailab.si/supp/bi-visprog
Contact: blaz.zupan{at}fri.uni-lj.si
Supplementary information: http://www.ailab.si/supp/bi-visprog
| INTRODUCTION |
|---|
|
|
|---|
Functional genomics often strives to discover relations between gene expression, structure and function, using tools from statistics, visualization and machine learning (Leung and Cavalieri, 2003). Analysis of microarray data is greatly enhanced by including additional information, such as gene annotation, and may provide new insights into the function of biological systems and processes (Troyanskaya et al., 2003). Many programs are available to the microarray data analyst. User-friendly programs that do not require programming skills allow users to select and inspect data using a set of predefined tools (see, for instance, http://geneontology.org for a collection of gene ontology tools, and http://ep.ebi.ac.uk/EP for gene expression tools). More powerful programs provide control over data flow and visualization, but they require substantial expertise in programming (e.g. scripting tools in Rhttp://bioconductor.org, or in Pythonhttp://biopython.org). This situation limits the ability of most biologists to analyze their own data. We explored the application of visual programming to solve this problem.
| ORANGE GENOMICS WIDGETS |
|---|
|
|
|---|
We have developed a visual programming environment for functional genomics data analysis. The environment uses the Orange data analysis framework, which allows users to control data flow without knowing how to program (Demsar and Zupan, 2004). The system is publicly available, modular and user friendly. Its basic data processing units are called widgets. Each widget implements a task of data manipulation, analysis, model building or visualization. The advantage of widgets is in their modularity. Widgets can be connected through channels and communicate with each other by sending and receiving data. The output of one widget is used as an input for one or several other subsequent widgets. Communication channels are typed (i.e. the data type is determined to be integer, text, table, etc.) and the system establishes the proper type of data connections automatically. This property relieves the user from the need to design data structure, which is one of the greatest obstacles for lay users. A collection of widgets and their communication channels is called a schema, which is essentially a program designed by the user for a specific data analysis task. The programming processcreating a schema with widgets and their connectionsis done visually through an easy-to-use graphic interface. Schemas can be saved and compiled into executable scripts for later reuse.
We developed a set of functional genomics widgets that address microarray data analysis, gene mapping and annotation with Gene Ontology (GO). They focus on visualization and can be used in combination with other data mining widgets that are already available in Orange. For a detailed description of widgets and their data interfaces see Supplementary information.
To demonstrate the utility of the system, we used microarray data from Dictyostelium discoideum development (Van Driessche et al., 2002) and Saccharomyces cerevisiae cell cycle microarray data (Spellman et al., 1998). See Supplementary information for details on datasets and description of the two example analyses.
The schema shown in Figure 1a illustrates the utility of the new functional genomics widgets. To use a widget, the user selects it from a toolbar at the top of the screen (data not shown) and places it in a schema. The widget icon illustrates the operation or the output and its name is shown below each icon in the schema. Connections (green lines, Fig. 1a) are made by a click-and-draw mouse operation. Opening (double-clicking) a widget icon invokes a window that allows the user to vary the widget's parameters of operation. The first widget in our schema loads the expression data (File widget) and allows the user to navigate and select data from local resources. Clusters of gene expressions (K-Means Clustering widget) are sent to the Expression Profiles widget for viewing in a line graph form and to the Heat Map widget for color-coded viewing of clustered gene expression patterns (Fig. 1b). The number of genes in a window often exceeds the number of pixels on the screen. The Heat Map window allows the user to determine how many genes should be merged into a single row for a compact view (Fig. 1b).
|
The combination of widgets in a schema is quite flexible so users may generate any desired data flow simply by connecting widgets in the desired order. An interesting example in Figure 1a is the combination of Heat Map and Heat Map (2) widgets, which provides a magnifying glass effect. The selected subset of genes from the first map can be visualized in the second map at a finer granularity, resulting in an enlarged image. The magnifying glass was not pre-programmed in the schema; it is a result of innovative combination of widgets by the user.
Widgets allow users to focus their attention on a selected subset of data and to switch rapidly between data subsets. In the Heat Map window, the user selected two rows of genes (Fig. 1b). All subsequent analyses, such as the GO Term Finder and Genome Mapping, are done on the selected subset (Fig. 1a) and the user can select which ones to view in separate windows by double-clicking the desired widgets icons (data not shown). Selecting other rows in Figure 1b would replace the information content of all subsequent widgets.
The GO Term Finder widget discovers significant GO terms associated with the input genes and displays gene annotation. The user can select genes based on their annotation and further process the data. The Genome Map widget is used to display chromosomal locations of selected genes. That widget also allows selection of genes according to chromosomal location and, in the example, the data are sent to the GO Term Finder (2) to discover significantly common GO terms of proximal genes.
Visual programming is a well-developed concept in computer science. Our system uses this powerful approach for microarray data analysis and visualization, allowing biologists to explore their microarray data without any knowledge of programming. The software runs on MS Windows, and the versions for Linux and Mac OS X are under development. We are also developing widgets to handle statistical testing.
| Acknowledgments |
|---|
This work was supported in part by a grant from the Slovene Ministry of Education, Science and Sports and by a grant from the National Institute of Child Health and Human Development, P01 HD39691.
Received on March 31, 2004; revised on June 16, 2004; accepted on August 7, 2004
| REFERENCES |
|---|
|
|
|---|
Demsar, J. and Zupan, B. (2004) Orange: from experimental machine learning to interactive data mining. White Paper (www.ailab.si/orange), Faculty of Computer and Information Science. , Slovenia University of Ljubljana.
Leung, Y.F. and Cavalieri, D. (2003) Fundamentals of cDNA microarray data analysis. Trends Genet., 19, 649659[CrossRef][Web of Science][Medline].
Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B. (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell, 9, 32733297
Troyanskaya, O.G., Dolinski, K., Owen, A.B., Altman, R.B., Botstein, D. (2003) A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc. Natl Acad. Sci., USA, 100, 83488353
Van Driessche, N., Shaw, C., Katoh, M., Morio, T., Sucgang, R., Ibarra, M., Kuwayama, H., Saito, T., Urushihara, H., Maeda, M., et al. (2002) A transcriptional profile of multicellular development in Dictyostelium discoideum. Development, 129, 15431552
This article has been cited by other articles:
![]() |
A. T. Jacobs and L. J. Marnett HSF1-mediated BAG3 Expression Attenuates Apoptosis in 4-Hydroxynonenal-treated Colon Cancer Cells via Stabilization of Anti-apoptotic Bcl-2 Proteins J. Biol. Chem., April 3, 2009; 284(14): 9176 - 9183. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Liu, M. E. Scheurer, R. El-Zein, Y. Cao, K.-A. Do, M. Gilbert, K. D. Aldape, Q. Wei, C. Etzel, and M. L. Bondy Association and Interactions between DNA Repair Gene Polymorphisms and Adult Glioma Cancer Epidemiol. Biomarkers Prev., January 1, 2009; 18(1): 204 - 214. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Rodenburg, A. G. Heidema, J. M. A. Boer, I. M. J. Bovee-Oudenhoven, E. J. M. Feskens, E. C. M. Mariman, and J. Keijer A framework to identify physiological responses in microarray-based gene expression studies: selection and interpretation of biologically relevant genes Physiol Genomics, October 8, 2008; 33(1): 78 - 90. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. F. Tacer, D. Kuzman, M. Seliskar, D. Pompon, and D. Rozman TNF-{alpha} interferes with lipid homeostasis and activates acute and proatherogenic processes Physiol Genomics, October 19, 2007; 31(2): 216 - 227. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Mramor, G. Leban, J. Demsar, and B. Zupan Visualization-based cancer microarray data classification analysis Bioinformatics, August 15, 2007; 23(16): 2147 - 2154. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Maziarz, C. Chung, D. J. Drucker, and A. Emili Integrating Global Proteomic and Genomic Expression Profiles Generated from Islet {alpha} Cells: Opportunities and Challenges to Deriving Reliable Biological Inferences Mol. Cell. Proteomics, April 1, 2005; 4(4): 458 - 474. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





