Bioinformatics Vol. 19 no. 1 2003
Pages 79-86
© 2003 Oxford University Press
Mining gene expression databases for association rules
1 Bioinformatics Program
2 Pediatrics and Communicable Diseases,
University of Michigan, Ann Arbor, MI 48109, USA
Received on April 19, 2002
; revised on July 1, 2002
; accepted on July 10, 2002
Motivation: Global gene expression profiling, both at
the transcript level and at the protein level, can be a
valuable tool in the understanding of genes, biological
networks, and cellular states. As larger and larger gene
expression data sets become available, data mining techniques can
be applied to identify patterns of interest in the data.
Association rules, used widely in the area of market basket
analysis, can be applied to the analysis of expression data as
well. Association rules can reveal biologically relevant
associations between different genes or between environmental
effects and gene expression. An association rule has the form
LHS
RHS, where LHS and
RHS are disjoint sets of items, the RHS set
being likely to occur whenever the LHS set occurs.
Items in gene expression data can include genes that are highly
expressed or repressed, as well as relevant facts describing the
cellular environment of the genes (e.g. the diagnosis of a tumor
sample from which a profile was obtained).
Results: We demonstrate an algorithm for efficiently mining association rules from gene expression data, using the data set from Hughes et al. (2000, Cell, 102, 109126) of 300 expression profiles for yeast. Using the algorithm, we find numerous rules in the data. A cursory analysis of some of these rules reveals numerous associations between certain genes, many of which make sense biologically, others suggesting new hypotheses that may warrant further investigation. In a data set derived from the yeast data set, but with the expression values for each transcript randomly shifted with respect to the experiments, no rules were found, indicating that most all of the rules mined from the actual data set are not likely to have occurred by chance.
Availability: An implementation of the algorithm using Microsoft SQL Server with Access 2000 is available at http://dot.ped.med.umich.edu:2000/pub/assoc_rules/assoc_rules.zip. Our results from mining the yeast data set are available at http://dot.ped.med.umich.edu:2000/pub/assoc_rules/yeast_results.zip.
Contact: ccreight{at}umich.edu
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Gyenesei, U. Wagner, S. Barkow-Oesterreicher, E. Stolte, and R. Schlapbach Mining co-regulated gene profiles for the detection of functional associations in gene expression data Bioinformatics, August 1, 2007; 23(15): 1927 - 1935. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Vinterbo, E.-Y. Kim, and L. Ohno-Machado Small, fuzzy and interpretable gene expression based classifiers Bioinformatics, May 1, 2005; 21(9): 1964 - 1970. [Abstract] [Full Text] [PDF] |
||||
