Bioinformatics Vol. 18 no. 90002 2002
Pages S231-S240
© 2002 Oxford University Press
The mutual information: Detecting and evaluating dependencies between variables
1 University Potsdam, Nonlinear Dynamics Group,
Am Neuen Palais 10, 14469 Potsdam, Germany
2 Max-Planck-Institute for Molecular Plant Physiology,
Am M\"uhlenberg 1, 14476 Golm, Germany
Received on April 8, 2002
; accepted on June 15, 2002
Motivation: Clustering co-expressed genes usually requires the definition of `distance' or `similarity' between measured datasets, the most common choices being Pearson correlation or Euclidean distance. With the size of available datasets steadily increasing, it has become feasible to consider other, more general, definitions as well. One alternative, based on information theory, is the mutual information, providing a general measure of dependencies between variables. While the use of mutual information in cluster analysis and visualization of large-scale gene expression data has been suggested previously, the earlier studies did not focus on comparing different algorithms to estimate the mutual information from finite data.
Results: Here we describe and review several approaches to estimate the mutual information from finite datasets. Our findings show that the algorithms used so far may be quite substantially improved upon. In particular when dealing with small datasets, finite sample effects and other sources of potentially misleading results have to be taken into account.
Contact: steuer{at}agnld.uni-potsdam.de
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
E. OCZERETKO, A. KITLAS, M. BOROWSKA, J. SWIATECKA, and T. LAUDANSKI Uterine Contractility: Visualization of Synchronization Measures in Two Simultaneously Recorded Signals Ann. N.Y. Acad. Sci., April 1, 2007; 1101(1): 49 - 61. [Abstract] [Full Text] [PDF] |
||||
![]() |
D.-W. Kim, K.-Y. Lee, K. H. Lee, and D. Lee Towards clustering of incomplete microarray data without the use of imputation Bioinformatics, January 1, 2007; 23(1): 107 - 113. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. J. Nikiforova, C. O. Daub, H. Hesse, L. Willmitzer, and R. Hoefgen Integrative gene-metabolite network with implemented causality deciphers informational fluxes of sulphur stress response J. Exp. Bot., July 1, 2005; 56(417): 1887 - 1896. [Abstract] [Full Text] [PDF] |
||||
![]() |
D.-W. Kim, K. H. Lee, and D. Lee Detecting clusters of different geometrical shapes in microarray gene expression data Bioinformatics, May 1, 2005; 21(9): 1927 - 1934. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. R. Pinto, L. A. Cowart, Y. A. Hannun, B. Rohrer, and J. S. Almeida Local correlation of expression profiles with gene annotations--proof of concept for a general conciliatory method Bioinformatics, April 1, 2005; 21(7): 1037 - 1045. [Abstract] [Full Text] [PDF] |
||||


