Bioinformatics Advance Access originally published online on December 10, 2008
Bioinformatics 2009 25(3):401-405; doi:10.1093/bioinformatics/btn634
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Matrix correlations for high-dimensional data: the modified RV-coefficient
1Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, 2Heymans Institute, University of Groningen, Groningen and 3TNO Quality of Life, Utrechtseweg 48, 3704 HE Zeist, The Netherlands
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Modern functional genomics generates high-dimensional datasets. It is often convenient to have a single simple number characterizing the relationship between pairs of such high-dimensional datasets in a comprehensive way. Matrix correlations are such numbers and are appealing since they can be interpreted in the same way as Pearson's correlations familiar to biologists. The high-dimensionality of functional genomics data is, however, problematic for existing matrix correlations. The motivation of this article is 2-fold: (i) we introduce the idea of matrix correlations to the bioinformatics community and (ii) we give an improvement of the most promising matrix correlation coefficient (the RV-coefficient) circumventing the problems of high-dimensional data.
Results: The modified RV-coefficient can be used in high-dimensional data analysis studies as an easy measure of common information of two datasets. This is shown by theoretical arguments, simulations and applications to two real-life examples from functional genomics, i.e. a transcriptomics and metabolomics example.
Availability: The Matlab m-files of the methods presented can be downloaded from http://www.bdagroup.nl.
Contact: a.k.smilde{at}uva.nl
Associate Editor: John Quackenbush
Received on July 21, 2008; revised on October 30, 2008; accepted on December 6, 2008