Bioinformatics Advance Access originally published online on November 16, 2004
Bioinformatics 2005 21(7):1280-1281; doi:10.1093/bioinformatics/bti141
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PermutMatrix: a graphical environment to arrange gene expression profiles in optimal linear order
1Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier 161 rue Ada, 34392 Montpellier cedex 5, France
2Ecole Nationale Supérieure Agronomique de Montpellier 2, Place P.Viala, 34060 Montpellier Cedex 02, France
3Research School of Biological Sciences, Australian National University Canberra, Australia
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: PermutMatrix is a work space designed to graphically explore gene expression data. It relies on the graphical approach introduced by Eisen and also offers several methods for the optimal reorganization of rows and columns of a numerical dataset. For example, several methods are proposed for optimal reorganization of the leaves of a hierarchical clustering tree, along with several seriation or unidimensional scaling methods that do not require any preliminary hierarchical clustering. This program, developed for MS Windows, with MS-Visual C++, has a clear and efficient graphical interface. Large datasets can be thoroughly and quickly analyzed.
Availability: http://www.lirmm.fr/~caraux/PermutMatrix/
Contact: caraux{at}lirmm.fr
| INTRODUCTION |
|---|
|
|
|---|
To analyze DNA microarray data, it is very useful to organize and cluster genes according to similarities in their expression profiles. Standard hierarchical clustering methods are appropriate for this operation, as they provide a tree that symbolizes the structure of similarities and in which clusters can be defined. Simultaneous display of the clustering tree and the colored representation of the data matrix, as proposed by Eisen et al. (1998), is a very useful feature, as it can be readily interpreted by biologists. The simplicity and efficacy of this representation has made it very successful. In PermutMatrix, this approach is supplemented with several optimal linear reordering methods, such as reorganization of the leaves of a clustering tree, unidimensional scaling and seriation.
Optimal reorganization of the leaves. In the standard hierarchical clustering approach, the gene order is the order in which the leaves of the clustering tree are enumerated. However, this enumeration is not unique, as the inversion of any subtree leaves does not change the general topology of the clustering tree. The number of possible enumerations is 2n1, where n is the total number of leaves in the tree. Then the question arises as to whether it is possible to choose the best possible organization of the leaves of the tree, in order to obtain the best graphical display of the data with the Eisen approach. Several methods have been proposed (Bar-Joseph et al., 2001; Degerman, 1982; Gruvaeus and Wainer, 1972) and are available in PermutMatrix (Fig. 1c). They differ with respect to the criterion to be optimized and the optimization algorithm.
|
Unidimensional scaling and seriation. Hierarchical clustering is not aimed at reordering rows and columns of a dataset, as clustering is not the same operation as ordering. Other methods are specifically designed for ordering objects, such as unidimensional scaling (Hubert and Arabie, 1986) or seriation (Kendall, 1982). Unidimensional scaling methods involve placing a set of objects along a row so that the distances between points best reflect the dissimilarities between objects. Seriation methods (Fig. 1b) assume that there is an unknown order between the objects, and they attempt to infer this order. These methods, some of which are old, have been widely developed and implemented in archaeology and psychology. For example, they were successfully used to establish the chronology of appearance of ancient objects (Marquardt, 1978). However, these methods are relatively unknown in biology. Five of them are available in PermutMatrix and are described in detail on the program website. The criteria to be optimized in the seriation methods are the same as those used in optimal reordering of the leaves of a tree. However, the algorithms implemented here are heuristics, because the research space is too wide and unstructured. There are n! ways to reorganize a set of n objects.
Identification and reorganization of classes. In PermutMatrix, the clustering tree methods and seriation or unidimensional scaling methods can be combined. It is also possible to define a class by aggregating, in the clustering tree, the leaves derived from the same node. Subsequently, the tree is no longer completely ramified (Fig. 1d). This takes into account that some terminal ramifications are not significant, and associated leaves can be reordered without tree constraint. The tree reorganization occurs on two levels: the classes are reordered as the leaves of a tree, and the leaves within each class are linearly reordered by seriation or unidimensional scaling.
Manual operations. The PermutMatrix graphical interface allows several manual operations: inversion, permutation, sorting, etc. These operations can be used to refine or locally explore the optimal solutions obtained by the methods given above.
| CONCLUSION |
|---|
|
|
|---|
PermutMatrix is a user-friendly and exploratory work space in which the graphical Eisen approach can be easily used and extended to optimal reorganization methods, which are less utilized than hierarchical clustering. These methods usually yield different and complementary results, therefore contributing to the understanding and identification of different gene expression profiles. It was designed for MS Windows and accepts any input data file in a standard text file format or in Eisen's Cluster format.
| Acknowledgments |
|---|
This program was partly developed during a sabbatical position at the Research School of Biological Sciences of the Australian National University. We are grateful for all support and friendship during this period. We have also been supported by Montpellier L-R Genopole.
Received on June 10, 2004; revised on September 3, 2004; accepted on October 4, 2004
| REFERENCES |
|---|
|
|
|---|
Bar-Joseph, Z., Gifford, D., Jaakkola, T.S. (2001) Fast optimal leaf ordering for hierarchical clustering. Bioinformatics, 17, S22S29[Abstract].
Degerman, R. (1982) Ordered binary trees constructed through an application of Kendall's tau. Psychometrika, 47, 523527[CrossRef].
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci., USA, 95, 1486314868
Gelfand, A.E. (1971) Rapid seriation methods with archaeological application. In Hodson, F.R., Kendall, D.G., Tautu, D.G. (Eds.). Mathematics in the Archaeological and Historical Sciences, , Edinburgh Edinburgh University Press, pp. 186201.
Gruvaeus, G. and Wainer, H. (1972) Two additions to hierarchical cluster analysis. Br. J. Math. Stat. Psychol., 25, 200206.
Hubert, L.J. and Arabie, P. (1986) Unidimensional scaling and combinatorial optimization. In de Leeuw, J., Heiser, W., Meulman, J., Critchley, F. (Eds.). Multidimensional Data Analysis, , Leiden, The Netherlands DSWO Press, pp. 181196.
Kendall, D.G. (1982) Seriation. In Kotz, S. and Johnson, N.L. (Eds.). Encyclopedia of Stastistical Sciences, , New York, NY Wiley-Interscience Vol. 8, , pp. 417424.
Marquardt, W.H. (1978) Advances in archaeological seriation. In Schiffer, M.B. (Ed.). Advances in Archaeological Method and Theory, , Orlando, FL Academic Press Vol. 1, , pp. 257314.
This article has been cited by other articles:
![]() |
S. Ollier, C. Leroux, A. de la Foye, L. Bernard, J. Rouel, and Y. Chilliard Whole intact rapeseeds or sunflower oil in high-forage or high-concentrate diets affects milk yield, milk composition, and mammary gene expression profile in goats J Dairy Sci, November 1, 2009; 92(11): 5544 - 5560. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Zeng and J. Li Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways Nucleic Acids Res., October 23, 2009; (2009) gkp822v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. I. Bower, X. Li, R. Taylor, and I. A. Johnston Switching to fast growth: the insulin-like growth factor (IGF) system in skeletal muscle of Atlantic salmon J. Exp. Biol., December 15, 2008; 211(24): 3859 - 3870. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Dumas, B. Meunier, J.-L. Berdague, C. Chambon, M. Desvaux, and M. Hebraud Comparative Analysis of Extracellular and Intracellular Proteomes of Listeria monocytogenes Strains Reveals a Correlation between Protein Expression and Serovar Appl. Envir. Microbiol., December 1, 2008; 74(23): 7399 - 7409. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. K. Sharma, S. K. Das, A. K. Mondal, O. G. Hackney, W. S. Chu, P. A. Kern, N. Rasouli, H. J. Spencer, A. Yao-Borengasser, and S. C. Elbein Endoplasmic Reticulum Stress Markers Are Associated with Obesity in Nondiabetic Subjects J. Clin. Endocrinol. Metab., November 1, 2008; 93(11): 4532 - 4541. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. de la Vega, M. R. Hall, K. J. Wilson, A. Reverter, R. G. Woods, and B. M. Degnan Stress-induced gene expression profiling in the black tiger shrimp Penaeus monodon Physiol Genomics, September 11, 2007; 31(1): 126 - 138. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Vuocolo, K. Byrne, J. White, S. McWilliam, A. Reverter, N. E. Cockett, and R. L. Tellam Identification of a gene network contributing to hypertrophy in callipyge skeletal muscle Physiol Genomics, February 12, 2007; 28(3): 253 - 272. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Reverter, A. Ingham, S. A. Lehnert, S.-H. Tan, Y. Wang, A. Ratnakumar, and B. P. Dalrymple Simultaneous identification of differential gene expression and connectivity in inflammation, adipogenesis and cancer Bioinformatics, October 1, 2006; 22(19): 2396 - 2404. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







