Bioinformatics Advance Access originally published online on May 27, 2004
Bioinformatics 2004 20(16):2759-2766; doi:10.1093/bioinformatics/bth323
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics vol. 20 issue 16 © Oxford University Press 2004; all rights reserved.
Mining gene expression data based on template theory
School of Engineering and Computer Science, Exeter University, Exeter EX4 4QF, UK
Received on August 21, 2003; revised on November 6, 2003; accepted on November 13, 2003
Advance Access Publication May 27, 2004
Motivation: It is understood that clustering genes are useful for exploring scientific knowledge from DNA microarray gene expression data. The explored knowledge can be finally used for annotating biological function for novel genes. Representing the explored knowledge in an efficient manner is then closely related to the classification accuracy. However, this issue has not yet been paid the attention it deserves.
Result: A novel method based on template theory in cognitive psychology and pattern recognition is developed in this study for representing knowledge extracted from cluster analysis effectively. The basic principle is to represent knowledge according to the relationship between genes and a found cluster structure. Based on this novel knowledge representation method, a pattern recognition algorithm (the decision tree algorithm C4.5) is then used to construct a classifier for annotating biological functions of novel genes. The experiments on five published datasets show that this method has improved the classification performance compared with the conventional method. The statistical tests indicate that this improvement is significant.
Availability: The software package can be obtained upon request from the author.
Contact: Z.R.Yang{at}exeter.ac.uk