Bioinformatics Advance Access published online on June 26, 2009
Bioinformatics, doi:10.1093/bioinformatics/btp390
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A Genetic Programming Approach for Burkholderia Pseudomallei Diagnostic Pattern Discovery
1 School of Biosciences, University of Exeter
2 Department of Clinical Immunology, Khon Kaen University, Thailand
3 Defence Medical & Environmental Research Institute, DSO National Laboratories, 27 Medical Drive, #13-01, Singapore 117510
4 Center for Virus Research, University of California, Irvine, CA, USA
*To whom correspondence should be addressed. Dr. Zheng Rong Yang, E-mail: z.r.yang{at}ex.ac.uk
| Abstract |
|---|
Motivation: Finding diagnostic patterns for fighting diseases like Burkholderia pseudomallei using biomarkers involves two key issues. First, exhausting all subsets of testable biomarkers (antigens in this context) to find a best one is computationally infeasible. Therefore, a proper optimization approach like evolutionary computation should be investigated. Second, a properly selected function of the antigens as the diagnostic pattern which is commonly unknown is a key to the diagnostic accuracy and the diagnostic effectiveness in clinical use.
Results: A conversion function is proposed to convert serum tests of antigens on patients to binary values based on which Boolean functions as the diagnostic patterns are developed. A genetic programming approach is designed for optimising the diagnostic patterns in terms of their accuracy and effectiveness. During optimisation, it is aimed to maximise the coverage (the rate of positive response to antigens) in the infected patients and minimize the coverage in the non-infected patients while maintaining the fewest number of testable antigens used in the Boolean functions as possible. The final cover-age in the infected patients is 96.55% using 17 of 215 (7.4%) antigens with zero coverage in the non-infected patients. Among these 17 antigens, BPSL2697 is the most frequently selected one for the diagnosis of Burkholderia Pseudomallei. The approach has been evaluated using both the cross-validation and the Jack-knife simulation methods with the prediction accuracy as 93% and 92%, respectively. A novel approach is also proposed in this study to evaluate a model with binary data using ROC analysis.
Associate Editor: Dr. Jonathan Wren
Received on November 13, 2008; revised on May 10, 2009; accepted on June 20, 2009