Bioinformatics Advance Access published online on February 17, 2009
Bioinformatics, doi:10.1093/bioinformatics/btp093
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A new ensemble-based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry (IMR-MS)
1 Research Group for Clinical Bioinformatics, Institute of Biomedical Engineering, University for Health Sciences, Medical Informatics and Technology (UMIT), A-6060 Hall in Tirol, Austria
2 Division of Gastroenterology and Hepatology, Department of Internal Medicine, Innsbruck Medical University, A-6020 Innsbruck, Austria
3 V&F Medical Development GmbH, A-6067 Absam, Austria
4Department of Medicine and Center for Alcohol Research, Liver Disease and Nutrition, Salem Medical Center, University of Heidelberg, D-69120 Heidelberg, Germany
*To whom correspondence should be addressed. Prof. Christian Baumgartner, E-mail: christian.baumgartner{at}umit.at
| Abstract |
|---|
Motivation: Alcoholic fatty liver disease (AFLD) and nonalcoholic fatty liver disease (NAFLD) can progress to severe liver diseases such as steatohepatitis, cirrhosis and cancer. Thus, the detection of early liver disease is essential; however, minimal invasive diagnostic methods in clinical hepatology still lack specificity.
Results: Ion molecule reaction mass spectrometry (IMR-MS) was applied to a total of 126 human breath gas samples comprising 91 cases (AFLD, NAFLD and cirrhosis) and 35 healthy controls. A new feature selection modality termed Stacked Feature Ranking (SFR) was developed to identify potential liver disease marker candidates in breath gas samples, relying on the combination of different entropy- and correlation-based feature ranking methods including statistical hypothesis testing using a two-level architecture with a suggestion and a decision layer. We benchmarked SFR against four single feature selection methods, a wrapper and a recently described ensemble method, indicating a significantly higher discriminatory ability of up to 10-15% for the SFR selected gas compounds expressed by the area under the ROC curve of AUC=0.85-0.95. Using this approach, we were able to identify unexpected breath gas marker candidates in liver disease of high predictive value. A literature study further supports top ranked markers to be associated with liver disease. We propose SFR as a powerful tool for biomarker search in breath gas and other biological samples using mass spectrometry.
Availability: The algorithm SFR and IMR-MS datasets are available under http://biomed.umit.at/page.cfm?pageid=526
Contact: michalel.netzer{at}umit.at & christian.baumgartner{at}umit.at
Associate Editor: Dr. Jonathan Wren
Received on December 10, 2008; revised on February 2, 2009; accepted on February 14, 2009