Bioinformatics Advance Access originally published online on January 28, 2009
Bioinformatics 2009 25(6):787-794; doi:10.1093/bioinformatics/btp056
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Automated procedure for candidate compound selection in GC-MS metabolomics based on prediction of Kovats retention index
1Applied Bioinformatics, Plant Research International, 2Centre for BioSystems Genomics (CBSG), Droevendaalsesteeg 1 and 3Laboratory of Bioinformatics, Wageningen University, Dreijenlaan 3, Wageningen, The Netherlands
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Matching both the retention index (RI) and the mass spectrum of an unknown compound against a mass spectral reference library provides strong evidence for a correct identification of that compound. Data on retention indices are, however, available for only a small fraction of the compounds in such libraries. We propose a quantitative structure-RI model that enables the ranking and filtering of putative identifications of compounds for which the predicted RI falls outside a predefined window.
Results: We constructed multiple linear regression and support vector regression (SVR) models using a set of descriptors obtained with a genetic algorithm as variable selection method. The SVR model is a significant improvement over previous models built for structurally diverse compounds as it covers a large range (360–4100) of RI values and gives better prediction of isomer compounds. The hit list reduction varied from 41% to 60% and depended on the size of the original hit list. Large hit lists were reduced to a greater extend compared with small hit lists.
Availability: http://appliedbioinformatics.wur.nl/GC-MS
Contact: roeland.vanham{at}wur.nl
Supplementary information: Supplementary data are available at Bioinformatics online.
Received on July 21, 2008; revised on December 11, 2008; accepted on January 24, 2009