Skip Navigation


Bioinformatics Advance Access originally published online on October 28, 2004
Bioinformatics 2005 21(7):860-868; doi:10.1093/bioinformatics/bti102
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/7/860    most recent
bti102v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (19)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Jarvis, R. M.
Right arrow Articles by Goodacre, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jarvis, R. M.
Right arrow Articles by Goodacre, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data

Roger M. Jarvis and Royston Goodacre *

Department of Chemistry, UMIST PO Box 88, Sackville St, Manchester M60 1QD, UK

*To whom correspondence should be addressed.

Motivation: The major difficulties relating to mathematical modelling of spectroscopic data are inconsistencies in spectral reproducibility and the black box nature of the modelling techniques. For the analysis of biological samples the first problem is due to biological, experimental and machine variability which can lead to sample size differences and unavoidable baseline shifts. Consequently, there is often a requirement for mathematical correction(s) to be made to the raw data if the best possible model is to be formed. The second problem prevents interpretation of the results since the variables that most contribute to the analysis are not easily revealed; as a result, the opportunity to obtain new knowledge from such data is lost.

Methods: We used genetic algorithms (GAs) to select spectral pre-processing steps for Fourier transform infrared (FT-IR) spectroscopic data. We demonstrate a novel approach for the selection of important discriminatory variables by GA from FT-IR spectra for multi-class identification by discriminant function analysis (DFA).

Results: The GA selects sensible pre-processing steps from a total of ~1010 possible mathematical transformations. Application of these algorithms results in a 16% reduction in the model error when compared against the raw data model. GA-DFA recovers six variables from the full set of 882 spectral variables against which a satisfactory DFA model can be formed; thus inferences can be made as to the biochemical differences that are reflected by these spectral bands.

Availability: Supplementary information, datasets and scripts are available from the corresponding author.

Contact: roy.goodacre{at}manchester.ac.uk


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
G. B. Fogel
Computational intelligence approaches for pattern discovery in biological systems
Brief Bioinform, July 1, 2008; 9(4): 307 - 316.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. M. Jarvis, D. Broadhurst, H. Johnson, N. M. O'Boyle, and R. Goodacre
PYCHEM: a multivariate analysis package for python
Bioinformatics, October 15, 2006; 22(20): 2565 - 2566.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
V. Shulaev
Metabolomics technology and bioinformatics
Brief Bioinform, June 1, 2006; 7(2): 128 - 139.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al.
Machine learning in bioinformatics
Brief Bioinform, March 1, 2006; 7(1): 86 - 112.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.