Bioinformatics Advance Access published online on October 28, 2004
Bioinformatics, doi:10.1093/bioinformatics/bti102
Bioinformatics © Oxford University Press 2004; all rights reserved
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Chemistry, UMIST, PO Box 88, Sackville St, Manchester M60 1QD, UK
* To whom correspondence should be addressed.
Motivation: The major difficulties relating to mathematical modelling of spectroscopic data are inconsistencies in spectral reproducibility and the black box nature of the modelling techniques. For the analysis of biological samples the first problem is due to biological, experimental and machine variability which can lead to sample size differences and unavoidable baseline shifts. Consequently there is often a requirement for mathematical correction(s) to be made to the raw data if the best possible model is to be formed. The second problem prevents interpretation of the results since the variables that most contribute to the analysis are not easily revealed; as a result the opportunity to obtain new knowledge from such data is lost. Methods: We use genetic algorithms (GAs) to select spectral pre-processing steps for Fourier transform infrared (FT-IR) spectroscopic data. We also demonstrate a novel approach for the selection of important discriminatory variables by GA from FT-IR spectra for multi-class identification by discriminant function analysis (DFA). Results: The GA selects sensible pre-processing steps from a total of Availability: Supplementary information, data sets and scripts are available at [insert URL when known].
Revised October 7, 2004
Accepted October 18, 2004
Article
Genetic algorithm optimisation for pre-processing and variable selection of spectroscopic data
Royston Goodacre, E-mail: Roy.Goodacre{at}manchester.ac.uk
![]()
Abstract
1010 possible mathematical transformations. Application of these algorithms results in a 16% reduction in the model error when compared against the raw data model. GA-DFA recovers six variables from the full set of 882 spectral variables against which a satisfactory DFA model can be formed, thus inferences can be made as to the biochemical differences that are reflected by these spectral bands.![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
G. B. Fogel Computational intelligence approaches for pattern discovery in biological systems Brief Bioinform, July 1, 2008; 9(4): 307 - 316. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M. Jarvis, D. Broadhurst, H. Johnson, N. M. O'Boyle, and R. Goodacre PYCHEM: a multivariate analysis package for python Bioinformatics, October 15, 2006; 22(20): 2565 - 2566. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Shulaev Metabolomics technology and bioinformatics Brief Bioinform, June 1, 2006; 7(2): 128 - 139. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al. Machine learning in bioinformatics Brief Bioinform, March 1, 2006; 7(1): 86 - 112. [Abstract] [Full Text] [PDF] |
||||

