Skip Navigation



Bioinformatics Advance Access published online on October 28, 2004

Bioinformatics, doi:10.1093/bioinformatics/bti102
Bioinformatics © Oxford University Press 2004; all rights reserved
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
21/7/860    most recent
bti102v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Jarvis, R. M.
Right arrow Articles by Goodacre, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jarvis, R. M.
Right arrow Articles by Goodacre, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Received July 13, 2004
Revised October 7, 2004
Accepted October 18, 2004

Article

Genetic algorithm optimisation for pre-processing and variable selection of spectroscopic data

Roger M. Jarvis 1 and Royston Goodacre 1*

1 Department of Chemistry, UMIST, PO Box 88, Sackville St, Manchester M60 1QD, UK

* To whom correspondence should be addressed.
Royston Goodacre, E-mail: Roy.Goodacre{at}manchester.ac.uk


   Abstract

Motivation: The major difficulties relating to mathematical modelling of spectroscopic data are inconsistencies in spectral reproducibility and the black box nature of the modelling techniques. For the analysis of biological samples the first problem is due to biological, experimental and machine variability which can lead to sample size differences and unavoidable baseline shifts. Consequently there is often a requirement for mathematical correction(s) to be made to the raw data if the best possible model is to be formed. The second problem prevents interpretation of the results since the variables that most contribute to the analysis are not easily revealed; as a result the opportunity to obtain new knowledge from such data is lost.

Methods: We use genetic algorithms (GAs) to select spectral pre-processing steps for Fourier transform infrared (FT-IR) spectroscopic data. We also demonstrate a novel approach for the selection of important discriminatory variables by GA from FT-IR spectra for multi-class identification by discriminant function analysis (DFA).

Results: The GA selects sensible pre-processing steps from a total of ~1010 possible mathematical transformations. Application of these algorithms results in a 16% reduction in the model error when compared against the raw data model. GA-DFA recovers six variables from the full set of 882 spectral variables against which a satisfactory DFA model can be formed, thus inferences can be made as to the biochemical differences that are reflected by these spectral bands.

Availability: Supplementary information, data sets and scripts are available at [insert URL when known].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
R. Cavill, H. C. Keun, E. Holmes, J. C. Lindon, J. K. Nicholson, and T. M. D. Ebbels
Genetic algorithms for simultaneous variable and sample selection in metabonomics
Bioinformatics, January 1, 2009; 25(1): 112 - 118.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
G. B. Fogel
Computational intelligence approaches for pattern discovery in biological systems
Brief Bioinform, July 1, 2008; 9(4): 307 - 316.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. M. Jarvis, D. Broadhurst, H. Johnson, N. M. O'Boyle, and R. Goodacre
PYCHEM: a multivariate analysis package for python
Bioinformatics, October 15, 2006; 22(20): 2565 - 2566.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
V. Shulaev
Metabolomics technology and bioinformatics
Brief Bioinform, June 1, 2006; 7(2): 128 - 139.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al.
Machine learning in bioinformatics
Brief Bioinform, March 1, 2006; 7(1): 86 - 112.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.