Bioinformatics Advance Access originally published online on February 24, 2005
Bioinformatics 2005 21(10):2309-2314; doi:10.1093/bioinformatics/bti346
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Self-organizing and self-correcting classifications of biological data
1Department of Microbiology and Molecular Genetics, Michigan State University East Lansing, MI 48824, USA
2Science Information Systems, American Type Culture Collection Manassas, VA 20110, USA
*To whom correspondence should be addressed.
Motivation: Rapid, automated means of organizing biological data are required if we hope to keep abreast of the flood of data emanating from sequencing, microarray and similar high-throughput analyses. Faced with the need to validate the annotation of thousands of sequences and to generate biologically meaningful classifications based on the sequence data, we turned to statistical methods in order to automate these processes.
Results: An algorithm for automated classification based on evolutionary distance data was written in S. The algorithm was tested on a dataset of 1436 small subunit ribosomal RNA sequences and was able to classify the sequences according to an extant scheme, use statistical measurements of group membership to detect sequences that were misclassified within this scheme and produce a new classification. In this study, the use of the algorithm to address problems in prokaryotic taxonomy is discussed.
Availability: S-Plus is available from Insightful, Inc. An S-Plus implementation of the algorithm and the associated data are available at http://taxoweb.mmg.msu.edu/datasets
Contact: garrity{at}msu.edu
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. G. Lilburn, S. H. Harrison, J. R. Cole, and G. M. Garrity Computational aspects of systematic biology Brief Bioinform, June 1, 2006; 7(2): 186 - 195. [Abstract] [Full Text] [PDF] |
||||
