Sequence ordinations: a multivariate analysis approach to analysing large sequence data sets
European Molecular Biology Laboratory Postfach 10.2209, Meyerhofstrasse 1. 6900 Herdelberg, FRG
Ordination is a powerful method for analysing complex data sets but has been largely ignored in sequence analysis. This paper shows how to use principal coordinates analysis to find lowdimensional representations of distance matrices derived from aligned sets of sequences. The method takes a matrix of Euclidean distances between all pairs of sequence and finds a coordinate space where the distances are exactly preserved The main problem is to find a measure of distance between aligned sequences that is Euclidean. The simplest distance function is the square root of the percentage difference (as measured by identities) between two sequences, where one ignores any positions in the alignment where there is a gap in any sequence. If one does not ignore positions with a gap, the distances cannot be guaranteed to be Euclidean but the deleterious effects are trivial. Two examples of using the method are shown. A set of 226 aligned globins were analysed and the resulting ordination very successfully represents the known patterns of relationship between the sequences. In the other example, a set of 610 aligned 5S rRNA sequences were analysed. Sequence ordinations complement phylogenetic analyses. They should not be viewed as a complete alternative.
Received on February 14, 1991; accepted on June 5, 1991
This article has been cited by other articles:
![]() |
C. Kuiken, P. Hraber, J. Thurmond, and K. Yusim The hepatitis C sequence database in Los Alamos Nucleic Acids Res., January 11, 2008; 36(suppl_1): D512 - D516. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Innings, M. Krabbe, M. Ullberg, and B. Herrmann Identification of 43 Streptococcus Species by Pyrosequencing Analysis of the rnpB Gene J. Clin. Microbiol., December 1, 2005; 43(12): 5983 - 5991. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Kuiken, K. Yusim, L. Boykin, and R. Richardson The Los Alamos hepatitis C sequence database Bioinformatics, February 1, 2005; 21(3): 379 - 384. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Kuiken, R. Thakallapalli, A. Eskild, and A. de Ronde Genetic Analysis Reveals Epidemiologic Patterns in the Spread of Human Immunodeficiency Virus Am. J. Epidemiol., November 1, 2000; 152(9): 814 - 822. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Johnson and G. M. Church Predicting ligand-binding function in families of bacterial receptors PNAS, April 11, 2000; 97(8): 3965 - 3970. [Abstract] [Full Text] [PDF] |
||||




