Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Grishin, V. N.
Right arrow Articles by Grishin, N. V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Grishin, V. N.
Right arrow Articles by Grishin, N. V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 18 no. 11 2002
Pages 1523-1534
© 2002 Oxford University Press

Euclidian space and grouping of biological objects

Vyacheslav N. Grishin 1 and Nick V. Grishin 1,2,*

1 Department of Biochemistry
2 Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050, USA

Received on February 2, 2002 ; revised on April 17, 2002 ; accepted on April 23, 2002

Motivation: Biological objects tend to cluster into discrete groups. Objects within a group typically possess similar properties. It is important to have fast and efficient tools for grouping objects that result in biologically meaningful clusters. Protein sequences reflect biological diversity and offer an extraordinary variety of objects for polishing clustering strategies. Grouping of sequences should reflect their evolutionary history and their functional properties. Visualization of relationships between sequences is of no less importance. Tree-building methods are typically used for such visualization. An alternative concept to visualization is a multidimensional sequence space. In this space, proteins are defined as points and distances between the points reflect the relationships between the proteins. Such a space can also be a basis for model-based clustering strategies that typically produce results correlating better with biological properties of proteins.

Results: We developed an approach to classification of biological objects that combines evolutionary measures of their similarity with a model-based clustering procedure. We apply the methodology to amino acid sequences. On the first step, given a multiple sequence alignment, we estimate evolutionary distances between proteins measured in expected numbers of amino acid substitutions per site. These distances are additive and are suitable for evolutionary tree reconstruction. On the second step, we find the best fit approximation of the evolutionary distances by Euclidian distances and thus represent each protein by a point in a multidimensional space. The Euclidian space may be projected in two or three dimensions and the projections can be used to visualize relationships between proteins. On the third step, we find a non-parametric estimate of the probability density of the points and cluster the points that belong to the same local maximum of this density in a group. The number of groups is controlled by a {sigma}-parameter that determines the shape of the density estimate and the number of maxima in it. The grouping procedure outperforms commonly used methods such as UPGMA and single linkage clustering.

Availability: The code of EESG program for Mathematica4 (Wolfram Research) as well as the details of the analysis are freely available at ftp://iole.swmed.edu/pub/EESG/.

Contact: grishin{at}chop.swmed.edu

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Protein Sci.Home page
J. Pei and N. V. Grishin
The P5 protein from bacteriophage phi-6 is a distant homolog of lytic transglycosylases
Protein Sci., May 1, 2005; 14(5): 1370 - 1374.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
L. N. Kinch, S. Cheek, and N. V. Grishin
EDD, a novel phosphotransferase domain common to mannose transporter EIIA, dihydroxyacetone kinase, and DegV
Protein Sci., February 1, 2005; 14(2): 360 - 367.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
H. Cheng, N. Shen, J. Pei, and N. V. Grishin
Double-stranded DNA bacteriophage prohead protease is homologous to herpesvirus protease
Protein Sci., August 1, 2004; 13(8): 2260 - 2269.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. Pei, N. V. Dokholyan, E. I. Shakhnovich, and N. V. Grishin
Using protein design for homology detection and active site searches
PNAS, September 30, 2003; 100(20): 11361 - 11366.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
M. A. Farnum, H. Xu, and D. K. Agrafiotis
Exploring the nonlinear geometry of protein homology
Protein Sci., August 1, 2003; 12(8): 1604 - 1612.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.