Bioinformatics Vol. 18 no. 7 2002
Pages 899-907
© 2002 Oxford University Press
Selecting targets for structural determination by navigating in a graph of protein families
1 Institute of Computer Sciences
2 Department of Biological Chemistry,
Institute of Life Sciences, The Hebrew University,
Jerusalem 91904, Israel
Received on July 1, 2001
; revised on December 21, 2001 and March 4, 2002
; accepted on March 11, 2002
Motivation: A major goal in structural genomics is to enrich
the catalogue of proteins whose 3D structures are known. In an
attempt to address this problem we mapped over 10 000 proteins
with solved structures onto a graph of all Swissprot protein
sequences (release 36,
73 000 proteins) provided by
ProtoMap, with the goal of sorting proteins according to their
likelihood of belonging to new superfamilies. We hypothesized
that proteins within neighbouring clusters tend to share common
structural superfamilies or folds. If true, the likelihood of
finding new superfamilies increases in clusters that are distal
from other solved structures within the graph.
Results: We defined an order relation between unsolved
proteins according to their distance from solved structures in
the graph, and sorted
48 000 proteins. Our list can be
partitioned into three groups:
35 000 proteins sharing a
cluster with at least one known structure;
6500 proteins
in clusters with no solved structure but with neighbouring
clusters containing known structures; and a third group
contains the rest of the proteins,
6100 (in 1274
clusters). We tested the quality of the order relation using
thousands of recently solved structures that were not included
when the order was defined. The tests show that our order is
significantly better (P-value
105) than a
random order. More interestingly, the order within the union of
the second and third groups, and the order within the third
group alone, perform better than random (P-values: 0.0008
and 0.15, respectively) and are better than alternative
orders created using PSI-BLAST. Herein, we present a method for
selecting targets to be used in structural genomics projects.
Availability: List of proteins to be used for targets selection combined with a set of biological filters for narrowing down potential targets is in http://www.protarget.cs.huji.ac.il
Contact: michall{at}mail.ls.huji.ac.il
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
O. Camoglu, T. Can, and A. K. Singh Integrating multi-attribute similarity networks for robust representation of the protein space Bioinformatics, July 1, 2006; 22(13): 1585 - 1592. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Punta and B. Rost PROFcon: novel prediction of long-range contacts Bioinformatics, July 1, 2005; 21(13): 2960 - 2968. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Kifer, O. Sasson, and M. Linial Predicting fold novelty based on ProtoNet hierarchical classification Bioinformatics, April 1, 2005; 21(7): 1020 - 1027. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Sasson, A. Vaaknin, H. Fleischer, E. Portugaly, Y. Bilu, N. Linial, and M. Linial ProtoNet: hierarchical classification of the protein space Nucleic Acids Res., January 1, 2003; 31(1): 348 - 352. [Abstract] [Full Text] [PDF] |
||||

