Bioinformatics Vol. 18 no. 7 2002
Pages 922-933
© 2002 Oxford University Press
Target space for structural genomics revisited
1 Department of Pharmacology, Columbia University, 630 West 168th Street,
New York, USA
2 CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University,
650 West 168th Street BB217, NY 10032, New York, USA
3 Columbia University Center for Computational Biology and Bioinformatics (C2B2),
Russ Berrie Pavilion, 1150 St. Nicholas Avenue, New York, NY 10032, USA
Received on June 5, 2001
; revised on January 4, 2002
; accepted on February 7, 2002
Motivation: Structural genomics eventually aims at determining structures for all proteins. However, in the beginning experimentalists are likely to focus on globular proteins to achieve a rapid basic coverage of protein sequence space. How many proteins will structural genomics have to target? How many proteins will be excluded since we already have structural information for these or since they are not globular? We have to answer these questions in the context of our target selection for the North-East Structural Genomics Consortium (NESG).
Results: We estimated that structural information is available for about 638% of all proteins; 6% if we require high accuracy in comparative modelling, 38% if we are satisfied with having a rough idea about the fold. Excluding all regions that are not globular, we found that structural genomics may have to target about 48% of all proteins. This corresponded to a similar percentage of residues of the entire proteomes (52%). We explored a number of different strategies to cluster protein space in order to find the number of families representing these 48% of structurally unknown proteins. For the subset of all entirely sequenced eukaryotes, we found over 18 000 fragment clusters each of which may be a suitable target for structural genomics.
Availability: All data are available from the authors, most results are summarized at: http://cubic.bioc.columbia.edu/genomes/RES/2002_bioinformatics/
Contact: rost{at}columbia.edu
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. J. Suhrer, M. Gruber, and M. J. Sippl QSCOP-BLAST--fast retrieval of quantified structural information for protein sequences of unknown structure Nucleic Acids Res., July 13, 2007; 35(suppl_2): W411 - W415. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Ofran, V. Mysore, and B. Rost Prediction of DNA-binding residues from sequence Bioinformatics, July 1, 2007; 23(13): i347 - i353. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. N. Offman, P. W. Fitzjohn, and P. A. Bates Developing a move-set for protein model refinement Bioinformatics, August 1, 2006; 22(15): 1838 - 1845. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L. Marsden, D. Lee, M. Maibaum, C. Yeats, and C. A. Orengo Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space Nucleic Acids Res., February 15, 2006; 34(3): 1066 - 1080. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Punta and B. Rost PROFcon: novel prediction of long-range contacts Bioinformatics, July 1, 2005; 21(13): 2960 - 2968. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Kifer, O. Sasson, and M. Linial Predicting fold novelty based on ProtoNet hierarchical classification Bioinformatics, April 1, 2005; 21(7): 1020 - 1027. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Liu and B. Rost Sequence-based prediction of protein domains Nucleic Acids Res., July 7, 2004; 32(12): 3522 - 3530. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Liu and B. Rost NORSp: predictions of long regions without regular secondary structure Nucleic Acids Res., July 1, 2003; 31(13): 3833 - 3835. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-S. Goh, N. Lan, N. Echols, S. M. Douglas, D. Milburn, P. Bertone, R. Xiao, L.-C. Ma, D. Zheng, Z. Wunderlich, et al. SPINE 2: a system for collaborative structural proteomics within a federated database framework Nucleic Acids Res., June 1, 2003; 31(11): 2833 - 2838. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Carter, J. Liu, and B. Rost PEP: Predictions for Entire Proteomes Nucleic Acids Res., January 1, 2003; 31(1): 410 - 413. [Abstract] [Full Text] [PDF] |
||||

