Structural genomics meets computational biology
Executive Editors Bioinformatics
A meeting recently organized by the NIH NIGMS Protein Structure Initiative (PSI, http://www.nigms.nih.gov/Initiatives/PSI) has made crystal clear the urgency and importance of the development of computational methods for the analysis of protein families, definition of protein domains and regions for expression, and annotation of protein function. No really new problems, but problems made now even more important for the development of the Structural Genomics projects.
PSI is now in the first year of the production phase (after a 5 year pilot project) with a projected annual cost of 66 million dollars. PSI is now composed of four large production centres: Northeast Structural Genomics Consortium (www.nesg.org), Midwest Center for Structural Genomics (www.mcsg.anl.gov), Joint Center for Structural Genomics (www.jcsg.org) and New York Structural GenomiX Research Consortium (www.nysgxrc.org), and six technology-based centers, modeling centers and a database (KnowledgeBase) and material repository associated project to be funded in September 2006. The project has the core activities around a number of committees, including the target selection Steering Subcommittee that includes the Bioinformatics group comprising our colleagues A. Godzik, A. Fiser, C. Orengo and B. Rost. This subcommittee was the one that organized the meeting to discuss the best strategies for selecting proteins to enter into the protein structure resolution pipeline of the PSI centers. It was also this committee that was lucky enough to choose the four most rainy days in Bethesda in the last 50 years.
Target selection is technically and scientifically an important issue that was discussed under the growing impression that the biological community at large is not well informed of the progress and are apparently not aware of the developments and achievements during the pilot phase of this project. We were convinced during the meeting that the appropriate selection of sensible and clearly define targets will certainly contribute positively to increase the interest of biologist in the developments in structural genomics, and that the interest that can be generated by a clear description of the targets to be solved will have to be reinforced/rewarded/maintained by providing access to the right metrics of progress and success.
In this scenario computational biology is essential not only to set the objectives and select the protein targets in the most effective possible way, but also, and perhaps more importantly, to make accessible to the community all the information generated by the SG projects.
A number of specific problems related to the selection of targets and the definition of milestones were discussed during the meeting. These problems include the definition of protein families at different levels of granularity, the range of sequences that can be modeled with a given structure, the available strategies for evaluating the quality of the models and the strategies for selecting the more interesting targets in large protein families. The issues related with the biological/biomedical interest of the targets and the possibilities in the difficult issue of measuring the coverage of the function space were also considered to be of great interest for the definition of the target selection strategy.
It was rewarding for us to see how all these problems of fundamental importance for the development of SG are directly part of the realm of bioinformatics/computational biology. It is also to be said that it is a great community responsibility to redouble our efforts to make available additional methods and resources to address these problems. Specific proposals such as the organization of an annual conference to stimulate the work on the subject related with target selection and analysis with the scientific community outside the SG projects will certainly be steps in the right direction. Fostering the undergoing efforts in the SG projects to make openly available, and easily accessible to computational biologist, the results of the on-going experiments will also create a very positive flow of research and development. For example, it is urgent to make publicly accessible the biophysical results obtained for the many constructions that the SG centers have tested for expression, solubility and crystallization. This resource can be invaluable for the development of domain boundary prediction methods, which in turn will contribute to speed-up the experimental work with complex eukaryotic proteins. Finally, given the importance of SG for the development of biology and biomedicine, and the fundamental importance of bioinformatics for the organization, analysis and exploitation of the results, it is unavoidable to think that the effort dedicated in the different Structural Genomics international projects to the computational analysis should be increased urgently.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. N.I. Pang, K. Lin, M. A. Wouters, J. Heringa, and R. A. George Identifying foldable regions in protein sequence from the hydrophobic signal Nucleic Acids Res., February 2, 2008; 36(2): 578 - 588. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Kinoshita, Y. Murakami, and H. Nakamura eF-seek: prediction of the functional sites of proteins by searching for similar electrostatic potential and molecular surface shape Nucleic Acids Res., July 13, 2007; 35(suppl_2): W398 - W402. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
