Bioinformatics Advance Access originally published online on October 12, 2004
Bioinformatics 2005 21(5):575-581; doi:10.1093/bioinformatics/bti058
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Understanding protein dispensability through machine-learning analysis of high-throughput data

1 UT-ORNL Graduate School of Genome Science and Technology Oak Ridge, TN, USA
2 Digital Biology Laboratory, Computer Science Department 201 Engineering Building West University of Missouri-Columbia Columbia, MO, USA
*To whom correspondence should be addressed.
Motivation: Protein dispensability is fundamental to the understanding of gene function and evolution. Recent advances in generating high-throughput data such as genomic sequence data, proteinprotein interaction data, gene-expression data and growth-rate data of mutants allow us to investigate protein dispensability systematically at the genome scale.
Results: In our studies, protein dispensability is represented as a fitness score that is measured by the growth rate of gene-deletion mutants. By the analyses of high-throughput data in yeast Saccharomyces cerevisiae, we found that a protein's dispensability had significant correlations with its evolutionary rate and duplication rate, as well as its connectivity in proteinprotein interaction network and gene-expression correlation network. Neural network and support vector machine were applied to predict protein dispensability through high-throughput data. Our studies shed some lights on global characteristics of protein dispensability and evolution.
Availability: The original datasets for protein dispensability analysis and prediction, together with related scripts, are available at http://digbio.missouri.edu/~ychen/ProDispen/
Contact: xudong{at}missouri.edu