Bioinformatics Advance Access originally published online on May 4, 2006
Bioinformatics 2006 22(13):1641-1647; doi:10.1093/bioinformatics/btl134
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Integrated analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: zero-inflated Poisson regression models to predict abundance of undetected proteins
1 Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Washington DC 20057, USA
2 Department of Biological Sciences, University of Maryland at Baltimore County Baltimore, MD 21250, USA
3 Microbiology Department, Pacific Northwest National Laboratory PO Box 999, Mail Stop P7-50, Richland, WA 99352, USA
*To whom correspondence should be addressed.
Motivation: Integrated analysis of global scale transcriptomic and proteomic data can provide important insights into the metabolic mechanisms underlying complex biological systems. However, because the relationship between protein abundance and mRNA expression level is complicated by many cellular and physical processes, sophisticated statistical models need to be developed to capture their relationship.
Results: In this study, we describe a novel data-driven statistical model to integrate whole-genome microarray and proteomic data collected from Desulfovibrio vulgaris grown under three different conditions. Based on the Poisson distribution pattern of proteomic data and the fact that a large number of proteins were undetected (excess zeros), zero-inflated Poisson (ZIP)-based models were proposed to define the correlation pattern between mRNA and protein abundance. In addition, by assuming that there is a probability mass at zero representing unexpressed genes and expressed proteins that were undetected owing to technical limitations, a Potential ZIP model was established. Two significant improvements introduced by this approach are (1) the predicted protein abundance level values for experimentally detected proteins are corrected by considering their mRNA levels and (2) protein abundance values can be predicted for undetected proteins (in the case of this study,
83% of the proteins in the D.vulgaris genome) for better biological interpretation. We demonstrated the use of these statistical models by comparatively analyzing proteomic and microarray results from D.vulgaris grown on lactate-based versus formate-based media. These models correctly predicted increased expression of Ech hydrogenase and decreased expression of Coo hydrogenase for D.vulgaris grown on formate.
Contact: Weiwen.Zhang{at}pnl.gov
Supplementary information: Supplementary data are available at Bioinformatics online.
Received on December 14, 2005; revised on March 31, 2006; accepted on April 1, 2006
This article has been cited by other articles:
![]() |
W. Zhang, F. Li, and L. Nie Integrating multiple 'omics' analysis for microbial biology: application and methodologies Microbiology, February 1, 2010; 156(2): 287 - 301. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Torres-Garcia, W. Zhang, G. C. Runger, R. H. Johnson, and D. R. Meldrum Integrative analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: a non-linear model to predict abundance of undetected proteins Bioinformatics, August 1, 2009; 25(15): 1905 - 1914. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Nie, G. Wu, and W. Zhang Correlation of mRNA Expression and Protein Abundance Affected by Multiple Sequence Features Related to Translational Efficiency in Desulfovibrio vulgaris: A Quantitative Analysis Genetics, December 1, 2006; 174(4): 2229 - 2243. [Abstract] [Full Text] [PDF] |
||||


