Bioinformatics Advance Access published online on May 4, 2006
Bioinformatics, doi:10.1093/bioinformatics/btl134
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Department of Biostatistics, Bioinformatics, and Biomathematics, Lombardi National Cancer Center, Georgetown University, Washington DC 20057
* To whom correspondence should be addressed.
Motivation: Integrated analysis of global scale transcriptomic and proteomic data can provide important insights into the metabolic mechanisms underlying complex biological systems. However, because the relationship between protein abundance and mRNA expression level is complicated by many cellular and physical processes, sophisticated statistical models need to be developed to capture their relationship. Results: In this study, we describe a novel data-driven statistical model to integrate whole-genome microarray and proteomic data collected from Desulfovibrio vulgaris grown under three different conditions. Based on the Poisson distribution pattern of proteomic data and the fact that a large number of proteins were undetected (excess zeros), Zero-Inflated Poisson-based models were proposed to define the correlation pattern between mRNA and protein abundance. In addition, by assuming that there is a probability mass at zero representing unexpressed genes and expressed proteins that were undetected due to technical limitations, a Potential Zero-Inflated Poisson model was established. Two significant improvements introduced by this approach are: i) the predicted protein abundance level values for experimentally detected proteins are corrected by considering their mRNA levels; and ii), protein abundance values can be predicted for undetected proteins (in the case of this study,
Received December 14, 2005
Revised March 31, 2006
Accepted April 1, 2006
Article
Integrated analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: Zero-Inflated Poisson regression models to predict abundance of undetected proteins
Lei Nie 1,
Gang Wu 2,
Fred J. Brockman 3,
and
Weiwen Zhang 3 *
2 Department of Biological Sciences, University of Maryland at Baltimore County, Baltimore, MD 21250
3 Microbiology Department, Pacific Northwest National Laboratory, P.O. Box 999, Mail Stop P7-50, Richland, WA 99352
Weiwen Zhang, E-mail: Weiwen.Zhang{at}pnl.gov
![]()
Abstract
83% of the proteins in the D. vulgaris genome) for better biological interpretation. We demonstrated the use of these statistical models by comparatively analyzing proteomic and microarray results from D. vulgaris grown on lactate-based versus formate-based media. These models correctly predicted increased expression of Ech hydrogenase and decreased expression of Coo hydrogenase for D. vulgaris grown on formate.
Associate Editor: Golan Yona
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
W. Torres-Garcia, W. Zhang, G. C. Runger, R. H. Johnson, and D. R. Meldrum Integrative analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: a non-linear model to predict abundance of undetected proteins Bioinformatics, August 1, 2009; 25(15): 1905 - 1914. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Nie, G. Wu, and W. Zhang Correlation of mRNA Expression and Protein Abundance Affected by Multiple Sequence Features Related to Translational Efficiency in Desulfovibrio vulgaris: A Quantitative Analysis Genetics, December 1, 2006; 174(4): 2229 - 2243. [Abstract] [Full Text] [PDF] |
||||

