Bioinformatics Advance Access originally published online on March 24, 2009
Bioinformatics 2009 25(11):1397-1403; doi:10.1093/bioinformatics/btp168
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Integrating shotgun proteomics and mRNA expression data to improve protein identification


1Department of Computer Sciences, 1 University Station C0500, 2Department of Chemistry and Biochemistry, Institute for Cellular and Molecular Biology, Center for Systems and Synthetic Biology, 2500 Speedway, The University of Texas at Austin, Austin, TX 78712 ,3Pathogen Functional Genomics Resource Center, J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850 and 4Children's Cancer Research Institute, The University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: Tandem mass spectrometry (MS/MS) offers fast and reliable characterization of complex protein mixtures, but suffers from low sensitivity in protein identification. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other information available, e.g. the probability of a protein's presence is likely to correlate with its mRNA concentration.
Results: We develop a Bayesian score that estimates the posterior probability of a protein's presence in the sample given its identification in an MS/MS experiment and its mRNA concentration measured under similar experimental conditions. Our method, MSpresso, substantially increases the number of proteins identified in an MS/MS experiment at the same error rate, e.g. in yeast, MSpresso increases the number of proteins identified by
40%. We apply MSpresso to data from different MS/MS instruments, experimental conditions and organisms (Escherichia coli, human), and predict 19–63% more proteins across the different datasets. MSpresso demonstrates that incorporating prior knowledge of protein presence into shotgun proteomics experiments can substantially improve protein identification scores.
Availability and Implementation: Software is available upon request from the authors. Mass spectrometry datasets and supplementary information are available from http://www.marcottelab.org/MSpresso/.
Contact: marcotte{at}icmb.utexas.edu; miranker{at}cs.utexas.edu
Supplementary Information: Supplementary data website: http://www.marcottelab.org/MSpresso/.
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
Associate Editor: Martin Bishop
Received on December 24, 2008; revised on February 19, 2009; accepted on March 18, 2009
This article has been cited by other articles:
![]() |
M. F. Ochs Knowledge-based data analysis comes of age Brief Bioinform, January 1, 2010; 11(1): 30 - 39. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. R. Ramakrishnan, C. Vogel, T. Kwon, L. O. Penalva, E. M. Marcotte, and D. P. Miranker Mining gene functional networks to improve mass-spectrometry-based protein identification Bioinformatics, November 15, 2009; 25(22): 2955 - 2961. [Abstract] [Full Text] [PDF] |
||||

