Bioinformatics Advance Access originally published online on November 22, 2007
Bioinformatics 2008 24(1):102-109; doi:10.1093/bioinformatics/btm545
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fast and accurate identification of semi-tryptic peptides in shotgun proteomics
1School of Informatics, 2Department of Chemistry, 3Department of Biology, Center for Genomics and Bioinformatics, 4National Center for Glycomics & Glycoproteomics, Indiana University, Bloomington, IN, USA and 5Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: One of the major problems in shotgun proteomics is the low peptide coverage when analyzing complex protein samples. Identifying more peptides, e.g. non-tryptic peptides, may increase the peptide coverage and improve protein identification and/or quantification that are based on the peptide identification results. Searching for all potential non-tryptic peptides is, however, time consuming for shotgun proteomics data from complex samples, and poses a challenge for a routine data analysis.
Results: We hypothesize that non-tryptic peptides are mainly created from the truncation of regular tryptic peptides before separation. We introduce the notion of truncatability of a tryptic peptide, i.e. the probability of the peptide to be identified in its truncated form, and build a predictor to estimate a peptide's truncatability from its sequence. We show that our predictions achieve useful accuracy, with the area under the ROC curve from 76% to 87%, and can be used to filter the sequence database for identifying truncated peptides. After filtering, only a limited number of tryptic peptides with the highest truncatability are retained for non-tryptic peptide searching. By applying this method to identification of semi-tryptic peptides, we show that a significant number of such peptides can be identified within a searching time comparable to that of tryptic peptide identification.
Contact: predrag{at}indiana.edu; rarnold{at}indiana.edu; hatang{at}indiana.edu
Associate Editor: John Quackenbush
Received on August 7, 2007; revised on October 7, 2007; accepted on October 26, 2007