Bioinformatics Advance Access published online on May 17, 2007
Bioinformatics, doi:10.1093/bioinformatics/btm267
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Probability-Based Pattern Recognition and Statistical Framework for Randomization: Modeling Tandem Mass Spectrum/Peptide Sequence False Match Frequencies
1Department of Applied Mathematics and Statistics, The Johns Hopkins University, Baltimore, Maryland, USA
2Soybean Genomics and Improvement Laboratory, USDA-ARS, Beltsville, Maryland, USA
*To whom correspondence should be addressed. Bret Cooper, E-mail: cooperb{at}ba.ars.usda.gov
| Abstract |
|---|
Motivation: In proteomics, reverse database searching is used to control the false match frequency for tandem mass spectrum/ peptide sequence matches, but reversal creates sequences devoid of patterns that usually challenge database-search software.
Results: We designed an unsupervised pattern recognition algorithm for detecting patterns with various lengths from large sequence datasets. The patterns found in a protein sequence database were used to create decoy databases using a Monte Carlo sampling algorithm. Searching these decoy databases led to the prediction of false positive rates for spectrum/peptide sequence matches. We show examples where this method, independent of instrumentation, database-search software and samples, provides better estimation of false positive identification rates than a prevailing reverse database searching method. The pattern detection algorithm can also be used to analyze sequences for other purposes in biology or cryptology.
Availability: On request from the authors.
Supplementary Data: http://bioinformatics.psb.ugent.be/
Associate Editor: Dr. Limsoon Wong
Received on March 6, 2007; revised on April 9, 2007; accepted on May 9, 2007
This article has been cited by other articles:
![]() |
L. Kall, J. D. Storey, and W. S. Noble Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry Bioinformatics, August 15, 2008; 24(16): i42 - i48. [Abstract] [PDF] |
||||
![]() |
D. A. Stead, N. W. Paton, P. Missier, S. M. Embury, C. Hedeler, B. Jin, A. J. P. Brown, and A. Preece Information quality in proteomics Brief Bioinform, March 1, 2008; 9(2): 174 - 188. [Abstract] [Full Text] [PDF] |
||||

