Bioinformatics Advance Access originally published online on May 17, 2007
Bioinformatics 2007 23(17):2210-2217; doi:10.1093/bioinformatics/btm267
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Probability-based pattern recognition and statistical framework for randomization: modeling tandem mass spectrum/peptide sequence false match frequencies
1Department of Applied Mathematics and Statistics, The Johns Hopkins University, Baltimore, and 2Soybean Genomics and Improvement Laboratory, USDA-ARS, Beltsville, Maryland, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: In proteomics, reverse database searching is used to control the false match frequency for tandem mass spectrum/peptide sequence matches, but reversal creates sequences devoid of patterns that usually challenge database-search software.
Results: We designed an unsupervised pattern recognition algorithm for detecting patterns with various lengths from large sequence datasets. The patterns found in a protein sequence database were used to create decoy databases using a Monte Carlo sampling algorithm. Searching these decoy databases led to the prediction of false positive rates for spectrum/peptide sequence matches. We show examples where this method, independent of instrumentation, database-search software and samples, provides better estimation of false positive identification rates than a prevailing reverse database searching method. The pattern detection algorithm can also be used to analyze sequences for other purposes in biology or cryptology.
Availability: On request from the authors.
Contact: Bret.Cooper{at}ars.usda.gov
Supplementary information: http://bioinformatics.psb.ugent.be/
Associate Editor: Limsoon Wong
Received on March 6, 2007; revised on April 9, 2007; accepted on May 9, 2007
This article has been cited by other articles:
![]() |
P. R. O. Payne, P. J. Embi, and C. K. Sen Translational informatics: enabling high-throughput research paradigms Physiol Genomics, November 6, 2009; 39(3): 131 - 140. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. G. Henry, L. Sandberg, K. Zhang, and H. M. Fletcher DNA Repair of 8-Oxo-7,8-Dihydroguanine Lesions in Porphyromonas gingivalis J. Bacteriol., December 15, 2008; 190(24): 7985 - 7993. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Kall, J. D. Storey, and W. S. Noble Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry Bioinformatics, August 15, 2008; 24(16): i42 - i48. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Stead, N. W. Paton, P. Missier, S. M. Embury, C. Hedeler, B. Jin, A. J. P. Brown, and A. Preece Information quality in proteomics Brief Bioinform, March 1, 2008; 9(2): 174 - 188. [Abstract] [Full Text] [PDF] |
||||



