Bioinformatics Advance Access originally published online on June 1, 2007
Bioinformatics 2007 23(15):2021-2023; doi:10.1093/bioinformatics/btm281
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
VIPER: an advanced software package to support high-throughput LC-MS peptide identification
Pacific Northwest National Laboratory, Richland, WA 99354, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: The accurate mass and time (AMT) tag approach is used for analysis of large scale experiments by combining information generated over multiple datasets and instrument types. The VIPER software package is one of the key components of the data processing pipeline and implements automated algorithms to discover LC-MS features, align and match these LC-MS features to a database of peptides previously identified in LC-MS/MS analyses, and identify and quantify pairs of isotopically labeled peptides.
Availability: VIPER may be downloaded free of charge at http://ncrr.pnl.gov/software/
Contact: rds{at}pnl.gov or proteomics{at}pnl.gov
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
VIPER (Visual Inspection of Peak/Elution Relationships) is an advanced software package developed initially to support high-throughput peptide identification in the accurate mass and time (AMT) tag approach to high-throughput proteomics (Zimmer et al., 2006). This approach is similar to shotgun proteomics approaches (Washburn et al., 2001) in that proteins are first enzymatically cleaved into peptide fragments and then analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) to identify peptides using conventional software tools, such as SEQUEST, Mascot or X!Tandem (Craig and Beavis 2004; Eng et al., 1994; Perkins et al., 1999). However, unlike shotgun approaches, the results from these initial analyses are stored in a reference database in the form of mass and (LC elution) time tags. Each tag serves as a unique 2D marker for subsequent identifications of that particular peptide by high-resolution, high mass accuracy LC-MS (e.g. Fourier transform ion cyclotron resonance). The LC-MS data are first processed using another PNNL developed software package called Decon2LS (http://ncrr.pnl.gov/software/) that uses a version of the THRASH algorithm (Horn et al., 2000) to detect features (and their monoisotopic masses) in the individual mass spectra. VIPER then processes collections of MS features (e.g. as found using Decon2LS) across elution time to identify unique LC-MS features, and also calibrates elution times, refines the mass calibration and matches the LC-MS features to mass and (elution) time tags in a reference database. The identified LC-MS features and the peptide/protein identifications can then be exported for further analysis using programs such as Microsoft Excel or Microsoft Access.
| 2 CORE FUNCTIONS AND FEATURES |
|---|
|
|
|---|
VIPER uses a graphical user interface (GUI) to generate 2D plots that display the monoisotopic masses observed in each mass spectrum (Fig. 1) and the LC-MS features discovered when VIPER groups related data points by mass and elution time. However, unlike visualization programs, such as Pep3D (Li et al., 2004) and MS vendor-supplied data acquisition software, VIPER maps the observed LC-MS features onto known AMT tags in reference databases to identify peptides. VIPER can also run in an automated mode, loading and processing data based on customizable, user-defined settings. It is primarily intended to work with monoisotopic mass data, as obtained by deisotoping mass spectra from medium to high resolution mass spectrometers (e.g. TOF, FTICR or Orbitrap). It can read several file formats including .CSV, .mzXML and .mzData. Comma-separated value (.CSV) files can be generated by Decon2LS, an open source software package for deisotoping LC-MS data (available free of charge at http://ncrr.pnl.gov/software/), while mzXML and mzData are standard XML-based formats for MS data.
|
2.1 Data processing algorithms
The first major analysis step performed by VIPER is to discover LC-MS features from MS features present in individual spectra. To do so, VIPER groups similar MS features in adjacent spectra using a single linkage clustering algorithm with a weighted Euclidean distance function to calculate the distance between features (Fig. 1). Each LC-MS feature is assigned a median mass, a central normalized elution time (NET) and an abundance estimate. When an experiment has used isotopic labeling for relative quantitation, VIPER searches for pairs of LC-MS features with similar elution times, and relevant (customizable) mass differences as governed by the labeling method (e.g. 4.0085 Da for 16O/18O labeling or 8.051 Da for ICAT labeling; see Fig. 3 in the Supplementary Material).
In order to identify the LC-MS features, an AMT tag database must be selected to import data from either a Microsoft SQL Server database or from a Microsoft Access database. Once the reference data is loaded, VIPER calibrates the elution times of each LC-MS feature using either a linear alignment function or the recently developed LCMSWARP algorithm (Jaitly et al., 2006). LCMSWARP is a dynamic programming algorithm that scores the similarity between subsections of the LC-MS and LC-MS/MS datasets, then finds a transformation function by discovering the best path between the subsections. It also perform mass recalibration using an additive model to model mass errors as a function of elution time and m/z. LC-MS features are then identified by matching the monoisotopic mass and calibrated elution times of features to those of peptides in the database based on their mass and elution time coordinates. Ambiguity due to multiple matches is resolved using relative probabilities based on the Mahalanobis distance and relative probability of the peptides being observed (Anderson et al., 2006).
VIPER next generates a list of mass differences between LC-MS features and matching AMT tags. These matches contain both true and false matches. The distribution is separated into normal (true matches) and uniform (random matches) components using an Expectation Maximization algorithm (Fig. 8 in the Supplementary Material). The standard deviation of the NET and mass components is used to compute the appropriate mass and NET tolerances for filtering the matched AMT tags, thereby improving the confidence in the peptide identifications by removing the background false positive matches. Additional details of the data processing steps are available in the Supplementary Material.
2.2 Visualization tools
VIPER's GUI interface allows users to navigate 2D plots of mass and elution time for a particular dataset. MS features in the main display can be colored to indicate the charge state of the ion detected (Fig. 1), while the user may restrict the points to be visualized using various filters. The LC-MS Feature Browser allows the user to individually examine each LC-MS feature. As the features are browsed, a selected ion chromatogram (abundance versus time) is displayed and the 2D plot is automatically updated to display the region of interest in mass–time space. The Pairs Browser (Fig. 9 in the Supplementary Material) performs a similar function for pairs of LC-MS features identified, plotting the abundance of each feature against scan number to provide a visual indication of the relative abundance of each member of the pair.
| 3 SUMMARY |
|---|
|
|
|---|
VIPER combines into one software package a host of useful functions and capabilities that facilitate and standardize analysis and processing of LC-MS data for peptide quantitation and identification using the AMT tag approach. The software includes an interactive GUI interface, plus the ability to automatically analyze a series of datasets using a parameter file to guide the analysis. The Supplementary Material includes a list of 23 selected publications for which VIPER was used for the data analysis, including a summary of the results from Hixson et al.. (2006).
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Portions of this research were supported by the US Department of Energy Office of Biological and Environmental Research Genomes:GtL Program, the NIH National Center for Research Resources (Grant RR018522) and the National Institute of Allergy and Infectious Diseases (NIH/DHHS through interagency agreement Y1-AI-4894-01). This manuscript has been authored by Battelle Memorial Institute, Pacific Northwest Division, under Contract No. DE-AC05-76RL0 1830 with the US Department of Energy. By accepting the article for publication, the publisher acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Alfonso Valencia
Received on January 16, 2007; revised on May 17, 2007; accepted on May 18, 2007
| REFERENCES |
|---|
|
|
|---|
Anderson KK, et al. Estimating probabilities of peptide database identifications to LC-FTICR-MS observations. Proteome Science (2006) 4:1.[CrossRef][Medline]
Craig R, Beavis RC. TANDEM: matching proteins with mass spectra. Bioinformatics (2004) 20:1466–1467.
Eng JK, et al. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. (1994) 5:976–989.[CrossRef][Web of Science]
Hixson KK, et al. Biomarker candidate identification in Yersinia pestis using organism-wide semiquantitative proteomics. J. Proteome Res. (2006) 5:3008–3017.[CrossRef][Web of Science][Medline]
Horn DM, et al. Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules. J. Am. Soc. Mass Spectrom (2000) 11:320–332.[CrossRef][Web of Science][Medline]
Jaitly N, et al. Robust algorithm for alignment of liquid chromatography-mass spectrometry analyses in an accurate mass and time tag data analysis pipeline. Anal. Chem. (2006) 78:7397–7409.[Medline]
Li X-J, et al. A tool to visualize and evaluate data obtained by liquid chromatography-electrospray ionization-mass spectrometry. Anal. Chem. (2004) 76:3856–3860.[Medline]
Perkins DN, et al. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis (1999) 20:3551–3567.[CrossRef][Web of Science][Medline]
Washburn MP, et al. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. (2001) 19:242–247.[CrossRef][Web of Science][Medline]
Zimmer JSD, et al. Advances in proteomics data analysis and display using an accurate mass and time tag approach. Mass Spectrom. Rev. (2006) 25:450–482.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
Z. He and W. Yu Improving peptide identification with single-stage mass spectrum peaks Bioinformatics, November 15, 2009; 25(22): 2969 - 2974. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Y. Chan, J. N. Sutton, J. M. Jacobs, A. Bondarenko, R. D. Smith, and M. G. Katze Dynamic Host Energetics and Cytoskeletal Proteomes in Human Immunodeficiency Virus Type 1-Infected Human Primary CD4 Cells: Analysis by Multiplexed Label-Free Mass Spectrometry J. Virol., September 15, 2009; 83(18): 9283 - 9295. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Umar, H. Kang, A. M. Timmermans, M. P. Look, M. E. Meijer-van Gelder, M. A. den Bakker, N. Jaitly, J. W. M. Martens, T. M. Luider, J. A. Foekens, et al. Identification of a Putative Protein Profile Associated with Tamoxifen Therapy Resistance in Breast Cancer Mol. Cell. Proteomics, June 1, 2009; 8(6): 1278 - 1294. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Eraso, J. H. Roh, X. Zeng, S. J. Callister, M. S. Lipton, and S. Kaplan Role of the Global Transcriptional Regulator PrrA in Rhodobacter sphaeroides 2.4.1: Combined Transcriptome and Proteome Analysis J. Bacteriol., July 15, 2008; 190(14): 4831 - 4848. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




