Skip Navigation


Bioinformatics Advance Access originally published online on June 1, 2007
Bioinformatics 2007 23(15):2021-2023; doi:10.1093/bioinformatics/btm281
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
23/15/2021    most recent
btm281v2
btm281v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (21)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Monroe, M. E.
Right arrow Articles by Smith, R. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Monroe, M. E.
Right arrow Articles by Smith, R. D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

VIPER: an advanced software package to support high-throughput LC-MS peptide identification

Matthew E. Monroe , Nikola Tolic , Navdeep Jaitly , Jason L. Shaw , Joshua N. Adkins and Richard D. Smith *

Pacific Northwest National Laboratory, Richland, WA 99354, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 CORE FUNCTIONS AND...
 3 SUMMARY
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: The accurate mass and time (AMT) tag approach is used for analysis of large scale experiments by combining information generated over multiple datasets and instrument types. The VIPER software package is one of the key components of the data processing pipeline and implements automated algorithms to discover LC-MS features, align and match these LC-MS features to a database of peptides previously identified in LC-MS/MS analyses, and identify and quantify pairs of isotopically labeled peptides.

Availability: VIPER may be downloaded free of charge at http://ncrr.pnl.gov/software/

Contact: rds{at}pnl.gov or proteomics{at}pnl.gov

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 CORE FUNCTIONS AND...
 3 SUMMARY
 ACKNOWLEDGEMENTS
 REFERENCES
 
VIPER (Visual Inspection of Peak/Elution Relationships) is an advanced software package developed initially to support high-throughput peptide identification in the accurate mass and time (AMT) tag approach to high-throughput proteomics (Zimmer et al., 2006). This approach is similar to ‘shotgun’ proteomics approaches (Washburn et al., 2001) in that proteins are first enzymatically cleaved into peptide fragments and then analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) to identify peptides using conventional software tools, such as SEQUEST, Mascot or X!Tandem (Craig and Beavis 2004; Eng et al., 1994; Perkins et al., 1999). However, unlike shotgun approaches, the results from these initial analyses are stored in a reference database in the form of mass and (LC elution) time tags. Each tag serves as a unique 2D marker for subsequent identifications of that particular peptide by high-resolution, high mass accuracy LC-MS (e.g. Fourier transform ion cyclotron resonance). The LC-MS data are first processed using another PNNL developed software package called Decon2LS (http://ncrr.pnl.gov/software/) that uses a version of the THRASH algorithm (Horn et al., 2000) to detect features (and their monoisotopic masses) in the individual mass spectra. VIPER then processes collections of MS features (e.g. as found using Decon2LS) across elution time to identify unique ‘LC-MS features,’ and also calibrates elution times, refines the mass calibration and matches the LC-MS features to mass and (elution) time tags in a reference database. The identified LC-MS features and the peptide/protein identifications can then be exported for further analysis using programs such as Microsoft Excel or Microsoft Access.


    2 CORE FUNCTIONS AND FEATURES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 CORE FUNCTIONS AND...
 3 SUMMARY
 ACKNOWLEDGEMENTS
 REFERENCES
 
VIPER uses a graphical user interface (GUI) to generate 2D plots that display the monoisotopic masses observed in each mass spectrum (Fig. 1) and the LC-MS features discovered when VIPER groups related data points by mass and elution time. However, unlike visualization programs, such as Pep3D (Li et al., 2004) and MS vendor-supplied data acquisition software, VIPER maps the observed LC-MS features onto known AMT tags in reference databases to identify peptides. VIPER can also run in an automated mode, loading and processing data based on customizable, user-defined settings. It is primarily intended to work with monoisotopic mass data, as obtained by deisotoping mass spectra from medium to high resolution mass spectrometers (e.g. TOF, FTICR or Orbitrap). It can read several file formats including .CSV, .mzXML and .mzData. Comma-separated value (.CSV) files can be generated by Decon2LS, an open source software package for deisotoping LC-MS data (available free of charge at http://ncrr.pnl.gov/software/), while mzXML and mzData are standard XML-based formats for MS data.


Figure 1
View larger version (36K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. The larger graphic shows a plot of monoisotopic masses versus scan number (time), color-coded by charge state. The upper left inset is a total ion chromatogram (TIC) of the entire dataset while the lower right inset shows 68 LC-MS features resolved in both the mass and time dimensions.

 
2.1 Data processing algorithms
The first major analysis step performed by VIPER is to discover LC-MS features from MS features present in individual spectra. To do so, VIPER groups similar MS features in adjacent spectra using a single linkage clustering algorithm with a weighted Euclidean distance function to calculate the distance between features (Fig. 1). Each LC-MS feature is assigned a median mass, a central normalized elution time (NET) and an abundance estimate. When an experiment has used isotopic labeling for relative quantitation, VIPER searches for pairs of LC-MS features with similar elution times, and relevant (customizable) mass differences as governed by the labeling method (e.g. 4.0085 Da for 16O/18O labeling or 8.051 Da for ICAT labeling; see Fig. 3 in the Supplementary Material).

In order to identify the LC-MS features, an AMT tag database must be selected to import data from either a Microsoft SQL Server database or from a Microsoft Access database. Once the reference data is loaded, VIPER calibrates the elution times of each LC-MS feature using either a linear alignment function or the recently developed LCMSWARP algorithm (Jaitly et al., 2006). LCMSWARP is a dynamic programming algorithm that scores the similarity between subsections of the LC-MS and LC-MS/MS datasets, then finds a transformation function by discovering the best path between the subsections. It also perform mass recalibration using an additive model to model mass errors as a function of elution time and m/z. LC-MS features are then identified by matching the monoisotopic mass and calibrated elution times of features to those of peptides in the database based on their mass and elution time coordinates. Ambiguity due to multiple matches is resolved using relative probabilities based on the Mahalanobis distance and relative probability of the peptides being observed (Anderson et al., 2006).

VIPER next generates a list of mass differences between LC-MS features and matching AMT tags. These matches contain both true and false matches. The distribution is separated into normal (true matches) and uniform (random matches) components using an Expectation Maximization algorithm (Fig. 8 in the Supplementary Material). The standard deviation of the NET and mass components is used to compute the appropriate mass and NET tolerances for filtering the matched AMT tags, thereby improving the confidence in the peptide identifications by removing the background false positive matches. Additional details of the data processing steps are available in the Supplementary Material.

2.2 Visualization tools
VIPER's GUI interface allows users to navigate 2D plots of mass and elution time for a particular dataset. MS features in the main display can be colored to indicate the charge state of the ion detected (Fig. 1), while the user may restrict the points to be visualized using various filters. The LC-MS Feature Browser allows the user to individually examine each LC-MS feature. As the features are browsed, a selected ion chromatogram (abundance versus time) is displayed and the 2D plot is automatically updated to display the region of interest in mass–time space. The Pairs Browser (Fig. 9 in the Supplementary Material) performs a similar function for pairs of LC-MS features identified, plotting the abundance of each feature against scan number to provide a visual indication of the relative abundance of each member of the pair.


    3 SUMMARY
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 CORE FUNCTIONS AND...
 3 SUMMARY
 ACKNOWLEDGEMENTS
 REFERENCES
 
VIPER combines into one software package a host of useful functions and capabilities that facilitate and standardize analysis and processing of LC-MS data for peptide quantitation and identification using the AMT tag approach. The software includes an interactive GUI interface, plus the ability to automatically analyze a series of datasets using a parameter file to guide the analysis. The Supplementary Material includes a list of 23 selected publications for which VIPER was used for the data analysis, including a summary of the results from Hixson et al.. (2006).


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 CORE FUNCTIONS AND...
 3 SUMMARY
 ACKNOWLEDGEMENTS
 REFERENCES
 
Portions of this research were supported by the US Department of Energy Office of Biological and Environmental Research Genomes:GtL Program, the NIH National Center for Research Resources (Grant RR018522) and the National Institute of Allergy and Infectious Diseases (NIH/DHHS through interagency agreement Y1-AI-4894-01). This manuscript has been authored by Battelle Memorial Institute, Pacific Northwest Division, under Contract No. DE-AC05-76RL0 1830 with the US Department of Energy. By accepting the article for publication, the publisher acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Alfonso Valencia

Received on January 16, 2007; revised on May 17, 2007; accepted on May 18, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 CORE FUNCTIONS AND...
 3 SUMMARY
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Anderson KK, et al. Estimating probabilities of peptide database identifications to LC-FTICR-MS observations. Proteome Science (2006) 4:1.[CrossRef][Medline]

    Craig R, Beavis RC. TANDEM: matching proteins with mass spectra. Bioinformatics (2004) 20:1466–1467.[Abstract/Free Full Text]

    Eng JK, et al. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. (1994) 5:976–989.[CrossRef][Web of Science]

    Hixson KK, et al. Biomarker candidate identification in Yersinia pestis using organism-wide semiquantitative proteomics. J. Proteome Res. (2006) 5:3008–3017.[CrossRef][Web of Science][Medline]

    Horn DM, et al. Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules. J. Am. Soc. Mass Spectrom (2000) 11:320–332.[CrossRef][Web of Science][Medline]

    Jaitly N, et al. Robust algorithm for alignment of liquid chromatography-mass spectrometry analyses in an accurate mass and time tag data analysis pipeline. Anal. Chem. (2006) 78:7397–7409.[Medline]

    Li X-J, et al. A tool to visualize and evaluate data obtained by liquid chromatography-electrospray ionization-mass spectrometry. Anal. Chem. (2004) 76:3856–3860.[Medline]

    Perkins DN, et al. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis (1999) 20:3551–3567.[CrossRef][Web of Science][Medline]

    Washburn MP, et al. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. (2001) 19:242–247.[CrossRef][Web of Science][Medline]

    Zimmer JSD, et al. Advances in proteomics data analysis and display using an accurate mass and time tag approach. Mass Spectrom. Rev. (2006) 25:450–482.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
Z. He and W. Yu
Improving peptide identification with single-stage mass spectrum peaks
Bioinformatics, November 15, 2009; 25(22): 2969 - 2974.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
E. Y. Chan, J. N. Sutton, J. M. Jacobs, A. Bondarenko, R. D. Smith, and M. G. Katze
Dynamic Host Energetics and Cytoskeletal Proteomes in Human Immunodeficiency Virus Type 1-Infected Human Primary CD4 Cells: Analysis by Multiplexed Label-Free Mass Spectrometry
J. Virol., September 15, 2009; 83(18): 9283 - 9295.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
A. Umar, H. Kang, A. M. Timmermans, M. P. Look, M. E. Meijer-van Gelder, M. A. den Bakker, N. Jaitly, J. W. M. Martens, T. M. Luider, J. A. Foekens, et al.
Identification of a Putative Protein Profile Associated with Tamoxifen Therapy Resistance in Breast Cancer
Mol. Cell. Proteomics, June 1, 2009; 8(6): 1278 - 1294.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
J. M. Eraso, J. H. Roh, X. Zeng, S. J. Callister, M. S. Lipton, and S. Kaplan
Role of the Global Transcriptional Regulator PrrA in Rhodobacter sphaeroides 2.4.1: Combined Transcriptome and Proteome Analysis
J. Bacteriol., July 15, 2008; 190(14): 4831 - 4848.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary data
Right arrow All Versions of this Article:
23/15/2021    most recent
btm281v2
btm281v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (21)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Monroe, M. E.
Right arrow Articles by Smith, R. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Monroe, M. E.
Right arrow Articles by Smith, R. D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?