Skip Navigation

Bioinformatics 2007 23(2):e191-e197; doi:10.1093/bioinformatics/btl299
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kohlbacher, O.
Right arrow Articles by Sturm, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kohlbacher, O.
Right arrow Articles by Sturm, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Proteomics

TOPP—the OpenMS proteomics pipeline

Oliver Kohlbacher 1,*, Knut Reinert 2, Clemens Gröpl 2, Eva Lange 2, Nico Pfeifer 1, Ole Schulz-Trieglaff 2 and Marc Sturm 1

1 Simulation of Biological Systems, Eberhard Karls University Tübingen Sand 14, 72076 Tübingen, Germany
2 Algorithmic Bioinformatics, Free University Berlin Takustrasse 9, 14195 Berlin, Germany

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TOPP COMPONENTS
 3 DESIGN AND IMPLEMENTATION
 4 EXAMPLE APPLICATIONS
 5 DISCUSSION AND CONCLUSION
 REFERENCES
 

Motivation: Experimental techniques in proteomics have seen rapid development over the last few years. Volume and complexity of the data have both been growing at a similar rate. Accordingly, data management and analysis are one of the major challenges in proteomics. Flexible algorithms are required to handle changing experimental setups and to assist in developing and validating new methods. In order to facilitate these studies, it would be desirable to have a flexible ‘toolbox’ of versatile and user-friendly applications allowing for rapid construction of computational workflows in proteomics.

Results: We describe a set of tools for proteomics data analysis—TOPP, The OpenMS Proteomics Pipeline. TOPP provides a set of computational tools which can be easily combined into analysis pipelines even by non-experts and can be used in proteomics workflows. These applications range from useful utilities (file format conversion, peak picking) over wrapper applications for known applications (e.g. Mascot) to completely new algorithmic techniques for data reduction and data analysis. We anticipate that TOPP will greatly facilitate rapid prototyping of proteomics data evaluation pipelines. As such, we describe the basic concepts and the current abilities of TOPP and illustrate these concepts in the context of two example applications: the identification of peptides from a raw dataset through database search and the complex analysis of a standard addition experiment for the absolute quantitation of biomarkers. The latter example demonstrates TOPP's ability to construct flexible analysis pipelines in support of complex experimental setups.

Availability: The TOPP components are available as open-source software under the lesser GNU public license (LGPL). Source code is available from the project website at www.OpenMS.de

Contact: oliver.kohlbacher{at}uni-tuebingen.de


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TOPP COMPONENTS
 3 DESIGN AND IMPLEMENTATION
 4 EXAMPLE APPLICATIONS
 5 DISCUSSION AND CONCLUSION
 REFERENCES
 
HPLC-MS-based proteomics applications require the management of large amounts of data in quite complex ways. While some key steps in the data analysis pipeline are common to all applications, the arrangement of these steps and the context of the data analysis is highly dependent on the kind of experiment being conducted. Hence, the overall data analysis pipeline is often subjected to large changes, while the underlying analysis algorithms remain mostly identical. We propose a new set of proteomics tools built upon our software framework OpenMS which addresses this problem using a set of predefined tools that can be combined into a proteomics workflow in a simple file-driven way. Adapting a proteomics analysis pipeline to a new experiment or a new data analysis strategy then only requires minor modifications to this workflow. Each tool handles a well-defined functionality in the area of proteomics data analysis. While the individual applications range from very trivial to rather complex tasks (e.g. file format conversion, peak picking, noise reduction, spectrum identification, quantitation, etc.) their combined value arises from the fact that they share a common interface, common formats and common configuration files. They can thus be combined like building blocks to perform more complex analysis tasks, an idea already used in similar toolboxes in bioinformatics, e.g. in EMBOSS (Olson, 2002). The analysis pipeline then defines the wiring of these building blocks by executing small scripts connecting individual building blocks. Manual analyses during the development of a pipeline are supported through a system of log files allowing the reconstruction of every processing step. The debugging output can be turned off as soon as the pipeline works as intended.

Some of the tasks above can be performed with the vendor software of the mass spectrometer as well. An example is the commercial software MassLynx (Waters, Inc.) which performs relative quantitation of proteins without the use of labeling methods. However, we believe that OpenMS offers a much higher degree of flexibility since the user is allowed to combine the individual components of TOPP according to his individual needs. Furthermore all algorithmic details are either published or can be found in the documentation. The user is always in full control of the data workflow and not dependent on any manufacturer, just to name another advantage that TOPP has over most commercial software.

There are also some academic software projects with aims similar to OpenMS. Closest to our idea is the Trans-Proteomic Pipeline (TPP) (Keller et al., 2005) developed at the Institute of Systems Biology (ISB) in Seattle (USA). The TPP makes use of open XML file formats for storage of data at the raw data, peptide and protein levels. It integrates other tools developed at the ISB into a coherent framework. Among these tools are PeptideProphet (Keller et al., 2005) which validates peptides assigned to MS/MS spectra, XPRESS (Han et al., 2001) and ASAPRatio (Li et al., 2003) that quantify peptides and proteins in differentially labeled samples. Pep3D (Li et al., 2004) visualizes the raw spectral data, and ProteinProphet (Nesvizhskii et al., 2003) infers sample proteins. At its current status, the main emphasis of the TPP is on peptide identification and quantitation. Only limited preprocessing of the data is possible and the software deals with the spectra as they leave the mass spectrometer.

Several groups, such as Katajamaa et al. (2006), Leoptos et al. (2006), Li et al. (2005), Palagi et al. (2005), Radulovic et al. (2004), Samuelsson et al. (2004), have developed other working systems for proteomic data analyses. Most of them focus on a single task such as protein identification or quantitation. However, in some situations, it might be preferable to have more control over the workflow of the data analysis in order to build customized applications or even implement own algorithms using existing data structures. TOPP offers all of these possibilities. In its current version, it can perform the main computational tasks occurring in proteomics experiments such as visualization, protein identification, quantitation, alignment (mapping) of samples and a basic statistical analysis. Besides, it comes with a comprehensible documentation and simple interfaces. Moreover, it is possible to develop new applications and to contribute ideas to the OpenMS framework.

By using standardized data exchange formats it is possible to combine TOPP components into complex workflows. In contrast to other academic projects, we did not only develop sophisticated algorithms but also aim for a software which is easy to use and most of all extensible.

In the following section we give a short overview of the main components implemented in TOPP and then demonstrate the versatility of our system by using the components in some pipelines that can be used for complex proteomics analysis.


    2 TOPP COMPONENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TOPP COMPONENTS
 3 DESIGN AND IMPLEMENTATION
 4 EXAMPLE APPLICATIONS
 5 DISCUSSION AND CONCLUSION
 REFERENCES
 
The individual tools (components) of TOPP can be grouped into several distinct packages: import/export, signal processing, identification, quantitation and analysis (Fig. 1). We will now briefly discuss the major components of each area.


Figure 1
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 The TOPP tools are grouped into five packages, each addressing a major area of functionality.

 
2.1 Import/export
File handling is important for all proteomics data analysis tools as there are at least two standard file formats and a lot of proprietary formats. The FileConverter converts several commonly used MS formats into each other. Supported formats are mzData (HUPO-PSI) (Orchard et al., 2006), mzXML (Sashimi project) (Pedrioli et al., 2004), ANDI/MS and several text-based formats.

The FileFilter extracts parts of a file such as specific types of spectra or data intervals. It allows the specification of a set of rules and extracts all data matching these rules from the input file. The filtering can be based on spectrum type, e.g. extract all MS/MS spectra from a combined MS-MS/MS run or on geometric criteria. In the latter case, it allows simple range operations on the data, such as the extraction of rectangles with respect to retention time (RT) and mass-to-charge (m/z) from an HPLC-MS run, or the extraction of a specific m/z region from a set of MS spectra. The FileInfo shows basic information about an MS data file, i.e. m/z range, RT range, intensity range and the type of the spectra in the file.

As the TOPP components support file-based operations only the two auxiliary components DBImporter and DBExporter are provided for database connectivity. DBExporter exports experimental data from an OpenMS database to one or several files. After processing the data, DBImporter is used to store the results in the database again. Database connectivity is especially useful to distribute data for grid and workflow applications.

2.2 Signal processing
MS raw data are always disturbed by baseline fluctuations and two types of noise: chemical (colored) noise and random (white) noise. To improve the reliability of the data for further analysis steps, noise and baseline should be removed.

For noise reduction we implemented two different smoothing filters in the signal processing component NoiseFilter, which are a peak area preserving Gaussian low-pass filter and a Savitzky–Golay low-pass filter recommended for spectrometric data (Press et al., 2002; Savitzky and Golay, 1999).

Baseline correction of raw MS data can be performed using the BaselineFilter component. For the baseline in MS experiments no universally accepted analytical expression exists. Therefore, we decided to implement a non-linear filter, known as the top-hat operator in morphological mathematics (Soille, 1999), as it does not depend on the underlying baseline shape.

Many of the higher level analysis steps, such as identification, rely on precise information about mass spectrometric peaks. The PeakPicker component extracts this information, which means it converts the ‘raw’ ion count acquired by the mass spectrometer into peak lists. Our peak picking approach (Lange et al., 2006) uses the multi-scale nature of spectrometric data and first detects the mass peaks in the wavelet-transformed signal. Afterwards important peak parameters (centroid, area, height, signal-to-noise ratio, asymmetric peak shape) are extracted by fitting an asymmetric peak function to the raw data. In an optional third step, the resulting fit can be further improved by using techniques from non-linear optimization. In contrast to currently established techniques, our algorithm yields precise peak positions even for data with low resolution and is able to separate overlapping peaks of multiply charged peptides.

2.3 Identification
By sending MS/MS spectra to an identification tool one can determine the peptides present in a sample, which in turn can be used to identify the proteins. TOPP will provide a number of adapters for identification tools. So far MascotAdapter, an adapter for the database-driven search algorithm Mascot (Perkins et al., 1999), is available. Adapters for other popular search engines such as SEQUEST (Tabb et al., 2001) and InSpecT (Tanner et al., 2001) as well as an adapter to the de novo sequencing code LuteFisk (Taylor and Johnson, 2000) are under development.

These adapters facilitate the integration of identification tools by handling both their input and output. They transform a set of MS/MS spectra to the specific formats required for input. After analysis by the identification tool, the resulting output is parsed and converted to analysisXML, a Proteomics Standards Initiative [PSI, Orchard et al. (2003)] compliant format for identification. Since there is often more than one candidate peptide hit for a certain spectrum, we provide the IDFilter component to filter out relevant hits by different filter criteria. One such filter criterion is that the score of the peptide hit exceeds a significance threshold.

To distinguish furthermore between correct and incorrect peptide hits the retention time can be taken into account (Petritis et al., 2003). This is done by the RTModel and the RTPredict components. RTModel, which uses a support vector machine (Chang and Lin, 2001, http://www.csie.ntu.edu.tw/cjlin/libsvm), is trained using high quality peptide-retention time pairs. Afterwards the model can be used in the RTPredict component to predict retention times for peptide hits. These predicted retention times can then be used to filter the peptide hits since a large difference between measured and predicted retention time suggests false peptide identification.

2.4 Quantitation
HPLC-MS experiments produce a flood of data that is difficult to handle and analyze. It is necessary to reduce the raw instrument data to the essential features therein: the retention time, mass-to-charge ratio and intensity of each peptide (or any other component) eluting from the column. The transformation of raw instrument data to the so-called feature maps reduces the data volume and improves the running time of further analysis steps (Gröpl et al., 2005). Besides, it yields valuable secondary information which is not immediately evident from the raw data such as charge estimates of peptides. The idea of two-dimensional raw data maps and the concept of peptide features is not novel (Leptoz et al., 2004; Radulovic et al., 2004) but has just started to emerge as a basis of quantitation and visualization of MS data.

Another important step, especially for differential quantitation, is the alignment of two maps. To calculate an alignment matching features between the two maps are determined. The list of matching features forms the basis for the computation of a suitable transformation between the two maps. The transformation is then applied to correct the coordinates of one of the maps such that both can be superimposed.

Initially a feature map is created from HPLC raw data by a run of the FeatureFinder component. It identifies the raw data points belonging to a feature and fits a two-dimensional model to the extracted region of the input data. The model is based on a bi-Gaussian elution profile along the retention time axis (or any other appropriate function) and a theoretical isotope pattern along the m/z axis. The output of the feature finder is a map of features, each identified by its RT and m/z coordinates and its intensity. Figure 2 shows a part of the raw data file and the features found in it. Figure 3 shows an example of how a feature model is adjusted to a (small) segment of the input data. The details of the algorithm have been described elsewhere (Gröpl et al., 2005; Mayr et al., 2006). Features can have annotations like the charge state and the quality of the model fit to the data.


Figure 2
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Feature finding from a global perspective. A section of a LC/MS raw data map (left) and the features extracted by the FeatureFinder (right). Visualization was done using TOPPView.

 


Figure 3
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 A small part of the raw data (left) and a model adjusted to it by FeatureFinder (right).

 
The UnlabeledMatcher's task is to match the features across maps. It takes two feature maps as input and creates a list of feature pairs. In its simplest form it will find all pairs of matching features according to some user-specified distance criteria.

Owing to calibration issues and changes in the experimental settings both mass-to-charge ratio and HPLC retention time can vary between two experiments. That is why features arising from the same species often need to be shifted along the RT or m/z dimension before matching. Thus we also developed a more sophisticated algorithm based on geometric hashing (Wolfson and Rigoutsos, 1997) to produce possible pairs of features after a suitable translation.

The transformation itself is computed by MapMapper. MapMapper takes as input a bijection between some features in these maps. It then computes a dewarping function that corrects for coordinate shifts between the two maps. The dewarping function is typically a piecewise linear or quadratic function. Its application to the coordinates of the second map will move all features in this map as close as possible to the coordinates of their corresponding features in the first map.

The transformation function is then applied to a map by MapDewarper. As described above, we decided to divide the process of MS sample alignment into three stages: computation of potential feature partners, estimation of a transformation of the feature coordinates and application of this transformation (dewarping). The reason for this decision was to achieve a high flexibility. This architecture allows us to use different computational approaches in each step and to improve the individual TOPP components independently from each other.

2.5 Analysis
Having preprocessed the data, one might want to perform further analyses. TOPP offers several components for this purpose. AdditiveSeries can be used to conduct an absolute quantitation of peptides. It uses the feature intensities as computed by the FeatureFinder application and evaluates the data of an additive measurement (Section 4.2).

TOPPView can be used for visual inspection of MS data. It is able to visualize one-dimensional spectra, two-dimensional maps and the results of our peak picking and feature finding algorithms. TOPPView supports all the file formats used by TOPP and can visualize data from OpenMS databases as well.

Finally, the tool MapStatistics computes a five-number summary of a feature map. This summary consists of median, minimum, maximum and the quartiles of the feature intensities and qualities in a map. These values provide a measure of location and spread, and they can be used to estimate the quality of the preprocessing steps. Feature maps with highly unusual statistics might be excluded from the further analysis workflow.


    3 DESIGN AND IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TOPP COMPONENTS
 3 DESIGN AND IMPLEMENTATION
 4 EXAMPLE APPLICATIONS
 5 DISCUSSION AND CONCLUSION
 REFERENCES
 
All the components of TOPP are designed to be versatile. They can be used both individually as command line tools and chained into linear or more complex pipelines. Chaining is done through makefiles, simple shell scripts or as components of complex workflow systems in distributed or GRID environments, e.g. by workflow systems like Taverna (Oinn et al., 2004).

In order to make the TOPP components easy to combine, we use standard file formats, such as mzData and analysisXML only. This also facilitates the integration of external tools supporting standard formats. A pipeline-specific control file provides parameters to all components and directs the data flow between them. In the control file a set of parameters for each individual invocation of a tool can be provided. For tasks which cannot be done with TOPP, wrapper components are provided to integrate commonly used applications. As an example, Mascot can be integrated to perform peptide identification by database search.

One of the design goals is user-friendliness. Hence, all TOPP components share a common base interface and provide a detailed description for all parameters. In addition, a full documentation of all components and examples are available on our website.

TOPP is based on OpenMS, an object-oriented software platform for shotgun proteomics. OpenMS makes extensive use of generic programming techniques in C++ and thereby provides fast execution of programs and portable code. It is tested on different Linux platforms (e.g. Fedora Core 4, Scientific Linux 4 and Suse Linux 9.0-10.1) using 32 bit and 64 bit architectures. OpenMS itself is based on several other open-source libraries such as QT (TrollTech Inc.) and the GNU Scientific Library.

The TOPP components use OpenMS data-structures and algorithms and provide a coherent interface for them. As TOPP shows, OpenMS can easily be used or extended in order to create new applications in the field of proteomics data analysis. It has already been used in other projects (Lange et al., 2006; Mayr et al., 2006) and will be developed further continuously. All future extensions of OpenMS functionality will be turned into new TOPP components making the pipeline more powerful. Just as TOPP, OpenMS comes with a Doxygen documentation of all the classes. In addition, a tutorial and several example applications offer a starting point for new users. OpenMS and TOPP are available as open-source software under the lesser GNU public license (LGPL) from www.OpenMS.de.


    4 EXAMPLE APPLICATIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TOPP COMPONENTS
 3 DESIGN AND IMPLEMENTATION
 4 EXAMPLE APPLICATIONS
 5 DISCUSSION AND CONCLUSION
 REFERENCES
 
Using the TOPP components, one can easily set up simple, yet powerful proteomics workflows. In its simplest form, a TOPP workflow merely consists of a shell script calling the individual components in a well-defined order. The output of each component is passed on to the next component. All components obtain their settings from a common configuration file, which contains individual sections for each component. The settings can be passed through the command line as well, but this method is more error-prone. In this section we will present two examples of analyses that we successfully implemented and ran using TOPP.

4.1 Simple tandem-MS identification pipeline
In this example pipeline the task is to identify all the peptides in an HPLC-MS/MS run. The starting point is the raw data exported from an MS machine (in one of the supported raw data formats). In the first step MS/MS spectra are extracted from the HPLC-MS file, as only these spectra are needed for the identification. In order to improve the quality of the data a noise filter is applied before reducing the raw data to stick spectra with the PeakPicker. Finally, Mascot is used to produce a list of peptide identification candidates for each spectrum. These candidates are then validated using the IDFilter. Unlikely identifications are removed in this step. The output of the pipeline consists of a list of identified peptides and the reliability of each identification in analysisXML format.

The shell script executing the pipeline is given below:Formula

The result of this particular workflow is an XML document adhering to the standards proposed by the PSI, which can be displayed in every standard web browser using appropriate style sheets. Except for the intermediary raw data files, all files produced in the pipeline are human-readable formats and—as far as possible—adhere to PSI standards.

We tested the pipeline on 1371 MS/MS spectra of a sample with five different proteins having a total of 234 possible tryptic peptides. These 1371 spectra led to 74 identified peptides, the majority of which could be assigned to the five proteins. The whole pipeline took ~12 min on a two processor AMD Opteron 250.

4.2 Absolute Quantitation
The goal of our second example workflow is to determine the concentration of myoglobin, a marker for myocardial infarction, in human serum. The experimental setup has already been described in detail in (Mayr et al., 2005), as well as the computational approach (Gröpl et al., 2005). In this section, we will therefore merely summarize the key concepts and then describe how we implemented this workflow in TOPP.

Myoglobin is a protein of low-molecular mass present in the cytosol of the cardiac and skeletal muscle. It quickly appears in blood after tissue injury and therefore represents a well-known biochemical marker for myocardial infarction. Nevertheless, results from different analytical procedures for myoglobin quantitation showed significant bias owing to a lack of assay standardization. The aim of the project described in (Mayr et al., 2006) was to develop a reference measurement procedure for myoglobin in human serum. Strong anion-exchange (SAX) chromatography was used to separate highly abundant serum proteins from the myoglobin fraction, which was then trypsinized and analyzed by reversed-phase high-performance liquid chromatography in combination with electrospray ionization mass spectrometry (RP-HPLC-ESI-MS).

To reduce the quantitation error, a constant amount of horse myoglobin was added as internal standard. Furthermore, an additive series was performed by adding known amounts of human myoglobin to aliquots of the sample. The absolute quantitation was conducted by determining the x-intercept of a linear regression using the ratio of the 11th tryptic peptide of human myoglobin (T11hu) and the 10th tryptic peptide of horse myoglobin (T10ho).

To speed up the computational analysis, all raw data files were truncated to a retention time range of 900–1600 s and an m/z range of 600–1000 Th. These ranges were chosen such that all myoglobin peptides were included in the truncated raw data maps. In addition, data points with very low intensities were filtered out using a threshold well below the noise level. No further preprocessing or peak picking was performed. We used our feature finding algorithm as it was described in Section 2.4 to identify and quantify the myoglobin peptides.

The workflow was this time implemented in TOPP, in contrast to the procedure described in (Gröpl et al., 2005), and allows a fully automated execution of the analysis. Using the TOPP components described above we were able to perform the myoglobin quantitation by executing a simple shell script. Furthermore, we used the TOPP components to estimate the shift in retention time and m/z between the different LC/MS maps and to map the myoglobin features from different maps directly onto each other (see Section 2.4 for details). This allows a direct comparison of different samples and reduces the likelihood of errors.

The shell script executing the pipeline is given below. First, all maps are truncated and the FeatureFinder is executed on each dataset. In the second loop, all feature maps are mapped on one reference map in a star-like alignment. Each TOPP module reads its parameters from the file AddSeries.ini which holds a separate section for each dataset. The number of the current section is given by parameter -n. Executing this pipeline took <1 h for each map. Our feature detection algorithm found on average 300 features in each map. The size of the raw dataset was 258 MB after the truncation and the feature detection led to a reduction by 90% to 26 MB in total.

The results of this experiment are shown in Fig. 4. Our quantitation gives an estimate of the absolute myoglobin concentration of 0.417 ng/µl (true value is 0.463 ng/µl) whereas a manual expert analysis yielded a result of 0.382 ng/µl.Formula


Figure 4
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 Regression results for the automated analysis of myoglobin in the serum samples.

 

    5 DISCUSSION AND CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TOPP COMPONENTS
 3 DESIGN AND IMPLEMENTATION
 4 EXAMPLE APPLICATIONS
 5 DISCUSSION AND CONCLUSION
 REFERENCES
 
We have presented TOPP (The OpenMS Proteomics Pipeline)—a set of practical tools that can easily be combined into proteomics pipelines. The TOPP components are based on OpenMS, the underlying C++ framework. Two standard proteomics workflows (a simple tandem-MS identification pipeline, and an absolute quantitation by an additive series) have been described in detail to show its functionality.

The development of software platforms for mass spectrometry based proteomics is currently a very active field of research. We believe that OpenMS, being an open-source project under the lesser GNU public license, fills a gap between commercial software products supplied by vendors of mass spectrometers and academic software projects that are often not very user friendly and offer a much smaller range of functions than TOPP.

TOPP and OpenMS are being developed further in ongoing research projects. For example, we plan to provide a TOPP pipeline for the detection and analysis of feature pairs resulting from labeling techniques like ICAT. Another planned extension is to improve upon the current capabilities of peptide identification algorithms.


    Acknowledgments
 
The authors wish to thank Prof. Dr Christian Huber (Saarland University, Germany) for providing the experimental data used to test the example pipelines.


    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 TOPP COMPONENTS
 3 DESIGN AND IMPLEMENTATION
 4 EXAMPLE APPLICATIONS
 5 DISCUSSION AND CONCLUSION
 REFERENCES
 

    Chang, C.-C. and Lin, C.-J. LIBSVM: a library for support vector machines, (2001) .

    Gröpl, C., Lange, E., Reinert, K., Kohlbacher, O., Sturm, M., Huber, C.G., Mayr, B.M., Klein, C.L. (2005) In Berthold, M. (Ed.). Algorithms for the automated absolute quantification of diagnostic markers in complex proteomics samples. Procceedings of CompLife 2005, Lecture Notes in BioinformaticsSpringer, Heidelberg, pp. 151–163.

    Han, D.K., et al. (2001) Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat. Biotechnol, . 19, 946–951[CrossRef][Web of Science][Medline].

    Katajamaa, M., et al. (2006) MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data. BMC Bioinformatics, 6, 634–636.

    Keller, A. (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem, . 74, 5383–5392[Medline].

    Keller, A., et al. (2005) A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol. Syst. Biol., R, . 1, E1–E8.

    Lange, E., Gröpl, C., Reinert, K., Kohlbacher, O., Hildebrandt, A. (2006) High accuracy peak-picking of proteomics data using wavelet techniques. Proceedings of PSB 2006World Scientific, Singapore, pp. 243–254.

    Leptos, K.C., et al. (2006) MapQuant: open-source software for large-scale protein quantification. Proteomics, 6, 1770–1782[CrossRef][Web of Science][Medline].

    Li, X.-J., et al. (2004) A tool to visualize and evaluate data obtained by liquid chromatography/electrospray ionization/mass spectrometry. Anal. Chem, . 76, 3856–3860[Medline].

    Li, X.-J., et al. (2005) A software suite for the generation and comparison of peptide arrays from sets of data collected by liquid chromatography-mass spectrometry. Mol. Cell Proteomics, 4, 1328–1340[Abstract/Free Full Text].

    Xiao-Jun, Li., et al. (2003) Automated statistical analysis of protein abundance ratios from data generated by stable isotope dilution and tandem mass spectrometry. Anal. Chem, . 75, 6648–6657[Medline].

    Mayr, B.M., et al. (2006) Absolute myoglobin quantitation in serum by combining two-dimensional liquid chromatography-electrospray ionization mass spectrometry and novel data analysis algorithms. J. Proteome Res, . 5, 414–421[CrossRef][Web of Science][Medline].

    Nesvizhskii, A.I., et al. (2003) A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem, . 75, 4646–4658[Medline].

    Oinn, T., et al. (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20, 3045–3054[Abstract/Free Full Text].

    Olson, S.A. (2002) EMBOSS opens up sequence analysis. European Molecular Biology Open Software Suite. Brief Bioinform, . 3, 87–91[Free Full Text].

    Orchard, S., et al. (2003) The proteomics standards initiative. Proteomics, 3, 1374–1376[CrossRef][Web of Science][Medline].

    Orchard, S., et al. (2005) Workshop of the Human Proteome Organisation Proteomics Standards Initiative (HUPO-PSI) Geneva, September 4–6. Proteomics, 6, 738–741[CrossRef].

    Palagi, P.M., et al. (2005) MSight: an image analysis software for liquid chromatography-mass spectrometry. Proteomics, 5, 2381–2384[CrossRef][Web of Science][Medline].

    Pedrioli, P.G., et al. (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat. Biotechnol, . 22, 1459–1466[CrossRef][Web of Science][Medline].

    Perkins, D.N., et al. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 20, 3551–3567[CrossRef][Web of Science][Medline].

    Petritis, K., et al. (2003) Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analyses. Anal. Chem, . 75, 1039–1048[Medline].

    Press, W.H., et al. Numerical Recipes in C++: The Art of Scientific Computing, . (2002) , Cambridge University Press.

    Radulovic, D., et al. (2004) Informatics platform for global proteomic profiling and biomarker discovery using liquid chromatography-tandem mass spectrometry. Mol. Cell Proteomics, 3, 984–997[Abstract/Free Full Text].

    Samuelsson, J., et al. (2004) Modular, scriptable and automated analysis tools for high-throughput peptide mass fingerprinting. Bioinformatics, 20, 3628–3635[Abstract/Free Full Text].

    Savitzky, A. and Golay, M.J.E. (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem, . 36, 1627–1639[CrossRef].

    Soille, P. Morphological Image Analysis, (1999) , Berlin and Heidelberg Springer-Verlag.

    Tabb, D.L., Eng, J.K., Yates, J.R. (2001) Proteome research: mass spectrometry. In James, P. (Ed.). Protein Identification by SEQUEST, , New York Springer Vol 1, , pp. 125–142.

    Tanner, S., et al. (2005) Inspect: Fast and accurate identification of post-translationally modified peptides from tandem mass spectra. Anal. Chem, . 77, 4626–4639[Medline].

    Taylor, J.A. and Johnson, R.S. (2000) Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Anal. Chem, . 73, 2594–2604[CrossRef].

    Wolfson, H.J. and Isidore, Rigoutsos. (1997) Geometric hashing: an overview. IEEE Comput. Sci. Eng, . 4, 10–21.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
S. P. Albaum, H. Neuweger, B. Franzel, S. Lange, D. Mertens, C. Trotschel, D. Wolters, J. Kalinowski, T. W. Nattkemper, and A. Goesmann
Qupe--a Rich Internet Application to take a step forward in the analysis of mass spectrometry-based quantitative proteomics experiments
Bioinformatics, December 1, 2009; 25(23): 3128 - 3134.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Hussong, B. Gregorius, A. Tholey, and A. Hildebrandt
Highly accelerated feature detection in proteomics data sets using modern graphics processing units
Bioinformatics, August 1, 2009; 25(15): 1937 - 1943.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Kessner, M. Chambers, R. Burke, D. Agus, and P. Mallick
ProteoWizard: open source software for rapid proteomics tools development
Bioinformatics, November 1, 2008; 24(21): 2534 - 2536.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
J. W. H. Wong, M. J. Sullivan, and G. Cagney
Computational methods for the comparative quantification of proteins in label-free LCn-MS experiments
Brief Bioinform, March 1, 2008; 9(2): 156 - 165.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kohlbacher, O.
Right arrow Articles by Sturm, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kohlbacher, O.
Right arrow Articles by Sturm, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?