Skip Navigation


Bioinformatics Advance Access originally published online on January 10, 2006
Bioinformatics 2006 22(5):634-636; doi:10.1093/bioinformatics/btk039
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/5/634    most recent
btk039v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (68)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Katajamaa, M.
Right arrow Articles by Oresic, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Katajamaa, M.
Right arrow Articles by Oresic, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data

Mikko Katajamaa 1, Jarkko Miettinen 2 and Matej Oresic 2,*

1Turku Centre for Biotechnology Turku, Finland
2VTT Technical Research Centre of Finland Espoo, Finland

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 IMPORT AND PROCESSING OF...
 ESTIMATION OF AREAS FOR...
 CONCLUSIONS
 REFERENCES
 

Summary: New additional methods are presented for processing and visualizing mass spectrometry based molecular profile data, implemented as part of the recently introduced MZmine software. They include new features and extensions such as support for mzXML data format, capability to perform batch processing for large number of files, support for parallel processing, new methods for calculating peak areas using post-alignment peak picking algorithm and implementation of Sammon's mapping and curvilinear distance analysis for data visualization and exploratory analysis.

Availability: MZmine is available under GNU Public license from http://mzmine.sourceforge.net/

Contact: matej.oresic{at}vtt.fi


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 IMPORT AND PROCESSING OF...
 ESTIMATION OF AREAS FOR...
 CONCLUSIONS
 REFERENCES
 
Mass spectrometry coupled to liquid or gas chromatography, or capillary electrophoresis (LC/MS, GC/MS or CE/MS, respectively) is increasingly utilized for differential profiling of biological samples. The applications of such an approach can be found in domains of systems biology, functional genomics and biomarker discovery. One of the ongoing challenges of such molecular profiling approaches is the development of better data processing methods.

We have recently introduced a suite of tools for the processing of mass spectrometry based profile data (Katajamaa and Oresic, 2005). MZmine implements solutions for several stages of data processing, including input file manipulation, spectral filtering, peak detection, chromatographic alignment, normalization, visualization and data export. MZmine (version 0.55) is a stand-alone Java application requiring Java Runtime Environment 5.0 or higher. It is therefore platform-independent, and successful installations have been reported on systems running Linux, Windows and Mac OS X, utilizing the software to process data from a variety of LC/MS and GC/MS instruments.

In this paper we report new developments of the software that include solutions for automated processing of large numbers of spectra, enhanced secondary peak picking method, as well as extension of software to post-processing by implementation of two methods for non-linear mapping of high-dimensional profile data into two-dimensional space.


    IMPORT AND PROCESSING OF FILES
 TOP
 ABSTRACT
 INTRODUCTION
 IMPORT AND PROCESSING OF...
 ESTIMATION OF AREAS FOR...
 CONCLUSIONS
 REFERENCES
 
MZmine supports import of the NetCDF as well as mzXML (Pedrioli et al., 2004) raw data formats. New tools for manipulating the raw data files are available, including methods for noise reduction by filtering in chromatographic direction, cropping raw data range and removing scans by their width.

Stages of spectral data processing are sequential, and once parameter values for a specific type of platform are known, the process can be automated. MZmine enables the set up of data processing as a batch process, as well as an option to store the data processing parameters into the template files that can be loaded for future runs using the data from the same platform. In addition, the data processing can be set up to run on multiple processors, which is particularly useful for stages that are trivially parallelizable such as peak picking.


    ESTIMATION OF AREAS FOR MISSED PEAKS
 TOP
 ABSTRACT
 INTRODUCTION
 IMPORT AND PROCESSING OF...
 ESTIMATION OF AREAS FOR...
 CONCLUSIONS
 REFERENCES
 
Following peak detection and subsequent alignment, many of the peaks have none or only few matches in other samples. There are various possible reasons for the misses: peak may not be present in the sample; peak detection may have failed because of noisy raw data or inaccurate parameter settings may have been used for peak detection and chromatographic alignment methods. The empty gaps caused by missing peaks are often troublesome to handle during subsequent steps in the data analysis and it is therefore worthwhile to return to raw data and check again for the presence of corresponding peaks based on detected peaks in select samples.

We implemented a gap-filler method which estimates heights and areas for missed peaks. This method first searches for a local intensity maximum within a selected chromatographic region corresponding to expected location of a missed peak, which is used as an estimate for peak height. The peak area estimate is then calculated by moving from the maximum to both directions along the extracted ion chromatogram as long as the peak curve is monotonously decreasing within the pre-specified tolerance limits.

The gap-filler method increases the number of low-intensity peaks included in data analysis (Fig. 1), and advances our ability to utilize the differential profiling for quantitative measurements of metabolites. As a limitation, the current alignment and gap-filler methods cannot distinguish different molecular species if present at the same retention time and m/z value.


Figure 1
View larger version (6K):
[in this window]
[in a new window]
 
Fig. 1 Comparison of peak heights and areas for two different aligned samples from the analysis on UPLC-MS (QTof Premier from Waters, Inc.). Each dot is a peak with a specific m/z value and retention time. Peaks found in primary peak picking are shown as black dots and those found after gap filling are white circles.

 
Data visualization
While a variety of excellent data analysis tools exist for software packages such as R (http://cran.r-project.org/) or Matlab (MathWorks, Inc), visalization capabilities enabling exploration of high-dimensional profile data embedded into MZmine facilitate quality control and first-pass data exploration.

We incorporated two methods, curvilinear distance analysis (CDA) (Lee et al., 2000) and Sammon's non-linear mapping (NLM) (Sammon Jr., 1969). They both try to preserve distances between points in original N-dimensional space and in lower dimensional projection space P (P being 2 in our case). Both use iterative process to find minimum of their respective error function. In the brief summary of the two methods, Formula will denote distance between points i and j in N-dimensional original space and dij will denote distance between same points in P-dimensional projection space.

Sammon's non-linear mapping
Sammon's NLM tries to minimize its error function E

Formula 1(1)
by iterative steepest gradient descent. Its strengths include ease of implementation and use. On the other hand, generally it converges slowly and its error function is biased towards the small distances.

Curvilinear distance analysis
Unlike Sammon's NLM, CDA uses stochastic gradient descent to minimize its error function E

Formula 2(2)
where F(Formula 2, {lambda}(k)) denotes weight function and {lambda}(k) is the neighborhood radius. The initial parameters are the starting learning rate {alpha}0 and the starting neighborhood radius {lambda}0. CDA reduces its workload by quantizing points in N-space to centroids, followed by creating a graph in which every centroid connects to a select number of centroids. Distances from every centroid to every other centroid, called curvilinear distances and denoted with {delta}ij, are then calculated using Dijkstra's shortest path algorithm. The distances are therefore calculated along the structures in N-dimensional space, not through them, therefore CDA provides a powerful distance metric for dimensionality reduction approaches.

Screenshot of MZmine with application of CDA included is shown in Figure 2.


Figure 2
View larger version (28K):
[in this window]
[in a new window]
 
Fig. 2 Screenshot of MZmine, based on lipidomic profiling of two cell lines (five samples each). Chromatograms of two samples are shown, along withthe CDA plot of all 10 samples.

 

    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 IMPORT AND PROCESSING OF...
 ESTIMATION OF AREAS FOR...
 CONCLUSIONS
 REFERENCES
 
The development of MZmine has been motivated by the need to create a software platform that enables easy incorporation of new algorithms and applications for data processing of mass spectrometry based molecular profile data.

Our current development areas are implementation of new normalization algorithms, extending the software to handle multiple spectra from the same sample (e.g. MS or MSn), and enabling database connectivity.


    Acknowledgments
 
The authors thank Tuulikki Seppänen-Laakso and Tapani Suortti for performing most of the LC/MS analyses utilized during the MZmine development process. M.K. was funded by Academy of Finland SYSBIO Programme. M.O. was partially funded by EU Marie Curie International Reintegration Grant.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Jonathan Wren

Received on November 25, 2005; revised on December 21, 2005; accepted on January 3, 2006

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 IMPORT AND PROCESSING OF...
 ESTIMATION OF AREAS FOR...
 CONCLUSIONS
 REFERENCES
 

    Katajamaa, M. and Oresic, M. (2005) Processing methods for differential analysis of LC/MS profile data. BMC Bioinformatics, 6, 179[CrossRef][Medline].

    Lee, J.A., Lendasse, A., Donckers, N., Verleysen, M. (2000) A robust nonlinear projection method. European Symposium on Artificial Neural Networks ESANN'2000Bruges, Belgium , pp. 13–20.

    Pedrioli, P.G.A., et al. (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat. Biotech, . 22, 1459–1466[CrossRef][Web of Science][Medline].

    Sammon, J.W., Jr. (1969) A nonlinear mapping for data structure analysis. IEEE Trans. Comp, . C-18, 401–409.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
DMMHome page
G. Medina-Gomez, L. Yetukuri, V. Velagapudi, M. Campbell, M. Blount, M. Jimenez-Linan, M. Ros, M. Oresic, and A. Vidal-Puig
Adaptation and failure of pancreatic {beta} cells in murine models with different degrees of metabolic syndrome
Dis. Model. Mech., November 1, 2009; 2(11-12): 582 - 592.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Yu, Y. Park, J. M. Johnson, and D. P. Jones
apLCMS--adaptive processing of high-resolution LC/MS data
Bioinformatics, August 1, 2009; 25(15): 1930 - 1936.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Hussong, B. Gregorius, A. Tholey, and A. Hildebrandt
Highly accelerated feature detection in proteomics data sets using modern graphics processing units
Bioinformatics, August 1, 2009; 25(15): 1937 - 1943.
[Abstract] [Full Text] [PDF]


Home page
DiabetesHome page
A. Kotronen, T. Seppanen-Laakso, J. Westerbacka, T. Kiviluoto, J. Arola, A.-L. Ruskeepaa, M. Oresic, and H. Yki-Jarvinen
Hepatic Stearoyl-CoA Desaturase (SCD)-1 Activity and Diacylglycerol but Not Ceramide Concentrations Are Increased in the Nonalcoholic Human Fatty Liver
Diabetes, January 1, 2009; 58(1): 203 - 208.
[Abstract] [Full Text] [PDF]


Home page
JEMHome page
M. Oresic, S. Simell, M. Sysi-Aho, K. Nanto-Salonen, T. Seppanen-Laakso, V. Parikka, M. Katajamaa, A. Hekkala, I. Mattila, P. Keskinen, et al.
Dysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetes
J. Exp. Med., December 22, 2008; 205(13): 2975 - 2984.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
H. Neuweger, S. P. Albaum, M. Dondrup, M. Persicke, T. Watt, K. Niehaus, J. Stoye, and A. Goesmann
MeltDB: a software platform for the analysis and integration of metabolomics experiment data
Bioinformatics, December 1, 2008; 24(23): 2726 - 2732.
[Abstract] [Full Text] [PDF]


Home page
Brief Funct Genomic ProteomicHome page
W. Hou, H. Zhou, F. Elisma, S. A. L. Bennett, and D. Figeys
Technological developments in lipidomics
Brief Funct Genomic Proteomic, September 19, 2008; (2008) eln042v1.
[Abstract] [Full Text] [PDF]


Home page
Brief Funct Genomic ProteomicHome page
M. Wang, J. You, K. G. Bemis, T. J. Tegeler, and D. P. G. Brown
Label-free mass spectrometry-based protein quantification technologies in proteomic analysis
Brief Funct Genomic Proteomic, September 1, 2008; 7(5): 329 - 339.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
M. Griesser, T. Hoffmann, M. L. Bellido, C. Rosati, B. Fink, R. Kurtzer, A. Aharoni, J. Munoz-Blanco, and W. Schwab
Redirection of Flavonoid Biosynthesis through the Down-Regulation of an Anthocyanidin Glucosyltransferase in Ripening Strawberry Fruit
Plant Physiology, April 1, 2008; 146(4): 1528 - 1539.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
P. Veltri
Algorithms and tools for analysis and management of mass spectrometry data
Brief Bioinform, March 20, 2008; (2008) bbn007v1.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
J. W. H. Wong, M. J. Sullivan, and G. Cagney
Computational methods for the comparative quantification of proteins in label-free LCn-MS experiments
Brief Bioinform, March 1, 2008; 9(2): 156 - 165.
[Abstract] [Full Text] [PDF]


Home page
DiabetesHome page
M. Kolak, J. Westerbacka, V. R. Velagapudi, D. Wagsater, L. Yetukuri, J. Makkonen, A. Rissanen, A.-M. Hakkinen, M. Lindell, R. Bergholm, et al.
Adipose Tissue Inflammation and Increased Ceramide Content Characterize Subjects With High Liver Fat Content Independent of Obesity
Diabetes, August 1, 2007; 56(8): 1960 - 1968.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Sysi-Aho, A. Vehtari, V. R. Velagapudi, J. Westerbacka, L. Yetukuri, R. Bergholm, M.-R. Taskinen, H. Yki-Jarvinen, and M. Oresic
Exploring the lipoprotein composition using Bayesian regression on serum lipidomic profiles
Bioinformatics, July 1, 2007; 23(13): i519 - i528.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. Du, R. Sudha, M. B. Prystowsky, and R. H. Angeletti
Data reduction of isotope-resolved LC-MS spectra
Bioinformatics, June 1, 2007; 23(11): 1394 - 1400.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Bellew, M. Coram, M. Fitzgibbon, M. Igra, T. Randolph, P. Wang, D. May, J. Eng, R. Fang, C. Lin, et al.
A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS
Bioinformatics, August 1, 2006; 22(15): 1902 - 1909.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
V. Shulaev
Metabolomics technology and bioinformatics
Brief Bioinform, June 1, 2006; 7(2): 128 - 139.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/5/634    most recent
btk039v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (68)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Katajamaa, M.
Right arrow Articles by Oresic, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Katajamaa, M.
Right arrow Articles by Oresic, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?