Bioinformatics Advance Access originally published online on May 5, 2007
Bioinformatics 2007 23(13):1705-1707; doi:10.1093/bioinformatics/btm132
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PQuad—a visual analysis platform for proteomic data exploration of microbial organisms
1Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, WA and 2Biological Sciences, Pacific Northwest National Laboratory, Richland, WA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: The visual Platform for Proteomics Peptide and Protein data exploration (PQuad) is a multi-resolution environment that visually integrates genomic and proteomic data for prokaryotic systems, overlays categorical annotation and compares differential expression experiments. PQuad requires Java 1.5 and has been tested to run across different operating systems.
Availability: http://ncrr.pnl.gov/software
Contact: bobbie-jo.webb-robertson{at}pnl.gov
| 1 INTRODUCTION |
|---|
|
|
|---|
Technological advances have been fueling a revolution in biology, enabling analyses of entire systems at a global scale (e.g. whole cells, tumors, or environmental communities). The application of high-throughput (HTP) experimental methodologies to global profiling of proteins is providing an essential component to the challenge of understanding biology at a systems level. Given that approaches, such as mass spectrometry (MS), can generate over 400 000 spectra per day, the size and inherent noise in the resulting data sets makes data mining challenging, especially in the traditional spreadsheet type of view.
PQuad is a software platform that enables visual exploration of large and complex proteomic datasets of microbial organisms (Havre et al., 2005) in a genomic context. Linked multi-resolution visualizations offer views of the data from the entire chromosome or plasmid down to the individual nucleotide and amino acid sequences.
| 2 VISUAL CAPABILITIES |
|---|
|
|
|---|
PQuad offers three key levels of resolution, (1) Genome View, (2) ORF View, and (3) Sequence View. Figure 1 displays these views on a data set for Salmonella typhimurium distributed with the software and other resources, available at http://www.proteomicsresource.org. The Genome View on the far left displays the complete DNA sequence of the chromosome. The DNA sequence is depicted as a single continuous gray line that wraps to fill the display area with the defined ORFs highlighted in yellow with peptide identifications mapped onto the ORFs in blue. The Sequence View on the far right gives residue specific information for a selected ORF and the possible six-frame translation. The center figure is the ORF View. The ORF View depicts the double-stranded DNA as two black lines and proteins are represented as bars in respect to the six-frame translation. This view is typically the most interesting to biological users as observing expression in relation to neighboring proteins in microbial organisms provides biologically relevant information, such as operon position. All three views are linked so that what is selected in one view is automatically propagated to the others.
|
Many sources of supplemental information that facilitate biological interpretation come in categorical form, such as sample condition, protein function or microarray expression (up or down regulated). PQuad offers the capability to map colors based on category definitions into the Genome and ORF Views. In the ORF View, for example, this would allow users to quickly identify if genes that are up-regulated in a microarray experiment are also expressed in the corresponding proteomic experiment. One of the most useful applications for categorical data integration is comparative proteomics to evaluate peptide/protein identification information from two or more different experimental conditions. Color is used to differentiate both peptide and protein expression across two experimental conditions.
| 3 DESIGN AND IMPLEMENTATION |
|---|
|
|
|---|
PQuad is built in JavaTM and has been tested to run on several operating systems, including Windows, Mac OS X, Linux and Unix. Included is a data set creation module and standard warning and error messages associated with loading data. For example, when a data set is loaded into PQuad a message immediately relays how many peptides matched defined ORFs. Additionally, to enable follow-up analysis peptides can be selected for exporting to a file.
3.1 Data requirements
The software requires three very simple input files: (1) the chromosome or DNA sequence, (2) the ORF location information, and (3) the peptide identification information (sequence and ORF membership). The categorical data is user-defined by appending this information to the relevant file. Peptide conditions are limited to two conditions added to the peptide file. The ORF categories are a column in the ORF location file and are bounded to 12, a typical number of colors that can be differentiated by a human.
On a standard desktop PC (2 GB, 3.2 GZ) PQuad can easily load a large microbial genome, such as E.coli, at
5 Mb,
4500 genes and 100 000 peptides in
30 s. Although PQuad has the computational capability to load larger chromosomes or multiple chromosomes concatenated, splice joints would not be apparent and thus the visualization would be less straight forward to interpret.
3.2 Legends and menus
Each view offers pull-down menus that allow the user to tailor the view to their needs. The pull-down menus offer two key capabilities, change in resolution, contrast and coloring schemes. The user can modify the Genome and ORF views to show different resolutions, such as one base-pair per pixel or 10 base-pairs per pixel. Menus also offer the capability to change the color scheme for the comparative studies by simply selecting show peptides in default coloring or by condition. The contrast between colors is easily modifiable by a slider. Tabs give pedigree information associated with the loaded genome, such as size and resolution. Additionally, a legend is provided for the Genome and ORF Views that define the color schemes, bottom right of Figure 1.
| 4 USE CASE SCENARIO |
|---|
|
|
|---|
We obtained experimental observations for S.typhimurium growing under standard laboratory (rich media) and virulence inducing (acidic, magnesium-depleted minimal media) (Adkins et al., 2006). Figure 1 illustrates a view of this data set allowing the user to quickly detect proteins expressed in virulent, non-virulent, or both conditions. The peptides associated with the standard condition are colored in light blue and the peptides associated with virulence in white. Peptides that are expressed in both conditions are colored red. The underlying proteins are then colored for easy identification of proteins expressed in only one condition or another, blue for standard and green for virulence conditions, circled in white.
Using the documented S.typhimurium genome we identified a set of proteins present only in the virulent condition, which mapped back to pathogenicity island 2 (PI2) which has been previously liked to virulence in S.typhimurium (Unsworth and Holden, 2000). Increased presence of peptides from the virulence-mimicking preparation is readily evident from the PQuad visualization, especially three proteins linked to type III secretion—a key process in S.typhimurium ability to survive in hostile environments.
| 5 CONCLUSIONS |
|---|
|
|
|---|
PQuad is a new visual analysis tool for proteomics that facilitates analysis of complex mixtures of proteins in multiple conditions for prokaryotic systems. Additionally, PQuad offers basic data integration capabilities by mapping categorical information onto the peptide and protein expression data. Development as an object-oriented application allows new visualizations to be added relatively easily.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We would like to thank the laboratory of Richard Smith at the Pacific Northwest National Laboratory (PNNL) who provided the dataset herein generated through interagency agreement Y1-AI-4894-01 from the National Institute of Allergy and Infectious Diseases (NIH/DHHS). This work was supported through Laboratory Directed Research and Development at PNNL. PNNL is a multi-program national laboratory operated by Battelle for the U.S. Department of Energy under contract DE-AC05-76L01830.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Thomas Lengauer
Received on November 16, 2006; revised on March 29, 2007; accepted on March 30, 2007
| REFERENCES |
|---|
|
|
|---|
Adkins JN, et al. Analysis of the Salmonella typhimurium proteome through environmental response toward infectious conditions. Mol. Cell Proteomics, ( (2006) ) 5, : 1450–1461.
Havre SL, et al. Enabling proteomics discovery through visual analysis. IEEE Eng. Med. Biol. Mag, ( (2005) ) 24, : 50–57.[CrossRef][ISI][Medline].
Unsworth K, Holden D. Identification and analysis of bacterial virulence genes in vivo. Philos. Trans. R. Soc. Lond. B. Biol. Sci, ( (2000) ) 355, : 613–622.[CrossRef][ISI][Medline].
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
