Skip Navigation


Bioinformatics Advance Access originally published online on April 20, 2008
Bioinformatics 2008 24(12):1426-1432; doi:10.1093/bioinformatics/btn197
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/12/1426    most recent
btn197v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Malik, R.
Right arrow Articles by Körner, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Malik, R.
Right arrow Articles by Körner, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Comparative conservation analysis of the human mitotic phosphoproteome

Rainer Malik *, Erich A. Nigg and Roman Körner

Department of Cell Biology, Max Planck Institute of Biochemistry, Martinsried D-82152, Germany

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: A key challenge in phosphoproteomic studies is to distinguish functionally relevant phosphorylation sites from potentially ‘silent’ phosphorylation. Considering that relevant phosphorylation sites are expected to be better conserved during evolution than overall Serine, Threonine and Tyrosine (S/ T/ Y) residues, we asked whether this can be directly demonstrated through statistic analysis, using a large experimental dataset.

Results: Analyzing phosphoproteomic data derived from the human mitotic spindle apparatus, we found that 95.2% of 1744 phosphorylation sites are conserved in at least one of six other vertebrate species. Using a new score, termed conservation Z-score (CZ-score), we demonstrate that phosphorylation sites are significantly better conserved than other S/T/Y sites, a conclusion validated from several kinase consensus motifs. Most importantly, phosphorylation sites with experimentally verified biological functions were significantly better conserved than other phosphorylation sites, indicating that analysis utilizing evolutionary conservation may constitute a powerful basis for the development of improved phosphorylation site predictors.

Contact: malik{at}biochem.mpg.de

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Phosphorylation plays essential roles in nearly every cellular process (Pawson and Scott, 2005) and this is reflected by the presence of some 510 protein kinases and 150 protein phosphatases encoded in the human genome (Manning et al., 2002; Moorhead et al., 2007; Park et al., 2005). Up to 40% of all proteins are thought to be regulated by phosphorylation at Serine, Threonine or Tyrosine (S/T/Y) residues (Kreegipuu et al., 1999). Phosphorylation events are frequently dependent on each other; for example, priming phosphorylation events may be necessary to target a certain protein for phosphorylation by another kinase (Elia et al., 2003) and the activities of kinases are frequently regulated by phosphorylation from other kinases (Nolen et al., 2004). Furthermore, a single phosphorylation site may be targeted by several kinases and phosphatases. Thus, phosphorylation networks are highly complex and can not be treated as simple three-component systems of kinases, substrates, and phosphatases. Our knowledge about the function of protein regulation through phosphorylation has long been limited by experimental difficulties regarding detection. However, recently this has changed dramatically by the adaptation of mass spectrometry (MS)-based high-throughput techniques for analysis of phosphorylation sites (Aebersold and Mann, 2003).

Generally, techniques for the mapping of post-translational modifications are based on tandem mass spectrometry (MS/MS). This method, first introduced in the 1980s (Biemann, 1988), has been further developed in proteomics for the analysis of complex mixtures, thus allowing the identification of proteins in complex samples (Aebersold and Goodlett, 2001). For the large-scale analysis of phosphorylation sites, phosphopeptides are usually enriched by specific binding of the negatively charged phosphate group to positively charged immobilized metal (iron or gallium) affinity chromatography (IMAC) (Andersson and Porath, 1986) or titanium dioxide (TiO2) (Larsen et al., 2005; Pinkse et al., 2004) resins. Alternatively, strong cation chromatography (SCX) may be used to enrich for phosphopeptides (Beausoleil et al., 2004).

As a consequence of the high interest in the elucidation of phosphorylation-based protein regulation and the improved analytical techniques, the number of identified phosphorylation sites has dramatically increased during the past few years and presently exceeds 10 000 sites (Beausoleil et al., 2004; Nousiainen et al., 2006; Olsen et al., 2006). However, the functional relevance of most of these modification sites remains elusive. For some proteins, dozens of phosphorylation sites have been detected (Molina et al., 2007; Nousiainen et al., 2006; Olsen et al., 2006), raising the question of whether all these phosphorylation sites relate to regulatory events. In fact, the possibility has long been considered that some phosphorylation sites may be ‘non-functional’ (or ‘silent’), essentially representing ‘noise in the system’ (Cohen, 1982). Even though this question is difficult to answer definitively, because many phosphorylation sites may be functional only under specific, not readily analyzable conditions, the existence of ‘silent’ phosphorylation events represents a serious possibility. Some support for this view comes from the observation that many experimentally determined phosphorylation sites are not well conserved across species. Functionally relevant sites would be expected to be highly conserved since their mutation (to non-phosphorylable amino acids) would constitute a handicap in evolutionary selection (Budovskaya et al., 2005). Therefore, the degree of conservation is a priori expected to reflect the functional relevance of any given phosphorylation site. This in turn raises the question of whether, and to what degree, experimentally identified phosphorylated sites can be demonstrated by statistical analysis to be better conserved than non-phosphorylated Serine, Threonine and Tyrosine residues. If so, the degree of conservation might be then used as a predictive tool to discriminate between sites with functional relevance and ‘silent’ sites. Large-scale investigations on the conservation of experimentally detected phosphorylation sites, however, are scarce. Recently, a study focused mainly on structural aspects of phosphorylation sites came to the unexpected conclusion that these are not significantly better conserved than other Serine, Threonine and Tyrosine residues (Jimenez, et al., 2007). In contrast, two other studies pointed to a higher conservation of phosphorylated residues (Gnad et al., 2007; Macek et al., 2007).

Here we present an in-depth conservation analysis performed on an extensive dataset of phosphorylation sites identified in mitotic spindles isolated from cultured human cells (X. Li, manuscript in preparation; Nousiainen et al., 2006). Importantly, this analysis includes a novel statistical approach. In particular, since sequences flanking the phosphorylated amino acid are known to play a key role in substrate recognition by kinases (Pinna and Ruzzene, 1996), we have included consensus sequences for the calculation of conservation scores. Furthermore, we use a two-step alignment procedure and introduce the CZ-score which reflects the relative conservation of phosphorylation sites in a statistically sound manner. Thus, we are able to perform comparative conservation analyses in a quantitative manner and thereby prove a higher conservation of phosphorylation sites.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
As stated above, the phosphorylation sites found on proteins of the human mitotic spindle (X. Li, manuscript in preparation; Nousiainen et al., 2006) formed the basis of our analysis. Before MS/MS, proteins were digested using proteases (e.g. trypsin), resulting in peptides, normally between 8 and 20 residues long. The full-length protein sequence and the phosphorylated residue were then automatically assigned to the identified peptide by the MS/MS software Mascot (Perkins et al., 1999). As the automatic assignment of phosphorylated residues is not always perfect, phosphorylation sites were checked manually to ensure the correct position of the phosphorylated residue. Detailed methods regarding spindle isolation and MS were published previously (Nousiainen et al., 2006).

Conservation of the human phosphorylation sites identified was analyzed in six other species, namely rat, mouse, cow, chicken, zebrafish and Xenopus (only vertebrate species were chosen to minimize ambiguity in the protein alignments). The sequence databases used were obtained freely, either via the International Protein Index (IPI) site (Kersey et al., 2004) or directly via NCBI. The following IPI versions were used: Human v3.31, Rat v3.31, Mouse v3.31, Cow v3.17, Chicken v3.25 and Zebrafish v3.30. For the Xenopus proteome, the keywords ‘Xenopus laevis [organism]’ were queried in the NCBI databases and all protein entries were retrieved. The following analysis of the data is heavily based on sequence comparison methods and statistical analysis tools.

2.1 BLAST
The Basic Local Analysis Search Tool (BLAST) (Altschul et al., 1990) is probably the most widely used program in bioinformatics. In our analysis, we use the program for two distinct tasks.

First, a BLAST search was performed with the full-length human protein [as assigned by Mascot (Perkins et al., 1999), also see Section 2] against the proteome sequence database of the other organisms. In order to find conservation only in closely related proteins, we used standard BLAST parameters (e-value cutoff: 1e–5). The resulting proteins from the other organisms were used to build a novel database to produce the second, ‘local’ alignment. In this local alignment, the peptide which was originally found by MS/MS was used to query against the database produced from the global alignment. BLAST parameters were changed accordingly to make an alignment for a ‘short’ input sequence (e-value cutoff 100, wordsize 2, gap open –9, gap extend –1). A hit was deemed positive if the central phosphorylated residue from the original peptide was conserved. The alignment procedure is illustrated in Supplementary Figure S1.

2.2 Alignment Z-score
Z-scores (also called standard scores) are well-established statistical tools. They are calculated by subtracting the mean of a population from a score and then dividing the difference by the SD of the population. Z represents the distance between the raw score and the population mean in units of the SD. Z is negative when the raw score is below the mean, positive when above.


Formula 1

(1)
where x is the score to be standardized, µ is the population mean and {sigma} is the SD of the population.

Booth et al. (2004) have published a program that automatically calculates the Z-score for any given alignment. It measures the distance (in SDs) between the given alignment and the mean value of all other alignments that can be obtained by permutation of either sequence. The Z-score obtained can then be used as a false-positive filter to remove alignments which could have occurred by chance, and was thus used in our ‘local’ alignment search, as described in the previous section. The local alignment was, therefore, significant and not calculated by chance. In other publications (Bacro and Comet, 2001), a Z-score cutoff of 7 has been described as being very significant; and hence, this threshold was used for very significant alignments. Finally, another threshold of four was used to describe fairly significant alignments.

2.3 Conservation score
To compare the conservation of different phosphorylation sites, we developed a new score that reflects biologically meaningful way. Kinases interact with a recognition motif on the substrates that normally consists of 3–8 amino acids (Kobe et al., 2005), this motif should therefore also be reflected in the conservation score (C-score). To reflect the substrate binding properties of many kinases, we developed a position-specific scoring matrix (PSSM) that assigns scores to each position of the conserved peptide in the other species. Only peptides which were found below the e-value threshold of the local alignment and for which the central phosphorylated residue was found to be conserved in the local alignment procedure were taken into account.

Conserved residues at the +1, +2, +3, +4 and the –1, –2, –3 and –4 positions were assigned bonuses (Fig. 1) when the amino acids were either identical or similar to the ones in the query. Similar residues were grouped as follows: R-K, D-E, V-A-L-I and F-Y-W. This approach is similar to the substitution matrix BLOSUM62, except amino acids were only deemed similar if they also had similar charge states. A penalty (score of –3) was given if a gap was opened in the alignment [Equation (2)].


Formula 2

(2)
where C represents the resulting C-score, and s represents the score at position i according to the PSSM described above. Penalties were not assigned to Serine–Threonine exchanges, as functional in vivo differences between Serine and Threonine phosphorylation have not been demonstrated so far. By virtue of comprising the PSSM, the C-score reflects the overall conservation of the phosphorylation site and its surrounding amino acid stretch. Importantly, it can be compared for each phosphorylated or non-phosphorylated S/T/Y residue.


Figure 1
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Position-specific scoring matrix. Positive scores are given to the +1 to +4 and –1 to –4 positions, respectively. Gaps are reflected by a negative score (–3).

 
2.4 Conservation Z-score
Next, we aimed at assessing the relative conservation level of phosphorylation sites compared to non-phosphorylated S/T/Y residues of the same protein. To this end, we defined the conservation Z-score (CZ-score) which is based on the above described C- and Z-scores.


Formula 3

(3)
where CZ represents the resulting CZ-score for a specific S, T or Y residue, C represents the C-score to be normalized, Ci represents the C-score of all of the respective amino acids (S, T, or Y) of the same protein, µ is the population mean, and {sigma} is the SD of the population. CZ scores were calculated for all phosphorylated and non-phosphorylated S/T/Y residues in our proteomics dataset based on the local alignments. The mean and SDs were calculated from the population of all C-scores corresponding to the selected residue, for example, if the residue was a Serine, the C-scores from all conserved Serines were taken into account. CZ-scores were then calculated separately for all conserved phosphorylated and non-phosphorylated sites and plotted as CZ-score distributions (see Section 3).

By definition, the CZ-score distribution of all S/T/Y residues of each individual protein is centered on a CZ-score of 0, regardless of the level of conservation of this specific protein and the analyzed species. Hence, differences in the level of conservation of phosphorylated compared to non-phosphorylated S/T/Y residues can be detected at the level of individual proteins. Therefore, the results take into consideration the phosphorylation at a site that is well conserved in a protein that is conserved almost perfectly is less significant than a phosphorylation site which itself is much better conserved than the overall protein conservation. Furthermore, by definition, CZ-scores of the closest homolog from different species are all centered on a CZ-score of 0, thus allowing for a direct comparison of these conservation scores from different species.

Regarding the set of non-phosphorylated S/T/Y residues used for comparison, one might be concerned that some of these residues constitute phosphorylation sites that escaped detection by MS and potentially falsify the results. Therefore, the maximal error introduced by such missed phosphorylation sites was estimated. There are ~100 000 phosphorylation sites in the human proteome (Zhang et al., 2002), compared to 2 300 000 S/T/Y residues (IPI human database), therefore the error introduced by undetected phosphorylation sites cannot be significant (< 2.3%), demonstrating the validity of our approach.

In summary, we conclude that using the Z-, C- and CZ-scores, we were able to assess the conservation of phosphorylated sites in a precise statistical way that up to now has not been presented.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
3.1 Alignment statistics
Out of the 1744 phosphorylation sites within the human mitotic phosphoproteome analyzed, 1660 have been found to be conserved in at least one other species (rat, mouse, cow, chicken, zebrafish or Xenopus). In the remainder of the article, a site is called ‘conserved’, if it is conserved in at least one of the selected species analyzed. The conservation statistics of phosphorylation sites for each organism are shown in Table 1, together with the conservation of the respective non-phosphorylated S/T/Y residues. As expected, the highest degree of conservation to the human phosphorylation sites was found in mouse and rat, with a very high percentage of conserved human phosphorylation sites. We first asked if the alignment approach was legitimate. The alignment Z-score shows whether the ‘local’ alignment may be produced by chance or not, by applying a strict Z-score filtering, potential false-positive matches in the local alignment can be filtered out, thus eliminating matches which are not potentially related by evolution. Local alignments of about 13% had a Z-score of ≥7 and thus can be considered as very significant, while 56% of local alignments fell into the category of fairly significant alignments. Therefore we conclude that our alignment approach is legitimate. For the CZ-score distributions discussed below, only the alignments with a Z-score of 4 or above were considered, resulting in a number of 1203 phosphorylated sites.


View this table:
[in this window]
[in a new window]

 
Table 1. Percentage of conserved phosphorylated sites between human and the indicated organism

 
3.2 CZ-score distributions
As the CZ-Score is the most crucial parameter in our analysis, our highest priority was to investigate the relationship between phosphorylated residues and conserved non-phosphorylated S/T/Y residues in an alignment. First, a direct comparison between phosphorylated and non-phosphorylated residues was performed by binning the calculated CZ-scores (X-axis) and plotting them against the number of observations (Y-axis). This resulted in a distribution of all scores which is depicted in Figure 2. The CZ-scores from non-phosphorylated S/T/Y residues form a normal-like distribution with a maximum around the zero-point of the plot, as expected. However, the plotting of CZ-scores of phosphorylated residues results in an almost normal distribution with the maximum shifted to the right into the positive area of the plot. From these data we conclude that phosphorylated residues found by MS analysis are significantly better conserved than non-phosphorylated S/T/Y amino acids. To test for statistical significance, a student's t-test (unpaired, two-tailed), as well as a Mann–Whitney U-test were performed. Both tests showed a significant difference in the two distributions (t-test: P = 2.0*e–176, U-test: P = 0.000158). Additionally, the median CZ-scores are –0.0893 for the non-phosphorylated and +0.1144 for the phosphorylated residues. Throughout this article, a P-value <0.001 will be considered significant.


Figure 2
View larger version (31K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. CZ-score distributions of phosphorylated versus non-phosphorylated residues in all organisms. (A) shows the distribution of CZ-scores of all non-phosphorylated residues. A normal distribution can be seen. (B) shows the distribution of CZ-scores of phosphorylated residues. A shift of the maximum into the positive region can be observed. Note, that only the best hit for every phosphorylated residue was considered in both statistics.

 
In order to evaluate whether the observed higher conservation of phosphorylation sites is specific to our dataset or constitutes a general feature of phosphorylation sites, we calculated the CZ-score distributions for three other combined phosphoproteome datasets (Beausoleil et al., 2004; Rush et al., 2005; Zheng et al., 2005). Again, the median CZ-score is clearly shifted towards higher conservation (median CZ-score +0.1436), suggesting that the better conservation of S/T/Y sites in the human spindle phosphoproteome is no exception, but represents a common trend. The difference between the mitotic dataset (Fig. 2B) and the above mentioned datasets is not significant in terms of P-value (two-tailed t-test, P = 0.0027, U-test: P = 0.00561), although a slightly higher median of +0.1436 can be observed, possibly due to the composition of the other datasets.

We also asked whether there were differences between Serine, Threonine and Tyrosine (S/T/Y) residues. The distributions for S/T/Y sites can be seen in Supplementary Figure S2. A significant difference in the distribution cannot be observed, thus the conservation of S and T residues can be regarded as the same. The sample size for Tyrosine (Y) residues, however, was too small (15 Tyrosine phosphorylation sites) in this study to draw sound conclusions.

Next, we elucidated whether the CZ-score distributions are significantly influenced by the high frequency of Proline at the +1 position of phosphorylation sites. In fact, pSP and pTP sites are the most common phosphorylation motifs in mitotic samples, often phosphorylated by cyclin dependent kinase 1 (Cdk1), one of the key regulators of mitosis. A Proline residue at the +1 position has the potential to initiate a structural change in the protein, often exerting a disruptive influence on secondary structure elements. For this reason, Prolines might be better conserved than other amino acids in the course of evolution and hence we considered the possibility that the C-score and CZ-score calculations might be skewed by this sequence feature. Thus, we calculated the CZ-score distributions for pSP and pTP. An increased degree of conservation of those sites compared to other motifs would result in a shift in the CZ-score to a different value. The PSSM was altered so that only a score of +5 was given for the +1 position if the residue was identical. Analyzing these results (data not shown), we did not notice a significant change in conservation, so we conclude that the high frequency of Prolines at the +1 position does not bias our approach.

3.3 Kinase consensus patterns
For most kinases, a kinase consensus motif can be defined. Adequate motifs are available for the most important mitotic kinases, namely Cdk1 (cyclin-dependent kinase 1), Plk1 (polo-like kinase 1), Aurora A and Aurora B. In addition to the analysis of the overall conservation, we analyzed the conservation of the subsets of phosphorylation sites carrying the corresponding consensus motifs. The motifs used in our search are summarized in Supplementary Table 1. As kinase consensus motifs can usually not be determined unambiguously, we chose to search with motifs using both ‘loose’ and ‘strict’ definitions. A ‘strict’ motif covers a larger sequence area, thus being more restrictive in the search. Interestingly, there was no visible difference in the CZ-score distributions between the loose and the strict motifs (data not shown). For Plk1, Aurora Kinase and Cdk1, we found that the vast majority of kinase phosphorylation patterns are conserved as well. Furthermore, the Polobox binding motif (Lowery et al., 2004) exhibits a high degree of conservation. This polo-like kinase specific domain facilitates interactions with phospho-epitopes, many of which are created by Cdk1, and targets Plk1 to substrates. An overview of the results of the conserved kinase consensus patterns is shown in Table 2.


View this table:
[in this window]
[in a new window]

 
Table 2. Percentages of conserved phosphorylation sites for selected kinase consensus patterns, and given for loose and strict consensus motives

 
Additionally we plotted the CZ-Score distributions calculated for conserved phosphorylation sites for the abovementioned kinase consensus patterns. These can be seen in Supplementary Figure S3. Since there are no obvious differences in the distributions, we thus conclude that all kinase consensus patterns behave the same. Note that in this analysis only conserved kinase consensus patterns have been taken into account.

3.4 CZ-score distributions in different organisms
To extend our analysis of phosphorylation sites, we calculated the CZ-scores separately for the selected organisms in comparison to human. We would expect that closely related organisms show a more positive CZ-score distribution than more distantly related species, due to a better conservation of phosphorylation-mediated regulation. Indeed, the maxima of the CZ-values for zebrafish and also (to a lesser extent) for Xenopus and chicken were found to be clearly shifted towards the zero value in comparison to the maxima of the mammalian CZ-values. Figure 3 shows a comparison between zebrafish and rat. The median CZ-scores for rat, chicken, zebrafish and Xenopus are +0.130 (similar values for cow and mouse), +0.097, +0.078 and +0.075, respectively. Therefore, it seems reasonable to conclude that the conservation of most mitotic regulation mechanisms in mammals results in a higher conservation of phosphorylation sites.


Figure 3
View larger version (44K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Distributions of CZ-scores for zebrafish and rat representing the highest and lowest CZ-score distribution-maxima. The CZ-score distribution for rat is shifted to positive values compared to zebrafish.

 
3.5 Validating the PSSM
Even though the PSSM adequately reflects the consensus patterns for many kinases, it cannot be equally well adapted to all different kinase specificities. Therefore, we asked to what extent the CZ-score distributions are qualitatively effected by alterations of the PSSM. To this end, we calculated CZ-scores for an inverse PSSM, defined by higher scores given to the ‘outer’ residues and lower scores to the ‘inner’ ones. As expected (Supplementary Fig. S4), the maximum of the CZ-score distribution is shifted slightly in comparison to the data obtained using the original PSSM (Fig. 2), but the overall picture remains the same. The non-phosphorylated S/T-residues show a normal distribution whereas the phosphorylated ones are still clearly shifted towards positive CZ-values (median CZ-score phosphorylated +0.110, median CZ-score non-phosphorylated: –0.048. t-test: P = 1.1*e–41, U-test: P = 4.6*e–6). Hence, we can conclude that the PSSM does not bias the CZ-score distribution and that the CZ-score distribution is quite robust, regardless of the specific PSSM used for analysis.

3.6 Biological significance
Using high-throughput techniques, a large number of phosphorylation sites can be identified in a relatively short time. However, in most instances the biological function of the majority of these sites remains unclear. To test whether our approach can detect statistical differences in conservation between functionally validated phosphorylation sites and other phosphorylation sites, we derived a dataset comprising 139 functionally relevant phosphorylation sites from our laboratory and the PhosphoELM database (Diella et al., 2004). We then compared the CZ-scores of these functionally verified sites to those from largely uncharacterized MS-derived phosphorylation sites. Remarkably, the sites with established functional significance are clearly shifted towards higher CZ-scores (Fig. 4), resulting in median CZ-scores for the biologically significant sites of +0.1803, as compared to +0.1143 for all phosphorylated residues. These differences were statistically significant, as demonstrated by both t-test and Mann–Whitney U-test analyses (t-test: P = 6.4*e–5, U-test: P = 1.5*e–6). We conclude from these data that biologically significant residues are indeed better conserved than other sites, and that this can be demonstrated in a statistically sound manner.


Figure 4
View larger version (40K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. CZ-score distributions of functionally validated versus all detected phosphorylation sites. Biologically significant phosphorylation sites are slightly shifted towards positive CZ-values.

 
The above conclusion implies that datasets of experimentally determined phosphorylation sites almost certainly contain a significant fraction of ‘silent’ sites. If one considers the dataset analyzed here as a mixture of biologically significant and non-significant sites, the determined CZ-score distributions enable a rough estimate of the ratio of significant sites. Specifically, a combination of ~63% of biologically significant sites with 37% of ‘silent’ sites would yield the calculated CZ-score median (see Supplementary Methods and Discussion).


    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We have presented an approach to efficiently assess and evaluate the conservation of phosphorylation sites from human mitotic spindle proteins and shown that these are considerably better conserved than non-phosphorylated S/T/Y residues. This is contradictory to the conclusion of Jimenez et al. (2007) concerning the conservation of phosphorylation sites from mouse and human. However, this former study was mainly focused on structural aspects and did not include an analysis of the conservation of sequences flanking the phosphorylated residue (reflecting kinase consensus patterns). Differences may therefore be explained by our improved sequence alignment methods and the use of the PSSM. Our findings are in agreement with recent publications on bacterial (Macek et al., 2007) and eukaryotic (Gnad et al., 2007) phosphoproteomes. Using a different dataset and statistical approach, the authors report that both phosphorylated proteins and phosphorylated residues of several species are better conserved than non-phosphorylated proteins/residues. The observation that phosphorylation sites are, on average, better conserved than non-phosphorylated Serine, Threonine and Tyrosine residues, has also important implications for future research. In particular, our study suggests that a conservation analysis may improve the quality of existing phosphorylation site prediction algorithms. In addition, assuming that one can define PSSMs for additional kinase consensus motives, it may become possible to investigate the behavior of many more kinases in comparison to the overall kinome.

Interestingly, our results also suggest that ‘biologically significant’ phosphorylation sites may be more conserved than uncharacterized ones. Taking advantage of the distinct CZ-score distributions of biologically significant and experimentally determined phosphorylation sites we propose a strategy to estimate the proportion of biologically functional phosphorylation sites in phosphoproteomes. As the non-functionality of specific phosphorylation sites cannot be proven experimentally (as it can never be excluded that sites may be functional under conditions other than those tested), indirect approaches, such as the one proposed here, may be the only means to address how many of the observed phosphorylation sites constitute non-functional ‘noise’ due to limited substrate specificities of kinases.

The precision of this strategy may currently still be limited by the relatively low number of high-quality functional in vivo studies of phosphorylation sites. Moreover, the PhosphoELM database used as a dataset for relevant phosphorylation sites in this study might still be ‘contaminated’ by non-significant sites. Finally, experimentalists may preferentially select well-conserved phosphorylation sites for functional biological analyses thus biasing the statistical dataset. However, due to the central biological role of phosphorylation-based signaling, the knowledge on biological functionalities of individual phosphorylation sites is increasing rapidly, and so we believe that the precision of our proposed strategy can be strongly improved in future studies.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We thank Thomas Gaitanos for proof-reading the manuscript.

Funding: This work was supported by ENFIN (contract number LSHG-CT-2005–518254), funded by the European Commission within its FP6 Program, under the thematic area ‘Life sciences, genomics and biotechnology for health’.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Burkhard Rost

Received on February 6, 2008; revised on April 17, 2008; accepted on April 17, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Aebersold R, Goodlett DR. Mass spectrometry in proteomics. Chem. Rev (2001) 101:269–295.[CrossRef][Web of Science][Medline]

    Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature (2003) 422:198–207.[CrossRef][Medline]

    Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol (1990) 215:403–410.[CrossRef][Web of Science][Medline]

    Andersson L, Porath J. Isolation of phosphoproteins by immobilized metal (Fe3+) affinity chromatography. Anal. Biochem (1986) 154:250–254.[CrossRef][Web of Science][Medline]

    Bacro JN, Comet JP. Sequence alignment: an approximation law for the Z-value with applications to databank scanning. Comput. Chem (2001) 25:401–410.[CrossRef][Web of Science][Medline]

    Beausoleil SA, et al. Large-scale characterization of HeLa cell nuclear phosphoproteins. Proc. Natl Acad. Sci. USA (2004) 101:12130–12135.[Abstract/Free Full Text]

    Biemann K. Contributions of mass spectrometry to peptide and protein structure. Biomed. Environ. Mass Spectrom (1988) 16:99–111.[CrossRef][Medline]

    Booth HS, et al. An efficient Z-score algorithm for assessing sequence alignments. J. Comput. Biol (2004) 11:616–625.[CrossRef][Web of Science][Medline]

    Budovskaya YV, et al. An evolutionary proteomics approach identifies substrates of the cAMP-dependent protein kinase. Proc. Natl Acad. Sci. USA (2005) 102:13933–13938.[Abstract/Free Full Text]

    Cheeseman IM, et al. Phospho-regulation of kinetochore-microtubule attachments by the Aurora kinase Ipl1p. Cell (2002) 111:163–172.[CrossRef][Web of Science][Medline]

    Cohen P. The role of protein phosphorylation in neural and hormonal control of cellular activity. Nature (1982) 296:613–620.[CrossRef][Medline]

    Diella F, et al. Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinformatics (2004) 5:79.[CrossRef][Medline]

    Elia AE, et al. The molecular basis for phosphodependent substrate targeting and regulation of Plks by the Polo-box domain. Cell (2003) 115:83–95.[CrossRef][Web of Science][Medline]

    Endicott JA, et al. Cyclin-dependent kinases: inhibition and substrate recognition. Curr. Opin. Struct. Biol (1999) 9:738–744.[CrossRef][Web of Science][Medline]

    Ferrari S, et al. Aurora-A site specificity: a study with synthetic peptide substrates. Biochem. J (2005) 390:293–302.[CrossRef][Web of Science][Medline]

    Gnad F, et al. PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol (2007) 8:R250.[CrossRef][Medline]

    Honda R, et al. Exploring the functional interactions between Aurora B, INCENP, and survivin in mitosis. Mol. Biol. Cell (2003) 14:3325–3341.[Abstract/Free Full Text]

    Jimenez JL, et al. A systematic comparative and structural analysis of protein phosphorylation sites based on the mtcPTM database. Genome Biol (2007) 8:R90.[CrossRef][Medline]

    Kersey PJ, et al. The International Protein Index: an integrated database for proteomics experiments. Proteomics (2004) 4:1985–1988.[CrossRef][Web of Science][Medline]

    Kobe B, et al. Substrate specificity of protein kinases and computational prediction of substrates. Biochim. Biophys. Acta (2005) 1754:200–209.[Medline]

    Kreegipuu A, et al. PhosphoBase, a database of phosphorylation sites: release 2.0. Nucleic Acids Res (1999) 27:237–239.[Abstract/Free Full Text]

    Larsen MR, et al. Highly selective enrichment of phosphorylated peptides from peptide mixtures using titanium dioxide microcolumns. Mol. Cell Proteomics (2005) 4:873–886.[Abstract/Free Full Text]

    Lowery DM, et al. The Polo-box domain: a molecular integrator of mitotic kinase cascades and Polo-like kinase function. Cell Cycle (2004) 3:128–131.[Web of Science][Medline]

    Macek B, et al. Phosphoproteome analysis of E. coli reveals evolutionary conservation of bacterial Ser/Thr/Tyr phosphorylation. Mol. Cell Proteomics (2008) 7:299–307.[Abstract/Free Full Text]

    Manning G, et al. The protein kinase complement of the human genome. Science (2002) 298:1912–1934.[Abstract/Free Full Text]

    Molina H, et al. Global proteomic profiling of phosphopeptides using electron transfer dissociation tandem mass spectrometry. Proc. Natl Acad. Sci. USA (2007) 104:2199–2204.[Abstract/Free Full Text]

    Moorhead GB, et al. Emerging roles of nuclear protein phosphatases. Nat. Rev. Mol. Cell Biol (2007) 8:234–244.[CrossRef][Web of Science][Medline]

    Nakajima H, et al. Identification of a consensus motif for Plk (Polo-like kinase) phosphorylation reveals Myt1 as a Plk1 substrate. J. Biol. Chem (2003) 278:25277–25280.[Abstract/Free Full Text]

    Nolen B, et al. Regulation of protein kinases; controlling activity through activation segment conformation. Mol. Cell (2004) 15:661–675.[CrossRef][Web of Science][Medline]

    Nousiainen M, et al. Phosphoproteome analysis of the human mitotic spindle. Proc. Natl Acad. Sci. USA (2006) 103:5391–5396.[Abstract/Free Full Text]

    Olsen JV, et al. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell (2006) 127:635–648.[CrossRef][Web of Science][Medline]

    Park J, et al. Building a human kinase gene repository: bioinformatics, molecular cloning, and functional validation. Proc. Natl Acad. Sci. USA (2005) 102:8114–8119.[Abstract/Free Full Text]

    Pawson T, Scott JD. Protein phosphorylation in signaling—50 years and counting. Trends Biochem. Sci (2005) 30:286–290.[CrossRef][Web of Science][Medline]

    Perkins DN, et al. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis (1999) 20:3551–3567.[CrossRef][Web of Science][Medline]

    Pinkse MW, et al. Selective isolation at the femtomole level of phosphopeptides from proteolytic digests using 2D-NanoLC-ESI-MS/MS and titanium oxide precolumns. Anal. Chem (2004) 76:3935–3943.[Medline]

    Pinna LA, Ruzzene M. How do protein kinases recognize their substrates? Biochim. Biophys. Acta (1996) 1314:191–225.[Medline]

    Rush J, et al. Immunoaffinity profiling of tyrosine phosphorylation in cancer cells. Nat. Biotechnol (2005) 523:94–101.

    Songyang Z, et al. Use of an oriented peptide library to determine the optimal substrates of protein kinases. Curr. Biol (1994) 4:973–982.[CrossRef][Web of Science][Medline]

    Zhang H, et al. Phosphoprotein analysis using antibodies broadly reactive against phosphorylated motifs. J. Biol. Chem (2002) 277:39379–39387.[Abstract/Free Full Text]

    Zheng H. Phosphotyrosine proteomic study of interferon alpha signaling pathway using a combination of immunoprecipitation and immobilized metal affinity chromatography. Mol. Cell Proteomics (2005) 4:721–730.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
ScienceHome page
C. Choudhary, C. Kumar, F. Gnad, M. L. Nielsen, M. Rehman, T. C. Walther, J. V. Olsen, and M. Mann
Lysine Acetylation Targets Protein Complexes and Co-Regulates Major Cellular Functions
Science, August 14, 2009; 325(5942): 834 - 840.
[Abstract] [Full Text] [PDF]


Home page
Sci SignalHome page
C. S. H. Tan, B. Bodenmiller, A. Pasculescu, M. Jovanovic, M. O. Hengartner, C. Jorgensen, G. D. Bader, R. Aebersold, T. Pawson, and R. Linding
Comparative Analysis Reveals Conserved Protein Phosphorylation Networks Implicated in Multiple Diseases
Sci. Signal., July 28, 2009; 2(81): ra39 - ra39.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/12/1426    most recent
btn197v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Malik, R.
Right arrow Articles by Körner, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Malik, R.
Right arrow Articles by Körner, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?