Skip Navigation


Bioinformatics Advance Access originally published online on January 23, 2008
Bioinformatics 2008 24(4):477-483; doi:10.1093/bioinformatics/btm616
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/4/477    most recent
btm616v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ginodi, I.
Right arrow Articles by Louzoun, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ginodi, I.
Right arrow Articles by Louzoun, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Precise score for the prediction of peptides cleaved by the proteasome

Ido Ginodi , Tal Vider-Shalit , Lea Tsaban and Yoram Louzoun *

Department of Mathematics and Statistics, Bar-Ilan University, Ramat-Gan, Israel, 52900

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: An 8–10mer can become a cytotoxic T lymphocyte epitope only if it is cleaved by the proteasome, transported by TAP and presented by MHC-I molecules. Thus most of the epitopes presented to cytotoxic T cells in the context of MHC-I molecules are products of intracellular proteasomal cleavage. These products are not random, as peptide production is a function of the precise sequence of the proteins processed by the proteasome.

Results: We have developed a score for the probability that a given peptide results from proteasomal cleavage. High scoring peptides are those that are cleaved in their extremities and not in their center, while low scoring peptides are either cleaved in their centers or not cleaved in their extremities. The current work differs from most previous works, in that it determines the production probability of an entire peptide, rather than trying to predict specific cleavage sites. We further present different score functions for the constitutive and the immunoproteasome. Our results were validated to have low error levels against multiple epitope databases. We provide here a novel computational tool and a website to use it—http://peptibase.cs.biu.ac.il/PepCleave_II/ to assess the probability that a given peptide indeed results from proteasomal cleavage.

Contact: louzouy{at}math.biu.ac.il

Supplementary information: Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Proteins are degraded into short peptides during cell activity. These peptides play numerous roles, including their presentation to cytotoxic T lymphocyte (CTL) (Rock and Goldberg, 1999). Two major paths of intracellular protein degradation exist within eukaryotic cells: the lysosomal pathway and the ubiquitin-proteasome pathway (Hershko and Ciechanover, 1992). Exogenous proteins (e.g. those entering the cell via endocytosis) as well as membrane proteins are usually degraded through the lysosomal pathway, while most of the proteins in the cytoplasm are degraded by the ubiquitin-proteasome system. In mammalian cells, 80–90% of the proteins are subject to proteasomal degradation (Gronostajski et al., 1985; Lee and Goldberg, 1998). The proteasome processes aged, damaged or no longer needed cytosolic proteins, breaking them into 3–20 AA (amino acids) peptides. In most cases, the degraded proteins are previously tagged by a ubiquitin molecule (Ciechanover, 1998; Coux, 1996). Many of the resulting peptides are further degraded by aminopeptidases and other cellular mechanisms, such as nibbling (Kisselev et al., 1999), mainly at the N-terminal (Beninga et al., 1998). Some of these short peptides (mainly those 8–10 AA long) are subsequently translocated by the transporter associated with antigen processing (TAP) channels to the endoplasmic reticulum (ER), where a small part of them are loaded on major histocompatibility complex class-I molecules (MHC-I) and presented to CTL (Rammensee et al., 1993). Proteasomal proteolysis seems to take place in the cell nucleus as well (Rockel et al., 2005).

The proteasome is constructed as a complex of a core enzymatic chamber, the 20s proteasome, and a ‘cup’ of regulatory particles (RP). The 20s proteasome in its basic form (sometimes referred to as a ‘constitutive’ proteasome) is a 700 kDa molecule, composed of two 14 subunit copies, each organized in a structure of four stacked heptametrical rings. Out of the 14 subunits, only 3 have an active site. Following secretion of Interferon-{gamma} (IFN-{gamma}) or tumor necrosis factor-{alpha} (TNF-{alpha}), the active subunits of the 20s proteasome (β – 1[{delta},Y], β – 2[MC14, LMP9, Z] and β – 5[MB1, X]) are replaced by their immune counterparts (β – 1i[LMP2], β – 2i[MECL – 1] and β – 5i[LMP7], respectively). IFN-{gamma} may also induce another structural change—the increased expression of the 11s regulator (also known as PA28, REG) (Groettrup et al., 1995; Rock and Goldberg, 1999; Tanaka and Kashara, 1998). During an immune response (e.g. an anti-viral or anti-bacterial response), the majority of the cytosolic proteasomes are transformed into their immune form (Khan et al., 2001). 20s proteasomes endowed with one or two RP are called 26s proteasomes (Coux et al., 1996).

Different types of proteasomes are known to produce different cleavage repertoires (Beninga et al., 1998; Chen et al., 2001). The immunoproteasome has been shown to be more specific than the constitutive form. This specificity fits its known involvement in the generation of antigenic peptides presented to CTL (Kesmir et al., 2003). 9mers originating from proteasomal cleavage are the main source for epitopes presented to the immune system in the context of MHC class-I (for a review see Koopmann et al., 1997). The remaining epitopes are either proteasomal cleavage products that are further trimmed, or epitopes resulting from the cross talk between the lysosomal and proteasome pathways (Monu and Trombetta, 2007). In order to predict which peptides can eventually become CTL epitopes, and to understand the specificity of the proteasomal cleavage process, we built a score for the probability that a peptide is indeed a cleavage product. Many cleavage prediction algorithms have already been developed (e.g. Kesmir et al., 2002; Nussbaum, 2001; Peters et al., 2002); for a comparative review see Saxova et al. 2003. These algorithms focused on the detection of cleavage motifs within a given protein. The basic working assumption of these algorithms is the existence of such cleavage sites. The prediction is performed based on the combination of the two AA directly flanking each suspected cleavage site. However, proteasomal cleavage is a rate dependent process (Peters et al., 2002), and the production of a peptide is not solely determined by ‘cleavage sites’. For example, assume three consecutive cleavage sites: A, B and C. Often the peptide A-B is produced, as well as the peptide A-C, but not the peptide B-C. If one were only to predict peptides based on cleavage sites, the peptides A-B and B-C would have been predicted and not the peptide A-C. Thus, in order to detect MHC-I epitope candidates, the cleavage process has to be treated as stochastic. Furthermore, the probability that a given peptide is produced should be based on its length and the configuration of AAs within the chain and not on each pair of AA or the region surrounding a cleavage site as proposed in the FragPredict algorithm (Holzhutter et al., 1999). We define a peptide as a cleavage product if it is cleaved in its extremities and not cleaved in its center with a high probability. On the contrary, a peptide is not a cleavage product, provided that it is either cleaved in its center, or not cleaved in at least one of its extremities. Note that since the cleavage process is rate dependent, even peptides with a low production probability may appear in practice, but those should be rare.

We introduce a linear score function, assuming a linear contribution of each AA in the peptide and in its 2-side flanking regions to the total score (that is, an exponential contribution to the production probability, as is further explained in the results section). Altuvia et al. have shown that only the first flanking positions of the cleavage site make a significant contribution to the cleavage probability (Altuvia and Margalit, 2000). We have therefore only used the first flanking AA. Numerical values for the parameters are estimated using a Simulated Annealing (SA) process. The different specifications of each proteasome yield different scores for the constitutive proteasome and the immunoproteasome. The proposed cleavage score can be accessed at http://peptibase.cs.biu.ac.il/PepCleave_II/.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Score function
We denote each peptide by Formula , where FN, FC are the flanking positions to its N, C termini, respectively, and Pi's are the internal residues. Assuming that each position has an independent contribution, we can roughly write:


Formula

where the positions are FN, P1, ... , Pn and FC. We can write


Formula

and obtain.


Formula

Thus a linear score can represent a combination of independent processes. Given a peptide, we thus define a linear score as:


Formula

A peptide with a high Scleave (pep) has a high probability of being produced, while a peptide with a low Scleave (pep) has a low production probability. This score can be roughly treated as the logarithm of the production probability of a given peptide.

The values Si's were learnt on in vitro cleavage experiments, rather than from presented CTL epitopes. We assumed possible different values for S1, S2, S4, S5 to allow for conformation effects. It seems reasonable not to assume identical cleavage behavior in the center and extremities of a peptide, since proteasomal cleavage was shown to be a length-position-rate dependent process.

2.2 Learning set
Products of proteasomal cleavage were extracted from reports on in vitro studies (mainly from Boes et al., 1994; Eggers et al., 1995; Emmerich et al., 2000; Groettrup et al., 1995; Leibovitz et al., 1994; Nandi et al., 1997; Niedermann et al., 1997; Niedermann et al., 1996; Rivett, 1985; Tenzer et al., 2004; Toes et al., 2001; Wenzel et al., 1994). Duplicates were removed. The data were separated according to the cleaving proteasome type (if reported). The resulting peptides constitute the positive-data learning sets.

The negative learning sets include peptides that were randomized according to the PROWL AA distribution (PROWL). Random AAs were generated using a generic randomization algorithm (Matsumoto and Kurita, 1994). Peptide lengths were distributed uniformly between 6 and 13 AAs. A description of the data sets is available in Supplementary Table 1.

2.3 Learning algorithm for the values of S1S5
Optimal values for S1S5 were learnt using a Simulated Annealing process (Kirkpatrick et al., 1983). The initial configuration of the learnt variables (20 AA x 5 components = 100 variables) was randomized with initial values between –0.5 to 0.5. The Initial temperature was set to T0 = 15, and was decreased exponentially (Tn = {lambda}nT0, {lambda} = 0.9) in constant intervals. Each mutation was a random change of a magnitude of up to 0.01. The probability of an uphill movement in temperature T (i.e. accepting a new configuration that is not better than the current one) was set as:


Formula

The score was normalized after each iteration by a member-wise division by a factor of Formula

The evaluation function for the process was chosen to be,


Formula

where pos+ and neg are the number of peptides from the positive and negative training sets which the algorithm has managed to learn. The coefficients G and B (both are non-negative integers) enable one to overweight one group or the other. The ratio [G : B] was varied in different learning processes.

The process was executed several times to ensure that the minimum attained could not be significantly improved. We have executed the learning process separately for the constitutive proteasome and for the immunoproteasome, according to the considerations introduced above. The characteristics of the processes are shown in Supplementary Table 2. Finally, we reduced the parameter values to the lowest absolute values that do not affect the success rate over the training sets.

2.4 Validation
Five validation sets were used (Supplementary Table 1, lower section). The positive validation sets contain peptides from the SYFPEITHI (Rammensee et al., 1999) and IEDB (Peters et al., 2005; Sette et al., 2005) databases. The first two positive validation sets were all naturally processed epitopes within the SYFPEITHI database, and MHC eluted epitopes in the IEDB database. In order to produce the third positive validation set, we used blast on the SYFPEITHI human epitopes versus all the genes in the ENSEMBL human genome to locate their origin, and chose only the epitopes with a unique alignment. We extracted the flanking positions of these epitopes, and used the epitopes and their flanking region's set as a validation for the flanking scores. Note that most MHC eluted epitopes in the IEDB database had no clear curated source. This database may thus be less reliable.

The negative validation set was composed of 1.e6 peptides randomized like those of the negative learning sets. These peptides’ lengths were uniformly distributed between 6 and 13 AAs. The last negative validation set was established by randomizing 1.e6 peptides (with the same AA distribution), restricting lengths to 8–9 AA (as in the SYFPEITHI set). Peptides from the validation sets were not included in the training sets. For each validation set, we assessed each score using measures of sensitivity (representing a score's ability to indicate cleavage), specificity (representing the ability to indicate non-cleaved peptides) and a correlation score calculated by (Saxova et al., 2003):


Formula

The different validation sets can be accessed in the Pep_Cleave website peptibase.cs.biu.ac.il:64080/PepCleave_II/Validation_sets


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
In order to calculate the probability that a given peptide is indeed the result of proteasomal cleavage, one must take into account the internal part of the peptide and the residues flanking the cleavage site. Given an n AA long peptide, we define the positions within the peptide as P1Pn, and the flanking positions as FN,FC. A peptide is produced if the positions FN P1 and Pn – FC are cleaved and none of the internal positions are cleaved. If the cleavage probability of each position in the proteasome was simply site dependent (as is assumed by existing algorithms), the probability to produce a peptide would be:


Formula

where w(x, y) is the probability of having a cleavage site between the residues placed in positions x and y. Such a score would predict that the majority of peptides would be very short (since Formula decreases with peptide length). Moreover, given three neighboring ‘cleavage sites’ (A, B, C), cleavage experiments often show that the peptides A-C and A-B are formed, but not the peptide B-C (Tenzer et al., 2004), which would be predicted taking into account only cleavage points. Actually, one often observes even more complex situations with four cleavage sites where peptides A-C and B-D are formed but not A-B, B-C or C-D (Nussbaum et al., 1998).

The opposite, more precise description would require a probability score specifically adapted to each position: Formula , where each wi is different from the others. This would require a huge learning set. We here propose a middle way of a score depending differently on the internal residues and the C and N termini. We define a score for the probability that a given peptide P shall be produced during cleavage, Scleave (FN – P – FC), as a linear combination of the effect of the AAs in the different positions:


Formula

We performed a SA algorithm in order to learn optimal values for the values of S1S5 in the score function. We first learnt the most basic configuration ([BSC]—basic) of all peptides measured to be cleaved. We then extended the analysis to four more different types of scores for specific proteasomes (e.g. the immunoproteasome). The score was learnt on two learning sets. The positive training set consisted of cleavage products taken from in vitro experiments (Boes et al., 1994; Eggers et al., 1995; Emmerich et al., 2000; Groettrup et al., 1995; Leibovitz et al., 1994; Nandi et al., 1997; Niedermann et al., 1997; Niedermann et al., 1996; Rivett, 1985; Tenzer et al., 2004; Toes et al., 2001; Wenzel et al., 1994). For some experiments, the proteasome type was specified. These results were used for specific training sets, namely a set of constitutive proteasome cleavage products and a set of immunoproteasome cleavage products.

The negative training set was composed of random peptides of length uniformly distributed between 6 to 13 AA and a typical AA distribution (PROWL). We used different negative training sets (for the different scores) to match the length distribution of the peptides in the positive learning set and the fraction of peptides with flanking positions. In both the negative and positive learning sets, 83.3–96.1% of the peptides had flanking regions.

The basic score represents the probability that a peptide would be produced, independently of the specific proteasome used by the cell. This score actually represents the ‘average effect’ of all proteasome types. Given the fact that in a typical experimental setup, the precise proteasome type used by the cell is unknown, such an average score is needed. Furthermore, multiple proteasome types may be used simultaneously by the cell. Thus such a score can actually represent a biological reality (Brooks et al., 2000). The SA produces a better predictor (on both the learning and validation sets) than any existing cleavage prediction algorithm. In contrast with existing algorithms, we do not claim to precisely predict all possible cleavage sites; rather, we claim to predict the existence of cleavage products, which we do predict precisely. In order to test the prediction of the algorithm, we used five different validation sets. Each set tested a different aspect of the score quality:

  • The first validation set contained 1.e6 random peptides with AA and length distribution similar to the ones used in the negative learning set. This set was used to check that the learning process can be extended to similar random sets.
  • The second and third sets consisted of reported naturally processed epitopes from the SYFPEITHI and IEDB databases. These epitopes are supposed to be cleaved before presentation. This set is used as a positive validation of the internal positions of Scleave (FN – P – FC) (i.e. S2 (P1), S3(Pi), S4(Pn)), since these epitopes do not contain flanking AAs.
  • The Fourth set is composed of naturally processed SYFPEITHI human epitopes, for which a single copy was found in the human genome. The origin of these epitopes in the human genome was located, and the flanking residues were extracted. This validation set allows us to test the score for the flanking regions. This dataset is much smaller than the previous one, and was only established to show that the inclusion of the flanking regions of the peptide improves the prediction capacity of the score.
  • The fifth set is SYFPEITHI like random peptides. These peptides have a similar AA usage as the first negative validation set, but the length of the SYFPETHI epitopes and without flanking regions. It was used to check that the score performance on the real SYFPEITHI is not merely a result of the peptide structure.
Before comparing our results with CTL epitopes, one must note that our score does not directly predict CTL epitopes, since biochemical mechanisms beyond proteasomal cleavage are involved in the epitope production process. An example for this would be nibbling, which is known to affect the N-terminus rather than the C-terminus of a peptide. Generally, post-proteasomal trimming of the N-terminus has been shown to be particularly relevant in the immunoproteasome case (Beninga et al., 1998). Our score is still a good approximation for the process leading to epitope production as can be seen from the validation results (Table 1). We are currently developing a trimming algorithm to bridge the gap between the cleavage prediction and MHC presentation.


View this table:
[in this window]
[in a new window]

 
Table 1. Absolute performance of the scores developed

 
The [BSC] cleavage score performs almost as well in the validation sets as in the learning sets (Table 1, second column). Its false positive and false negative levels are on average, 24 and 27%, respectively. It obviously performs better with flanking than without the flanking residues (24.5% error versus 27%), but the difference is not large. Note also that we denote any predicted random peptide as a false positive. Actually a large number of those can be real epitopes. In in vitro experiments, at least 4–5% of the peptides are measured to be produced. The current scores were learnt using a negative validation set composed of random peptides. We have attempted to learn the cleavage algorithms using peptides from cleavage experiments, not observed to be cleaved. However, the validation results, following such training were not as good as the ones obtained from training with random peptides.

The [BOW] score has approximately 15% false positives, not very far from the actual fraction of peptides produced from a random sequence, while maintaining approximately 60–65% properly predicted peptides in the positive validation set (with and without flanking regions—Table 1, fourth column). The opposite was found out to be more complex. Even when increasing the penalty for false negatives, we cannot reduce their fraction to <20%. This can probably be attributed to the fact that many epitopes in the positive validation sets are not direct cleavage products, and cannot be directly learnt from in vitro cleavage experiments.

When the specific proteasome used is known, specific scores may also be needed. Proteasomes can be divided into core proteasome (20s) without the 19s cap, or with the 19s cap (26s). The constitutive proteasome itself can change some of its active subunits in the presence of IFN-{gamma} to produce the immunoproteasome. IFN-{gamma} can also induce the addition of a PA28 cap. This division is further complicated by the presence of hybrid proteasomes (Bose et al., 2001; Brooks et al., 2000; Rivett et al., 2001).

Limited by the size of the data sets, we did not compute a score for each possible type of proteasome (IFN-{gamma} induced, PA28, 26s versus 20s ...). We computed two generalized scores: one for the immunoproteasome and one for the proteasome produced in the absence of IFN-{gamma}. Our scores can thus be defined as follows:

  1. Constitutive proteasome score—[CON]. This score predicts cleavage products of the constitutive proteasome.
  2. Immunoproteasome score—[IMM]. The score is to be used when cleavage products are predicted in the presence of IFN-{gamma}.
The absolute and relative performances, sensitivities and specificities of all five scores over both training and validation sets are displayed in Table 1 and Figure 1. These results are significantly better than all previously reported scores (Saxova et al., 2003). The sensitivity of the current algorithm is equal to that of top algorithms like NetChop2 (Saxova et al., 2003), or NetChop3 and has a specificity that is almost twice as large. The values for the functions S1S5 of the five different scores are available in the appendix (Supplementary Tables 3–7). Scaled images for the components of [BSC], [CON] and [IMM] are shown in Figure 2.


Figure 1
View larger version (33K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Specificity, sensitivity and CC of the scores developed in the current study, compared with existing scores (Saxova et al., 2003). Our score performs significantly better than existing scores.

 

Figure 2
View larger version (44K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Scaled images of the scores [BSC], [CON] and [IMM]. All three images were scaled according to the same color map. Cysteine has a clear negative contribution in all positions of [CON] and in most of the positions of [BSC].

 
The two partial scores ([CON] and [IMM]) have lower false positives than their merged counterpart, but much larger false negatives. This can be understood intuitively. The different scores proposed here aggregate different cleavage processes (i.e. cleavage by different proteasomes) under one roof. An aggregation of a larger number of processes allows us to explain more results (thus reducing the false negatives). This comes with the price of a larger number of false positives.

To determine the common prediction capacities of [CON] and [IMM], one could treat them as two parts of the same score function. The combined score [CON][IMM] predicts a peptide to be produced if at least one of its two components is positive. The combined score admits a CC of 0.47, higher than those of the [CON] an [IMM] separately.

As our score functions appear to provide a good prediction of proteasomal cleavage probabilities, we can derive conclusions about biological aspects of cleavage. We here provide a few such examples: In all scores except for the immune proteasome, Cysteine demonstrated a significant negative contribution in practically all positions. The presence of Cysteine either inside a peptide or in its 1st position reciprocal flanking region extremely diminishes its production probability (Fig. 2). To establish this result we counted the appearance of Cysteines in the SYFPETHI database and found only 6 appearances, yielding a frequency of ~0.14% versus a natural frequency of 1.82% (MHC epitopes have 13 times less Cysteines than random epitopes). This phenomenon may be accounted by the bi-sulphide bond that Cysteine establishes. Another clear correlation is between AA volume and their probability of being located at the right hand side of a cleavage site (Fig. 3). These examples highlight the facts that the cleavage scores, although only learnt on the sequences of cleaved peptides, manage to reproduce some of the structural aspects related to cleavage.


Figure 3
View larger version (27K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Correlation between cleavage probability and residue volume for the basic score. The correlation existed only in the positions S1 and S4 (P < 0.05). The correlation coefficient for the position S1: r = 0.7; S4: r = 0.46.

 

    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The prediction of proteasomal cleavage products has many applications in immunology. Previous algorithms have tried to specify cleavage sites within a given protein, and define peptides as regions between cleavage sites (Kesmir et al., 2002; Nussbaum, 2001; Peters et al., 2002). Such algorithms cannot predict two overlapping peptides, which often does occur. Moreover, these site specific methods, at the end of the day, fail to provide good predictive capacities for the probability that a given peptide should be produced. We have here presented a score for the probability that a whole sub-peptide is produced by proteasomal cleavage, rather than trying to locate specific cleavage motifs. Our algorithm assumes a linear contribution of each AA within the peptide and its flanking region to the score. This method yields better performance for the general probability that a given peptide is the result of proteasomal cleavage. Moreover it predicts quite precisely the production of CTL epitope precursors. As in any other prediction method, the current score has both false positives and false negatives (Type I and II errors). We have attempted to balance between the need to minimize both errors types. However, in many situations, one needs to minimize either one or the other. We have thus devised appropriate scores, where either the false positives or the false negatives are minimized. While it was possible to significantly reduce the fraction of false positives, the reduction was less significant for false negatives. This is probably due to inherent errors in the databases used, or to the fact that the presented epitopes may undergo further processing not represented in in vitro cleavage experiments.

We cannot compare directly our results with other state of the art algorithms, since the goal of the different algorithms is slightly different. In contrast with most algorithms, we do not predict cleavage sites, but a score for the entire peptide. If one uses the minimal description of an epitope as a peptide with no cleavage sites in its center, then our algorithms outperforms even the most up to date algorithms. Netchop 3 has for example a CC score of 0.05 on our validation sets compared to 0.47 for our basic algorithm. This comparison should be taken with a grain of salt, as cleavage prediction algorithms are more focused on the prediction of cleavage sites and not peptide products.

We have not developed separate scores for each possible proteasome in this study. Even if enough data were available to learn such scores, it would still be very problematic to apply them to different situations in vivo, since it is rarely known which combination of proteasomes operates in a given situation.

The current scores are available at http://peptibase.cs.biu.ac.il:64080/PepCleave_II/. The results are incorporated into a server presenting all predicted epitopes in a given virus or host. The beta version of this server is open for general access at peptibase.cs.biu.ac.il/peptibase/.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The work of Y.L, I.G. and T.V. was covered by NIH grant: 1 R01 AI61062-01. The work of T.V. was also covered by a scholarship of Yeshaia Horowitz foundation.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Burkhard Rost

Received on August 1, 2007; revised on December 8, 2007; accepted on December 11, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Altuvia Y, Margalit H. Sequence signals for generation of antigenic peptides by the proteasome: implications for proteasomal cleavage mechanism. J. Mol. Biol (2000) 295:879–890.[CrossRef][Web of Science][Medline]

    Beninga J, et al. Interferon-gamma can stimulate post-proteasomal trimming of the N terminus of an antigenic peptide by inducing leucine aminopeptidase. J. Biol. Chem (1998) 273:18734–18742.[Abstract/Free Full Text]

    Boes B, et al. Interferon gamma stimulation modulates the proteolytic activity and cleavage site preference of 20S mouse proteasomes. J. Exp. Med (1994) 179:901–909.[Abstract/Free Full Text]

    Bose S, et al. gamma-Interferon decreases the level of 26 S proteasomes and changes the pattern of phosphorylation. Biochem. J (2001) 353:291–297.[CrossRef][Web of Science][Medline]

    Brooks P, et al. Subcellular localization of proteasomes and their regulatory complexes in mammalian cells. Biochem. J (2000) 346:155–161.[CrossRef][Web of Science][Medline]

    Chen W, et al. Immunoproteasomes shape immunodominance hierarchies of antiviral CD8(+) T cells at the levels of T cell repertoire and presentation of viral antigens. J. Exp. Med (2001) 193:1319–1326.[Abstract/Free Full Text]

    Ciechanover A. The ubiquitin–proteasome pathway: on protein death and cell life. The EMBO J (1998) 17:7151–7160.[CrossRef][Web of Science][Medline]

    Coux O, et al. Structure and functions of the 20S and 26S proteasomes. Annu. Rev. Biochem (1996) 65:801.[CrossRef][Web of Science][Medline]

    Eggers M, et al. The cleavage preference of the proteasome governs the yield of antigenic peptides. J. Exp. Med (1995) 182:1865–1870.[Abstract/Free Full Text]

    Emmerich NPN, et al. The human 26 S and 20 S proteasomes generate overlapping but different sets of peptide fragments from a model protein substrate. J. Biol. Chem (2000) 275:21140–21148.[Abstract/Free Full Text]

    Groettrup M, et al. The interferon-gamma-inducible 11 S regulator (PA28) and the LMP2/LMP7 subunits govern the peptide production by the 20 S proteasome in vitro. J. Biol. Chem (1995) 270:23808–23815.[Abstract/Free Full Text]

    Gronostajski RM, et al. The ATP dependence of the degradation of short- and long-lived proteins in growing fibroblasts. J. Biol. Chem (1985) 260:3344–3349.[Abstract/Free Full Text]

    Hershko A, Ciechanover A. The ubiquitin system for protein degradation. Ann. Rev. Biochem (1992) 61:761–807.[CrossRef][Web of Science][Medline]

    Holzhutter HG, et al. A theoretical approach towards the identification of cleavage-determining amino acid motifs of the 20 S proteasome. J. Mol. Biol (1999) 286:1251–1265.[CrossRef][Web of Science][Medline]

    Kesmir C, et al. Bioinformatic analysis of functional differences between the immunoproteasome and the constitutive proteasome. Immunogenetics (2003) 55:437–449.[CrossRef][Web of Science][Medline]

    Kesmir C, et al. Prediction of proteasome cleavage motifs by neural networks. Prot. Eng (2002) 15:287–296.[Abstract/Free Full Text]

    Khan S, et al. Immunoproteasomes largely replace constitutive proteasomes during an antiviral and antibacterial immune response in the liver. J. Immunol (2001) 167:6859–6868.[Abstract/Free Full Text]

    Kirkpatrick S, et al. Optimization by simulated annealing. Science (1983) 220:671.[Abstract/Free Full Text]

    Kisselev AF, et al. The sizes of peptides generated from protein by mammalian 26 an 20S proteasomes: implications for understanding the degradative mechanism and antigen presentation. J. Biol. Chem (1999) 274:3363.[Abstract/Free Full Text]

    Koopmann JO, et al. Generation, intracellular transport and loading of peptides associated with MHC class-I mulecules. Curr. Opin. Immunol (1997) 9:80–88.[CrossRef][Web of Science][Medline]

    Lee DH, Goldberg AL. Proteasome inhibitors: valuable new tools for cell biologists. Trends Cell Biol (1998) 8:397–403.[CrossRef][Web of Science][Medline]

    Leibovitz D, et al. Sequential degradation of the neuropeptide gonadotropin-releasing hormone by the 20 S granulosa cell proteasomes. FEBS Lett (1994) 346:203–206.[CrossRef][Web of Science][Medline]

    Matsumoto M, Kurita Y. Twisted GFSR generators II. ACM Trans. Model. Comput. Simul (1994) 4:254–266.[CrossRef]

    Monu N, Trombetta ES. Cross-talk between the endocytic pathway and the endoplasmic reticulum in cross-presentation by MHC class I molecules. Curr. Opin. Immunol (2007) 19:66–72.[CrossRef][Web of Science][Medline]

    Nandi D, et al. Intermediates in the formation of mouse 20S proteasomes: implications for the assembly of precursor beta subunits. EMBO J (1997) 16:5363–5375.[CrossRef][Web of Science][Medline]

    Niedermann G, et al. Potential immunocompetence of proteolytic fragments produced by proteasomes before evolution of the vertebrate immune system. J. Exp. Med (1997) 186:209–220.[Abstract/Free Full Text]

    Niedermann G, et al. The proteolytic fragments generated by vertebrate proteasomes: structural relationships to major histocompatibility complex class I binding peptides. Proc. Natl Acad. Sci. USA (1996) 93:8572–8577.[Abstract/Free Full Text]

    Nussbaum AK. Prediction Algorithm for Proteasomal Cleavages. (2001) Tuebingen, Germany: University of Tuebingen.

    Nussbaum AK, et al. Cleavage motifs of the yeast 20S proteasome beta subunits deduced from digests of enolase 1. Proc. Natl Acad. Sci. USA (1998) 95:12504–12509.[Abstract/Free Full Text]

    Peters B, et al. Assessment of proteasomal cleavage probabilities from kinetic analysis of time-dependent product formation. J. Mol. Biol (2002) 318:847–862.[CrossRef][Web of Science][Medline]

    Peters B, et al. The Immune Epitope Database and Analysis Resource: From Vision to Blueprint. PLoS Biol (2005) 3:e91.[CrossRef][Medline]

    PROWL PROWL-Amino acid properties. http://prowl.rockefeller.edu/aainfo/contents.htm. (Accessed on January, 2007).

    Rammensee H, et al. SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics (1999) 50:213–219.[CrossRef][Web of Science][Medline]

    Rammensee HG, et al. Peptides naturally presented by MHC class 1 mulecules. Annu. Rev. Immunol (1993) 11:213.[CrossRef][Web of Science][Medline]

    Rivett AJ. Purification of a liver alkaline protease which degrades oxidatively modified glutamine synthetase. Characterization as a high molecular weight cysteine proteinase. J. Biol. Chem (1985) 260:12600–12606.[Abstract/Free Full Text]

    Rivett AJ, et al. Regulation of proteasome complexes by gamma-interferon and phosphorylation. Biochimie (2001) 83:363–366.[Medline]

    Rock KL, Goldberg AL. Degradation of cell proteins and the generation of MHC class1-presented peptides. Annu. Rev. Immunol (1999) 17:739–779.[CrossRef][Web of Science][Medline]

    Rockel TD, et al. Proteasomes degrade proteins in focal subdomains of the human cell nucleus. J. Cell. Sci (2005) 118:5231–5242.[Abstract/Free Full Text]

    Saxova P, et al. Predicting proteasomal cleavage sites: a comparison of available methods. Int. Immunol (2003) 15:781–787.[Abstract/Free Full Text]

    Sette A, et al. A roadmap for the immunomics of category A-C pathogens. Immunity (2005) 22:155–161.[CrossRef][Medline]

    Tanaka K, Kashara M. The MHC class I ligand-generating system: roles of immunoproteasomes and the interferon-gamma inducible proteasome activator PA28. Immunol. Rev (1998) 163:161–176.[CrossRef][Web of Science][Medline]

    Tenzer S, et al. Quantitative Analysis of prion-protein degradation by constitutive and immuno-20S proteasomes indicates differences correlated with disease susceptibility. J. Immunol (2004) 172:1083–1091.[Abstract/Free Full Text]

    Toes REM, et al. Discrete cleavage motifs of constitutive and immunoproteasomes revealed by quantitative analysis of cleavage products. J. Exp. Med (2001) 194:1–12.[Abstract/Free Full Text]

    Wenzel T, et al. Existence of a molecular ruler in proteasomes suggested by analysis of degradation products. FEBS Lett (1994) 349:205–209.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
D. S. DeLuca, B. Eiz-Vesper, N. Ladas, B. A.-M. Khattab, and R. Blasczyk
High-throughput minor histocompatibility antigen prediction
Bioinformatics, September 15, 2009; 25(18): 2411 - 2417.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Vider-Shalit, R. Sarid, K. Maman, L. Tsaban, R. Levi, and Y. Louzoun
Viruses selectively mutate their CD8+ T-cell epitopes--a large-scale immunomic analysis
Bioinformatics, June 15, 2009; 25(12): i39 - i44.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
24/4/477    most recent
btm616v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ginodi, I.
Right arrow Articles by Louzoun, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ginodi, I.
Right arrow Articles by Louzoun, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?