Skip Navigation


Bioinformatics Advance Access originally published online on June 28, 2007
Bioinformatics 2007 23(23):3241-3243; doi:10.1093/bioinformatics/btm334
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/23/3241    most recent
btm334v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Wee, L. J.K.
Right arrow Articles by Ranganathan, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wee, L. J.K.
Right arrow Articles by Ranganathan, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

CASVM: web server for SVM-based prediction of caspase substrates cleavage sites

Lawrence J.K. Wee 1, Tin Wee Tan 1 and Shoba Ranganathan 2,1,*

1Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore and 2Department of Chemistry and Biomolecular Sciences & Biotechnology Research Institute, Macquarie University, Sydney, Australia

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION OF CASVM...
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: Caspases belong to a unique class of cysteine proteases which function as critical effectors of apoptosis, inflammation and other important cellular processes. Caspases cleave substrates at specific tetrapeptide sites after a highly conserved aspartic acid residue. Prediction of such cleavage sites will complement structural and functional studies on substrates cleavage as well as discovery of new substrates. We have recently developed a support vector machines (SVM) method to address this issue. Our algorithm achieved an accuracy ranging from 81.25 to 97.92%, making it one of the best methods currently available. CASVM is the web server implementation of our SVM algorithms, written in Perl and hosted on a Linux platform. The server can be used for predicting non-canonical caspase substrate cleavage sites. We have also included a relational database containing experimentally verified caspase substrates retrievable using accession IDs, keywords or sequence similarity.

Availability: http://www.casbase.org/casvm/index.html

Contact: shoba.ranganathan{at}mq.edu.au

Supplementary information: http://www.casbase.org/casvm/help/index.html


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION OF CASVM...
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Caspases belong to a unique class of cysteine proteases which function as critical effectors of apoptosis, inflammation and other important cellular processes, such as cell proliferation, cell differentiation, cell migration and receptor internalization (Algeciras-Schimnich et al., 2002; Launay et al., 2005; Los et al., 2002). Caspases contain a cysteine residue at the active site and cleave substrates at specific tetrapeptide sites (denoted P4-P3-P2-P1) with a highly conserved aspartate (D) at the P1 position (Talanian et al., 1997). To date, at least 14 mammalian caspases have been discovered and they can be grouped into three classes based on their preferential tetrapeptide specificities (Thornberry et al., 1997). Group I caspases (-1, -4 and -5) recognize the sequence (W/L)EHD; Group II caspases (-2, -3 and -7) prefer the sequence DEXD; while Group III caspases (-6, -8, -9 and -10) cleave proteins with the sequence (L/V)E(T/H)D.

As reviewed in Earnshaw et al. (1999) and Fischer et al. (2003), caspase substrates belong to a myriad of protein classes, such as structural elements of cytoplasm and nucleus, components of the DNA repair machinery, protein kinases, GTPases and viral structural proteins. Although more than 280 caspase substrates have been discovered to date, it is possible that several more remain undetected. The identification and characterization of caspase substrates will be critical for deepening our understanding of the role of these enzymes in the various cellular pathways. However, the accurate detection of caspase cleavage sites in target proteins requires complex and time consuming in vivo and in vitro experiments. Given the readily available sequence data in public databases, a useful alternative is to conduct in silico screening for potential cleavage sites among proteins.

At present, a number of tools are available for computational prediction of caspase substrate cleavage sites. PeptideCutter (Gasteiger et al., 2005), a general protease cleavage prediction server, employs the preferential caspase cleavage specificities of various caspases for prediction. Lohmuller et al. (2003) conceived the peptidase substrate prediction tool (PEPS) based on position-specific scoring matrices (PSSM) for caspase-3 substrate cleavage. Garay-Malpartida et al. (2005) developed the CasPredictor software which utilizes an algorithm which analyzes the cleavage sites for amino acid substitution, amino acid frequency and the presence of ‘PEST’ sequences, short for sequences containing proline (P), glutamate (E), aspartate (D), serine (S) and threonine (T) residues (Rechsteiner and Rogers, 1996; Rogers et al., 1986), in the vicinity of the cleavage site. The GraBCas software, created by Backes et al. (2005) provided an improved PSSM-based method which further accounted for the P1'-P2' residues. Yang (2005) had previously applied different neural networks for predicting cleavage sites such as, single-layer perceptrons, multi-layer perceptrons and the Bayesian bio-basis function neural networks.

We have applied the support vector machine (SVM) algorithm to this domain and have shown that the SVM method achieved an accuracy ranging from 81.25 to 97.92%, making it one of the best methods currently available (Wee et al., 2006). Here, we report the web server implementation of our SVM method, named CASVM, for the automated prediction of caspase cleavage sites on protein sequences. We have also constructed a relational database for easy retrieval of experimentally verified caspase substrates and provided the relevant datasets used in the development of our method on the website.


    2 DESCRIPTION OF CASVM SERVER
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION OF CASVM...
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Details on the SVM algorithms implemented in the server has been described previously in our earlier work (Wee et al., 2006). Briefly, an SVM classifier (termed P4P1-trained classifier) was trained with the sequences from a dataset containing unique caspase cleavage sites obtained from experimentally verified caspase substrates and an equal number of ‘non-cleavage’ sites (random tetrapeptide sequences extracted elsewhere on the same substrate). To account for the influence of adjacent residues on substrate cleavage, we constructed two other SVM classifiers trained with sequence segments containing the tetrapeptide cleavage sites with upstream two residues (P4P2'-trained classifier) and tetrapeptide sequences with upstream 10 residues up to P14 and downstream ten residues up to P10' (P14P10'-trained classifier) from the substrates. To minimize overtraining due to the high occurrence of aspartic acid at P1 position and enable the prediction of cleavage sites with residues other than aspartic acid at P1, the SVM classifiers were trained on sequences with the P1 residue removed. These SVM classifiers were used in CASVM and were shown to achieve an accuracy of 81.25, 89.58 and 93.75%, respectively when tested on datasets not used in training.

The server is written in Perl and is hosted on a Linux platform. The LIBSVM software package is used for implementing the SVM algorithm (Chang and Lin, 2001). The server homepage presents an intuitive interface for user input and processing. Users can submit (through copy and paste) a raw or FASTA-formatted protein sequence and select a number of options for server prediction. Upon form submission, the input sequence will be scanned over the entire length of the sequence with the scanning window selected by the user. Three scanning window sizes are available: P4P1, P4P2' and P14P10', each dictating the type of SVM classifier to be used for prediction. For example, if the scanning window size of P14P10' is selected, the P14P10'-trained SVM classifier will be used for prediction. The P14P10'-trained classifier, having reported the highest accuracy during our experimentation, is selected as default. We have also included the option for the selection of the type of P1 residue to be screened so as to account for the possibility of non-canonical cleavage sites on substrates. Users are able to select for aspartic acid (default) or both aspartic acid and glutamic acid as the required P1 residue. As the input sequence is being scanned, sequence segments containing the specified P1 amino acid will be extracted and predicted for the presence of the cleavage site with the selected SVM classifier. The output of the CASVM prediction displays the name of input sequence (optional), sequence length, an abbreviated version of the sequence, a list of potential cleavage sites (all tetrapeptide sequences with the specified P1 residue in the input sequence) and the CASVM-predicted cleavage sites. All cleavage sites are labeled with the P1 residue position.


    3 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION OF CASVM...
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
CASVM is a web server for the SVM-based prediction of caspase substrates cleavage sites. As the substrates used in the method are derived from a variety of organisms (human, mouse, rat, fruit fly, cow, chicken, frog, worm and viruses) and are cleaved by various caspases (caspase -1, -3, -6, -7, -8, -9, -12, -13 and -14), the server is applicable to the detection of cleavage sites in substrates from various organisms and is not caspase-specific. In our analyses, we observed that the inclusion of residues in the immediate vicinity of the cleavage site increased the prediction accuracy (Wee et al., 2006). These findings are consistent with Backes et al. (2005) and Garay-Malpartida et al. (2005) reporting the role of the P1'-P2', residues and neighboring PEST sequences on substrate cleavage, respectively. Specifically, we have included SVM classifiers accounting for limited and extended sequences flanking the tetrapeptide cleavage site as represented by the P4P2'-trained and P14P10'-trained classifiers.

With the range of caspase substrates now available, a large variety of cleavage sites are presented, many of which differed markedly from the consensus tetrapeptide specificities reported in Thornberry et al. (1997). Although caspases are thought to be selective for aspartic acid at the P1 position, a notable number of substrates were cleaved at tetrapeptide sites bearing glutamic acid at the P1 position. Existing methods for caspase cleavage sites prediction (excepting ours) are largely limited to the discovery of cleavage sites with aspartic acid at P1 as they assume the consensus XXXD motif as the basis for their algorithms. The strict adherence to the inclusion of aspartic acid may be limiting the sensitivity of these tools since it is intuitively likely that many more substrates will have P1 residues as glutamic acid. Therefore, we have included the option to screen and predict for tetrapeptide sequences containing either aspartic acid or glutamic acid at the P1 position.

CASVM also contains a relational database of over 270 experimentally verified caspase substrates. The database, accessible at http://www.casbase.org/casvm/squery/index.html, can be queried using UniProt accession IDs (UniProt Consortium, 2007), keywords or the integrated BLASTP program (Altschul et al., 1990). The dataset of 219 unique caspase cleavage sites which was used for developing our SVM method can be downloaded from the website (Supplementary Material).


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION OF CASVM...
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
LJKW gratefully acknowledges the award of a research scholarship from the National University of Singapore.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Alex Bateman

Received on April 27, 2007; revised on June 12, 2007; accepted on June 17, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 DESCRIPTION OF CASVM...
 3 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Algeciras-Schimnich A, et al. Apoptosis-independent functions of killer caspases. Curr. Opin. Cell Biol (2002) 14:721–726.[CrossRef][Web of Science][Medline]

    Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol (1990) 215:403–410.[CrossRef][Web of Science][Medline]

    Backes C, et al. GraBCas: a bioinformatics tool for score-based prediction of Caspase and Granzyme B-cleavage sites in protein sequences. Nucleic Acids Res (2005) 33:208–213.[CrossRef]

    Chang CC, Lin CJ. LIBSVM: a library for support vector machines. (2001) http://www.csie.ntu.edu.tw/~cjlin/libsvm.

    Earnshaw WC, et al. Mammalian caspases: structure, activation, substrates, and functions during apoptosis. Annu. Rev. Biochem (1999) 68:383–424.[CrossRef][Web of Science][Medline]

    Fischer U, et al. Many cuts to ruin: a comprehensive update of caspase substrates. Cell Death Differ (2003) 10:76–100.[CrossRef][Web of Science][Medline]

    Garay-Malpartida HM, et al. CaSPredictor: a new computer-based tool for caspase substrate prediction. Bioinformatics (2005) 21(Suppl. 1):i169–i176.[Abstract]

    Gasteiger E, et al. Protein Identification and Analysis Tools on the ExPASy Server. In: The Proteomics Protocols Handbook—Walker JM, ed. (2005) Totowa, USA: Humana Press. 571–607.

    Launay S, et al. Vital functions for lethal caspases. Oncogene (2005) 24:5137–5148.[CrossRef][Web of Science][Medline]

    Lohmuller T, et al. Toward computer-based cleavage site prediction of cysteine endopeptidases. Biol. Chem (2003) 384:899–909.[CrossRef][Web of Science][Medline]

    Los M, et al. Caspases: more than just killers? Trends Immunol (2002) 22:31–34.[Web of Science]

    Rechsteiner M, Rogers S. PEST sequences and regulation by proteolysis. Trends Biochem. Sci (1996) 21:267–271.[CrossRef][Web of Science][Medline]

    Rogers S, et al. Amino acid sequences common to rapidly degraded proteins: the PEST hypothesis. Science (1986) 234:364–368.[Abstract/Free Full Text]

    Talanian RV, et al. Substrate specificities of caspase family proteases. J. Biol. Chem (1997) 272:9677–9682.[Abstract/Free Full Text]

    Thornberry NA, et al. A combinatorial approach defines specificities of members of the caspase family and granzyme B. Functional relationships established for key mediators of apoptosis. J. Biol. Chem (1997) 272:17907–17911.[Abstract/Free Full Text]

    UniProt Consortium. The Universal Protein Resource (UniProt). Nucleic Acids Res (2007) 35:D193–D197.[Abstract/Free Full Text]

    Wee LJK, et al. SVM-based prediction of caspase substrate cleavage sites. BMC Bioinformatics (2006) 7(Suppl. 5):S14.

    Yang ZR. Prediction of caspase cleavage sites using Bayesian bio-basis function neural networks. Bioinformatics (2005) 21:1831–1837.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/23/3241    most recent
btm334v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Wee, L. J.K.
Right arrow Articles by Ranganathan, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wee, L. J.K.
Right arrow Articles by Ranganathan, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?