Skip Navigation



Bioinformatics Advance Access published online on March 29, 2005

Bioinformatics, doi:10.1093/bioinformatics/bti404
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
21/11/2644    most recent
bti404v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Yang, Z. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yang, Z. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author (2005). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org
Received November 14, 2004
Revised February 7, 2005
Accepted March 22, 2005

Article

Mining SARS-CoV protease cleavage data using non-orthogonal decision trees, a novel method for decisive template selection

Zheng Rong Yang 1

1 Department of Computer Science, Exeter University, UK


   Abstract

Motivation: Although the outbreak of the severe acute respiratory syndrome (SARS) is currently over, it is expected that it will return to attack human beings. A critical challenge to scientists with various disciplines worldwide is to study the specificity of cleavage activity of SARS related coronavirus (SARS-CoV) and use the knowledge obtained from the study for effective inhibitor design to fight the disease. The most commonly used inductive programming methods for knowledge discovery from data assume that the elements of input patterns are orthogonal to each other. Suppose a sub-sequence is denoted as PB2B-PB1B-PB1'B-PB2'B, the conventional inductive programming method may result in a rule like "if PB1B=Q, then the sub-sequence is cleaved, otherwise non-cleaved". If the site PB1B is not orthogonal to the others (for instance, PB2B, PB1'B, and PB2'B), the prediction power of this kind of the rules may be limited. It is therefore motivated in this study to develop a novel method for constructing non-orthogonal decision trees for mining protease data.

Result: Eighteen sequences of coronavirus polyprotein are downloaded from NCBI (http://www.ncbi.nlm.nih.gov). Among these sequences, 252 cleavage sites have been experimentally determined. These sequences are scanned using a sliding window with size k to generate about 50,000 k-mer sub-sequences (for short, k-mers). The value of k varies from four to 12 with the gap of two. The bio-basis function proposed in (Thomson et al., 2003) is used to transformation the k-mers to a high-dimensional numerical space on which an inductive programming method is applied for the purpose of deriving a decision tree for decision-making. The process of this transform is referred to as a bio-mapping. The constructed decision trees select about ten out of 50,000 k-mers. This small set of selected k-mers is regarded as a set of decisive templates. By doing so, non-orthogonal decision trees are constructed using the selected templates and the prediction accuracy is significantly improved.

Availability: The program for bio-mapping can be obtained by request to the author.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
C.-T. Chen, E.-W. Yang, H.-J. Hsu, Y.-K. Sun, W.-L. Hsu, and A.-S. Yang
Protease substrate site predictors derived from machine learning on multilevel substrate phage display data
Bioinformatics, December 1, 2008; 24(23): 2691 - 2697.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.