Bioinformatics Advance Access published online on February 15, 2006
Bioinformatics, doi:10.1093/bioinformatics/btl051
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Dept of Computer Science, University of Wales Aberystwyth, SY23 3DB, UK
* To whom correspondence should be addressed.
Motivation: The genome of Arabidopsis thaliana, which has the best understood plant genome, still has approximately one third of its genes with no functional annotation at all from either MIPS or TAIR. We have applied our Data Mining Prediction (DMP) method to the problem of predicting the functional classes of these protein sequences. This method is based on using a hybrid machine-learning/data-mining method to identify patterns in the bioinformatic data about sequences that are predictive of function. We use data about sequence, predicted secondary structure, predicted structural domain, InterPro patterns, sequence similarity profile, and expressions data. Results: We predicted the functional class of a high percentage of the Arabidopsis genes with currently unknown function. These predictions are interpretable and have good test accuracies. We describe in detail seven of the rules produced. Availability: Rulesets available at http://www.aber.ac.uk/compsci/Research/bio/dss/arabpreds/ and predictions available at http://www.genepredictions.org/.
Received August 19, 2005
Revised January 9, 2006
Accepted February 7, 2006
Article
Functional bioinformatics for Arabidopsis thaliana
A. Clare 1 *,
A. Karwath 2,
and
R. D. King 1
2 Institute for Computer Science, Albert-Ludwigs-University Freiburg, D-79110 Freiburg, Germany
A. Clare, E-mail: afc{at}aber.ac.uk
![]()
Abstract
Associate Editor: Nikolaus Rajewsky
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
I. V. Tetko, I. V. Rodchenkov, M. C. Walter, T. Rattei, and H.-W. Mewes Beyond the 'best' match: machine learning annotation of protein sequences by integration of different sources of information Bioinformatics, March 1, 2008; 24(5): 621 - 628. [Abstract] [Full Text] [PDF] |
||||
