Bioinformatics Vol. 16 no. 5 2000
Pages 412-424
© 2000 Oxford University Press
Assessing the accuracy of prediction algorithms for classification: an overview
1 Department of Information and Computer
Science, University of California, Irvine, CA 92697, USA
2 Center for Biological Sequence Analysis,
The Technical University of Denmark, DK-2800 Lyngby, Denmark
3 Net-ID, Inc., San Francisco, CA 94107, USA
Received on October 11, 1999
; revised on February 18, 2000
; accepted on February 23, 2000
We provide a unified overview of methods that currently are widely used to assess the accuracy of prediction algorithms, from raw percentages, quadratic error measures and other distances, and correlation coefficients, and to information theoretic measures such as relative entropy and mutual information. We briefly discuss the advantages and disadvantages of each approach. For classification tasks, we derive new learning algorithms for the design of prediction systems by directly optimising the correlation coefficient. We observe and prove several results relating sensitivity and specificity of optimal systems. While the principles are general, we illustrate the applicability on specific problems such as protein secondary structure and signal peptide prediction.
Contact: pfbaldi{at}ics.uci.edu
4 Also at the Department of Biological Sciences, University of California, Irvine, USA, to whom all correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
D. Li, W. Liu, Z. Liu, J. Wang, Q. Liu, Y. Zhu, and F. He PRINCESS, a Protein Interaction Confidence Evaluation System with Multiple Data Sources Mol. Cell. Proteomics, June 1, 2008; 7(6): 1043 - 1052. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Sonego, A. Kocsor, and S. Pongor ROC analysis: applications to the classification of biological sequences and 3D structures Brief Bioinform, May 1, 2008; 9(3): 198 - 209. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Sperschneider and A. Datta KnotSeeker: Heuristic pseudoknot detection in long RNA sequences RNA, April 1, 2008; 14(4): 630 - 640. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. R. Southey, J. V. Sweedler, and S. L. Rodriguez-Zas Prediction of neuropeptide cleavage sites in insects Bioinformatics, March 15, 2008; 24(6): 815 - 825. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Casadio, P. L. Martelli, and A. Pierleoni The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation Brief Funct Genomic Proteomic, February 18, 2008; (2008) eln003v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Randall, J. Cheng, M. Sweredoski, and P. Baldi TMBpro: secondary structure, {beta}-contact and tertiary structure prediction of transmembrane {beta}-barrel proteins Bioinformatics, February 15, 2008; 24(4): 513 - 520. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Terribilini, J. D. Sander, J.-H. Lee, P. Zaback, R. L. Jernigan, V. Honavar, and D. Dobbs RNABindR: a server for analyzing and predicting RNA-binding sites in proteins Nucleic Acids Res., July 13, 2007; 35(suppl_2): W578 - W584. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-R. Xu, J.-X. Zhang, B.-C. Han, L. Liang, and Z.-L. Ji CytoSVM: an advanced server for identification of cytokine-receptor interactions Nucleic Acids Res., July 13, 2007; 35(suppl_2): W538 - W542. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Y. Ung, H. Li, C. W. Yap, and Y. Z. Chen In Silico Prediction of Pregnane X Receptor Activators by Machine Learning Approache Mol. Pharmacol., January 1, 2007; 71(1): 158 - 168. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. T. P. Kim, K. Yura, and N. Go Amino acid residue doublet propensity in the protein-RNA interface and its application to RNA interface prediction Nucleic Acids Res., December 5, 2006; (2006) gkl819v2. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Capriotti, R. Calabrese, and R. Casadio Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information Bioinformatics, November 15, 2006; 22(22): 2729 - 2734. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Voss Structural analysis of aligned RNAs Nucleic Acids Res., November 14, 2006; 34(19): 5471 - 5481. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Wang, C. Ding, R. F. Meraz, and S. R. Holbrook PSoL: a positive sample only learning algorithm for finding non-coding RNA genes Bioinformatics, November 1, 2006; 22(21): 2590 - 2596. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Fang, S. Fan, X. Zhang, and M. Q. Zhang Predicting methylation status of CpG islands in the human brain Bioinformatics, September 15, 2006; 22(18): 2204 - 2209. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Holen Efficient prediction of siRNAs with siRNArules 1.0: An open-source JAVA approach to siRNA algorithms RNA, September 1, 2006; 12(9): 1620 - 1625. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Zheng, Z. Liu, C. Xue, W. Zhu, K. Chen, X. Luo, and H. Jiang Mutagenic probability estimation of chemical compounds by a novel molecular electrophilicity vector and support vector machine Bioinformatics, September 1, 2006; 22(17): 2099 - 2106. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Terribilini, J.-H. Lee, C. Yan, R. L. Jernigan, V. Honavar, and D. Dobbs Prediction of RNA binding sites in proteins from amino acid sequence RNA, August 1, 2006; 12(8): 1450 - 1462. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Boden and T. L. Bailey Identifying sequence regions undergoing conformational change via predicted continuum secondary structure Bioinformatics, August 1, 2006; 22(15): 1809 - 1814. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Saha and G. P. S. Raghava AlgPred: prediction of allergenic proteins and mapping of IgE epitopes. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W202 - W209. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. X. Weichenberger and M. J. Sippl NQ-Flipper: validation and correction of asparagine/glutamine amide rotamers in protein crystal structures Bioinformatics, June 1, 2006; 22(11): 1397 - 1398. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. C. Wiese and A. Hendriks Comparison of P-RnaPredict and mfold--algorithms for RNA secondary structure prediction Bioinformatics, April 15, 2006; 22(8): 934 - 942. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. BINDEWALD and B. A. SHAPIRO RNA secondary structure prediction from sequence alignments using a network of k-nearest neighbor classifiers. RNA, March 1, 2006; 12(3): 342 - 352. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al. Machine learning in bioinformatics Brief Bioinform, March 1, 2006; 7(1): 86 - 112. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Carrico, F. R. Pinto, C. Simas, S. Nunes, N. G. Sousa, N. Frazao, H. de Lencastre, and J. S. Almeida Assessment of Band-Based Similarity Coefficients for Automatic Type and Subtype Classification of Microbial Isolates Analyzed by Pulsed-Field Gel Electrophoresis J. Clin. Microbiol., November 1, 2005; 43(11): 5483 - 5490. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Rausch, T. Weber, O. Kohlbacher, W. Wohlleben, and D. H. Huson Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs) Nucleic Acids Res., October 12, 2005; 33(18): 5799 - 5808. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Saetrom, R. Sneve, K. I. Kristiansen, O. Snove Jr, T. Grunfeld, T. Rognes, and E. Seeberg Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming Nucleic Acids Res., June 7, 2005; 33(10): 3263 - 3270. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. R. Pinto, L. A. Cowart, Y. A. Hannun, B. Rohrer, and J. S. Almeida Local correlation of expression profiles with gene annotations--proof of concept for a general conciliatory method Bioinformatics, April 1, 2005; 21(7): 1037 - 1045. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Blythe and D. R. Flower Benchmarking B cell epitope prediction: Underperformance of existing methods Protein Sci., January 1, 2005; 14(1): 246 - 248. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Y. Oshchepkov, E. E. Vityaev, D. A. Grigorovich, E. V. Ignatieva, and T. M. Khlebodarova SITECON: a tool for detecting conservative conformational and physicochemical properties in transcription factor binding site alignments and for site recognition Nucleic Acids Res., July 1, 2004; 32(suppl_2): W208 - W212. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. G. Bagos, T. D. Liakopoulos, I. C. Spyropoulos, and S. J. Hamodrakas PRED-TMBB: a web server for predicting the topology of {beta}-barrel outer membrane proteins Nucleic Acids Res., July 1, 2004; 32(suppl_2): W400 - W404. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Y. HAN, C. Z. CAI, S. L. LO, M. C.M. CHUNG, and Y. Z. CHEN Prediction of RNA-binding proteins from primary sequence by a support vector machine approach RNA, March 1, 2004; 10(3): 355 - 368. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. J. Jensen, D. W. Ussery, and S. Brunak Functionality of System Components: Conservation of Protein Function in Protein Feature Space Genome Res., November 1, 2003; 13(11): 2444 - 2449. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.Z. Cai, L.Y. Han, Z.L. Ji, X. Chen, and Y.Z. Chen SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence Nucleic Acids Res., July 1, 2003; 31(13): 3692 - 3697. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. G. Sheffield and J. J. Gavinski Proteomics Methods for Probing Molecular Mechanisms in Signal Transduction J Dairy Sci, July 1, 2003; 86(13_suppl): E115 - 124. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. G. Sheffield and J. J. Gavinski Proteomics methods for probing molecular mechanisms in signal transduction J Anim Sci, March 1, 2003; 81(suppl_3): 48 - 57. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-p. Hung, P. Baldi, and G. W. Hatfield Global Gene Expression Profiling in Escherichia coli K12. THE EFFECTS OF LEUCINE-RESPONSIVE REGULATORY PROTEIN J. Biol. Chem., October 18, 2002; 277(43): 40309 - 40323. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Carter, I. Dubchak, and S. R. Holbrook A computational approach to identify genes for functional RNAs in genomic sequences Nucleic Acids Res., October 1, 2001; 29(19): 3928 - 3938. [Abstract] [Full Text] [PDF] |
||||












