Bioinformatics Advance Access originally published online on August 2, 2005
Bioinformatics 2005 21(18):3596-3603; doi:10.1093/bioinformatics/bti609
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
JIGSAW: integration of multiple sources of evidence for gene prediction
1Center for Bioinformatics and Computational Biology, University of Maryland Institute for Advanced Computer Studies, University of Maryland College Park, MD 20742, USA
2Department of Computer Science, University of Maryland Institute for Advanced Computer Studies, University of Maryland College Park, MD 20742, USA
3Department of Computer Science, Johns Hopkins University 3400 N. Charles Street, Baltimore, MD 21218, USA
*To whom correspondence should be addressed.
Motivation: Computational gene finding systems play an important role in finding new human genes, although no systems are yet accurate enough to predict all or even most protein-coding regions perfectly. Ab initio programs can be augmented by evidence such as expression data or protein sequence homology, which improves their performance. The amount of such evidence continues to grow, but computational methods continue to have difficulty predicting genes when the evidence is conflicting or incomplete. Genome annotation pipelines collect a variety of types of evidence about gene structure and synthesize the results, which can then be refined further through manual, expert curation of gene models.
Results: JIGSAW is a new gene finding system designed to automate the process of predicting gene structure from multiple sources of evidence, with results that often match the performance of human curators. JIGSAW computes the relative weight of different lines of evidence using statistics generated from a training set, and then combines the evidence using dynamic programming. Our results show that JIGSAW's performance is superior to ab initio gene finding methods and to other pipelines such as Ensembl. Even without evidence from alignment to known genes, JIGSAW can substantially improve gene prediction accuracy as compared with existing methods.
Availability: JIGSAW is available as an open source software package at http://cbcb.umd.edu/software/jigsaw
Contact: jeallen{at}umiacs.umd.edu
Received on June 20, 2005; revised on July 28, 2005; accepted on July 29, 2005
This article has been cited by other articles:
![]() |
Q. Liu, A. J. Mackey, D. S. Roos, and F. C. N. Pereira Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction Bioinformatics, March 1, 2008; 24(5): 597 - 605. [Abstract] [Full Text] [PDF] |
||||
![]() |
Genome Information Integration Project And H-Invit The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts Nucleic Acids Res., January 11, 2008; 36(suppl_1): D793 - D799. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Bergman and H. Quesneville Discovering and detecting transposable elements in genome sequences Brief Bioinform, November 1, 2007; 8(6): 382 - 392. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. DeCaprio, J. P. Vinson, M. D. Pearson, P. Montgomery, M. Doherty, and J. E. Galagan Conrad: Gene prediction using conditional random fields Genome Res., September 1, 2007; 17(9): 1389 - 1398. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Moretti, F. Armougom, I. M. Wallace, D. G. Higgins, C. V. Jongeneel, and C. Notredame The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods Nucleic Acids Res., July 13, 2007; 35(suppl_2): W645 - W648. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Xia, J. Bi, and Y. Li Identification of alternative 5'/3' splice sites based on the mechanism of splice site competition Nucleic Acids Res., December 4, 2006; 34(21): 6305 - 6313. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Pierstorff, C. M. Bergman, and T. Wiehe Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA Bioinformatics, December 1, 2006; 22(23): 2858 - 2864. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Hsieh, C. Y. Lin, N. H. Liu, W. Y. Chow, and C. Y. Tang GeneAlign: a coding exon prediction tool based on phylogenetical comparisons. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W280 - W284. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. M. Wallace, O. O'Sullivan, D. G. Higgins, and C. Notredame M-Coffee: combining multiple sequence alignment methods with T-Coffee Nucleic Acids Res., March 23, 2006; 34(6): 1692 - 1699. [Abstract] [Full Text] [PDF] |
||||



