Bioinformatics Advance Access originally published online on January 29, 2004
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics 20(5) © Oxford University Press 2004; all rights reserved.
Using functional and organizational information to improve genome-wide computational prediction of transcription units on pathway-genome databases
Bioinformatics Research Group, Artificial Intelligence Center, SRI International, 333 Ravenswood Avenue, Menlo Park, CA 950151, USA
Received on August 1, 2003
; revised on October 7, 2003
; accepted on October 9, 2003
Advance Access Publication January 29, 2004
Motivation: The prediction of transcription units (TUs, which are similar to operons) is an important problem that has been tackled using many different approaches. The availability of complete microbial genomes has made genome-wide TU predictions possible. Pathway-genome databases (PGDBs) add metabolic and other organizational (i.e. protein complexes) information to the annotated genome, and are able to capture TU organization information. These characteristics of PGDBs make them a suitable framework for the development and implementation of TU predictors.
Results: We implemented a TU predictor that uses only intergenic distance and functional classification of genes to predict TU boundaries, and applied it to EcoCyc, our PGDB of Escherichia coli. To this original predictor, we added information on metabolic pathways, protein complexes and transporters, all readily available in EcoCyc, in order to generate an enhanced predictor. The enhanced predictor correctly predicted 80% of the known E.coli TUs (69% of the known operons), a moderate improvement over the original predictor's performance (75% of TUs and 65% of operons correctly predicted), demonstrating that the extra information available in the PGDB does indeed improve prediction performance. Performance of this E.coli-based predictor on a genome other than that of E.coli was tested on BsubCyc, our computationally generated PGDB for Bacillus subtilis, for which a set of 100 known operons is available. Prediction accuracy decreased substantially (46% of the known operons correctly predicted). This was due in part to missing information in BsubCyc, which prevented full use of the predictor's features. The augmented predictor has been implemented as part of our Pathway Tools software suite, and can be used to populate a PGDB with predicted TUs.
Availability: The TU predictor is included in version 7.0 of the Pathway Tools software suite. Pathway Tools 7.0 is available free of charge to academic institutions and for a fee to commercial enterprises. It runs on Sun Solaris 8, Linux and Windows. TUs predicted on the Caulobacter crescentus and Mycobacterium tuberculosis (H37Rv) genomes are available in our CauloCyc and MtbrvCyc databases, available at the BioCyc web site (http://biocyc.org). To obtain version 7.0 of Pathway Tools, follow the directions in our web site, http://biocyc.org/download.shtml.
Contact: pkarp{at}ai.sri.com
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R. W. W. Brouwer, O. P. Kuipers, and S. A. F. T. van Hijum The relative value of operon predictions Brief Bioinform, September 1, 2008; 9(5): 367 - 375. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. J. Lee, I. Paulsen, and P. Karp Annotation-based inference of transporter function Bioinformatics, July 1, 2008; 24(13): i259 - i267. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Caspi, H. Foerster, C. A. Fulcher, P. Kaipa, M. Krummenacker, M. Latendresse, S. Paley, S. Y. Rhee, A. G. Shearer, C. Tissier, et al. The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases Nucleic Acids Res., January 11, 2008; 36(suppl_1): D623 - D631. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. D. Karp, I. M. Keseler, A. Shearer, M. Latendresse, M. Krummenacker, S. M. Paley, I. Paulsen, J. Collado-Vides, S. Gama-Castro, M. Peralta-Gil, et al. Multidimensional annotation of the Escherichia coli K-12 genome Nucleic Acids Res., December 3, 2007; 35(22): 7577 - 7590. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Roback, J. Beard, D. Baumann, C. Gille, K. Henry, S. Krohn, H. Wiste, M.I. Voskuil, C. Rainville, and R. Rutherford A predicted operon map for Mycobacterium tuberculosis Nucleic Acids Res., August 1, 2007; 35(15): 5085 - 5095. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Dam, V. Olman, K. Harris, Z. Su, and Y. Xu Operon prediction using both genome-specific and general genomic information Nucleic Acids Res., January 12, 2007; 35(1): 288 - 298. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Janga, W. F. Lamboy, A. M. Huerta, and G. Moreno-Hagelsieb The distinctive signatures of promoter regions and operon junctions across prokaryotes Nucleic Acids Res., September 1, 2006; 34(14): 3980 - 3987. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Che, G. Li, F. Mao, H. Wu, and Y. Xu Detecting uber-operons in prokaryotic genomes. Nucleic Acids Res., January 1, 2006; 34(8): 2418 - 2427. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Caspi, H. Foerster, C. A. Fulcher, R. Hopkinson, J. Ingraham, P. Kaipa, M. Krummenacker, S. Paley, J. Pick, S. Y. Rhee, et al. MetaCyc: a multiorganism database of metabolic pathways and enzymes Nucleic Acids Res., January 1, 2006; 34(suppl_1): D511 - D516. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. D. Karp, C. A. Ouzounis, C. Moore-Kochlacs, L. Goldovsky, P. Kaipa, D. Ahren, S. Tsoka, N. Darzentas, V. Kunin, and N. Lopez-Bigas Expansion of the BioCyc collection of pathway/genome databases to 160 genomes Nucleic Acids Res., October 24, 2005; 33(19): 6083 - 6089. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. P. Westover, J. D. Buhler, J. L. Sonnenburg, and J. I. Gordon Operon prediction without a training set Bioinformatics, April 1, 2005; 21(7): 880 - 888. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. M. Keseler, J. Collado-Vides, S. Gama-Castro, J. Ingraham, S. Paley, I. T. Paulsen, M. Peralta-Gil, and P. D. Karp EcoCyc: a comprehensive database resource for Escherichia coli Nucleic Acids Res., January 1, 2005; 33(suppl_1): D334 - D337. [Abstract] [Full Text] [PDF] |
||||


