Bioinformatics Advance Access published online on March 6, 2007
Bioinformatics, doi:10.1093/bioinformatics/btm078
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comparative Annotation of Viral Genomes with Non-Conserved Gene Structure
Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG, United Kingdom
*To whom correspondence should be addressed, Ms. Saskia de Groot E-mail: saskia.degroot{at}chch.ox.ac.uk
| Abstract |
|---|
Motivation: Detecting genes in viral genomes is a complex task. Due to the biological necessity of them being constrained in length, RNA viruses in particular tend to code in overlapping reading frames. Since one amino acid is encoded by a triplet of nucleic acids, up to three genes may be coded for simultaneously in one direction. Conventional HMM based gene finding algorithms may typically find it difficult to identify multiple coding regions, since in general their topologies do not allow for the presence of overlapping or nested genes. Comparative methods have therefore been restricted to likelihood ratio tests on potential regions as to being double or single coding, using the fact that the constrictions forced upon multiple-coding nucleotides will result in atypical sequence evolution. Exploiting these same constraints, we present a hidden Markov model based gene-finding program, which allows for coding in unidirectional nested and overlapping reading frames, to annotate two homologous aligned viral genomes. Our method does not insist on conserved gene structure between the two sequences, thus making it applicable for the pairwise comparison of more distantly related sequences.
Results: We apply our method to 15 pairwise alignments of six different HIV2 genomes. Given sufficient evolutionary distance between the two sequences, we achieve sensitivity of about 8489% and specificity of about 9799.9%. We additionally annotate three pairwise alignments of the more distantly related HIV1 and HIV2, as well as of two different Hepatitis Viruses, attaining results of
87% sensitivity and
98.5% specificity. We subsequently incorporate prior knowledge by knowing the gene structure of one sequence and annotating the other conditional on it. Boosting accuracy close to perfect we demonstrate that conservation of gene structure on top of nucleotide sequence is a valuable source of information, especially in distantly related genomes.
Availability: The Java code is available from the authors.
Associate Editor: Prof. Thomas Lengauer
Received on August 15, 2006; revised on February 27, 2007; accepted on February 27, 2007
This article has been cited by other articles:
![]() |
S. McCauley, S. de Groot, T. Mailund, and J. Hein Annotation of selection strengths in viral genomes Bioinformatics, November 15, 2007; 23(22): 2978 - 2986. [Abstract] [Full Text] [PDF] |
||||
