Gene structure prediction using information on homologous protein sequence
Istituto di Tecnologie Biomediche Avanzate, Consiglio Nazionale Delle Ricerche via Ampere 56, 20131 Milan Italy
1institute of Cytology and Genetics, Russian Academy of Science pr ak Lavrentyeva 10, Novosibirsk 630090, Russia
* To whom correspondence should be addressed. E-mail:milanesi{at}itba.mi.cnr.it
In this paper a new approach for the prediction of protein coding gene structures is described. The principal scheme of prediction is as follows: first, the exons with the best potential are predicted in a sequence with unknown functions and a list of potential amino acid fragments coded by these exons is formed. Second, testing the homology between each amino acid fragment from the list and proteins from the SWISS-PROT database of amino acid sequences. One protein with the best homology is chosen out of all the homologous sequences. Third, reconstruction of the exonintron structure, basing it on its homology with the chosen protein sequences. The method was tested on an independent control set (20 genes). The results were as follows: 21% of real exons were lost and 3% of non-real exons were found. This system can be used to refine the results of gene prediction systems, especially if highly homologous proteins are found in the amino acid sequence database.
Received on June 16, 1995; revised on March 30, 1996; accepted on March 30, 1996
This article has been cited by other articles:
![]() |
A. Lomsadze, V. Ter-Hovhannisyan, Y. O. Chernoff, and M. Borodovsky Gene identification in novel eukaryotic genomes by self-training algorithm Nucleic Acids Res., November 28, 2005; 33(20): 6494 - 6506. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Mathe, M.-F. Sagot, T. Schiex, and P. Rouze Current methods of gene prediction, their strengths and weaknesses Nucleic Acids Res., October 1, 2002; 30(19): 4103 - 4117. [Abstract] [Full Text] [PDF] |
||||
