Bioinformatics 20(Suppl. 1) © Oxford University Press 2004; all rights reserved.
Splice site identification by idlBNs
Grup de Recerca en Informàtica Biomèdica, Institut Municipal d'Investigació Mèdica, Universitat Pompeu Fabra, Centre de Regulació Genòmica, Psg. Marítim 3749, 08003 Barcelona, Spain
Received on January 15, 2004; accepted on March 1, 2004
Motivation: Computational identification of functional sites in nucleotide sequences is at the core of many algorithms for the analysis of genomic data. This identification is based on the statistical parameters estimated from a training set. Often, because of the huge number of parameters, it is difficult to obtain consistent estimators. To simplify the estimation problem, one imposes independent assumptions between the nucleotides along the site. However, this can potentially limit the minimum value of the estimation error.
Results: In this paper, we introduce a novel method in the context of identifying functional sites, that finds a reasonable set of independence assumptions supported by the data, among the nucleotides, and uses it to perform the identification of the sites by their likelihood ratio. More importantly, in many practical situations it is capable of improving its performance as the training sample size increases. We apply the method to the identification of splice sites, and further evaluate its effect within the context of exon and gene prediction.
Supplementary information: The datasets built specifically for this paper as well as the full set of results are available at http://genome.imim.es/datasets/splidlbns2004
Contact: rcastelo{at}imim.es
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Grau, I. Ben-Gal, S. Posch, and I. Grosse VOMBAT: prediction of transcription factor binding sites using variable order Bayesian trees. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W529 - W533. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al. Machine learning in bioinformatics Brief Bioinform, March 1, 2006; 7(1): 86 - 112. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Ben-Gal, A. Shani, A. Gohr, J. Grau, S. Arviv, A. Shmilovici, S. Posch, and I. Grosse Identification of transcription factor binding sites with variable-order Bayesian networks Bioinformatics, June 1, 2005; 21(11): 2657 - 2666. [Abstract] [Full Text] [PDF] |
||||


