Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Cawley, S. L.
Right arrow Articles by Pachter, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Cawley, S. L.
Right arrow Articles by Pachter, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 19 Suppl. 2 2003
pages ii36-ii41
© 2003 Oxford University Press

HMM sampling and applications to gene finding and alternative splicing

Simon L. Cawley 1 and Lior Pachter 2,*

1 Affymetrix, 6550 Vallejo St Suite 100, Emeryville, CA 94608, USA
2 Department of Mathematics, U.C. Berkeley, CA 94720, USA

Received on March 17, 2003 ; accepted on June 9, 2003

The standard method of applying hidden Markov models to biological problems is to find a Viterbi (maximal weight) path through the HMM graph. The Viterbi algorithm reduces the problem of finding the most likely hidden state sequence that explains given observations, to a dynamic programming problem for corresponding directed acyclic graphs. For example, in the gene finding application, the HMM is used to find the most likely underlying gene structure given a DNA sequence. In this note we discuss the applications of sampling methods for HMMs. The standard sampling algorithm for HMMs is a variant of the common forward-backward and backtrack algorithms, and has already been applied in the context of Gibbs sampling methods. Nevetheless, the practice of sampling state paths from HMMs does not seem to have been widely adopted, and important applications have been overlooked. We show how sampling can be used for finding alternative splicings for genes, including alternative splicings that are conserved between genes from related organisms. We also show how sampling from the posterior distribution is a natural way to compute probabilities for predicted exons and gene structures being correct under the assumed model. Finally, we describe a new memory efficient sampling algorithm for certain classes of HMMs which provides a practical sampling alternative to the Hirschberg algorithm for optimal alignment. The ideas presented have applications not only to gene finding and HMMs but more generally to stochastic context free grammars and RNA structure prediction.

Key words: suboptimal parses, sampling, hidden Markov model, conserved alternative splicing

Contact: lpachter{at}math.berkeley.edu

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
W. Boomsma, K. V. Mardia, C. C. Taylor, J. Ferkinghoff-Borg, A. Krogh, and T. Hamelryck
A generative, probabilistic model of local protein structure
PNAS, July 1, 2008; 105(26): 8932 - 8937.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Stanke, M. Diekhans, R. Baertsch, and D. Haussler
Using native and syntenically mapped cDNA alignments to improve de novo gene finding
Bioinformatics, March 1, 2008; 24(5): 637 - 644.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Stanke, O. Keller, I. Gunduz, A. Hayes, S. Waack, and B. Morgenstern
AUGUSTUS: ab initio prediction of alternative transcripts.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W435 - W439.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Agrawal and G. D. Stormo
Using mRNAs lengths to accurately predict the alternatively spliced gene products in Caenorhabditis elegans
Bioinformatics, May 15, 2006; 22(10): 1239 - 1244.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al.
Machine learning in bioinformatics
Brief Bioinform, March 1, 2006; 7(1): 86 - 112.
[Abstract] [Full Text] [PDF]


Home page
DNA ResHome page
S.-J. Noh, K. Lee, H. Paik, and C.-G. Hur
TISA: Tissue-specific Alternative Splicing in Human and Mouse Genes
DNA Res, January 1, 2006; 13(5): 229 - 243.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
C. Dewey, J. Q. Wu, S. Cawley, M. Alexandersson, R. Gibbs, and L. Pachter
Accurate Identification of Novel Human Genes Through Simultaneous Gene Prediction in Human, Mouse, and Rat
Genome Res., April 1, 2004; 14(4): 661 - 664.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.