Skip Navigation


Bioinformatics Advance Access originally published online on March 16, 2009
Bioinformatics 2009 25(9):1105-1111; doi:10.1093/bioinformatics/btp120
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
25/9/1105    most recent
btp120v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Trapnell, C.
Right arrow Articles by Salzberg, S. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Trapnell, C.
Right arrow Articles by Salzberg, S. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

TopHat: discovering splice junctions with RNA-Seq

Cole Trapnell 1,*, Lior Pachter 2 and Steven L. Salzberg 1

1Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742 and 2Department of Mathematics, University of California, Berkeley, CA 94720, USA

*To whom correspondence should be addressed.


   Abstract

Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.

Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development.

Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu

Contact: cole{at}cs.umd.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Associate Editor: Ivo Hofacker


Received on October 23, 2008; revised on February 24, 2009; accepted on February 26, 2009

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
M. Brudno, P. Medvedev, J. Stoye, and F. M. De La Vega
A report on the 2009 SIG on short read sequencing and algorithms (Short-SIG)
Bioinformatics, November 1, 2009; 25(21): 2863 - 2864.
[Full Text] [PDF]


Home page
Brief BioinformHome page
D. S. Horner, G. Pavesi, T. Castrignano, P. D. De Meo, S. Liuni, M. Sammeth, E. Picardi, and G. Pesole
Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing
Brief Bioinform, October 27, 2009; (2009) bbp046v1.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
E. R. Mardis and R. K. Wilson
Cancer genome sequencing: a review
Hum. Mol. Genet., October 15, 2009; 18(R2): R163 - R168.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.