Skip Navigation


Bioinformatics Advance Access originally published online on July 26, 2005
Bioinformatics 2005 21(18):3667-3668; doi:10.1093/bioinformatics/bti598
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/18/3667    most recent
bti598v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Claesson, M. J.
Right arrow Articles by van Sinderen, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Claesson, M. J.
Right arrow Articles by van Sinderen, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

BlastXtract—a new way of exploring translated searches

Marcus J. Claesson * and Douwe van Sinderen

Alimentary Pharmabiotic Centre and Department of Microbiology, National University of Ireland Cork, Ireland

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 IMPLEMENTATION AND METHODS
 REFERENCES
 

Summary: Searches of translated, unannotated genomic DNA sequences against protein databases is a useful early-stage method for discovering protein homologues encoded by the sequence, but generates huge amounts of output data that quickly become impregnable. BlastXtract is a web-based tool for managing and visualizing results from large translated BLAST and FastA searches. It combines the speed and storage benefits of relational database management systems with an easy-to-use graphical navigation map, and greatly facilitates the early exploration of genomic sequence.

Availability: BlastXtract can be downloaded from http://bioinfo.ucc.ie/blastxtract/

Contact: mclaesson{at}bioinfo.ucc.ie


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 IMPLEMENTATION AND METHODS
 REFERENCES
 
The ever increasing amount of genomic data being produced brings the need for tools that can easily manage and visualize results from sequence similarity programs. The most widely used tool for searching genomic sequence for homologues is BLAST (Altschul et al., 1997), and performing translated searches, such as BLASTX, against protein databases can predict the presence of many putative genes (Robison et al., 1994). This is a relatively quick and simple method to get an overview of the contents of a genome, without running gene prediction programs and subsequent sequence analysis of putative proteins. However, searches of large genomic query sequences generate vast amounts of output data, which quickly become unmanageable and impossible to browse through. Hence, a need exists to devise a bioinformatics tool that reliably stores all produced data, and is also capable of performing fast searches and orderly retrieval of particular results, visualized by a user-friendly graphical interface.

Another important issue during assembly and early annotation of genomic DNA sequences is to identify sequencing and assembly errors, which are frequent in particularly low-quality or draft genome sequences and can cause frameshifts. Translated BLASTX searches produce two high scoring pair (HSP) alignments around a frameshift site and these can be visualized graphically to aid detection.

Therefore, in order to combine the benefits of an easy-to-use graphical interface with the speed and high storage capacity of relational database management systems (RDBMS), BlastXtract was developed. This novel web-based tool allows translated BLAST results to be uploaded into a database, where it can be further queried and visualized through an intuitive and clickable navigation map.

In addition to managing BLAST outputs, BlastXtract is also able to include translated results from the FastA program (Pearson and Lipman, 1988), which utilizes a slower but more sensitive pairwise alignment algorithm. Unlike BLAST FastA produces only one alignment per hit and frameshifts are consequently colour coded to aid detection. The combination of displaying BLAST and FastA searches with their alignments side-by-side is more informative and increases the likelihood of making correct gene predictions.


    IMPLEMENTATION AND METHODS
 TOP
 Abstract
 INTRODUCTION
 IMPLEMENTATION AND METHODS
 REFERENCES
 
A new BlastXtract session commences with the user uploading a standard output file of either BLASTX or FastX/FastY results into the database. The choice of parameters of the initial search is left to the user, but for a high level of detail and statistical significance it is recommended to allow as many relevant hits as possible and to divide longer query sequences into smaller ones. In the case of the latter, BlastXtract's query offset function allows the pieces to be ‘stitched’ back together to mimic a search of the full sequence. Once the result file has been uploaded, with an optional data description, into a database table it can be browsed and explored further. In addition to the translated DNA-to-protein searches, DNA-to-DNA output data can also be processed. However, the translated searches are more useful for annotation purposes, since protein sequences have a higher degree of conservation compared with DNA and their database entries usually are more informative.

The user can choose to look at all possible hits within a specified query sequence range or limit the search for hits with certain words or accession numbers in its description. Hits that overlap can be filtered out, which is a very useful way to get an overview of only the best hits for each position. Also E-value thresholds can be set. The way to display the hits can be either graphical or non-graphical. The non-graphical display shows all the values of the HSPs in a table and gives the option of showing the full protein alignment for each case. The accession numbers in the hit description are hyperlinked to the Sanger SRS and NCBI web server. Every HSP also has two bar graphs which illustrate the relative positions within the total query sequence and the chosen range. They also indicate where in the sequence the start and stop would be if the alignment was complete. The graphical display visualizes an overview of the requested hits in the specified sequence range and shows the values of every hit when scrolled over. HSPs that belong to the same hits are intertwined with dotted lines. This view also works as a navigation map where more detailed information can be obtained as in the non-graphical display, by clicking on the hits (Fig. 1). A colour coding scheme indicates in which frame the HSPs are found and how high a score it has.



View larger version (40K):
[in this window]
[in a new window]
 
Fig. 1 Screenshot of BlastXtract browsing function with detailed view of one hit and the clickable navigation map below. The results are generated from a BLASTX search of B.breve UCC2003 draft sequence against the UniProt database. The frameshift around the two HSPs in the first 1.5 kb is clearly visualized.

 
BlastXtract has been an important tool in the annotation process of two prokaryotic genomes, Lactobacillus salivarius UCC118 (low-GC content and high sequence read coverage) and Bifidobacterium breve UCC2003 (high-GC content and low coverage), where the detection of ~400 frameshifts was greatly facilitated.

BlastXtract was written in Perl and uses Bioperl modules for parsing the output data and producing the graphical objects. The web interface runs under Apache web server and is built using the Perl CGI module and JavaScript. The Perl DBI module enables communication with a RDBMS, which can be either MySQL or PostgreSQL. On the server side, BlastXtract needs to be installed on a Linux/UNIX system with the required Perl modules and database systems, but from the client side only a standard web browser is needed.


    Acknowledgments
 
We thank Rory Mullane for his appreciated help with the web design, and to the UCC Bioinformatics laboratory for their supportive suggestions. This work was supported by a Food Institutional Research Measure grant (01/R&D/C/159) through the Department of Agriculture and Food under the National Development Plan 2000–2006 and the Science Foundation Ireland funded Alimentary Pharmabiotic Centre.

Conflict of Interest: none declared.

Received on April 28, 2005; revised on June 28, 2005; accepted on July 24, 2005

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 IMPLEMENTATION AND METHODS
 REFERENCES
 

    Altschul, S., et al. (1997) Gapped BLAST and PSI_BLAST: a new generation of protein database programs. Nucleic Acids Res., 25, 2289–3402.

    Pearson, W.R. and Lipman, D.J. (1988) Improved tools for biological sequencecomparison. Proc. Natl Acad. Sci. USA, 4, 2444–2448.

    Robison, K., et al. (1994) Large scale bacterial gene discovery by similarity search. Nat. Genet., 7, 205–214[CrossRef][ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/18/3667    most recent
bti598v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Claesson, M. J.
Right arrow Articles by van Sinderen, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Claesson, M. J.
Right arrow Articles by van Sinderen, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?