Skip Navigation


Bioinformatics Advance Access originally published online on June 22, 2007
Bioinformatics 2007 23(17):2334-2336; doi:10.1093/bioinformatics/btm331
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/17/2334    most recent
btm331v2
btm331v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Deng, W.
Right arrow Articles by Mullins, J. I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Deng, W.
Right arrow Articles by Mullins, J. I.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

ViroBLAST: a stand-alone BLAST web server for flexible queries of multiple databases and user's datasets

Wenjie Deng , David C. Nickle , Gerald H. Learn , Brandon Maust and James I. Mullins *

Department of Microbiology, University of Washington School of Medicine, Seattle, WA 98195, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: ViroBLAST is a stand-alone BLAST web interface for nucleotide and amino acid sequence similarity searches. It extends the utility of BLAST to query against multiple sequence databases and user sequence datasets, and provides a friendly output to easily parse and navigate BLAST results. ViroBLAST is readily useful for all research areas that require BLAST functions and is available online and as a downloadable archive for independent installation.

Availability: http://indra.mullins.microbiol.washington.edu/blast/viroblast.php

Contact: jmullins{at}u.washington.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The alignment of nucleotide or amino acid sequences has become one of the most important tools of researchers studying genomics or proteomics. Needleman and Wunsch were the first to attempt sequence alignment with an objective computer-programmable criteria (Needleman and Wunsch, 1970). However, as the amount of genetic information expanded, a need for faster algorithms emerged, and the basic local alignment search tool (BLAST) (Altschul et al., 1990) now predominates as the fastest and most widely-used tool. The BLAST algorithm works by using an underlying mutational model to compare segments from each of two sequences and find all high-scoring segment pairs (HSP) whose alignment scores cannot be improved by extension or trimming (Sellers, 1984; Smith and Waterman, 1981). Here, we extend the utility of BLAST by providing a web-based interface to search against not only public sequence databases but also local sequence databases, and a friendly output interface to easily parse and navigate BLAST results. This tool, referred to as ViroBLAST, was created for use in our laboratory to facilitate verification of the origins of human immunodeficiency virus (HIV) sequences derived from various experiments (Learn et al., 1996). The ViroBLAST application is also available for download and installation and customization for any variety of specific projects.

The National Center for Biotechnology Information (NCBI) Blast server is widely used for sequence similarity searches (McGinnis and Madden, 2004), and it suits the general purpose of searches against the public sequence databases curated by the Center. The stand-alone executable blast and wwwblast from the NCBI BLAST site (ftp://ftp.ncbi.nih.gov/blast/executables/) provide easy ways for a user to perform blast searches via command line or a web server. However, stand-alone blast and wwwblast require significant effort for use in specialized research. The stand-alone version requires each user to install and configure the program and customize the databases; wwwblast provides a web interface but fails to blast against multiple databases and does not allow users to upload their own sequence set to blast against. In our research area, Los Alamos National Laboratory (LANL) provides the HIV-BLAST server (http://www.hiv.lanl.gov/content/hiv-db/BASIC_BLAST/basic_blast.html) to blast against HIV sequences in their database. In a recent implementation, HIV-BLAST at LANL has begun to provide users the capability to upload their sequences as a database to be queried. However, it is limited in scope and can blast only one query sequence at a time. The BioAfrica Blast server (http://bioafrica.mrc.ac.za/blast/index.html) allows users to upload sequences to be queried. However, neither Blast server (LANL or BioAfrica) allows users to select and query multiple databases simultaneously. Finally, none are able to provide an organized interface to easily parse and navigate BLAST output for further analysis, other than in rows of BLAST results. We therefore identified a need for a single interface that would allow users to perform batch sequence searches against public, local and user-customized sequence databases, and to easily parse and navigate BLAST output, leading us to develop ViroBLAST.


    2 IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
ViroBLAST offers an interface to choose from a user-definable set of target databases including user uploaded sequence dataset, expands the query functionality provided by existing BLAST implementations, and provides a friendly and organized BLAST output interface. ViroBLAST is available in two, freely accessible forms. It can be downloaded and installed as a web server, and it is accessible as a web server at http://indra.mullins.microbiol.washington.edu/blast/viroblast.php (Fig. 1). Users may input their query sequences directly by pasting them into the query box, or by uploading sequences as FASTA files from a local computer. Users can then select which sequence databases to search, including local databases, user-uploaded datasets in FASTA format, downloaded public databases from NCBI or other sites, or any combination of the above options. ViroBLAST provides advanced users the option to vary BLAST parameters to glean more specific information. It also provides the option for a user to receive results via email, a useful feature for large, time-consuming batch searches.


Figure 1
View larger version (35K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Features and options in the ViroBLAST browser. Users can do batch sequence searches against multiple databases including public or local sequence databases and user's own sequence datasets. Advanced searches are possible via configurable BLAST parameters.

 
The results of the ViroBLAST sequence alignment are presented in a summary table with query sequence name, subject sequence name (linked to the GenBank sequence report, if applicable), subject source database, bit score (linked to the pair-wise alignment result), identity percentage and E-value (http://www.ncbi.nlm.nih.gov/blast/blast_FAQs.shtml#Expect) if the user chooses the default pair-wise output (Fig. 2). The summary table extracts essential information from the BLAST result to create an extensive picture of the results for further analysis. This has proved very useful for analyses such as sequence contamination checking. From this table, users can reparse and filter results by entering a threshold identity percentage or bit score, and select sequences to download. Users can also directly access the pair-wise alignment output. Large query results can be difficult to handle due to the size of the output. Therefore, ViroBLAST parses the result file into smaller, more manageable files that can more easily be downloaded and navigated.


Figure 2
View larger version (49K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. The summarized output interface of ViroBLAST search. Users can parse and filter the results, download aligned sequences and fetch a modified BLAST pair-wise alignment output that is organized to easily download and navigate.

 
ViroBLAST runs on Apache web server and MySQL database server. It uses PHP and Perl programming languages (http://www.php.net, http://www.perl.com) and applies the standalone blastall program downloaded from the NCBI (ftp://ftp.ncbi.nih.gov/blast). For our specific uses, public sequence databases are curated and provided from NCBI and LANL. The nucleotide sequence databases include viral and HIV GenBank databases (NCBI; http://www.ncbi.nlm.nih.gov/Genbank/), the non-redundant vector database (NCBI; ftp://ftp.ncbi.nih.gov/pub/UniVec), and the highly curated LANL-HIV sequence databases (LANL; http://www.hiv.lanl.gov/content/hiv-db). The protein databases consist of non-redundant protein database, SwissProt protein database, RefSeq protein database (NCBI; ftp://ftp.ncbi.hih.gov/blast/db/), HIV and viral protein databases (NCBI; http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=protein). Users can also upload their own sequence datasets to query against. Uploaded sequence data are stored for the duration of the query in a secured directory not accessible via the web server, and then automatically deleted. The resulting alignments, which may contain uploaded sequences, remain accessible on the server for 24 h via a randomized URL known to the user who started the query.

For maximum utility in other research fields, we provide a stand-alone ViroBLAST package that has been tested on Linux, Mac OS X and Solaris. Users can download the package from the ViroBLAST website, install it on their own computers, and configure and customize the sequence databases they need. Users can then manage their own ViroBLAST web server tailored to meet their specific research purpose.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We thank Mark Jensen, Yi Liu, Geoffrey Gottlieb and Tulio de Oliveira for valuable comments and suggestions. This work was supported by funds from Advanced Technology Initiative in Infectious Diseases at University of Washington and US Public Health Service grants to J.I.M. (AI047734, AI57005, AI058894) as well as the University of Washington Center for AIDS Research (AI27757).

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Thomas Lengauer

Received on April 7, 2007; revised on June 7, 2007; accepted on June 16, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol (1990) 215:403–410.[CrossRef][Web of Science][Medline]

    Learn GH Jr, et al. Maintaining the integrity of human immunodeficiency virus sequence databases. J. Virol (1996) 70:5720–5730.[Abstract/Free Full Text]

    McGinnis S, Madden TL. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res (2004) 32:W20–W25.[Abstract/Free Full Text]

    Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol (1970) 48:443–453.[CrossRef][Web of Science][Medline]

    Sellers PH. Pattern recognition in genetic sequences by mismatch density. Bull. Math. Biol (1984) 46:501–514.[Web of Science]

    Smith TF, Waterman MS. Identification of common molecular subsequences. J. Mol. Biol (1981) 147:195–197.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
J. Ramana and D. Gupta
ProtVirDB: a database of protozoan virulent proteins
Bioinformatics, June 15, 2009; 25(12): 1568 - 1569.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Xu, J. Wu, J. Xiao, Y. Tan, Q. Bao, F. Zhao, and X. Li
PlasmoGF: an integrated system for comparative genomics and phylogenetic analysis of Plasmodium gene families
Bioinformatics, May 1, 2008; 24(9): 1217 - 1220.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/17/2334    most recent
btm331v2
btm331v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Deng, W.
Right arrow Articles by Mullins, J. I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Deng, W.
Right arrow Articles by Mullins, J. I.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?