Bioinformatics Advance Access originally published online on August 2, 2005
Bioinformatics 2005 21(18):3665-3666; doi:10.1093/bioinformatics/bti601
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
WebACTan online companion for the Artemis Comparison Tool

1Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College London London SW7 2AZ, UK
2Department of Infectious Disease Epidemiology, Imperial College London London, W2 1PG, UK
3Pathogen Sequencing Unit, Sanger Institute Cambridge CB10 1SA, UK
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: WebACT is an online resource which enables the rapid provision of simultaneous BLAST comparisons between up to five genomic sequences in a format amenable for visualization with the well-known Artemis Comparison Tool (ACT). Comparisons can be generated on-the-fly using sequences directly retrieved via EMBL database queries, or by entering or uploading user sequences. Furthermore, pre-computed comparisons are available between all publicly available, completed prokaryotic genomes and plasmids currently contained within the Genome Reviews database (372 sequences, representing 175 different species). The system is designed to minimize the volume of downloaded data and maximize performance. Genome sequences, annotation and pre-computed comparisons are stored in a relational database allowing flexible querying based on user-defined sequence regions, from whole genome to a defined region flanking a specified gene. Comparison and sequence files, whether computed online or retrieved from the database of pre-computed genome comparisons, can be viewed online using ACT and are available for download.
Availability: Freely accessible at http://www.webact.org
Contact: admin{at}webact.org
Supplementary information: User guide and worked examples are available at http://www.webact.org/WebACT/docs
| INTRODUCTION |
|---|
|
|
|---|
The Artemis Comparison Tool (ACT) is a graphical DNA sequence comparison viewer allowing the results of a BLASTN or TBLASTX search to be viewed between sequences of interest, while highlighting available annotation (Carver et al., 2005). Presently, the generation of suitable sequence comparison files and their subsequent loading into ACT is the responsibility of the user. ACT requires the input of pre-generated comparison files in either the tab-delimited output format of BLAST (Altschul et al., 1997) or MSPCrunch (Sonnhammer and Durbin, 1994), together with the original sequences and their annotations, in EMBL or GenBank formats. For the uninitiated bench scientist, access to the necessary data, computational resources and the experience to generate these files efficiently is currently a significant obstacle to the usage of ACT.
WebACT is an online resource providing BLAST comparison and sequence files in appropriate formats for ACT, allowing the generation of comparisons based on sequences contained within the EMBL database (Kanz et al., 2005), from user submitted sequences, or selected from a database of pre-computed comparisons. Provision of pre-computed comparisons in this manner results in a significantly faster turn-round of prokaryotic sequence queries. Worked examples demonstrating use of WebACT are available alongside the documentation on the WebACT site.
| PRE-BUILT SEQUENCE COMPARISONS |
|---|
|
|
|---|
Sequences for pre-computed comparisons are sourced from the Genome Reviews database (Kersey et al., 2005). This database was used in preference to the original genome entries in the EMBL/GenBank database as it contains only completed sequences, which have more consistent annotation in a format appropriate to the requirements of ACT.
Pre-computed comparisons were generated using NCBI BLAST, with the results obtained in tab-delimited format. BLAST comparisons were carried out using a word size of nine and soft DUST masking. Each sequence was initially formatted as a BLAST database, which was also chunked into 100 kb segments with a 1 kb overlap for use as a query sequence. This approach avoids some of the problems inherent in running BLAST against long gene-rich sequences (e.g. Schwartz et al., 2000; Korf et al., 2003). An all-against-all set of pairwise BLASTN comparisons (including self-comparisons) was generated in such a manner that each comparison was only carried out once. All analysis results were parsed and stored in the WebACT database.
Up to five sequences can be selected from the database for comparative display. WebACT allows selection of the full-length sequence (i.e. an entire bacterial chromosome, or plasmid), a defined set of base co-ordinates, or a named gene and specified length of flanking sequence in the corresponding genome(s). Gene names can be either manually entered or chosen from a list of known genes present within the annotated genomes. Selection of the genomic region to be displayed can be made by applying the same criteria to each of the sequences, or defining each region individually. The comparison data can be screened on the basis of the BLAST E-value before loading into ACT.
The sequences stored in the WebACT database are compared with those made available by the EBI on a monthly basis, and the sequence, annotation and comparisons are automatically updated as appropriate. Newly released sequences are also incorporated into the database at this time. Sequence records are parsed using Bioperl (Stajich et al., 2002), stripped of features and sequence data and stored in the database as serialized objects. The sequence is stored in 100 kb chunks, while each sequence feature is stored as a serialized object along with its genome co-ordinates, permitting the rapid assembly of a sequence record representing the requested region. Full sequence records are also stored as compressed flat-files to optimize performance when such records are requested.
| EMBL QUERIES AND USER-DEFINEDSEQUENCES |
|---|
|
|
|---|
Comparisons can also be generated on-the-fly from user-defined sequences. These may be uploaded in EMBL, GenBank or FASTA formats, or selected by EMBL accession number. For accession numbers relating to the Contig division of the EMBL database, each constituent record is retrieved and automatically assembled into a record containing the full set of CDS features for the selected sequence. BLAST comparisons between the chosen sequences are then carried out in a pairwise manner. Generation of large sequence comparisons is computationally expensive, consequently completion of the comparisons can be notified via email.
| VISUALIZATION AND DATA DOWNLOAD |
|---|
|
|
|---|
Once a comparison has been generated, ACT can be launched directly from the user's web browser using Java Web Start. All results are retained on the server for 7 days; however, a WebACT session can be downloaded as a single file and retained by the user for use offline. The downloaded file can be reloaded into WebACT at any point, from which the comparison may be viewed without time-consuming regeneration. An additional advantage of using the pre-computed comparisons is that a small file (
2 kb), which defines the user's sequence selections, can be downloaded and it allows WebACT to reconstruct the comparison at a later date, removing the need for the user to download large quantities of data. Sequence data downloaded from the WebACT database can also be reloaded to permit a comparison to be repeated using a different algorithm or set of parameters. | IMPLEMENTATION |
|---|
|
|
|---|
WebACT is a mod_perl application built using the CGI:::Application framework. Sequences, annotation and BLAST hits are stored in a MySQL relational database. Extensive use is made of the Bioperl modules (Stajich et al., 2002) for handling sequence data. User-submitted queries are managed through the Sun Grid Engine 6.0 job scheduler. Pre-computed comparisons were generated using an AMD Opteron based cluster running Red Hat Enterprise Linux 3.0, via Sun Grid Engine. WebACT has been tested using Internet Explorer, Firefox, Opera and Safari browsers running on Windows, Linux and Mac OSX. The only client-side requirements are for a supported JavaScript enabled web browser, and a Java 1.3 or newer installation. In order to launch the ACT application directly from WebACT, Java Web Start is required.
| OUTLOOK |
|---|
|
|
|---|
Plans for future development of WebACT include the addition of a facility to permit a comparison selected from the database to be re-run with user-defined parameters, and the incorporation of BLASTZ (Schwartz et al., 2003) as an additional comparison algorithm.
| Acknowledgments |
|---|
The authors are grateful to the London E-Science Centre (http://www.lesc.imperial.ac.uk) for access to high performance computing resources. This work was supported by the Faculties of Life Sciences and Medicine, Imperial College London and the Wellcome Trust.
Conflict of Interest: none declared.
| Footnotes |
|---|
Present address: Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
Received on May 23, 2005; revised on July 27, 2005; accepted on July 27, 2005
| REFERENCES |
|---|
|
|
|---|
Altschul, S.F., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402
Bioinformatics Carver, T.J., et al. (2005) ACT: The Artemis Comparison Tool. in press.
Kanz, C., et al. (2005) The EMBL Nucleotide Sequence Database. Nucleic Acids Res., 33, D29D33
Kersey, P., et al. (2005) Integr8 and Genome Reviews: integrated views of complete genomes and proteomes. Nucleic Acids Res., 33, 297302.
Korf, I., et al. (2003) BLAST. , Sebastapol O'Reilly and Associates.
Sonnhammer, E.L.L. and Durbin, R. (1994) A workbench for large scale sequence homology analysis. Comput. Appl. Biosci., 10, 301307
Stajich, J.E., et al. (2002) The Bioperl Toolkit: Perl modules for the life sciences. Genome Res., 10, 16111618.
Schwartz, S., et al. (2000) PipMakera web server for aligning two genomic DNA sequences. Genome Res., 10, 577586
Schwartz, S., et al. (2003) Humanmouse alignments with BLASTZ. Genome Res, 13, 103107
This article has been cited by other articles:
![]() |
N. Woodford, A. Carattoli, E. Karisik, A. Underwood, M. J. Ellington, and D. M. Livermore Complete Nucleotide Sequences of Plasmids pEK204, pEK499, and pEK516, Encoding CTX-M Enzymes in Three Major Escherichia coli Lineages from the United Kingdom, All Belonging to the International O25:H4-ST131 Clone Antimicrob. Agents Chemother., October 1, 2009; 53(10): 4472 - 4482. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Cai, R. Thompson, M. F. Budinich, J. R. Broadbent, and J. L. Steele Genome Sequence and Comparative Genome Analysis of Lactobacillus casei: Insights into Their Niche-Associated Evolution Gen Biol Evol, September 23, 2009; 2009(0): 239 - 257. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Peigne, P. Bidet, F. Mahjoub-Messai, C. Plainvert, V. Barbe, C. Medigue, E. Frapy, X. Nassif, E. Denamur, E. Bingen, et al. The Plasmid of Escherichia coli Strain S88 (O45:K1:H7) That Causes Neonatal Meningitis Is Closely Related to Avian Pathogenic E. coli Plasmids and Is Associated with High-Level Bacteremia in a Neonatal Rat Meningitis Model Infect. Immun., June 1, 2009; 77(6): 2272 - 2284. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D.F. Meyer, D. C.G. Silva, C. Yang, K. F. Pedley, C. Zhang, M. van de Mortel, J. H. Hill, R. C. Shoemaker, R. V. Abdelnoor, S. A. Whitham, et al. Identification and Analyses of Candidate Genes for Rpp4-Mediated Resistance to Asian Soybean Rust in Soybean Plant Physiology, May 1, 2009; 150(1): 295 - 307. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Moynihan, J. P. Morrissey, E. R. Coppoolse, W. J. Stiekema, F. O'Gara, and E. F. Boyd Evolutionary History of the phl Gene Cluster in the Plant-Associated Bacterium Pseudomonas fluorescens Appl. Envir. Microbiol., April 1, 2009; 75(7): 2122 - 2131. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. J. Cooke, D. J. Brown, M. Fookes, D. Pickard, A. Ivens, J. Wain, M. Roberts, R. A. Kingsley, N. R. Thomson, and G. Dougan Characterization of the Genomes of a Diverse Collection of Salmonella enterica Serovar Typhimurium Definitive Phage Type 104 J. Bacteriol., December 15, 2008; 190(24): 8155 - 8162. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Mavroidi, D. M. Aanensen, D. Godoy, I. C. Skovsted, M. S. Kaltoft, P. R. Reeves, S. D. Bentley, and B. G. Spratt Genetic Relatedness of the Streptococcus pneumoniae Capsular Biosynthetic Loci J. Bacteriol., November 1, 2007; 189(21): 7841 - 7855. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Aanensen, A. Mavroidi, S. D. Bentley, P. R. Reeves, and B. G. Spratt Predicted Functions and Linkage Specificities of the Products of the Streptococcus pneumoniae Capsular Biosynthetic Loci J. Bacteriol., November 1, 2007; 189(21): 7856 - 7876. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. V. Cohen, J. D. Oliver, A. DePaola, E. J. Feil, and E. Fidelma Boyd Emergence of a Virulent Clade of Vibrio vulnificus and Correlation with the Presence of a 33-Kilobase Genomic Island Appl. Envir. Microbiol., September 1, 2007; 73(17): 5553 - 5565. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Parks and J. E. Peters Transposon Tn7 Is Widespread in Diverse Bacteria and Forms Genomic Islands J. Bacteriol., March 1, 2007; 189(5): 2170 - 2173. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Quirke, F. J. Reen, M. J. Claesson, and E. F. Boyd Genomic island identification in Vibrio vulnificus reveals significant genome plasticity in this human pathogen Bioinformatics, April 15, 2006; 22(8): 905 - 910. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Pritchard, J. A. White, P. R.J. Birch, and I. K. Toth GenomeDiagram: a python package for the visualization of large-scale genomic data Bioinformatics, March 1, 2006; 22(5): 616 - 617. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






