Skip Navigation


Bioinformatics Advance Access originally published online on April 6, 2005
Bioinformatics 2005 21(11):2791-2793; doi:10.1093/bioinformatics/bti403
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/11/2791    most recent
bti403v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (8)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Vilella, A. J.
Right arrow Articles by Rozas, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Vilella, A. J.
Right arrow Articles by Rozas, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

VariScan: Analysis of evolutionary patterns from large-scale DNA sequence polymorphism data

Albert J. Vilella , Angel Blanco-Garcia , Stephan Hutter {dagger} and Julio Rozas *

Departament de Genètica, Facultat de Biologia, Universitat de Barcelona Diagonal 645, 08028 Barcelona, Spain

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 IMPLEMENTATION
 REFERENCES
 

Summary: VeriScan is a software package for the analysis of DNA sequence polymorphisms at the whole genome scale. Among other features, the software (1) can conduct many population genetic analyses; (2) incorporates a multiresolution wavelet transform-based method that allows capturing relevant information from DNA polymorphism data; (3) facilitates the visualization of the results in the most commonly used genome browsers.

Availability: The software with documentation is available under the GNU GPL software license from: http://www.ub.es/softevol/variscan

Contact: jrozas{at}ub.edu


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 IMPLEMENTATION
 REFERENCES
 
Analysis of DNA sequence polymorphisms and of single nucleotide polymorphisms (SNPs) are powerful approaches in understanding the evolutionary forces underlying nucleotide variation and in mapping genes of disease. Currently, the detection of Darwinian positive selection (Sabeti et al., 2002; Olson, 2002) is receiving a lot of interest, since, for instance, knowledge of the specific gene or genomic region under selection can help pharmaceutical research in vaccine and drug development, or in vaccination strategies. This detection, nevertheless, is not easy since demographic processes could mimic its footprint. The distinctive signature of natural selection can nonetheless be detected by analysing the spatial distribution of polymorphisms across broad regions of the genome (Sabeti et al., 2002; Quesada et al., 2003) by using coalescent-based methods (Kingman, 1982; Hudson, 1990; Rosenberg and Nordborg, 2002).

The sliding window (SW) method has been extensively used for exploratory DNA polymorphism data analysis (Rozas and Rozas, 1995). Unfortunately, the SW approach has a number of limitations, such as the determination of the appropriate window size or the problem of multiple comparisons, that are critical in genome-wide based analysis. Here, we describe the VariScan software which has been designed for the analysis of DNA sequence polymorphisms at the whole genome scale. Among other features, VariScan incorporates a wavelet transform (WT)-based analysis for capturing relevant information from DNA polymorphism data (Liò, 2003). WT allows obtaining low and high frequency components from signals and therefore, it could be useful in capturing global and local features, such as conserved regions, peaks and valleys of nucleotide diversity, linkage disequilibrium (LD) clusters from DNA polymorphism data. The software has, therefore, the appropriate data handling and analysis capabilities needed for genome-wide resequencing projects, which ultimately could lead to the detection of the imprint of natural selection.


    SYSTEMS AND METHODS
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 IMPLEMENTATION
 REFERENCES
 
VariScan software is written in ANSI C and has been tested on Linux, MacOSX and Win32 platforms. It has been optimized for the high speed processing of large DNA sequence data files (~100 sequences of ~100 Mb each). Indeed, the algorithms have been implemented for a running time of O(n), where n is the length of the DNA sequence. The input files are multiple aligned DNA sequence data in a number of formats as MAF, MGA, XMFA, PHYLIP or the HapMap genotype format. Although VariScan can conduct some analysis using unphased data (in general, genotypic data is phase-unknown), the gametic phase information is needed in some of the implemented methods (e.g. LD, Haplotype diversity); therefore, the gametic phase should be determined before using these methods in VariScan.


    IMPLEMENTATION
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 IMPLEMENTATION
 REFERENCES
 
Molecular population genetics parameters
VariScan implements a number of population genetics parameters including coalescent-based statistics (Rozas et al., 2003). In particular, VariScan estimates (1) summary statistics of nucleotide and haplotype polymorphism levels, (2) linkage disequilibrium-based statistics and (3) several coalescent-based tests of neutrality. VariScan can estimate these parameters on a specific number of sequences, or considering different options of treating gaps and missing data. All of these analyses can be conducted using the sliding window (SW) method that, in turn, can be used to obtain a graphical representation of the results.

Wavelet transform-based analysis
VariScan can also conduct a signal decomposition analysis by means of WT methods. Unlike SW, WT-based results are nearly independent of the chosen window length and, therefore, are more suitable for the separate detection of features of variable lengths independently of their genomic background. The signal, which is the raw profile of the population genetic parameter estimated along the DNA sequence, is analysed by using LastWave v2.0 software (E. Bacry, http://www.cmap.polytechnique.fr/~bacry/LastWave/). We chose Daubechies' D4 as the default wavelet filter (Daubechies, 1992) since it is adequate for locating features as peaks and valleys from a signal, with a minimum degree of smoothness (Liò and Vanucci, 2000); this filter, nevertheless, can be changed by the user. The signal is further decomposed to all analysing levels (MRA analysis; Mallat, 1999) using the orthogonal wavelet decomposition method; the orthogonal property of Daubechies wavelets allows the further reconstruction of the signal. The outcome, which is the reconstructed wavelet-transform profiles of the population genetic parameter along the sequence, can be used to identify genomic features at multiple resolution levels (i.e. at global and local scales); for instance, features located in diverse nucleotide diversity backgrounds.

Results visualization
VariScan permits the visualization of the results through available genomic browsers (Fig. 1). For instance, VariScan can write the outcome on custom annotation track formats as the WIG format used in the Genome browser at UCSC (Kent et al., 2002) or the xyplot format in GBrowse (Stein et al., 2002), conferring a visual representation of the wavelet-transform profile integrated with current annotation tracks for the genome of interest. As a result, it is possible to relate statistic profile results (of nucleotide diversity, LD, etc.) with present annotated genomic features (specific genes, intergenic regions, haplotype information, etc.) from available genome projects.



View larger version (81K):
[in this window]
[in a new window]
 
Fig. 1 Application of the MRA analysis to the Patil et al. (2001) data. Lower panel: MRA analysis from the nucleotide diversity profile along ~28 Mb region encompassing most of the 21q chromosome (A). Signal reconstruction of high-frequency bands with information from 5 to 9 MRA levels (B) Signal reconstruction from 10 to 14 MRA levels. Levels 1 to 4 are not shown. Upper panel: small-scale map showing a chromosomal region (positions 33 500 000–34 500 000) with reduced levels of nucleotide diversity. This 1 Mb region includes a number of genes, such as the receptors for interferon alpha/beta/gamma and Interleukin-10, and it is enclosed within one of the target-regions selected in the ENCODE project for their biological interest (ENm005). The regions of low-variation (track A) fit well with the larger haploytpes (Perlegen haplotype track), obtained independently by Patil et al. (2001).

 


    Acknowledgments
 
We are very grateful to M. Aguadé, J.M. Aroca, B. Audit, M. Casas, J. Castresana, S.O. Kolokotronis and C. Segarra for their valuable comments and suggestions. This work was supported by grant BMC2001-2906 from the Dirección General de Investigación Científica y Técnica, Spain, conferred on M. Aguadé, and by grant TXT98-1802 from the Dirección General de Ense nanza Superior e Investigación Científica, Spain, conferred on J.R.


    Footnotes
 
{dagger}Present address: Department Biology II—Evolutionary Biology, University of Munich, Munich, Germany Back

Received on January 19, 2005; revised on March 16, 2005; accepted on March 21, 2005

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 IMPLEMENTATION
 REFERENCES
 

    Daubechies, I. Ten Lectures on Wavelets, (1992) , Philadelphia SIAM.

    Hudson, R.R. (1990) Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol., 7, 1–44.

    Kent, W.J., et al. (2002) The Human Genome Browser at UCSC. Genome Res., 12, 996–1006[Abstract/Free Full Text].

    Kingman, J.F.C. (1982) On the genealogy of large populations. J. Appl. Prob., 19A, 27–43[CrossRef].

    Liò, P. (2003) Wavelets in bioinformatics and computational biology: state of art and perspectives. Bioinformatics, 19, 2–9[Abstract/Free Full Text].

    Liò, P. and Vanucci, M. (2000) Finding pathogenicity islands and gene transfer events in genome data. Bioinformatics, 16, 932–940[Abstract/Free Full Text].

    Mallat, S. A Wavelet Tour of Signal Processing, 2nd edn., (1999) , San Diego Academic Press.

    Olson, S. (2002) Seeking the signs of selection. Science, 298, 1324–1325[Free Full Text].

    Patil, N., et al. (2001) Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science, 294, 1719–1723[Abstract/Free Full Text].

    Quesada, H., et al. (2003) Large-Scale Adaptive Hitchhiking upon High Recombination. Genetics, 165, 895–900[Abstract/Free Full Text].

    Rosenberg, N.A. and Nordborg, M. (2002) Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms. Nat. Rev. Genet., 3, 380–390[CrossRef][Web of Science][Medline].

    Rozas, J. and Rozas, R. (1995) DnaSP, DNA sequence polymorphism: an interactive program for estimating population genetics parameters from DNA sequence data. Comput. Appl. Biosci., 11, 621–625[Abstract/Free Full Text].

    Rozas, J., et al. (2003) DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics, 19, 2496–2497[Abstract/Free Full Text].

    Sabeti, P.C., et al. (2002) Detecting recent positive selection in the human genome from haplotype structure. Nature, 419, 832–837[CrossRef][Medline].

    Stein, L.D., et al. (2002) The generic genome browser: a building block for a model organism system database. Genome Res., 12, 1599–1610[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
GeneticsHome page
E. M. Hill-Burns and A. G. Clark
X-Linked Variation in Immune Response in Drosophila melanogaster
Genetics, December 1, 2009; 183(4): 1477 - 1491.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. Librado and J. Rozas
DnaSP v5: a software for comprehensive analysis of DNA polymorphism data
Bioinformatics, June 1, 2009; 25(11): 1451 - 1452.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
D. Bensasson, M. Zarowiecki, A. Burt, and V. Koufopanou
Rapid Evolution of Yeast Centromeres in the Absence of Drive
Genetics, April 1, 2008; 178(4): 2161 - 2167.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
I. M. Ehrenreich and M. D. Purugganan
Sequence Variation of MicroRNAs and Their Binding Sites in Arabidopsis
Plant Physiology, April 1, 2008; 146(4): 1974 - 1982.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
I. J. Tsai, D. Bensasson, A. Burt, and V. Koufopanou
Population genomics of the wild yeast Saccharomyces paradoxus: Quantifying the life cycle
PNAS, March 25, 2008; 105(12): 4957 - 4962.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. Egea, S. Casillas, E. Fernandez, M. A. Senar, and A. Barbadilla
MamPol: a database of nucleotide polymorphism in the Mammalia class
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D624 - D629.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. Glinka, D. De Lorenzo, and W. Stephan
Evidence of Gene Conversion Associated with a Selective Sweep in Drosophila melanogaster
Mol. Biol. Evol., October 1, 2006; 23(10): 1869 - 1878.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/11/2791    most recent
bti403v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (8)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Vilella, A. J.
Right arrow Articles by Rozas, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Vilella, A. J.
Right arrow Articles by Rozas, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?