Skip Navigation


Bioinformatics Advance Access originally published online on September 7, 2004
Bioinformatics 2005 21(3):402-404; doi:10.1093/bioinformatics/bti003
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/3/402    most recent
bti003v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (37)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Price, E. W.
Right arrow Articles by Carbone, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Price, E. W.
Right arrow Articles by Carbone, I.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics vol. 21 issue 3 © Oxford University Press 2005; all rights reserved.

SNAP: workbench management tool for evolutionary population genetic analysis

Eric W. Price 1,2 and Ignazio Carbone 1,*

1 Center for Integrated Fungal Research, Department of Plant Pathology, North Carolina State University Raleigh, NC 27695, USA
2 Department of Computer Sciences, North Carolina State University Raleigh, NC 27695, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 Framework
 IMPLEMENTATION
 REFERENCES
 

Summary: The reconstruction of population processes from DNA sequence variation requires the coordinated implementation of several coalescent-based methods, each bound by specific assumptions and limitations. In practice, the application of these coalescent-based methods for parameter estimation is difficult because they make strict assumptions that must be verified a priori and their parameter-rich nature makes the estimation of all model parameters very complex and computationally intensive. A further complication is their distribution as console applications that require the user to navigate through console menus or specify complex command-line arguments. To facilitate the implementation of these coalescent-based tools we developed SNAP Workbench, a Java program that manages and coordinates a series of programs. The workbench enhances population parameter estimation by ensuring that the assumptions and program limitations of each method are met and by providing a step-by-step methodology for examining population processes that integrates both summary-statistic methods and coalescent-based population genetic models.

Availability: SNAP Workbench is freely available at http://snap.cifr.ncsu.edu. The workbench and tools can be downloaded for Mac, Windows and Unix operating systems. Each package includes installation instructions, program documentation and a sample dataset.

Contact: ignazio_carbone{at}ncsu.edu

Supplementary information: A description of system requirements and installation instructions can be found at http://snap.cifr.ncsu.edu


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 Framework
 IMPLEMENTATION
 REFERENCES
 
In recent years, rapid advances in DNA sequencing technology and population genetic theory have resulted in a plethora of new approaches for making inferences on population processes from DNA sequence variation. Among these are the tools for estimating population mutation and migration rates such as MIGRATE, (Beerli and Felsenstein, 1999, 2001) MDIV (Nielsen and Wakeley, 2001 and Genetree Bahlo and Griffiths, 2000); recombination rates such as Recom58 (Griffiths and Marjoram, 1996) Infs and Fins (Fearnhead and Donnelly, 2001) and Recombine (Kuhner et al., 2000); migration and recombination such as LAMARC (Beerli and Felsenstein, 2001; Kuhner et al., 2000); and selection (Neuhauser and Krone, 1997). These approaches were built on Wright–Fisher population genetic models by incorporating probability-based coalescent methods, which take full advantage of the data and their inherent stochastic properties (Kingman, 1982a,b,c).

In practice, there are three main limitations for using these coalescent-based methods: (1) they make strict assumptions that must be verified a priori; (2) their parameter-rich nature makes the estimation of all model parameters very complex and computationally intensive; and (3) they are distributed as console applications written in C and require the user to navigate through console menus or specify complex command-line arguments. Although many tools and techniques are being developed for analyzing population-based DNA sequence variation, very few provide step-by-step methodologies for integrating multiple analysis methods into a readily accessible, user-friendly package.

The development of an integrated software environment would eliminate incompatibilities due to the strict data format requirements of different programs and allow data input and output to flow seamlessly between different analysis modules. This approach has been the goal of several emerging program suites, such as Mesquite (Maddison and Maddison, 2002, http://mesquiteproject.org) and EMBOSS Rice et al., 2000, which present the user with an arsenal of molecular tools and analysis methods. Two major limitations with these software programs are that they provide little or no guidance on how to perform a specific analysis and they require source code modification of existing C program modules before their inclusion.


    SYSTEMS AND METHODS
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 Framework
 IMPLEMENTATION
 REFERENCES
 
We have developed a workbench program that can manage and coordinate a suite of nucleotide analysis programs (SNAP). These include the alignment program ClustalW (Thompson et al., 1994); the phylogenetic analysis program PHYLIP (Felsenstein, 2004) and PAUP* 4.0 (Swofford, 1998); the non-parametric permutation analysis programs Seqtomatrix and Permtest (Hudson et al., 1992); the recombination detection programs RecMin (Myers and Griffiths, 2003); and RecPars (Hein, 1990); the coalescent-based programs Genetree, Recom58, MDIV and MIGRATE; and additional programs [SNAP Combine, Map, Clade and Matrix; Fig. 1 and (Carbone et al., 2004)]. The workbench was designed to facilitate the incorporation of new tools as they become available, thereby serving as a bridge between theoretical and applied population genetic analysis. Although our workbench was designed to target population genetics, its solution to the problem of workflow integration is flexible and powerful enough to provide solutions in other fields as well. Our goal in developing the workbench was to:

  1. eliminate the requirement for using command line,
  2. integrate a wide array of approaches for analyzing population genetic data based on both traditional summary-statistic methods and the newer coalescent-based population genetic models,
  3. ensure that the assumptions and program requirements of each method are not violated and
  4. provide user interactive tutorials for teaching and training.



View larger version (32K):
[in this window]
[in a new window]
 
Fig. 1 Flowchart illustrating the sequence of steps and tools implemented in SNAP Workbench. The names of software tools are italicized; Hudson refers to the programs Seqtomatrix and Permtest developed by R. Hudson. The programs pars and consense are from the PHYLIP package (Felsenstein, 2004).

 

    Framework
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 Framework
 IMPLEMENTATION
 REFERENCES
 
The workbench was programmed in Java to preserve platform independence across multiple operating systems. The program modules integrated in the workbench are written in C or Java and can be readily compiled on a variety of computing platforms. SNAP Workbench allows the user to customize the interface for available program modules without requiring computer programming or shell scripting skills. This is accomplished using the template design feature of the workbench. Templates allow the user to create and organize drop-down menus in the interface. Each menu option is further divided into submenus. Submenus define a set of programs or options within a single program that are executed sequentially to complete a particular analysis. For example, the submenu option for performing a ‘Nonparametric test for population subdivision’ under the ‘Migration’ menu requires the user to execute the programs SNAP Map, Seqtomatrix and Permtest sequentially. Because there is usually more than one way to perform these analyses, the workbench supports, via multithreading, simultaneous program executions to allow the user to explore different scenarios. Multiple files are displayed in a tabbed format for easy access and line wrapping has been disabled to facilitate viewing of long DNA sequences. Files may be edited and saved directly in the workbench using basic text-editing functions.


    IMPLEMENTATION
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 Framework
 IMPLEMENTATION
 REFERENCES
 
SNAP Workbench has recently been used to examine recombination and migration in Cryphonectria hypovirus 1 (CHV-1) (Carbone et al., 2004). The multithreaded capability of our workbench was particularly important in this study because many independent coalescent runs were necessary to ensure convergence of the programs MIGRATE, MDIV, Recom58 and Genetree. A flowchart showing all the programs and analysis paths for inferring migration and recombination processes in CHV-1 is shown in Figure 1. The template design feature of the workbench allowed us to create menus consisting of a defined set of programs, assumptions and parameter settings for following a particular path in the flowchart (e.g. see paths 1 and 2 in Fig. 1). Currently, the workbench is designed to operate on a single machine. Future versions of SNAP Workbench will be able to use distributed parallel processing on Linux clusters and supercomputers for performing computationally intensive simulations and will integrate tutorials, providing comprehensive hands-on training, for the different analysis methods in the workbench.


    Acknowledgments
 
The authors thank Doug Brown, Judy Jakobek and two anonymous reviewers for providing valuable comments.

Received on June 18, 2004; revised on August 5, 2004; accepted on August 24, 2004

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 SYSTEMS AND METHODS
 Framework
 IMPLEMENTATION
 REFERENCES
 

    Bahlo, M. and Griffiths, R.C. (2000) Inference from gene trees in a subdivided population. Theoret. Popul. Biol., 57, 79–95.

    Beerli, P. and Felsenstein, J. (1999) Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics, 152, 763–773[Abstract/Free Full Text].

    Beerli, P. and Felsenstein, J. (2001) Maximum-likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proc. Natl Acad. Sci. USA, 98, 4563–4568[Abstract/Free Full Text].

    Carbone, I., Liu, Y., Hillman, B.I., Milgroom, M.G. (2004) Recombination and migration of Cryphonectria hypovirus 1 as inferred from gene genealogies and the coalescent. Genetics, 166, 1611–1629[Abstract/Free Full Text].

    Fearnhead, P. and Donnelly, P. (2001) Estimating recombination rates from population genetic data. Genetics, 159, 1299–1318[Abstract/Free Full Text].

    Felsenstein, J. (2004) PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. , Seattle, WA Department of Genomic Sciences, University of Washington.

    Griffiths, R. and Marjoram, P. (1996) Ancestral inference from samples of DNA sequences with recombination. J. Comput. Biol., 3, 479–502[Web of Science][Medline].

    Hein, J. (1990) Reconstructing evolution of sequences subject to recombination using parsimony. Math. Biosci., 98, 185–200[CrossRef][Web of Science][Medline].

    Hudson, R.R., Boos, D.D., Kaplan, N.L. (1992) A statistical test for detecting geographic subdivision. Mol. Biol. Evol., 9, 138–151[Abstract].

    Kingman, J.F.C. (1982a) On the genealogy of large populations. J. Appl. Probab., 19, 27–43.

    Kingman, J.F.C. (1982b) Exchangeability and the evolution of large populations. In Koch, G. and Spizzichino, F. (Eds.). Exchangeability in Probability and Statistics, , Amsterdam North-Holland, pp. 97–112.

    Kingman, J.F.C. (1982c) The coalescent. Stochastic Processes and their Applications, 13, 235–248[CrossRef].

    Kuhner, M.K., Yamato, J., Felsenstein, J. (2000) Maximum likelihood estimation of recombination rates from population data. Genetics, 156, 1393–1401[Abstract/Free Full Text].

    Mesquite: a modular system for evolutionary analysis. Version 0.992. Maddison, W.P. and Maddison, D.R. (2002) .

    Myers, S.R. and Griffiths, R.C. (2003) Bounds on the minimum number of recombination events in a sample history. Genetics, 163, 375–394[Web of Science][Medline].

    Neuhauser, C. and Krone, S.M. (1997) The genealogy of samples in models with selection. Genetics, 145, 519–534[Abstract].

    Nielsen, R. and Wakeley, J. (2001) Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics, 158, 885–896[Abstract/Free Full Text].

    Rice, P., Longden, I., Bleasby, A. (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet., 16, 276–277[CrossRef][Web of Science][Medline].

    Swofford, D.L. PAUP*. Phylogenetic Analysis Using Parsimony (* and Other Methods). Version 4.0, (1998) , Sunderland, MA Sinauer Associates.

    Thompson, J.D., Higgins, D.G., Gibson, T.J. (1994) ClustalW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, , pp. 4673–4680[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
MycologiaHome page
N. D. Charlton, I. Carbone, S. M. Tavantzis, and M. A. Cubeta
Phylogenetic relatedness of the M2 double-stranded RNA in Rhizoctonia fungi
Mycologia, July 1, 2008; 100(4): 555 - 564.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Park, B. Park, K. Jung, S. Jang, K. Yu, J. Choi, S. Kong, J. Park, S. Kim, H. Kim, et al.
CFGP: a web-based, comparative fungal genomics platform
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D562 - D571.
[Abstract] [Full Text] [PDF]


Home page
MycologiaHome page
C. R. Grunig, A. Duo, T. N. Sieber, and O. Holdenrieder
Assignment of species rank to six reproductively isolated cryptic species of the Phialocephala fortinii s.l.-Acephala applanata species complex
Mycologia, January 1, 2008; 100(1): 47 - 67.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
E. H. Stukenbrock, S. Banke, M. Javan-Nikkhah, and B. A. McDonald
Origin and Domestication of the Fungal Wheat Pathogen Mycosphaerella graminicola via Sympatric Speciation
Mol. Biol. Evol., February 1, 2007; 24(2): 398 - 411.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. L. Aylor, E. W. Price, and I. Carbone
SNAP: Combine and Map modules for multilocus population genetic analysis
Bioinformatics, June 1, 2006; 22(11): 1399 - 1401.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
L. Charles, I. Carbone, K. G. Davies, D. Bird, M. Burke, B. R. Kerry, and C. H. Opperman
Phylogenetic Analysis of Pasteuria penetrans by Use of Multiple Genetic Loci
J. Bacteriol., August 15, 2005; 187(16): 5700 - 5708.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/3/402    most recent
bti003v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (37)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Price, E. W.
Right arrow Articles by Carbone, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Price, E. W.
Right arrow Articles by Carbone, I.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?