Skip Navigation


Bioinformatics Advance Access originally published online on April 4, 2006
Bioinformatics 2006 22(11):1402-1403; doi:10.1093/bioinformatics/btl131
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/11/1402    most recent
btl131v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bardel, C.
Right arrow Articles by Génin, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bardel, C.
Right arrow Articles by Génin, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

ALTree: association detection and localization of susceptibility sites using haplotype phylogenetic trees

Claire Bardel 1,*, Vincent Danjean 2 and Emmanuelle Génin 1

1 Unité de recherche en Génétique Épidémiologique et Structure des Populations Humaines, INSERM U535 Villejuif, France
2 Laboratoire ID-IMAG, UMR 5132 Grenoble, France

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MAIN PROGRAM:...
 3 TWO UTILITIES
 4 REQUIREMENTS
 5 CONCLUSION
 REFERENCES
 

Summary: Finding the genes involved in complex diseases susceptibility and among those genes, localizing the variant sites explaining this susceptibility is a major goal of genetic epidemiology. In this context, haplotypic methods that use the joint information on several markers may be of particular interest. When the number of haplotypes is large, a grouping may be required. Phylogenetic trees allow such groupings of haplotypes based on their evolutionary history and may help in the detection and localization of disease susceptibility sites. In this paper, we present a new software to perform phylogeny-based association and localization analysis.

Availability: The software package, including all documentation and example files, is freely available at http://claire.bardel.free.fr/software.html. It is distributed under the GPL license.

Contact: bardel{at}vjf.inserm.fr


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MAIN PROGRAM:...
 3 TWO UTILITIES
 4 REQUIREMENTS
 5 CONCLUSION
 REFERENCES
 
When looking for an association between a candidate gene and a disease, genetic markers can be either studied one at a time or combined with the other markers located on the same chromosome to form haplotypes. In the last few years, haplotypic methods have been shown to be powerful to look for association (Akey et al., 2001; Zaykin et al., 2002). However, when the number of haplotypes increases, the power of the association test decreases, owing to the increase in the degree of freedom and to the decrease in the sample size per haplotype. To reduce these problems, it has been proposed to group haplotypes according to their evolutionary history (Templeton et al., 1987; Seltman et al., 2003; Durrant et al., 2004; Bardel et al., 2005). Moreover, such groupings allow to make hypotheses about the markers responsible for the susceptibility to the disease: in the evolutionary tree, mutations defining a group containing more haplotypes carried by cases than controls are putative susceptibility sites for the disease.

In this paper, we present a new software to perform a phylogeny-based association and localization analysis based on the method described in Bardel et al. (2005). The software deals with SNP haplotype data. It is written in Perl, with one package in C. It is composed of three programs: the main analysis program, altree, which performs either the association or the localization analysis depending on the option selected by the user and two utilities, altree-add-S and altree-convert, which help the user managing its data file before running altree.


    2 THE MAIN PROGRAM: ALTREE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MAIN PROGRAM:...
 3 TWO UTILITIES
 4 REQUIREMENTS
 5 CONCLUSION
 REFERENCES
 
To run altree, two input files are required: first, a file containing for each haplotype, the number of case and control individuals carrying it, and second, a file containing the phylogenetic tree and the list of character state changes on the tree. This last file is obtained by running one of the following phylogeny reconstruction program: paup (Swofford, 2002, http://paup.csit.fsu.edu/), phylip (Felsenstein, 2004, [http://evolution.genetics.washington.edu/phylip.html]) or paml (Yang, 1997, [http://abacus.gene.ucl.ac.uk/software/paml.html]).

2.1 Association detection
The principle of the method is to perform series of nested case/control homogeneity tests in the different clades defined on the phylogenetic tree. P-values are computed at each level of the tree and a global P-value corrected for multiple testing is computed for the test, using a permutation procedure (Ge et al., 2003; Becker and Knapp, 2004) which takes into account the non-independance of the nested tests. The association test required the tree to be rooted because the nested analysis began at the root of the tree. Consequently, if the tree is not rooted during the phylogenetic reconstruction process, an outgroup must be provided by the user so that altree can root the tree.

The output file contains the phylogenetic tree, with the number of case and control haplotypes in each branch, and the P-value of the association test.

2.2 Localization of the susceptibility site(s)
The principle of the localization analysis is to use a new character called ‘S’ that represents the disease status. For each haplotype the state of this character depends on the proportion of cases carrying this haplotype (state 1 for a large proportion of cases and 0 else). The character state changes are optimized on the tree for each character (including S), and a correlated evolution index is calculated between each change of each site and the changes of the character S. The site(s) whose evolution is the most correlated to the character S is the most probable susceptibility site.

For the localization test, the character states can be reconstructed on the tree either by paup (parsimony method) or paml (Maximum likelihood method), but not by phylip because it leaves too many ambiguities in the ancestral character reconstructions. Contrary to the association analysis, the tree does not need to be rooted. When several trees are available (e.g. several equiparsimonious trees), the user can specify how many trees should be included in the analysis. They will then be picked up at random from the total sample of parsimonious trees. Two kinds of correlated evolution index can be calculated. The first is the one described in Bardel et al. (2005): only transitions from 0 to 1 for the character S are taken into account. A second index which seems more appropriate is now available: it takes into account both transitions of S (from 0 to 1 and from 1 to 0). The correlated evolution index can be chosen using the option —coevo

The output file of the program consists in a list of all the correlated evolution index for all transitions of all sites, ranked in decreasing order. The phylogenetic trees can be included in the output file using the option —print-tree.


    3 TWO UTILITIES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MAIN PROGRAM:...
 3 TWO UTILITIES
 4 REQUIREMENTS
 5 CONCLUSION
 REFERENCES
 
3.1 A file converter: altree-convert
Our software analyzes haplotypic data. Such data are not generally directly available and they must be obtained by using haplotype reconstruction programs. altree-convert allows to convert output files from two haplotype reconstruction programs, phase (Stephens et al., 2001; Stephens and Donnelly, 2003) and FamHap [Becker and Knapp (2004), only for haplotype reconstructed on family data] into input files for the phylogeny reconstruction programs (paup or phylip file format). When the paup file format is chosen, a list of commands is also written in the file so that users who are not familiar with paup can run it with only a few modifications to this file.

3.2 Definition of the character S: altree-add-S
To define the state of the character S, the user can choose its own criterion and add the character S manually in the paup input file. Otherwise, the user can use altree-add-S. This program takes a paup or a phylip input file and a list of the number of cases and controls carrying a given haplotype in input and returns a new paup or phylip input file in which the character S has been added according to the following criterion: the state of S is ‘0’, ‘1’ or ‘?’ depending on the proportion (ph) of cases carrying the haplotype h compared to the proportion p0 of cases in the whole sample.

  • if Formula, S is coded ‘0’ (high number of controls);
  • if Formula, S is coded ‘1’ (high number of cases);
  • else, S is coded ‘?’ (missing data).
with nh being the number of individuals carrying the haplotype h and {varepsilon}, a number chosen by the user.


    4 REQUIREMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MAIN PROGRAM:...
 3 TWO UTILITIES
 4 REQUIREMENTS
 5 CONCLUSION
 REFERENCES
 
The software requires Perl 5.8, a C compiler and a phylogeny reconstruction software: paup, phylip or paml.

The processing time was measured on a Pentium III, 930 MHz, 512 Mo of RAM. For a dataset of 363 individuals genotyped for 7 SNPs (33 different haplotypes, 6 levels in the tree), the association test runs in ~24 h (P-value evaluated by 100 000 permutations, complexity of the program being linear with respect to the number of permutations) and the localization test runs in ~10 s (2000 equiparsimonious trees analyzed, complexity of the program being linear with respect to the number of analyzed trees).


    5 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MAIN PROGRAM:...
 3 TWO UTILITIES
 4 REQUIREMENTS
 5 CONCLUSION
 REFERENCES
 
Our software groups haplotypes based on their phylogenetic relationships to perform both association and localization analyzes. With the two utilities, it is easily usable, even for users who are not accustomed to phylogeny reconstruction programs. The power of altree to detect an association and its efficiency to detect susceptibility loci was evaluated by simulations. We showed that it is especially interesting when more than one susceptibility site are involved in the disease. We also showed that altree can successfully identify the three variant sites in CARD15 that are involved in Crohn disease (Bardel et al., 2005)

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Charlie Hodgman

Received on January 13, 2006; revised on March 17, 2006; accepted on March 31, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 THE MAIN PROGRAM:...
 3 TWO UTILITIES
 4 REQUIREMENTS
 5 CONCLUSION
 REFERENCES
 

    Akey, J., et al. (2001) Haplotypes vs single marker linkage disequilibrium tests: what do we gain? Eur. J. Hum. Genet, . 9, 291–300[CrossRef][Web of Science][Medline].

    Bardel, C., et al. (2005) On the use of haplotype phylogeny to detect disease susceptibility loci. BMC Genet, . 6, 24[CrossRef][Medline].

    Becker, T. and Knapp, M. (2004) A powerful strategy to account for multiple testing in the context of haplotype analysis. Am. J. Hum. Genet, . 75, 561–570[CrossRef][Medline].

    Durrant, C., et al. (2004) Linkage disequilibrium mapping via cladistic analysis of single-nucleotique polymorphism haplotypes. Am. J. Hum. Genet, . 75, 35–43[CrossRef][Web of Science][Medline].

    Felsenstein, J. (2004) Phylip (phylogeny inference package) version 3.6.

    Ge, Y., et al. (2003) Resampling-based multiple testing for microarray data analysis. Test, 12, 1–77.

    Seltman, H., et al. (2003) Evolutionary-based association using haplotype data. Genet. Epidemiol, . 25, 48–58[CrossRef][Web of Science][Medline].

    Stephens, M., et al. (2001) A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet, . 68, 978–989[CrossRef][Web of Science][Medline].

    Stephens, M. and Donnelly, P. (2003) A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet, . 73, 1162–1169[CrossRef][Web of Science][Medline].

    Swofford, D.L. (2002) paup phylogenetic analysis using parcimony, version 4.0b10. , Sunderland, MA. Sinauer Associates.

    Templeton, A.R., et al. (1987) A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. I. Basic theory and an analysis of alcohol dehydrogenase activity in Drosophila. Genetics, 117, 343–351[Abstract/Free Full Text].

    Yang, Z. (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. appl. bioSci, . 13, 555–556[Free Full Text].

    Zaykin, D.V., et al. (2002) Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum. Hered, . 53, 79–91[Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/11/1402    most recent
btl131v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bardel, C.
Right arrow Articles by Génin, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bardel, C.
Right arrow Articles by Génin, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?