Skip Navigation


Bioinformatics Advance Access originally published online on November 2, 2005
Bioinformatics 2005 21(24):4423-4424; doi:10.1093/bioinformatics/bti744
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/24/4423    most recent
bti744v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Planet, P. J.
Right arrow Articles by Sarkar, I. N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Planet, P. J.
Right arrow Articles by Sarkar, I. N.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oxfordjournals.org

mILD: a tool for constructing and analyzing matrices of pairwise phylogenetic character incongruence tests

Paul J. Planet 1 and Indra Neil Sarkar 1,2,*

1Division of Invertebrate Zoology, American Museum of Natural History New York, NY, USA
2Division of Library Services, American Museum of Natural History New York, NY, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND DESCRIPTION...
 3 CONSIDERATIONS AND EXPANSIONS
 4 CONCLUSION
 REFERENCES
 

Summary: Pairwise comparisons of disagreement in phylogenetic datasets offer a powerful tool for isolating historical incongruence for closer analysis. Statistically significant phylogenetic character incongruence may reflect important differences in evolutionary history, such as horizontal gene transfer. Such testing can also be used to specify possible combinations of datasets for further phylogenetic analysis. The process of comparing multiple datasets can be very time consuming, and it is sometimes unclear how to combine data partitions given the observed patterns of incongruence. Here we present an application that automates the process of making pairwise comparisons between large numbers of phylogenetic datasets using the Incongruence Length Difference (ILD) test. The application also implements strategies for data combination based on the patterns of incongruence observed in pairwise comparisons.

Availability: The application is freely available as a Perl script that interacts with the command-line version of PAUP*.

Contact: sarkar{at}amnh.org

Supplementary information: http://www.GenomeCurator.org/mILD/


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND DESCRIPTION...
 3 CONSIDERATIONS AND EXPANSIONS
 4 CONCLUSION
 REFERENCES
 
In phylogenetic analyses, disagreement (incongruence) among the datasets or even partitions of the same dataset can be explained by real differences in evolutionary history or by the effects of misleading phylogenetic signal (noise or homoplasy) which may overwhelm valid signal as a result of inadequate sampling, experimental error or the effects of separating data into partitions. Identifying significant incongruence is a crucial first step in genomic phylogenetic analysis, especially in organisms that exchange genes with distantly related organisms (i.e. almost all microorganisms).

A wide array of statistical tests have been developed that can be used to identify historically significant phylogenetic character incongruence. One of the most widely used and tested measures of character incongruence is the Incongruence Length Difference (ILD) (Farris et al., 1994). The ILD test statistic is

where Lc is the number of steps in the most parsimonious tree found when all datasets (partitions) are analyzed in a combined analysis, and Li is the number of steps of the most parsimonious tree found for data partition i out of a total of n partitions. {delta}ILD is given a P-value by comparing it with a distribution of randomly generated {delta}ILD values that are calculated from partitions, equal in size to the originals, generated by random resampling of characters (columns in a phylogenetic matrix) from all partitions.

The ILD test assesses if the degree of incongruence seen among datasets (partitions) is simply because of partitioning them for separate analysis by indicating when the conflict between datasets is not significantly greater than the conflict within each dataset.

When more than two datasets (partitions) exist, as is often the case in genome-scale datasets, the ILD can be computed as a global test of overall character incongruence amongst all partitions, but this calculation cannot identify specifically which partitions are incongruent. Incongruent partitions can be identified by sequentially testing each partition against a combination of all others (Baker et al., 1997; Brown et al., 2002; Escobar-Paramo et al., 2004). However, this technique may also fail to identify incongruence when validly discordant phylogenetic signal is distributed over more than one partition. A more attractive, but laborious, solution is to test each partition against every other partition with the goal of combining data based on the patterns of congruence in a matrix of all pairwise comparisons (Baker et al., 1997; Lecointre et al., 1998, 2005; Planet et al., 2003). The two major impediments to this procedure are (1) that the number of tests increases exponentially as partitions are added, making manual implementation tedious when datasets are large, and prohibitive for genome-scale datasets, and, (2) in practice, multiple pairwise comparisons often result in asymmetries that confound unambiguous combination of partitions (Baker et al., 1997; Planet et al., 2003). To be symmetrical, all partitions should be congruent with all other partitions in the combination. For example, if partition A is congruent with B, and B is congruent with C, but A is not congruent with C then the combination ABC is not symmetrical. It is difficult to choose between the overlapping combinations AB and BC.

Two solutions have been proposed to account for the patterns of incongruence in pairwise character incongruence matrices. Lecointre (Lecointre et al., 2005) suggested eliminating incongruence by deleting individual sequences that cause incongruence in each partition. This is accomplished by sequentially removing (jackknifing) taxa from each partition until partitions are found to be congruent. Offending sequences are then deleted from a final ‘careful’ simultaneous analysis of all partitions. We refer to this as the taxon jackknife strategy. A second solution is to perform multiple rounds of pairwise tests, choosing combinations of symmetrically congruent partitions for inclusion as single partitions in the next round (Planet et al., 2003). This process, which we refer to as the ‘snowball’ technique, can be repeated until all partitions have been combined or are found to be incongruent.

With many partitions, a large amount of incongruence and a high degree of asymmetry, both the snowball and taxon jackknife techniques are time intensive. We present an application that automates the procedure of making and analyzing matrices of ILD comparisons (mILD). mILD takes individual dataset/partition files and performs all pairwise comparisons by interacting with the unix version of PAUP. Users can then choose to implement either the taxon jackknife or snowball strategies for data combination. The goal of this application is to make data combination tools widely available and tractable for large genome-scale analysis.


    2 METHODS AND DESCRIPTION OF ALGORITHMS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND DESCRIPTION...
 3 CONSIDERATIONS AND EXPANSIONS
 4 CONCLUSION
 REFERENCES
 
Each dataset (partition) is input into mILD as a single file of aligned sequences in FASTA format. mILD then concatenates sequences in each file by matching taxon names, and outputs a combined nexus file in which each input file is represented as a partition. Interacting with the UNIX version of PAUP, mILD carries out all pairwise comparisons between partitions using PAUP's version of the ILD test—the partition homogeneity test—and outputs a matrix of ILD P values for each comparison.

The user can choose between the snowball and taxon jackknife strategies to further test the combinability of partitions. The user can also select heuristic or exhaustive snowball strategies. The algorithm for the latter is as follows:

  1. Based on the values in the incongruence matrix find all symmetrically congruent combinations of partitions.
  2. Choose one of the symmetric combinations. Currently, mILD allows the user to select either a random combination or the combination with (a) the most characters (b) the most partitions.
  3. The combination chosen in step 2 is then represented as a single partition in another round of all pairwise tests. The process then returns to step 1.

This process repeats until all partitions are either combined or incongruent.

In the exhaustive snowball strategy, all symmetrically congruent combinations are included as single partitions (along with all the original partitions) in the next round of testing. The exhaustive snowball technique ends when no new combinations are identified.

The user can also choose the taxon jackknife technique. Based on the preliminary incongruence matrix, for each incongruent pair of taxa mILD performs multiple rounds of ILD testing, excluding each taxon in turn. Each taxon that results in a loss of incongruence during the ILD test is flagged and reported.

For both data combination strategies, the choice of P-value threshold for determining incongruence can be set by the user. In previous analyses, we have found a P-value of 0.01 to be a conservative but reliable estimate. In addition, the user can choose to apply a strict Bonferroni correction.


    3 CONSIDERATIONS AND EXPANSIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND DESCRIPTION...
 3 CONSIDERATIONS AND EXPANSIONS
 4 CONCLUSION
 REFERENCES
 
Recent critiques of the ILD test suggest that it performs poorly when partitions differ substantially in level of noise, overall evolutionary rate, rate heterogeneity among sites, number of informative sites or size (Dolphin et al., 2000; Yoder et al., 2001; Barker et al., 2002; Darlu et al., 2002; Dowton et al., 2002). However, the ILD remains a powerful test when incongruence is caused by differing evolutionary histories under certain conditions (i.e. large numbers of informative sites and low to intermediate among-site rate variation) (Darlu et al., 2002). Most studies have suggested that the ILD test is generally more susceptible to type I errors (false rejection of congruence), than type II errors (failure to reject congruence when datasets are incongruent) (Hipp et al., 2004). Thus, in the right circumstances, the ILD test can be a sensitive starting point for identifying potentially incongruent partitions for further analysis (Hipp et al., 2004). We suggest that results from mILD analyses should be carefully scrutinized and tested with other follow-up tests of incongruence. Although our implementation of mILD is designed to interact with PAUP*, it is designed to be easily modified to use other phylogenetic applications. Future versions will include other incongruence metrics, and tests of the validity of ILD P values such as those proposed by Dolphin et al., (2000).


    4 CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND DESCRIPTION...
 3 CONSIDERATIONS AND EXPANSIONS
 4 CONCLUSION
 REFERENCES
 
As genomic datasets become increasingly available, new tools are required that combine rigorous phylogenetic analysis with high-throughput, automated data curation. mILD allows large numbers of phylogenetic datasets (e.g. gene alignments) to be tested for phylogenetic congruence in a statistical framework, and then automates techniques for combining the data based on patterns of incongruence. We intend to expand this application to include other tests of incongruence and strategies for data combination.


    Acknowledgments
 
We thank David H. Figurski, Rob DeSalle and E. Kurt Lienau for the helpful discussion. P.J.P. is supported by NIH grant 5R01 GM062351-02; I.N.S. by NSF grants IIS-0241229 & DBI-0421604 and the Lewis B. & Dorothy Cullman Program for Molecular Systematics.

Conflict of Interest: none declared.

Received on August 2, 2005; revised on October 7, 2005; accepted on October 24, 2005

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND DESCRIPTION...
 3 CONSIDERATIONS AND EXPANSIONS
 4 CONCLUSION
 REFERENCES
 

    Baker, R.H., et al. (1997) Multiple sources of character information and the phylogeny of Hawaiian drosophilids. Syst. Biol, . 46, 654–673[CrossRef][Web of Science][Medline].

    Barker, F.K., et al. (2002) The utility of the incongruence length difference test. Syst. Biol, . 51, 625–637[CrossRef][Web of Science][Medline].

    Brown, E.W., et al. (2002) Detection of recombination among Salmonella enterica strains using the incongruence length difference test. Mol. Phylogenet. Evol, . 24, 102–120[CrossRef][Web of Science][Medline].

    Darlu, P., et al. (2002) When does the incongruence length difference test fail? Mol. Biol. Evol, . 19, 432–437[Abstract/Free Full Text].

    Dolphin, K., et al. (2000) Noise and incongruence: interpreting results of the incongruence length difference test. Mol. Phylogenet. Evol, . 17, 401–406[CrossRef][Web of Science][Medline].

    Dowton, M., et al. (2002) Increased congruence does not necessarily indicate increased phylogenetic accuracy—the behavior of the incongruence length difference test in mixed-model analyses. Syst. Biol, . 51, 19–31[CrossRef][Web of Science][Medline].

    Escobar-Paramo, P., et al. (2004) Decreasing the effects of horizontal gene transfer on bacterial phylogeny: the Escherichia coli case study. Mol. Phylogenet. Evol, . 30, 243–250[CrossRef][Web of Science][Medline].

    Farris, J.S., et al. (1994) Testing the Significance of Incongruence. Cladistics, 10, 315–319[CrossRef][Web of Science].

    Hipp, A.L., et al. (2004) Congruence Versus Phylogenetic Accuracy: Revisiting the Incongruence Length Difference Test. Syst. Biol, . 53, 81–89[Free Full Text].

    Lecointre, G., et al. (1998) Escherichia coli molecular phylogeny using the incongruence length difference test. Mol. Biol. Evol, . 15, 1685–1695[Abstract].

    Lecointre, G., et al. (2005) Total Evidence require exclusion of phylogenetically misleading data. Zoologica Scripta, 34, 101–117[CrossRef].

    Planet, P.J., et al. (2003) The Widespread Colonization Island of Actinobacillus actinomycetemcomitans. Nat. Genet, . 34, 193–198[CrossRef][Web of Science][Medline].

    Yoder, A.D., et al. (2001) Failure of the ILD to determine data combinability for slow loris phylogeny. Syst. Biol, . 50, 408–424[Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Syst BiolHome page
D. L. J. Quicke, O. R. Jones, and D. R. Epstein
Correcting the Problem of False Incongruence Due to Noise Imbalance in the Incongruence Length Difference (ILD) Test
Syst Biol, June 1, 2007; 56(3): 496 - 503.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/24/4423    most recent
bti744v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Planet, P. J.
Right arrow Articles by Sarkar, I. N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Planet, P. J.
Right arrow Articles by Sarkar, I. N.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?