Bioinformatics Advance Access originally published online on August 7, 2006
Bioinformatics 2006 22(20):2493-2499; doi:10.1093/bioinformatics/btl427
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Robust inference of positive selection from recombining coding sequences
Computational Biology Group, Institute of Infectious Disease and Molecular Medicine University of Cape Town, Private Bag, Rondebosch 7701, South Africa
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Accurate detection of positive Darwinian selection can provide important insights to researchers investigating the evolution of pathogens. However, many pathogens (particularly viruses) undergo frequent recombination and the phylogenetic methods commonly applied to detect positive selection have been shown to give misleading results when applied to recombining sequences. We propose a method that makes maximum likelihood inference of positive selection robust to the presence of recombination. This is achieved by allowing tree topologies and branch lengths to change across detected recombination breakpoints. Further improvements are obtained by allowing synonymous substitution rates to vary across sites.
Results: Using simulation we show that, even for extreme cases where recombination causes standard methods to reach false positive rates >90%, the proposed method decreases the false positive rate to acceptable levels while retaining high power. We applied the method to two HIV-1 datasets for which we have previously found that inference of positive selection is invalid owing to high rates of recombination. In one of these (env gene) we still detected positive selection using the proposed method, while in the other (gag gene) we found no significant evidence of positive selection.
Availability: A HyPhy batch language implementation of the proposed methods and the HIV-1 datasets analysed are available at http://www.cbio.uct.ac.za/pub_support/bioinf06. The HyPhy package is available at http://www.hyphy.org, and it is planned that the proposed methods will be included in the next distribution. RDP2 is available at http://darwin.uvigo.es/rdp/rdp.html.
Contact: konrad{at}cbio.uct.ac.za, cathal{at}science.uct.ac.za
| 1 INTRODUCTION |
|---|
|
|
|---|
The standard phylogenetic approach to inferring positive Darwinian selection in protein-coding sequences is based on the codon models first proposed by Muse and Gant (1994) and Yang (1994), which have since been developed into a set of robust methods that detect positive selection while allowing for selective pressure to vary across sites (Nielsen and Yang, 1998; Yang et al., 2000; Wong et al., 2004). These methods, however, assume that the phylogenetic tree topology and branch lengths are constant across all sites in the sequencean assumption which is invalid when the sequences have been affected by recombination. Indeed, it has been shown (Anisimova et al., 2003; Shriner et al., 2003) that the presence of recombination can cause these methods to fail with type I (false positive) error rates as high as 90%. In a recent study (Scheffler and Seoighe, manuscript submitted), we quantified the percentage of false positive inferences as a function of recombination rate and demonstrated that inferred positive selection on two example HIV datasets is invalidated by the presence of recombination.
Recombination can contribute to false inference of positive selection by causing the branch lengths (Fig. 1a) and tree topologies (Fig. 1b) to differ between sites. In order to devise a robust method of inferring positive selection we investigated the impact of allowing tree topology and branch length parameters to change across recombination breakpoints. In a real analysis we anticipate that a subset of recombination breakpoints might be undetected. In order to improve the performance of our method in the presence of a subset of undetected recombination breakpoints, we included a variable synonymous substitution rate in our models, which allows the total tree length to vary from site to site. Sequences can evolve under a variable synonymous substitution rate owing to mutation rate variation or owing to site-specific selection acting on synonymous changes, but synonymous rate variation could also be detected as a result of recombination events that alter branch lengths. Incorporating synonymous rate variation in the model can therefore account for some of the misestimated branch lengths that result from recombination events that alter branch lengths but not tree topology. In general, we expect these recombination events to be more difficult to detect than those that cause a substantial change in tree topology. We evaluated the performance of the method by simulation and applied it to investigate whether the HIV datasets mentioned above can be inferred to be evolving under positive selection when recombination is taken into account.
|
| 2 MATERIALS AND METHODS |
|---|
|
|
|---|
We generated a number of datasets using the Codonrecsim program written by Rasmus Nielsen (Anisimova et al., 2003) that simulates recombined coding sequence alignments. It does this by simulating under a phylogenetic model of evolution using the discrete model (M3) of site-to-site rate variation proposed by Yang et al. (2000), but with the evolution taking place along genealogies simulated under the coalescent model with recombination (Hudson, 1983). This means that sites that have a recombination breakpoint between them do not evolve along the same phylogenetic tree. Barton and Etheridge (2004) have shown that selection has little effect on genealogies, which justifies neglecting selection when simulating genealogies under the coalescent model with recombination.
We performed two suites of simulation experiments, one using 10-taxon and one using 30-taxon datasets (Table 1). In each suite we simulated neutrally evolving datasets (i.e.
= 1, mimicking pseudogene evolution) to estimate false positive rates and datasets evolving with site-to-site rate variation and positive selection [using the parameters inferred by Anisimova et al. (2003) on their hepatitis D antigen dataset under the 3-component discrete model (Yang et al., 2000)] to estimate power. Each simulated alignment was 500 codons long, and each dataset consisted of 100 replicates. The transition/transversion rate ratio (
) and the codon equilibrium frequencies were set to the values estimated for the Hepatitis D antigen data set. We chose mutation and recombination rate parameters that produced high false positive rates when using the standard method (see below) to infer positive selection on the neutral datasets. For the 30-taxon datasets the population-scaled recombination rate (
) was 0.01 and the population-scaled mutation rate (
) was 0.36, resulting in an average of 43.2 recombination events in the entire genealogy and an expected number of 1.43 mutations per codon. For the 10-taxon datasets
was 0.05 and
was 3.6, resulting in an average of 247.11 recombination events in the entire genealogy and an expected number of 10.18 mutations per codon (the very high values for the 10-taxon datasets serve to illustrate that the method works well even in extreme cases). To verify that the proposed method does not have an adverse effect when used on unrecombined data, we also simulated datasets with exactly the same parameters but with zero recombination rate.
|
Finally, we analysed the HIV-1 subtype C env and gag data of our recent study (Scheffler and Seoighe manuscript submitted). These datasets contain 10 taxa each, with accession numbers AY118165 [GenBank] -AY118166 [GenBank] , AF286227 [GenBank] , AY158533 [GenBank] -AY158535 [GenBank] , AF411967 [GenBank] , AF391234 [GenBank] -AF391235 [GenBank] and AF391238 [GenBank] for the env sequences (1053 codons in length) and AY118165 [GenBank] -AY118166 [GenBank] , AF286227 [GenBank] , AY158533 [GenBank] -AY158535 [GenBank] , AF411967 [GenBank] , AY162223 [GenBank] -AY162224 [GenBank] and AF391254 [GenBank] for the gag sequences (590 codons in length).
| 3 ALGORITHM |
|---|
|
|
|---|
In this study we report results for four methods of detecting positive selection, using different combinations of the two strategies investigated:
- Standard: This is the baseline method, which assumes that topology, relative branch lengths and total tree length are constant over all sites.
- Synonymous rate variation: This method assumes that topology and relative branch length are constant over all sites, but allows total tree length to vary from site to site.
- Partitioning: This method uses recombination breakpoints (either detected or the actual simulated breakpoints) to divide the alignment into partitions, each of which is assumed to include no further recombination breakpoints. Topology, relative branch lengths and total tree length are forced to be constant over all sites within a partition, but allowed to vary between partitions.
- Synonymous rate variation with partitioning: This method combines the previous two methods: topology and relative branch lengths are assumed constant over all sites within a partition, but allowed to vary between partitions. Total tree length is allowed to vary from site to site irrespective of partitioning.
- Synonymous rate variation: This method assumes that topology and relative branch length are constant over all sites, but allows total tree length to vary from site to site.
3.1 Baseline (standard) method
We detected positive selection by comparing the discrete nearly neutral and selection models M1a and M2a of Wong et al. (2004). We used the PAUP* program (Swafford, 2002) to estimate the maximum likelihood topologies under the HKY85 model (Hasegawa et al., 1985). To save computation time, we did not estimate the branch lengths separately for each model, but instead used the branch lengths estimated under the M0 (single rate) model (Yang et al., 2000). We report a sequence as being under positive selection at the 5 or 1% level if model M2a provides a significantly better fit than model M1a as measured by a likelihood ratio test with the appropriate significance level.
3.2 Allowing synonymous rate variation
In the methods that model synonymous rate variation we added a synonymous substitution rate parameter to the baseline method. We treat the synonymous rate s as belonging to one of a number of discrete rate classes, similar to the treatment of the non-synonymous/synonymous rate ratio
, so that the expression for the instantaneous substitution rate from codon i to codon j at site h becomes:
|
| (1) |
is the transition/transversion rate ratio and
j is the codon equilibrium frequency of codon j.
(h) and s(h) denote, respectively, the non-synonymous/synonymous rate ratio and synonymous rate at site h.
The synonymous rate is drawn from a discrete distribution with three rate categories (we obtained no noticeable difference in results when using four categories, data not shown), with rates scaled such that the average synonymous rate over all sites is 1. This distribution is identical to that used for the
parameter in the discrete model M3 of Yang et al. (2000), except that the latter is unscaled. Thus each site, in addition to belonging to one of the
categories, also belongs to a synonymous rate category. This can also be viewed as providing three different tree scales: the evolution at each site is modelled as following the same tree topology and relative branch lengths, but the tree may be scaled differently for different sites.
Note that our parameterisation of site-to-site rate variation is different from that used by Kosakovsky Pond and Muse (2005), which uses the synonymous rate only for synonymous changes and hence is not a direct measure of total tree length (s(h) is absent from the expression for the instantaneous rate of non-synonymous transitions and transversions). Whereas Kosakovsky Pond and Muse (2005) apply parametric models to the distribution of synonymous and of non-synonymous rates, our parameterisation applies the same parametric models to the distribution of synonymous rates and of selective strengths.
3.3 Detecting recombination breakpoints
For the methods using partitioning by detected recombination we estimated the positions of recombination breakpoints using the non-parametric RDP (Martin and Rybicki, 2000), GENECONV (Padidam et al., 1999) and MAXIMUM CHI SQUARED (Maynard Smith, 1992) methods as implemented in RDP2 (Martin et al., 2005). See Poke et al. (2006) for a description of how these methods work. Default program settings were used throughout except that a Bonferroni corrected P-value cutoff of 0.01 was used to minimize the probability of falsely inferring recombination. All breakpoints detected by any of the three methods were taken into consideration.
3.4 Allowing different tree topologies for different sequence fragments
Once the recombination breakpoints have been detected, we use them to partition the alignment into separate segments (Fig. 2). When the number of segments exceeds a preset maximum N (20 in this study), we use only the N longest unbroken segments and discard the remaining data. The rationale behind this is that when the segments between recombination breakpoints are very short, they contain very little phylogenetic information and therefore the tree topology and branch length parameters cannot be estimated accurately for the partitions. Moreover, such small partitions contribute very little information so that discarding them should be less costly than introducing additional uncertainty resulting from estimating additional branch length and topology parameters for the partition. In the present study, data were discarded only for the simulated data, which had very high rates of recombination. The number of breakpoints detected in the real datasets we examined was lower than the maximum in both cases.
|
Next, topologies and branch lengths are estimated as in the baseline method, except that a separate topology and set of branch lengths is used for each segment. The remaining model parameters, however, are shared across all segments. In particular, the parameters of models M1a and M2a describing the rate categories are estimated only once for all segments.
To allow fairer comparison with the unpartitioned methods, we present the results for the simulation experiments not only for the full unpartitioned sequence (Fig. 2, top), but also for an unpartitioned analysis of the sites in the largest unrecombined segments only (Fig. 2, middle). This latter result provides a more direct comparison with the partitioned analysis (Fig. 2, bottom) which uses the same subset of the codons.
| 4 RESULTS AND DISCUSSION |
|---|
|
|
|---|
4.1 Simulation experiments
We investigated power and false positive rates using the simulated datasets (summarized in Table 1). For each dataset we first considered the effect of allowing the synonymous rate to vary across sites and of separating the tree topology and branch length parameters between the segments defined by recombination breakpoints, given that the locations of the recombination breakpoints are known. This was done by retrieving the recombination breakpoints used in the simulations. We then present the power and false positive rates for the more realistic case in which the breakpoint locations are not known, but are instead inferred using a set of breakpoint detection algorithms (Martin et al., 2005).
4.1.1 True breakpoints
The neutral simulations provide a worst case (but nevertheless realistic) scenario with which to investigate false positive rates. We found (Table 2) that allowing the synonymous substitution rate to vary across sites brought about a large decrease in false positives relative to the standard method, but still left the false positive rate unacceptably high. Partitioning according to the true breakpoints (Table 2), on the other hand, brought false positive levels down to close to the desired rate. In this case, synonymous rate variation with partitioning did not give further improvement over partitioning alone. The decrease in false positives when partitioning has two causes. First, the fact that some data are discarded inevitably leads to a reduction in power: this can be seen by comparing the full sequence results with the largest unrecombined segments (LUS) results for the unpartitioned methods. Second, the partitioning itself causes a further reduction, which is the desired effect: the magnitude of this effect can be seen by comparing the results for the partitioning methods with the LUS results of the corresponding unpartitioned methods. Therefore, in order to see the effect of partitioning the phylogeny parameters between unrecombined segments or allowing the synonymous rate to vary on the false positive rates, the results obtained using these methods should be compared with those obtained by applying the standard method to the LUS.
|
The positive selection simulations provide a means to investigate power (Table 3). Again, some caution is required here because positive results could be artefacts of recombination rather than instances where the method detected the signal of positive selection. Nevertheless, when the false positive rate obtained on the corresponding set of neutral simulations is low, we can conclude that the result obtained on the positive selection simulations is a good indication of power.
|
For the case in which we assume that the true recombination breakpoints are known, the power was higher on the large dataset than on the small dataset. This was partly because the recombination levels were so high in the small dataset that the average segment length (for the 20 largest unrecombined segments) was <8 codons. In fact, given that tree topologies and branch lengths were inferred on such short segments, it is surprising that the method retains any power to discriminate between datasets with and without positive selection (as demonstrated by the higher rate of positives in the positive selection datasets than in the neutral datasets). This shows that, even when recombination creates what might appear to be a hopelessly fragmented evolutionary history, it can still be possible to perform reasonable inferences provided recombination is taken into account.
Inferring trees and branch lengths on very short segments for the partitioning method caused a large decrease in power for the small datasets, and possibly also a small increase in false positives. This is particularly noticeable for the partitioning method (without synonymous rate variation) applied to the small positive selection dataset, on which we obtained only 6% power at the 5% significance level. To confirm that this severe drop in power was caused by misestimation of tree topologies and branch lengths on the short segments we repeated the analysis, but with the topology and branch lengths for each segment fixed to the true (simulated) values. This resulted in 99% power (at both 5 and 1% significance levels), which is, as expected, identical to the result obtained for the corresponding unrecombined simulations. When the true topology was fixed but the branch lengths estimated as usual, the power was 14%(10%) at the 5%(1%) significance level. Thus the decrease in power can be attributed to inaccurate estimation of the branch lengths, which appears to become particularly acute when the segment lengths are this short. We caution that extremely short segment lengths (e.g. resulting from extremely high recombination rates such as in this simulation) may cause the proposed method to lack power.
4.1.2 Detected breakpoints
In real data, the true breakpoints are unknown and have to be detected by a recombination detection method. This has the disadvantage that there may be inaccuracy in the breakpoints detected, but may also have advantages in that recombination events that have little or no effect (for instance because they occur between closely related taxa and do not change the tree topology, as in Fig. 1a) will remain undetected, and thus will not have any negative effect on the power of the method. This could explain the results in Tables 4 and 5 where we found that using the detected breakpoints resulted in better performance (both a lower rate of false positives and higher power) than using the true breakpoints. In particular, the average segment lengths for the small datasets were longer, owing to the suppression of many presumably unimportant (and difficult to detect) recombination breakpoints. The longer segment lengths yielded improvements of the results obtained by methods using partitioning on these datasets.
|
|
Using the detected breakpoints, the power obtained using partitioning with synonymous rate variation on the small dataset was even higher than that obtained on the large dataset. This can be explained by the fact that the diversity in this dataset was much higher so that, once the false signal caused by recombination has been compensated for, the dataset contains more information that can be used to obtain inferences about selective pressure.
It is reassuring that modelling synonymous rate variation had very little effect on the recombination-free sequences: false positives were essentially unchanged while power decreased slightly. Partitioning had no effect: trivially, when the true breakpoints were used, there were no breakpoints to take into account so that the partitioning methods were identical to the corresponding unpartitioned methods. Recombination detection resulted in only a few falsely detected breakpoints (in 3 and 8 of the 100 replicates for the small neutral and small positive selection datasets respectively, and in none of the large datasets), but the inference of positive selection after partitioning gave a different result from that obtained without partitioning only for one replicate in the small positive selection dataset, and only at the higher of the two significance levels listed. Hence the proposed methods do not have negative effects when applied to unrecombined data.
4.2 Analysis of viral datasets
Next, we used the four methods to analyse the HIV-1 subtype C datasets for which we have previously shown (Scheffler and Seoighe, manuscript submitted) that the recombination levels are high enough to cause false inference of positive selection. Indeed, the standard method inferred positive selection on both data sets at very high levels of significance.
For the env data (Table 6) we detected 12 recombination breakpoints. We found that both modelling synonymous rate variation and partitioning (using 13 segments and discarding no data) caused reductions both in the significance level of the result and in the magnitude of positive selection inferred under the M2a model (as seen from the value of the
2 parameter), but that even when using both synonymous rate variation and partitioning we still detected positive selection at a highly significant level. We conclude that these sequences are likely to have evolved under both recombination and positive selection.
|
For the gag data (Table 7) we detected only four recombination breakpoints. This time, although partitioning (using five segments and discarding no data) without modelling synonymous rate variation did not remove the evidence of positive selection, the result was no longer significant when the synonymous rate was allowed to vary and even less so when synonymous rate variation and partitioning were combined. We conclude that, when recombination is taken into account, there is no convincing evidence that these sequences have evolved under positive selection.
|
| 5 CONCLUSIONS |
|---|
|
|
|---|
Our simulation results reveal that modelling synonymous rate variation tends to make inference of positive selection more conservative: both false positives and power go down. However, the levels of false positives observed in these simulations were still unacceptably high despite being much lower than when constant synonymous rates were assumed.
Using tree topology and branch lengths inferred separately for segments defined by detected recombination breakpoints caused a dramatic reduction in the false positive rate. For example, in the 10-taxon dataset we obtained an improvement from 94% false positives on the neutral simulations and 73% power on the positive selection simulations to 11% false positives on the neutral simulations and 91% power on the positive selection simulations. By combining partitioning with synonymous rate variation the false positive rate dropped further to an acceptable 2%, albeit at the cost of some reduction in power. The final power of 83% was nevertheless higher than the original power of 73%.
One of the most encouraging aspects of the simulation results was the performance of the partitioning methods using the detected recombination breakpoints. In the current set of simulations these methods performed better than the methods that used the simulated breakpoints, most likely because of the small segment lengths obtained when all of the recombination breakpoints were used. These results imply that the method we propose is not highly susceptible to inaccuracy in the detected breakpoints and that the majority of the benefit derived from partitioning appears to be obtained from the subset of most easily detectable recombination breakpoints.
We have not investigated the accuracy of site-specific selection detection using the proposed methods. In their simulation studies, Anisimova et al. (2003) and Shriner et al. (2003) found that site-specific analyses using standard phylogenetic methods are much more robust to recombination than whole-sequence analyses. This is consistent with our preliminary investigations (data not shown), in which we failed to find high levels of site-specific false positive inference using standard methods. More recently, Kosakovsky Pond et al. (2006) have found that under some conditions site-specific inference using a fixed effects likelihood method can also give highly misleading results in the presence of recombination. These authors found that the effects of recombination on site specific inference can be alleviated by analysing unrecombined segments separately and we therefore recommend that the method presented here should also be used for site-specific inference of positive selection when recombination is suspected.
Our results indicate that the proposed methods are able to filter out false inferences of positive selection on recombined sequences, but also have the power required to infer positive selection in such sequences when the signal of positive selection does exist. Furthermore we show that there is no evidence of a disadvantage of applying partitioning to sequences when the sequences have not in fact undergone recombination. In such cases few, if any, recombination breakpoints were detected and inferring the tree topology and branch length parameters separately for the resulting large unrecombined segments appeared to have no effect on the power or false positive rates. We therefore recommend that a method such as the one we describe, which includes a screen for recombination and separation of phylogeny parameters between recombination breakpoints, be applied routinely when phylogenetic methods are used to infer positive selection in sequences for which recombination is possible.
| Acknowledgments |
|---|
The authors thank Rasmus Nielsen for making his Codonrecsim program available to them, Fourie Joubert and David Posada for use of the Linux clusters at the University of Pretoria, South Africa and the University of Vigo, Spain, and Sergei Kosakovsky Pond for help with the HyPhy package and offering to incorporate the proposed methods into future distributions of HyPhy. This study was supported by the South African National Bioinformatics Network and by the National Institute of Allergy and Infectious Disease and the National Institutes of Health through the Centre for the AIDS Programme of Research in South Africa (grant no. 1U19AI51794). Funding to pay the Open Access publication charges was provided by the South African National Bioinformatics Network.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Keith A Crandall
Received on June 26, 2006; revised on July 31, 2006; accepted on August 1, 2006
| REFERENCES |
|---|
|
|
|---|
Anisimova, M., et al. (2003) Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics, 164, 12291236
Barton, N.H. and Etheridge, A.M. (2004) The effect of selection on genealogies. Genetics, 166, 11151131
Goldman, N. and Yang, Z. (1994) A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol, . 11, 725736[Abstract].
Hasegawa, M., et al. (1985) Dating the human-ape split by a molecular clock of mitochondrial DNA. J. Mol. Evol, . 22, 160174[CrossRef][Web of Science][Medline].
Hudson, R. (1983) Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol, . 23, 183201[CrossRef][Web of Science][Medline].
Kosakovsky Pond, S.L. and Muse, S.V. (2005) Site-to-site variation of synonymous substitution rates. Mol. Biol. Evol, . 22, 23752385
Kosakovsky Pond, S.L., et al. (2005) HyPhy: hypothesis testing using phylogenies. Bioinformatics, 21, 676679
Kosakovsky Pond, S.L., et al. (2006) Automated phylogenetic detection of recombination using a genetic algorithm. Mol. Biol. Evol, . msl051.
Martin, D.P. and Rybick, E. (2000) RDP: detection of recombination amongst aligned sequences. Bioinformatics, 16, 562563
Martin, D.P., et al. (2005) RDP2: recombination detection and analysis from sequence alignments. Bioinformatics, 21, 260262
Maynard Smith, J. (1992) Analysing the mosaic structure of genes. J. Mol. Evol, . 34, 126129[Web of Science][Medline].
Muse, S. and Gaut, B. (1994) A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol. Biol. Evol, . 11, 715724[Abstract].
Nielsen, R. and Yang, Z. (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics, 148, 929936
Padidam, M., et al. (1999) Possible emergence of new geminiviruses by frequent recombination. Virology, 265, 218225[CrossRef][Web of Science][Medline].
Poke, F., et al. (2006) The impact of intragenic recombination on phylogenetic reconstruction at the sectional level in Eucalyptus when using a single copy nuclear gene (cinnamoyl CoA reductase). Mol. Phylogenet. Evol, . 39, 160170[CrossRef][Web of Science][Medline].
Shriner, D., et al. (2003) Potential impact of recombination on sitewise approaches for detecting positive natural selection. Genet Res, . 81, 115121[CrossRef][Web of Science][Medline].
Swofford, D. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4, (2002) , Sunderland, Massachusetts Sinauer Associates.
Wong, W.S.W., et al. (2004) Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics, 168, 10411051
Yang, Z., et al. (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics, 155, 431449
This article has been cited by other articles:
![]() |
R. A. Medina, F. Torres-Perez, H. Galeno, M. Navarrete, P. A. Vial, R. E. Palma, M. Ferres, J. A. Cook, and B. Hjelle Ecology, Genetic Diversity, and Phylogeographic Structure of Andes Virus in Humans and Rodents in Chile J. Virol., March 15, 2009; 83(6): 2446 - 2459. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. van der Walt, E. P. Rybicki, A. Varsani, J. E. Polston, R. Billharz, L. Donaldson, A. L. Monjane, and D. P. Martin Rapid host adaptation by extensive recombination J. Gen. Virol., March 1, 2009; 90(3): 734 - 746. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Anisimova and C. Kosiol Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models Mol. Biol. Evol., February 1, 2009; 26(2): 255 - 271. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Delport, K. Scheffler, and C. Seoighe Models of coding sequence evolution Brief Bioinform, January 1, 2009; 10(1): 97 - 109. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. F. Y. Poon, F. I. Lewis, S. D. W. Frost, and S. L. Kosakovsky Pond Spidermonkey: rapid detection of co-evolving sites using Bayesian graphical models Bioinformatics, September 1, 2008; 24(17): 1949 - 1950. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Strain, L. A. Kelley, S. Schultz-Cherry, S. V. Muse, and M. D. Koci Genomic Analysis of Closely Related Astroviruses J. Virol., May 15, 2008; 82(10): 5099 - 5103. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. McCauley, S. de Groot, T. Mailund, and J. Hein Annotation of selection strengths in viral genomes Bioinformatics, November 15, 2007; 23(22): 2978 - 2986. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







