Skip Navigation


Bioinformatics Advance Access originally published online on September 6, 2005
Bioinformatics 2005 21(21):3943-3950; doi:10.1093/bioinformatics/bti654
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
21/21/3943    most recent
bti654v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (23)
Google Scholar
Right arrow Articles by Beerenwinkel, N.
Right arrow Articles by Däumer, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Beerenwinkel, N.
Right arrow Articles by Däumer, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oxfordjournals.org
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions{at}oxfordjournals.org

Computational methods for the design of effective therapies against drug resistant HIV strains

Niko Beerenwinkel 1,*, Tobias Sing 2, Thomas Lengauer 2, Jörg Rahnenführer 2, Kirsten Roomp 2, Igor Savenkov 2, Roman Fischer 3, Daniel Hoffmann 3, Joachim Selbig 4, Klaus Korn 5, Hauke Walter 5, Thomas Berg 6, Patrick Braun 7, Gerd Fätkenheuer 8, Mark Oette 10, Jürgen Rockstroh 11, Bernd Kupfer 12, Rolf Kaiser 9 and Martin Däumer 9

1Department of Mathematics, University of California Berkeley, CA, USA
2Max Planck Institute for Informatics Saarbrücken, Germany
3Center of Advanced European Studies and Research Bonn, Germany
4Max Planck Institute of Molecular Plant Physiology and University of Potsdam Germany
5Institute of Clinical and Molecular Virology, University of Erlangen-Nürnberg Erlangen, Germany
6Medical Laboratory Berlin, Germany
7PZB Aachen, Germany
8Department of Internal Medicine, University of Cologne Germany
9Institute of Virology, University of Cologne Germany
10Department of Gastroenterology, University of Düsseldorf Germany
11Department of Internal Medicine, University of Bonn Germany
12Institute of Medical Microbiology and Immunology, University of Bonn Germany

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 DATA MANAGEMENT
 3 FROM GENOTYPE TO...
 4 EVOLUTIONARY PATHWAYS
 5 THERAPY OPTIMIZATION
 6 CONCLUSIONS
 REFERENCES
 

Summary: The development of drug resistance is a major obstacle to successful treatment of HIV infection. The extraordinary replication dynamics of HIV facilitates its escape from selective pressure exerted by the human immune system and by combination drug therapy. We have developed several computational methods whose combined use can support the design of optimal antiretroviral therapies based on viral genomic data.

Contact: niko{at}math.berkeley.edu


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 DATA MANAGEMENT
 3 FROM GENOTYPE TO...
 4 EVOLUTIONARY PATHWAYS
 5 THERAPY OPTIMIZATION
 6 CONCLUSIONS
 REFERENCES
 
Persons infected with human immunodeficiency virus type 1 (HIV-1) are highly susceptible to develop the acquired immunodeficiency syndrome (AIDS), a major global threat to human health. HIV-1 is a retrovirus with a 9.2 kb genome coding for 15 viral proteins. Currently, 19 drugs targeting three distinct steps in the viral replication cycle are available for antiretroviral therapy. These drugs can be grouped into four different classes, according to their target and mechanism of action. Nucleoside and nucleotide analogs act as chain terminators in reverse transcription of RNA to DNA. Non-nucleoside reverse transcriptase inhibitors bind to and inhibit reverse transcriptase (RT), a viral enzyme that catalyzes reverse transcription. Protease inhibitors target the HIV protease, which is involved in maturation of released viral particles by cleaving precursor proteins. Finally, entry inhibitors block the penetration of HIV virions into their target cells.

Cell entry is a complex process mediated by sequential interactions of the viral proteins gp120 (envelope) and gp41 (transmembrane) with the cellular CD4 receptor and a co-receptor, usually CCR5 or CXCR4, depending on the individual virion. Consequently, different types of entry inhibitors have been proposed: fusion inhibitors prevent merging of viral and host cell membranes by binding to the transmembrane protein gp41. In contrast, co-receptor antagonists bind to the host protein prior to membrane fusion.

The available antiretroviral agents are applied in combination therapies—so-called highly active antiretroviral therapy (HAART), typically comprising two nucleoside analogs and either a protease inhibitor or a non-nucleoside RT inhibitor. However, therapeutic success, even of HAART, is limited. Antiretroviral therapy is not able to eradicate HIV, and durable suppression of virus replication below detectable limits is achieved in only a fraction of patients. Drug resistance can be the cause of treatment failure and is almost always a consequence of it (Clavel and Hance, 2004; DeGruttola et al., 2000).

1.1 Drug resistance
The intrapatient virus population is a highly dynamic system, characterized by high virus production and turnover rates and a high mutation rate. These evolutionary dynamics are the basis for a large and diversified virus population that predisposes or quickly generates resistance mutations. In a replicating population escape mutants with a selective advantage under therapy become dominant and lead to increased virus production and eventually to therapy failure. A number of mutations in protease, RT and gp41 have been associated with resistance to different antiviral agents (Shafer et al., 2000). Each drug has its own characteristic resistance profile reflecting its chemical properties and mechanism of action. Nevertheless, cross-resistance (i.e. resistance against an unused drug) is common between drugs from the same class. Therefore, HAART advocates the use of two different drug classes in order to reduce the likelihood of a mutant to resist all drugs in the combination and to suppress viral replication more effectively (Jordan et al., 2002).

After treatment failure, the shifted population may be hit with a new drug combination, but finding such a potent regimen is challenging. Cross-resistance severely limits the remaining treatment options and the success of subsequent regimens is further impaired. The interplay between development of drug resistance and insufficient suppression of virus replication can eventually lead to situations in which the currently available drugs cannot at all control replication any longer. In the United States, as many as 50% of patients receiving HAART carry a virus that is resistant to at least one of the approved drugs (Richman et al., 2004). Furthermore, transmission of drug-resistant viruses is estimated to occur in ~15% of persons newly diagnosed with HIV infection in the United States (Bennett et al., 2005).

Since cross-resistance is frequent, treatment changes cannot be based on the assumption that the virus will remain susceptible to the unused drugs. Therefore, resistance testing has become an important diagnostic tool in the management of HIV-infections (Perrin and Telenti, 1998). Resistance testing can be performed either by measuring viral activity in the presence and absence of a drug (phenotypic resistance testing), or by sequencing the viral genes coding for the drug targets (genotypic resistance testing). Genotypic assays are much faster and cheaper, but sequence data provide only indirect evidence of resistance.

The Arevir project is a collaborative effort between clinicians, virologists and computational biologists to exploit genotype data from genotypic resistance tests for the individual selection of optimal drug combinations. We have developed several computational methods for the analysis of integrated genotypic, phenotypic and clinical data. Our goal was to provide tools for supporting personalized genotype-driven treatment decisions.

1.2 Challenges
The following questions have been addressed and approaches to their solutions will be described in the following sections.

  1. Data integration. A prerequisite for any attempt to use genotypic data in a clinical setting is to provide this information at the right time and place. Resistance testing is often performed in specialized virological laboratories separated from the clinical department. Furthermore, most clinical data management systems are not prepared to handle sequence data. Thus, our first task is to collect, organize, and integrate all relevant patient data.
  2. Phenotype prediction from genotypes. The first step in interpreting genotypic data is to understand the effect of single mutations and to relate mutational patterns to the in vitro phenotype. We have addressed predicting phenotypic drug resistance from the viral drug targets as well as prediction of co-receptor usage from gp120. Both models can augment the cheaper and faster genotypic test with a prediction of the phenotype, namely the susceptibility to each of the drugs and the co-receptor in use. This piece of information is important for the choice of therapy.
  3. Evolution of drug resistance. Understanding the mutational pathways that lead to resistant strains is important for two reasons. First, this knowledge allows for estimating the distance of a virus population to escape from drug pressure, a quantity referred to as the genetic barrier. Second, the prediction of mutational pathways makes it possible to design sequences of therapies rather than one regimen at a time. We have addressed the problem of estimating evolutionary pathways from sequence data.
  4. Therapy optimization. Our ultimate goal is to determine optimal drug combinations on the basis of genotypic information. For this task, we need to estimate the in vivo effect of a drug combination on a given viral genotype and to identify the regimen that maximizes clinical response. In addressing these problems we make use of both the in vitro phenotype predictions and the estimated evolutionary pathways.
For each of the four challenges we present computational approaches and indicate the biological or clinical impact. We show how the developed tools can be linked together in order to support the selection of effective therapies against drug-resistant HIV strains.


    2 DATA MANAGEMENT
 TOP
 Abstract
 1 INTRODUCTION
 2 DATA MANAGEMENT
 3 FROM GENOTYPE TO...
 4 EVOLUTIONARY PATHWAYS
 5 THERAPY OPTIMIZATION
 6 CONCLUSIONS
 REFERENCES
 
In order to meet the data integration and management challenge we have developed the Arevir database, a secure electronic platform for collaborative research aimed at optimizing anti-HIV therapies. This system is designed to facilitate data exchange, improve diagnostics, support medical decisions and provide the basis for data analysis.

2.1 Database schema
In managing HIV-infected patients a number of different types of data arise, including personal patient data, therapy histories, numerous virologic, immunologic and other clinical test results derived from patient samples from different tissues, and sequence data, e.g. from genotypic resistance tests. Our database schema captures these data types in different modules, consisting of a few tables each (Beerenwinkel, 2004).

There is an important relationship between sequences and therapies via the drug targets. The compounds making up a combination therapy target specific viral proteins. DNA segments coding for these proteins are sequenced in order to gain information on the level of resistance that has been developed by the virus. Thus, given the values of clinical markers the data model allows for asking for outcomes of therapy types versus mutational patterns within the drug targets. This is the central question of the Arevir project. It will be revisited in a later section.

2.2 Implementation
The data model has been implemented in the open source relational database management system MySQL. A secured client/server architecture allows for remote access to the centralized database. Since sensitive patient data are involved, this setting needs to meet the security demands imposed by state and national law. In addition, we have developed a web interface to the database for clinicians and virologists. For these users the appropriate view on the data is through a single patient or a single patient sample. Thus, treating physicians as well as lab personnel get access to an integrated view onto all relevant data for one patient. For example, they can evaluate a genotypic resistance test result in the context of the patient's medical history and current immunologic status. Moreover, applying the developed computational tools yields phenotypic interpretations of the genotypes. As of 2005, the Arevir database comprises 5720 patients, 9685 therapies, 5065 DNA sequences, and 146 539 laboratory test results from seven different institutions including three clinical centers and two virological laboratories.

2.3 Public databases
In addition to our pooled cohort data, public datasets can also provide valuable information on sequence variation and response to therapy. A major resource for clinical trials data is the AIDS Clinical Trials Group (http://aactg.org). The Los Alamos National Laboratories maintain databases of annotated HIV sequence data, drug-resistance mutations, HIV epitopes and vaccine trials results (http://www.hiv.lanl.gov). The Stanford HIV Drug Resistance Database contains sequences coding for the drug targets of antiretroviral therapy, drug susceptibility data and therapy histories where publicly available (http://hivdb.stanford.edu).


    3 FROM GENOTYPE TO PHENOTYPE
 TOP
 Abstract
 1 INTRODUCTION
 2 DATA MANAGEMENT
 3 FROM GENOTYPE TO...
 4 EVOLUTIONARY PATHWAYS
 5 THERAPY OPTIMIZATION
 6 CONCLUSIONS
 REFERENCES
 
Genotype–phenotype relations are much easier to study if the phenotype is determined by a well-defined lab experiment than for in vivo phenotypes that depend on many factors, which can confound the analysis. Therefore, predicting in vitro phenotypes from HIV genotypes is a good starting point for sequence interpretation.

3.1 Drug susceptibility
Prediction of phenotypic drug resistance from genotypes is based on matched genotype–phenotype pairs derived from patients failing antiretroviral therapy. For each drug, phenotypic resistance is determined in a recombinant virus assay (Kellam and Larder, 1994; Walter et al., 1999). In this experiment the replication capacity of the virus is measured as a function of drug concentration. The drug–response relationship is summarized by the resistance factor (or the fold-change in susceptibility), defined as the ratio between the amount of drug necessary to inhibit replication of the virus by 50% and the corresponding value for a standardized wild-type virus. Coefficients of variation between 10 and 60% have been reported for the resistance factor (Walter et al., 1999). However, determination of genotypes by cycle-sequencing is highly reproducible, but the common population sequencing strategy detects only those variants that are present in at least 20% of viruses in the population. For drug-resistance testing, the full protease (99 amino acids), the 5' part of the RT (typically the first 250–300 residues), and possibly parts of gp41 and gp120 are sequenced. To predict the resistance phenotype from the genotype means to solve, for each drug, the regression problem with predictors being the sequence positions of the drug target and response being the resistance factor. Alternatively, we may consider the related binary classification problem induced by choosing a drug-specific cutoff to define a susceptible and a resistant class of viruses.

A number of machine learning approaches to resistance phenotype prediction from genotypes have been proposed including neural networks (Draghici and Potter, 2003; Wang and Larder, 2003), recursive partitioning (Sevin et al., 2000), linear stepwise regression (Wang et al., 2004), and more elaborate statistical models (Foulkes and DeGruttola, 2002; DiRienzo et al., 2003). We discuss in more detail support vector machine (SVM) regression (Beerenwinkel, 2001) and decision tree classification (Beerenwinkel, 2002), which serve as the engine for a widely used web-based prediction tool (cf. Section 5.2).

For SVM regression, sequences are mapped into an Euclidean vector space by introducing 20 indicator variables for each amino acid position of the multiple sequence alignment. The SVM learning strategy is suitable for this type of high-dimensional noisy data. Table 1 summarizes the performance of the regression models on a set of 650 genotype–phenotype pairs (Beerenwinkel et al., 2002).


View this table:
[in this window]
[in a new window]
 
Table 1 SVM regression models

 
SVMs are among the best performing machine-learning methods in terms of prediction accuracy. However, other methods are advantageous if interpretation of the learned model is intended. We have applied decision trees to the classification problem described above in order to elucidate the effect of mutational patterns on the resistance phenotype. This analysis has revealed concise models incorporating only 4–7 sequence positions as compared with some 10–20 positions that are associated with resistance (Johnson et al., 2004). Moreover, decision trees can model the effect of a mutation in the context of other mutations. In particular, some decision trees display resensitization or hypersusceptibility effects. For example, zidovudine resistance induced by mutation T215Y in the RT may be reverted by mutations L74V/I and M184I/V. The latter substitution can also resensitize tenofovir resistant strains (Wolf et al., 2003). Likewise, mutation N88S in the protease gene has been found to increase susceptibility to amprenavir.

3.2 Co-receptor usage
The effective use of co-receptor antagonists that target a particular co-receptor depends on the ability of determining prior to drug application the type of co-receptor used by the virus for cell entry. In fact, careful monitoring of viral co-receptor usage is mandatory during such treatment, because few mutations in the envelope protein gp120 of HIV are sufficient for switching to another co-receptor. In addition, a switch from CCR5 to CXCR4 has been associated with accelerated progression towards AIDS. Since experimental determination of co-receptor usage is costly, the availability of sequence-based methods would be advantageous for routine clinical practice with upcoming CCR5 antagonists.

We have analyzed this genotype–phenotype relation in 1100 sequences of the third hypervariable (V3) region of gp120 for which co-receptor usage had been determined experimentally. To accommodate for the extraordinary genetic variability within this region, sequences were aligned to a fixed reference multiple alignment containing representatives of all HIV-1 subtypes. We compared decision trees, SVMs (Pillai et al., 2003), neural networks (Resch et al., 2001), position-specific scoring matrices (Jensen et al., 2003) and a classical rule based on charge of amino acids at positions 11 and 25 in the V3 loop (Fouchier et al., 1995). Using ROCR (Sing et al., 2005b), a comprehensive tool for evaluating classifier performance, we found SVMs to outperform the other methods. In an effort to attain this current gold standard in performance with a model that lends itself more readily for interpretation, we have suggested mixtures of localized rules (Sing et al., 2004), a novel weighted voting strategy for rules-based classifiers. Rules, describing specific mutational patterns, are localized in the sense that their associated weights are modulated in an instance-dependent manner based on the genetic background in which the pattern occurs. This method significantly outperformed classical decision tree building, thus representing an alternative for knowledge extraction.


    4 EVOLUTIONARY PATHWAYS
 TOP
 Abstract
 1 INTRODUCTION
 2 DATA MANAGEMENT
 3 FROM GENOTYPE TO...
 4 EVOLUTIONARY PATHWAYS
 5 THERAPY OPTIMIZATION
 6 CONCLUSIONS
 REFERENCES
 
Under suboptimal therapy the virus population continuously replicates and acquires new resistance mutations. This process occurs in a non-uniform, stochastic fashion and gives rise to co-existing evolutionary pathways. Understanding this evolutionary process is important for estimating the proximity of a virus to escape from drug pressure. We use mutagenetic trees, a family of probabilistic graphical models, to estimate rate and order of occurrence of resistance-associated mutations in the viral drug targets.

4.1 Mutagenetic trees
We consider a set of n specific amino acid changes (mutations) that develop under drug treatment. A mutagenetic tree for these n mutations is a connected branching on {0,...,n} rooted at 0 (Fig. 1). Each vertex v != 0 represents the binary random variable Xv that indicates the occurrence of mutation v. We associate probability parameters {theta}v with the tree edges to obtain a directed acyclic graphical model with conditional probability matrices

where pa(v) denotes the parent of v in the tree. The first row of this matrix imposes the constraint that a mutation can occur only if all of its ancestor mutations have already occurred. A mutagenetic tree defines a probability distribution on the set of all possible mutational patterns. In particular, this model family includes linear path models (chains) and the model of complete independence given by the star topology. It is possible to characterize the complete family of mutagenetic tree models by their algebraic invariants, which turn out to have a simple combinatorial structure (Beerenwinkel et al., 2005). Mutagenetic trees can be reconstructed from observed cross-sectional data by Edmonds' maximum weight branching algorithm involving only pair-wise probabilities (Desper et al., 1999).



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 1 Mutagenetic tree for the development of zidovudine resistance. Vertices denote amino acid changes from the wild-type, edges are labeled with conditional probabilities (A) and expected waiting times in days (B), respectively.

 
We have extended the single tree model to mixture models of mutagenetic trees that combine several weighted trees (Beerenwinkel et al., 2005c). The first tree component is a star with uniform probabilities that models the spontaneous and independent occurrence of mutations. All other components represent dependencies between mutations and are estimated from the data. The mixture model is learned by an Expectation–Maximization Algorithm that iteratively estimates the expected values of the missing data (i.e. the association of samples to the trees) and the structure and parameters of the trees. For model selection (choosing the number of tree components) we either use cross-validation or a modified Bayesian Information Criterion that includes an estimate of the structural redundancy between tree components (J. Yin, N. Beerenwinkel, J. Rahnenführer and T. Lengauer submitted for publication). Mtreemix, a software package for statistical inference with mutagenetic trees and mixtures of these, is described in Beerenwinkel et al. (2005a).

Assuming independent Poisson processes for the occurrence of mutations and for the observed sampling times (i.e. the time on therapy) with rates {lambda}v and {lambda}S, respectively, we find

This relation allows for translating the estimated conditional probabilities between mutations into the expected waiting time for the mutation to occur. Furthermore, the probabilities of occurrence of any mutational pattern can be computed for any fixed mean waiting time. Hence, using these timed mutagenetic trees we can compare models that have initially been estimated from datasets sampled after different mean waiting times.

Figure 1 shows a mutagenetic tree and the corresponding timed mutagenetic tree for the development of drug resistance in the HIV RT under therapy with zidovudine, the first anti-HIV drug approved. The tree has been estimated from 364 genotypes derived from previously untreated patients under zidovudine mono-therapy (Beerenwinkel et al., 2005b). This dataset is publicly available at the Stanford HIV Drug Resistance Database (Rhee et al., 2003). The model displays two characteristic pathways, namely the 70-219 and the 215-41 pathways (cf. Boucher et al., 1992).

4.2 Genetic barrier
Suppose we have estimated a mutagenetic tree model for the development of resistance to a certain drug. In particular, this model can be used to compute transition probabilities between mutational patterns. As described in the previous section we can predict the resistance phenotype from the genotype. Using a classifier restricted to the set of n mutations we predict each mutational pattern to be either susceptible or resistant. Now, for a given virus we may ask what the transition probability to any resistant state is. In fact, this question is crucial for minimizing the risk of resistance development with the next regimen. We refer to the genetic barrier as the probability of not reaching any resistant state after a fixed time period under therapy. This quantity can be calculated as the sum of the probabilities of all mutational patterns predicted as susceptible. Thus, a higher genetic barrier indicates that the virus is less likely to become resistant.

For example, Table 2 shows the genetic barriers to both low level and high level zidovudine resistance of the wild type virus under three different regimens, namely zidovudine monotherapy, double therapy with zidovudine plus lamivudine, and double therapy with zidovudine plus didanosine. The underlying mutagenetic tree model is the tree displayed in Figure 1 scaled to a mean sampling time of 96 weeks. As expected, the genetic barrier to zidovudine is always higher under the combination of zidovudine plus lamivudine than under zidovudine alone, because these drugs do not share any resistance mutations. More surprisingly, we find that zidovudine resistance appears to develop faster under zidovudine plus didanosine than under zidovudine monotherapy. This effect may be explained by the stronger selective pressure exerted by the double therapy and the cross-resistance profile of zidovudine and didanosine (Beerenwinkel et al., 2005b; Brun-Vezinet et al., 1997). Thus, the genetic barrier is a useful concept for designing effective treatment strategies.


View this table:
[in this window]
[in a new window]
 
Table 2 Genetic barriers of the wild-type virus to resistance to zidovudine (ZDV) under the three regimens zidovudine monotherapy, zidovudine + lamivudine (3TC) double therapy, and zidovudine + didanosine (ddI) double therapy

 

    5 THERAPY OPTIMIZATION
 TOP
 Abstract
 1 INTRODUCTION
 2 DATA MANAGEMENT
 3 FROM GENOTYPE TO...
 4 EVOLUTIONARY PATHWAYS
 5 THERAPY OPTIMIZATION
 6 CONCLUSIONS
 REFERENCES
 
The computational task of identifying optimal antiretroviral drug combinations with respect to a given viral genotype is a typical bioinformatics problem (such as sequence alignment) in the sense that the objective function of the optimization problem is not known. In fact, we need to know the in vivo effect of any drug combination on any mutational pattern in order to find the best regimen. Typical clinical parameters of interest are the virus load (the amount of plasma HIV RNA) and the number of CD4+ cells (T-lymphocytes). Estimating these response functions is much more challenging than actually selecting the optimal drug therapy. Indeed, the number of drug combinations is only on the order of thousands, and hence they can be enumerated. By contrast, HIV's high genetic diversity induces a much higher number of mutational patterns over all drug targets. Furthermore, clinical response is influenced by several factors other than resistance, including patient adherence, immunological status and baseline virus load.

One way to estimate the activity of a therapeutic regimen against a viral strain is to learn this effect from an observational clinical database such as the Arevir database. This is straightforward, if we fix a combination therapy or a narrowly defined type of therapy. In this case, machine-learning approaches similar to those presented in a previous section can be used to predict clinical response. However, if the drug combination is not fixed, direct learning from cohort data is limited by the amount of data necessary to derive useful models, because now the complexity of the problem depends on both mutational patterns and drug combinations (DiRienzo and DeGruttola, 2002). Furthermore, the distribution of drug combinations in clinical databases is heavily skewed, reflecting approval times and treatment strategies over time (Beerenwinkel, 2004). Thus, training on such datasets is likely to result in models that capture the features of only a few frequently observed combinations, but are not appropriate to explore the product space of all mutational patterns and drug combinations.

5.1 Scoring functions
An alternative approach to general response prediction is to score drug combinations on the basis of single drug effects. This implies assuming a functional dependency of the effect of a drug combination on the single drug effects. The simplest way of doing this is to use a classifier for resistance phenotype prediction and to count the number of active drugs in a combination, i.e. the number of drugs for which the virus is predicted susceptible. For example, De Luca et al. (2003) have separated 332 patients according to viral genotype and therapy. SVM based phenotype predictions were used to define one group of patients with two or fewer drugs predicted as active and another group with three or more active drugs. Using a Cox proportional hazards model they show that patients in the group with at most two active drugs are at significantly higher risk of virological failure (Fig. 2). As compared with 11 other interpretation systems that are based on expert rules, only this data-driven approach yielded significant predictions of virological response.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 2 Risk of virological failure (two consecutive virus load values of >500 cps/ml after 24 weeks of therapy) as a function of the number of weeks on therapy. Two patient groups are distinguished according to whether the number of drugs scored as active is <3 or not. The two groups experience a significantly different risk of virological failure. (Data kindly provided by Andrea De Luca, Catholic University, Rome.)

 
A natural refinement of this scoring scheme is to sum over the real-valued predicted resistance factors instead of the binary resistance predictions. However, the dynamic range of resistance factors varies by as much as two orders of magnitude between different drugs. In order to normalize these values we estimate their distribution over a large random sample of 2000 genotypes. Since bimodality is a common feature for all drugs, we model this density by a Gaussian mixture model,

whose parameters can be estimated by the Expectation–Maximization Algorithm. This two-state model provides a data-derived definition of susceptible and resistant. By linearizing the log-likelihood ratio between these two classes, we obtain the activity score, which approximates the conditional probability of membership in the susceptible class given the viral genotype (Beerenwinkel et al., 2003a). Thus, the activity score provides a normalized and comparable measure of resistance, and we can extend it to multi-drug therapies by summing over all drugs in the combination.

Similarly, we can use the genetic barrier of the virus to resistance to each of the compounds of the regimen (Fig. 3). Summing these values provides an estimate of how easy it is for the virus to escape from the selective pressure of the combination therapy. As demonstrated in Section 4.2 this genetic barrier score can be different from the genetic barrier of the drug combination. We confine ourselves with this approximation, because estimating the genetic barrier for all drug combinations would again require, for each combination, many samples derived from patients under the respective regimen. Despite these simplifications both the activity score and the genetic barrier score are predictive of virological response. Figure 4 shows their performance of classifying genotype–therapy pairs on a special and instructive dataset consisting of 64 sequences, each paired with one successful and one failing regimen. The genotype alone does not provide any useful information for classifying these pairs. Similarly, by randomizing the genotype data, we see that the therapy data alone do not give rise to a competitive classifier either. The noticeably best performance is obtained on the combined genotype–therapy data. Thus, the learned concept is specific for the combined effect of drug combination and mutational pattern. The genetic barrier score, which makes use of three different types of datasets (Fig. 3), performs best.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 3 Data flow. White boxes indicate different types of datasets, shaded boxes symbolize computational models inferred from the data (implemented tools in italics).

 


View larger version (15K):
[in this window]
[in a new window]
 
Fig. 4 Error rates for different scoring functions on a set of 128 genotype–therapy pairs in which each genotype occurs exactly twice, once with a drug combination resulting in a successful therapy (defined as undetectable virus load), and once with another drug combination resulting in therapy failure (defined as virus load >1000 cps/ml). From left to right: activity scores (act), with sequences randomized (act_rs), with therapies randomized (act_rt), genetic barrier scores (bar), with sequences randomized (bar_rs), with therapies randomized (bar_rt).

 
In a related approach we have estimated the proximity of the virus to an escape state more conservatively. Applying a heuristic greedy search, we explore the mutational neighborhood of the viral sequence by successively introducing point mutations and following the in silico mutants that reduce the activity of the regimen most. The estimated ‘worst case’ activities were used in a regression model to predict the expected drop in virus load (Beerenwinkel et al., 2003b).

5.2 Geno2pheno
We have implemented the web server geno2pheno (http://www.genafor.org) that provides interpretations of genotypic test results in terms of phenotype predictions (Beerenwinkel et al., 2003a; Sing et al., 2005a). The system predicts co-receptor usage from submitted HIV-1 V3 loop sequences as well as phenotypic resistance to 17 antiretroviral agents from protease and RT sequences. The output also includes activity scores rendering predictions comparable between drugs. An additional software tool, theo, for selecting and evaluating drug combinations on the basis of the different scoring functions discussed above is currently validated and tested by virologists and clinicians. Since December 2000, geno2pheno has made 35 000 online resistance predictions and since June 2004 >1000 co-receptor predictions. The system is used worldwide by virologists performing genotypic resistance tests as well as by clinicians seeking effective drug combinations.


    6 CONCLUSIONS
 TOP
 Abstract
 1 INTRODUCTION
 2 DATA MANAGEMENT
 3 FROM GENOTYPE TO...
 4 EVOLUTIONARY PATHWAYS
 5 THERAPY OPTIMIZATION
 6 CONCLUSIONS
 REFERENCES
 
In order to support clinical decision making on the basis of viral genomic data, we have developed and applied several computational methods and tools. Specifically, we have addressed data integration and management (Arevir database), prediction of drug resistance and co-receptor usage from genotypes (geno2pheno), modeling of the evolution of drug resistance and the genetic barrier by mutagenetic trees (mtreemix), and selection of optimal drug combinations (theo). The integration of various types of genomic, phenotypic and clinical data as well as the coupling of different computational models yields predictive models of therapy outcome that may support the design of combination therapies.

6.1 Future work
Further factors of therapeutic outcome, involving pharmacological, viral and host factors, need to be accounted for in future work. For example, pharmacokinetic properties of drugs and their specific realization in different patients (pharmacogenomics) are important predictors. The amount of drug actually present in infected cells may yield more accurate predictions of the development of resistance. Besides resistance, replication capacity (fitness) is another viral property currently investigated. It depends on phenotypic properties of many viral proteins, such as protease cleavage rate or RT error rate. It may be expected that a fitness estimate based on integrating these predictions into a model of the viral replication cycle will lead to improved predictions. Finally, there is strong evidence that viral evolution is, in part, also host-dependent. In particular, we have started to study the impact of the host HLA genotype on the development of viral escape mutations (Roomp et al., 2005).


    Acknowledgments
 
The Arevir project, including Open Access publication charges for this article, has been funded by Deutsche Forschungsgemeinschaft (DFG) under Grant No. HO 1582/1-3 and KA 1569/1-3. N.B. acknowledges funding from DFG under Grant No. BE 3217/1-1. J.R. has been funded by BMBF under Grant No. 01GR0453. The work at the Max-Planck Institute of Computer Science has been performed partly in the context of the BioSapiens Network of Excellence (EU Grant No. LSHG-CT-2003-503265).

Conflict of Interest: none declared.

Received on June 8, 2005; revised on July 27, 2005; accepted on August 30, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 DATA MANAGEMENT
 3 FROM GENOTYPE TO...
 4 EVOLUTIONARY PATHWAYS
 5 THERAPY OPTIMIZATION
 6 CONCLUSIONS
 REFERENCES
 

    Beerenwinkel, N. Computational Analysis of HIV Drug Resistance Data., (2004) , Aachen, Germany Shaker.

    Beerenwinkel, N. and Drton, M. (2005) Mutagenetic tree models. In Pachter, L. and Sturmfels, B. (Eds.). Algebraic Statistics for Computational Biology, , Oxford, UK Oxford University Press, pp. 278–290.

    Beerenwinkel, N., et al. (2001) Geno2pheno: interpreting genotypic HIV drug resistance tests. IEEE Intell. Syst., 16, 35–41.

    Beerenwinkel, N., et al. (2002) Diversity and complexity of HIV-1 drug resistance: a bioinformatics approach to predicting phenotype from genotype. Proc. Natl Acad. Sci. USA, 99, 8271–8276[Abstract/Free Full Text].

    Beerenwinkel, N., et al. (2003a) Geno2pheno: estimating phenotypic drug resistance from HIV-1 genotypes. Nucleic Acids Res., 31, 3850–3855[Abstract/Free Full Text].

    Beerenwinkel, N., et al. (2003b) Methods for optimizing antiviral combination therapies. Bioinformatics, 19, i16–i25[Abstract].

    Beerenwinkel, N., et al. (2005a) Mtreemix: a software package for learning and using mixture models of mutagenetic trees. Bioinformatics, 21, 2106–2107[Abstract/Free Full Text].

    Beerenwinkel, N., et al. (2005b) Estimating HIV evolutionary pathways and the genetic barrier to drug resistance. J. Infect. Dis., 191, 1953–1960[CrossRef][Web of Science][Medline].

    Beerenwinkel, N., et al. (2005c) Learning multiple evolutionary pathways from cross-sectional data. J. Comput. Biol., 12, 584–598[CrossRef][Web of Science][Medline].

    Bennett, D., McCormick, L., Kline, R., Wheeler, W., Hemmen, M., Smith, A., Zaidi, I., Dondero, T. The HIV Drug Resistance ARVDRT/VARHS Surveillance Group. (2005) U.S. surveillance of HIV drug resistance at diagnosis using HIV diagnostic sera [abstract 674]. 12th Conference on Retroviruses and Opportunistic InfectionsBoston, MA , pp. 309.

    Boucher, C., et al. (1992) Ordered appearance of zidovudine resistance mutations during treatment of 18 human immunodeficiency virus-positive subjects. J. Infect. Dis., 165, 105–110[Web of Science][Medline].

    Brun-Vezinet, F., et al. (1997) HIV-1 viral load, phenotype, and resistance in a subset of drug-naïve participants from the Delta trial. Lancet, 350, 983–990[CrossRef][Web of Science][Medline].

    Clavel, F. and Hance, A.J. (2004) HIV drug resistance. N. Engl. J. Med., 350, 1023–1035[Free Full Text].

    DeGruttola, V., et al. (2000) The relation between baseline HIV drug resistance and response to antiretroviral therapy: re-analysis of retrospective and prospective studies using a standardized data analysis plan. Antivir. Ther., 5, 41–48[Web of Science][Medline].

    De Luca, A., et al. (2003) The prognostic value to predict virological outcomes of 14 distinct systems used to interpret the results of genotypic HIV-1 drug resistance testing in untreated patients starting their first HAART. HIV Med., 4, 20.

    Desper, R., et al. (1999) Inferring tree models for oncogenesis from comparative genome hybridization data. J. Comput. Biol., 6, 37–51[Web of Science][Medline].

    DiRienzo, G. and DeGruttola, V. (2002) Collaborative HIV resistance-response database initiatives: sample size for detection of relationships between HIV-1 genotype and HIV-1 RNA response using a non-parametric approach. Antivir. Ther., 7, S71[Web of Science].

    DiRienzo, A.G., et al. (2003) Nonparametric methods to predict HIV drug susceptibility phenotype from genotype. Stat. Med., 22, 2785–2798[CrossRef][Web of Science][Medline].

    Draghici, S. and Potter, R. (2003) Predicting HIV drug resistance with neural networks. Bioinformatics, 19, 98–107[Abstract/Free Full Text].

    Fouchier, R.A., et al. (1995) Simple determination of human immunodeficiency virus type 1 syncytium-inducing V3 genotype by PCR. J. Clin. Microbiol., 33, 906–911[Abstract].

    Foulkes, A.S. and DeGruttola, V. (2002) Characterizing the relationship between HIV-1 genotype and phenotype: prediction based classification. Biometrics, 58, 145–156[CrossRef][Web of Science][Medline].

    Jensen, M.A., et al. (2003) Improved coreceptor usage prediction and genotypic monitoring of R5-to-X4 transition by motif analysis of human immunodeficiency virus type 1 env V3 loop sequences. J. Virol., 77, 13376–13388[Abstract/Free Full Text].

    Johnson, V.A., et al. (2004) Update of the drug resistance mutations in HIV-1: 2004. Top. HIV Med., 12, 2004.

    Jordan, R., et al. (2002) Systematic review and meta-analysis of evidence for increasing numbers of drugs in antiretroviral combination therapy. Br. Med. J., 324, 1–10[Free Full Text].

    Kellam, P. and Larder, B. (1994) Recombinant virus assay: a rapid, phenotypic assay for assessment of drug susceptibility of human immunodeficiency virus type 1 isolates. Antimicrob. Agents Chemother., 38, 23–30[Abstract/Free Full Text].

    Perrin, L. and Telenti, A. (1998) HIV treatment failure: testing for HIV resistance in clinical practice. Science, 280, 1871–1873[Abstract/Free Full Text].

    Pillai, S., et al. (2003) A new perspective on V3 phenotype prediction. AIDS Res. Hum. Retroviruses, 19, 145–149[CrossRef][Web of Science][Medline].

    Resch, W., et al. (2001) Improved success of phenotype prediction of the human immunodeficiency virus type 1 from envelope variable loop 3 sequence using neural networks. Virology, 288, 51–62[CrossRef][Web of Science][Medline].

    Rhee, S.-Y., et al. (2003) Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Res., 31, 298–303[Abstract/Free Full Text].

    Richman, D.D., et al. (2004) The prevalence of antiretroviral drug resistance in the United States. AIDS, 18, 1393–1401[CrossRef][Web of Science][Medline].

    Roomp, K., Ahlenstiel, G., Beerenwinkel, N., Rockstroh, J., Däumer, M., Spengler, U., Lengauer, T. (2005) HLA profiles predict known and novel HIV-1 escape mutations at a population level. 2nd International Immunoinformatics SymposiumBoston, MA , pp. 12–13.

    Sevin, A.D., et al. (2000) Methods for investigation of the relationship between drug-susceptibility phenotype and human immunodeficiency virus type 1 genotype with applications to AIDS clinical trials group 333. J. Infect. Dis., 182, 59–67[CrossRef][Web of Science][Medline].

    Shafer, R.W., et al. (2000) The genetic basis of HIV-1 resistance to reverse transcriptase and protease inhibitors. AIDS Rev., 2, 211–228[Medline].

    Sing, T., Beerenwinkel, N., Lengauer, T. (2004) Learning mixtures of localized rules by maximizing the area under the ROC curve. Proceedings of the 1st International Workshop on ROC Analysis in Artificial IntelligenceAugust 22Valencia, Spain , pp. 89–96.

    Sing, T., Beerenwinkel, N., Kaiser, R., Hoffmann, D., Däumer, M., Lengauer, T. (2005a) Geno2pheno[coreceptor]: a tool for predicting coreceptor usage from genotype and for monitoring coreceptor-associated sequence alterations. 3rd European HIV Drug Resistance WorkshopAthens, Greece.

    Sing, T., et al. (2005b) ROCR: Visualizing classifier performance. Bioinformatics, (in press).

    Wang, D. and Larder, B. (2003) Enhanced prediction of lopinavir resistance from genotype by use of artificial neural networks. J. Infect. Dis., 188, , pp. 653–660[CrossRef][Web of Science][Medline].

    Wang, K., et al. (2004) Simple linear model provides highly accurate genotypic predictions of HIV-1 drug resistance. Antivir. Ther., 9, 343–352[Web of Science][Medline].

    Walter, H., et al. (1999) Rapid, phenotypic HIV-1 drug sensitivity assay for protease and reverse transcriptase inhibitors. J. Clin. Virol., 13, 71–80[CrossRef][Web of Science][Medline].

    Wolf, K., et al. (2003) Tenofovir resistance and resensitization. Antimicrob. Agents Chemother., 47, 3478–3484[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J Antimicrob ChemotherHome page
P. Frange, J. Galimand, C. Goujard, C. Deveau, J. Ghosn, C. Rouzioux, L. Meyer, and M.-L. Chaix
High frequency of X4/DM-tropic viruses in PBMC samples from patients with primary HIV-1 subtype-B infection in 1996-2007: the French ANRS CO06 PRIMO Cohort Study
J. Antimicrob. Chemother., July 1, 2009; 64(1): 135 - 141.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Microbiol.Home page
C. Paar, C. Palmetshofer, K. Flieger, M. Geit, R. Kaiser, H. Stekel, and J. Berg
Genotypic Antiretroviral Resistance Testing for Human Immunodeficiency Virus Type 1 Integrase Inhibitors by Use of the TruGene Sequencing System
J. Clin. Microbiol., December 1, 2008; 46(12): 4087 - 4090.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Deforche, R. Camacho, K. Van Laethem, P. Lemey, A. Rambaut, Y. Moreau, and A.-M. Vandamme
Estimation of an in vivo fitness landscape experienced by HIV-1 under drug selective pressure useful for prediction of drug resistance evolution during treatment
Bioinformatics, January 1, 2008; 24(1): 34 - 41.
[Abstract] [Full Text] [PDF]


Home page
BiostatisticsHome page
N. Beerenwinkel and M. Drton
A mutagenetic tree hidden Markov model for longitudinal clonal HIV sequence data
Biostat., January 1, 2007; 8(1): 53 - 71.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
21/21/3943    most recent
bti654v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (23)
Google Scholar
Right arrow Articles by Beerenwinkel, N.
Right arrow Articles by Däumer, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Beerenwinkel, N.
Right arrow Articles by Däumer, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?