Skip Navigation


Bioinformatics Advance Access originally published online on October 8, 2007
Bioinformatics 2008 24(3):389-395; doi:10.1093/bioinformatics/btm447
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/3/389    most recent
btm447v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Mayburd, A. L.
Right arrow Articles by Mulshine, J. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mayburd, A. L.
Right arrow Articles by Mulshine, J. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Successful anti-cancer drug targets able to pass FDA review demonstrate the identifiable signature distinct from the signatures of random genes and initially proposed targets

Anatoly L. Mayburd 1,*, Inna Golovchikova 2 and James L. Mulshine 3,*

1CPA Global, LLC, 1725 Duke Street, Alexnadria, VA 22314, 2IGMZ consulting, 456 Washington Avenue, Belleville, NJ 07109 and 3Rush University Medical Center, Chicago, IL 60612, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

Motivation: New efforts to guide and prioritize the selection of cancer drug targets are urgently needed, as is evident by the slow development of novel anti-cancer agents and the narrow therapeutic index of existing drugs. Given these limitations, the current study was conducted to explore the classification features defining the therapeutic success that can result from targeting a particular gene.

Results: Classification was based on extracting features specific to known successful anti-cancer targets and combining them in a linear classifier, resulting in calculation of an enrichment score for each gene. Extended description, the search tool used in this study, enriched existing drug target candidates by up to 10-fold at an ~50% recall rate, covering ~24 000 genes or ~80% of genome. More importantly, the target category with high attrition rate was classified from target category with low attrition rate, allowing to refine the drug development portfolios. Biological relevance of the parameters comprising the enrichment score was explored. Enrichment in cancer-specific effects was independently demonstrated by literature analysis. Imposing these enrichment scores on existing structural, pathway and phenotype-based procedures for prospective target selection may enhance the efficiency and accuracy of target identification and accelerate drug design.

Availability: The software used in this work is available upon request.

Contact: amayburd{at}cpaglobal.com, James_L_Mulshine{at}rush.edu

Supplementary information: Supplementary data are available at www.mayburd.com; http://www.rush.edu/rumc/page-1120170920643.html


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Which of the ~25 000 genes in the human genome (Stein, 2004) can be successful targets of anti-cancer therapy?

Age-adjusted total US cancer mortality rates remain almost unchanged since 1969, after peaking in 1992 (http://canques.seer.cancer.gov). Despite many advances, the cost of launching a new anti-cancer drug ranges from $800 million to $2 billion (DiMasi et al., 2003, http://www.fda.gov/oc/initiatives/criticalpath/whitepaper.html#execsummary). A recent FDA review shows that the influx of new candidates has diminished significantly over the last 5 years (http://www.fda.gov/oc/initiatives/criticalpath/whitepaper.html#execsummary). Cancer is known to be a target-rich pathology with thousands of genes involved in the disease process (Futreal et al., 2004; Higgins et al., 2007; Moffat et al., 2006). We assumed, however, that being a mechanism-related gene and being a successful anti-cancer target may require non-identical sets of criteria and that bona fide anti-cancer targets form only a small fraction of cancer-related genes. With the level of target attrition approaching 95% (Cockett et al., 2000), an algorithm is urgently needed to evaluate a priori the chance of a gene product becoming a successful drug target and to make cancer research more cost efficient. Not as much as discovering of new potential targets (Chen et al., 2001; Han et al., 2007; Li et al., 2006; Nettles et al., 2006; Paul et al., 2004) but effectively prioritizing the current and future target pool (Chen et al., 2007) might lead to a more economical, rapid and effective drug design. In this work, we propose an independent method of target identification and prioritization.

In general, drug activity via a successful target must result in minimal side effects and maximum intended effect. Drug targets must be essential participants in the mechanism of disease, which ideally should be demonstrated in phenotypic or correlative studies (Kurzrock et al., 2003; Lennerz et al., 2005; Ong et al., 2006; van Es and Arts, 2005; Zambrowicz and Sands, 2003). Targets that exhibit cancer-specific mutations constitute a preferential category, as their structural dissimilarity with the normal form of a gene product allows disease-specific binding (Cascón et al., 2006; Chalandon and Schwaller, 2005; Kalnina et al., 2005; Saglio et al., 2004). A related category—this one represented by heat shock protein (Hsp)-90—produces an anti-cancer effect because of the different extent of associations of its multi-subunit functional complex (Kamal et al., 2003). In a final category represented by tubulin, the ‘silver bullet’ effect is the result of different functional consequences of the target binding in normal and malignantly transformed cells (Ganansia-Leymarie et al., 2003).

Additional fundamental requirements may be combined in a single category of uniqueness (Hasan et al., 2006; Oh et al., 2004; Zheng et al., 2006). The targets representing unique signal pathways are likely not to have too many functional collaterals. Such collaterals may compensate for the loss of function after blocking of the active site with a drug. In addition, functionally unique pathways are often mediated by structurally distinct protein domains, less prone to non-specific ligand binding. Other criteria, such as consistency of differential expression across multiple and diverse datasets are also valid (Rhodes et al., 2004). We found that combining of individual gene's properties with its systemic interactions leads to a predictive feature space where capturing of a successful target signature becomes possible.

In this study, analysis of uniqueness was shown to produce features preferentially selecting FDA approved targets versus random genes. This finding was supported by double cross-validation and quantitative review of links to published literature. After sufficient number of such features was derived, the separation of two classes became very significant.

The authors hope that the ideas set forth in this study may spark the interest of managers of drug development programs and researchers using microarrays, leading to improved ROI and accelerated design of cancer therapeutics. See also RMD, pp. 3–15.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Supporting data
Supplementary Material and Raw Data (RD) can be accessed at www.mayburd.com. The Excel files of RD are designated with the letter S and order number, their names also specify which part of work they support. The files are briefly described in online introduction ‘Contents and brief description of this site’ and the descriptions are duplicated within the files.

The MS Word files of SM comprise:

  1. Rationale, Methodology and Discussion (RMD), supporting the up-front manuscript. This material expands on methodology and provides in-depth discussion, its sections mirror the subsections of this manuscript and follow the same order, reflected in the content list.
  2. Example Portfolio (EP), illustrating the type of targets favored by the algorithm;
  3. Appendix A containing the identities and description of training set components as well as datasets used in the study;
  4. Appendix B containing data path to extraction of differential expression and clustering data;
  5. Appendix C describing software used for aggregation and processing of datasets in this report.

2.2 Data aggregation
Based on general approach of Shannon (Shannon, 1948), resolution is favored by aggregating multiple individual microarray projects into a single meta-project. See RMD: p. 15 for more details.

2.3 Datasets A, B and C
Dataset A contains mainly differential expression signatures, Dataset B contains clustering data and Dataset C contains initial data for conversion into correlative profiles.

2.4 Training and testing sets
Selection of training and testing sets was based on 58 FDA- approved successful targets. The information regarding a target's FDA approval status is in Therapeutic Target Database (Chen et al., 2002). Details of selection and links to the constituents of the training and testing sets are described in the Appendix A and RD. The training and testing components for scoring system 1 are given in files S1A and S2A. The training and testing components for the scoring system 2 are given in S6D, column C. The training and testing set components for the combined system (1 + 2) are given in S9B, column AB.

2.5 Classes of genes
The current study differentiates between bona fide successful targets and random genes. Because of the varying proportion of bona fide targets, three secondary sub-classes were defined by searching TTD (Chen et al., 2002).

  1. Successful: drug targets approved by the FDA (in total, 58 FDA-approved anti-cancer drug targets were extracted).
  2. Developing: targets that have not been approved by the FDA, but for which high affinity ligands have been developed.
  3. Proposed: targets in the research stage that reflect the prior criteria of candidate selection, ligands not developed yet.

Each of the secondary classes represented a combination of primary classes, the successful targets being the most enriched in bona fide targets while random genes being the least enriched. The identities of the constituents and details of selection can be viewed in Appendix A and RD (files S1A and S2B) sections.

2.6 Classification feature selection
The crucial subject of feature selection is presented in online version of the up-front manuscript and in Supplementary Material:

(RMD), pp. 18–34 and RD, folders S3–S8.

Briefly, successful targets differ from random genes in:

  1. Differential Expression Consistency (DEXCON), measured across a panel of projects where differential expression is monitored. Successful targets are more consistently perturbed in disease.
  2. Inter-probe Variance (INTVAR), measuring the tendency of multiple Affymetrix probe-sets comprising the same gene to occupy similar dendrogram positions across a testing panel of diverse clustering methods applied to a plurality of datasets. Successful targets demonstrate such similarity and display low INTVAR.
  3. Survival coefficient (SURV), measuring the tendency of a gene to be a member of small unique clusters, or large robust clusters.

Anti-cancer successful targets tend to be excluded from small clusters, but other target classes show the contrary trend.

DEXCON, INTVAR and SURV were defined as Scoring system 1.

(d) Uniqueness and Fisher-Uniqueness test (U-test and FU-test),

measuring the tendency of a tested expression profile (gene) to display different pattern of correlative interactions with the sub-panels of successful targets and random genes (represented by their expression profiles). The tested profile is correlated with the profiles comprising the testing panel. The correlation coefficients on the successful side are compared with the same on the random side. If the tested gene is a successful target itself, the P-value of t- or F-tests will be significant in this setting.

  1. Relative Variance (RELVAR), measuring the ratio of variance between the gene expression values across the samples in the same experiment to the averaged expression level in the said experiment.

Successful targets appear to display more relative variability in this setting.

(b) Kurtosis and Skew (KURT and SKEW), measuring deviation of the expression level distribution from normal. KURT and SKEW were greater for successful targets as opposed to random genes.
(c) Copy Number, measuring relative strength of detected signal, averages across all samples for successful targets and random genes. The average Copy Number may be 3-fold higher for successful target class versus random genes.

U-test, FU-test, RELVAR, Copy Number, KURT and SKEW were defined as Scoring system 2.

2.7 Indirect (bonus-penalty) scoring
The values of features were converted into bonus-penalty coefficients, following the results of univariate resolution in the training set, using these metrics individually. The regions of parameter ranking were assigned either bonus or penalty points, depending on either target enrichment or depletion, respectively. Regions that did not contribute to resolution were assigned a score of zero. The cut-offs determining the value of bonus and penalty points were established empirically, as a result of optimized training. Such a system improved the signal-to-noise ratio. The values of bonus/penalty points and their matches with original scores are provided in RD, files S6–S11, designated as B/P. See RMD: pp. 34–37 for more details.

2.8 Linear classification
Linear classification is robust and interpretable, and a linear classifier was chosen (Duda, 2001a). In addition, exclusion of non-linearity created situation of under-training and limited over-fit concerns. The details of classification are given in RMD: pp. 37–40.

2.9 Estimation of false positive rate using Bonferroni correction
Ranking of multiple genes by a score carries a probability that some high scores arise randomly.


Formula 1

(1)

where FP is false positive rate (0 < FP < 1) in the high-score population, N is the total number of genes in the ranking, p is the t-test P-value of separation between the high-score and general population,

2.10 Cross-validation
Validation approaches in this work are described in RMD: pp. 40–41 and in RD: S9.

2.11 Exclusion of bias
Bias may arise if the successful and random classes have an unequal proportion of genes with an incomplete set of classifying features (i.e. if INTVAR or SURV is missing). Zero DEXCON was not counted as missing, as genes may not be differentially expressed. To negate this bias, genes with an incomplete profile of classifying metrics were excluded from analysis. More details are in RD: S10, RMD: pp. 41–43.

2.12 Analysis of resolution

  1. Empirical distributions as a function of score were computed for random gene and successful target populations.
  2. ROC curves (Receiver Operating Characteristic Curves) were plotted. Different relative frequencies (REL = T/R) of successful targets (T) versus random genes (R) were observed at different recall rates (REC). Resolution increment coefficient (RI) was defined as


Formula 2

(2)

More details in RD: S11, RMD: pp. 43–44.

2.13 Validation by published literature
NCBI Gene website http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene was searched with the aliases of the genes of interest, selected in the regions of high and low composite score. Individual gene's pages were accessed and (on each page) the numbers of original articles were counted, as well as the number of articles specifically implicating the given gene in cancer causation. The counts were tabulated and compared for different levels of score. More details in RMD: pp. 44–47, RD: S12.


    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
3.1 The successful targets and random genes form distinct classes
Key findings of the current study are described in Figures 1–3GoGo and Table 1, also in RMD, pp. 47–50.


Figure 1
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. (A) Empirical statistics of multivariate class separation using the composite vector score, system 1. Black bars = successful targets (training set); bars with horizontal stripes = successful targets (testing set); bars with tilted stripes = developing targets; gray bars = proposed targets; white bars = random gene scores. The identities of training and testing set components are in RD, S9B, columns A, S, T, AB. The target class labels are in column AC. (B) Using data for the combined scoring system 1 and 2, Receiving Operating Characteristics (ROC) curve was obtained by plotting true positives (the fraction of known successful targets, ordinate) as a function of false positives (random genes co-segregating with the successful targets, abscissa). Based on ROC data, RI (relative increment) coefficient was estimated. Squares = successful targets; triangles = proposed targets.

 

Figure 2
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Cross-validation errors plotted for the successful target class, using combined scoring system (1 + 2). A stringent double cross-validation approach was used to test the objective character of the score and its independence on a particular training set composition. The bars represent computed relative cross-validation errors (disparities) in the score ranks. The sizes of training and testing sets were approximately equal. The identities of training and testing set components are in RD, S9B, columns A, S, T, AB.

 

Figure 3
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Resolution between Low Attrition target class (FDA approved) and High Attrition target class (current pipe-line, proposed targets). Squares = successful targets; diamonds = proposed targets; triangles = random genes.

 

View this table:
[in this window]
[in a new window]

 
Table 1. Resolution parameters of the current method, individual classification features and multivariate classifiers

 
Figure 1A presents the empirical statistics of a multivariate application of extended description, the search tool used in this study. In Figure 1A, 8 features: INTVAR, DEXCON, U-test, FU-test, RELVAR, Copy Number, KURT and SKEW were included in the linear classifier, applied to >24 000 genes (95% genome coverage, Stein, 2004). The expanded set of features combines scoring systems 1 and 2.

The progressive increase in the ratio of known successful targets versus potentially competing random genes was observed as a function of the score. For the ranked scores >0.9, recall reached ~60% at enrichment factor ~6.0. In contrast, enrichment reached only 0.2 in the score region between 0 and 0.4, which also contained 5% of known targets; this finding indicates that the upper and lower regions of the score range may differ about 30-fold in terms of target enrichment. To re-trace this conclusion, compare the ratios of successful targets versus random genes in different regions of score. Compared with class resolution by DEXCON alone, the multivariate score leads to a 2-fold greater recall rate and far better class separation, at the same investment of resources and time. The visual difference of empirical distributions was confirmed by t-test, successful versus random gene population, Table 1.

The distance between the population corresponds to P-value ~10–52 and can be increased even more with more data aggregation.

Figure 1B displays ROC curve data for the multivariate classifiers using combined system 1 and 2. Based on ROC plotting we attempted to estimate (Table 1) the expected acceleration of successful target discovery rate, if the current method was applied


Formula 3

(3)
where RI is the projected increment of discovery rate, REC is recall rate of the method (0 < REC < 1) defined as fraction of the total bona fide targets discovered in the defined rank of the score, REL is relative enrichment in the defined rank of the score (1 < REL < {infty}). REL can be assessed based on ROC curves, as an ordinate to abscissa ratio.

RI evaluates a hypothetical scenario when the research and development resources are directed to the top-scoring (top 10% by rank) genes and all the targets that occur in this subset. The remaining 90% of total gene population is discarded, including the remainder of the target pool in this score range. Discarding of the population with low successful to random gene ratio should lead to increased ROI (Return on Investment), but as Table 1 and further discussion suggests this step has to be applied judiciously.

In the selected top 10%, the REL is measured as the ratio of the fraction of all known successful targets related to the fraction of all random genes (REL > 1). The gain due to REL > 1 can be offset by exclusion of too many potential target candidates if REC << 1. In this regard, the attempts to use single features (DEXCON or INTVAR) would lead to a worse result than no scoring at all. However, multivariate classifiers lead to RI ~1.6 and 3.6 for scoring system 1 and (1 + 2) combination correspondingly, under the conditions of stronger target recovery in the high-score region. Since single filters are never used in target characterization, preservation of diversity of candidate structures and pathways is essential for approaching the final candidate through a number of filters.

3.2 How many successful targets might exist in genome?
The successful targets were compared with random genes and the predominant structural and functional families were traced in both categories. The leading (most enriched) categories of successful anti-cancer targets include receptors, cytoskeleton components, oxygenases, reductases and dehydrogenases, see RD: S7. These were followed by kinases, helicases, proteinases and deaminases. Affiliation of a gene with the categories historically known to produce successful targets was considered during a sample target portfolio selection, see Example Portfolio. According to analysis of >24 000 genes in RD: S15, the success scores were superimposed on pathway and structural information. The highest ~1600 ranking genes (co-clustering with ~50% of highest ranking successful targets) contained ~500 druggable candidates, and of those ~250 matched the pathway categories described above. Thus, a potential pool of ~250 higher quality targets is conceivable, while the currently available pool is ~60–70 targets (see TTD). While being large, this number is manageable under the current experimental technologies. Focusing the resources of a siRNA gene silencing (or mouse knockout) large-scale project on these candidates is likely to produce a very significant impact on the state of the art.

3.3 Elimination of artifacts and biases
The next step was to determine whether population separation could have been caused by an artifact. Such artifacts may include multiple comparison artifact, overfit of the classifier training process, dependence of result on the particular composition of the compared classes and gap-related bias.

Inspection of FP values in Table 1 ruled out multiple comparison artifact for all classifying approaches, since all Bonferroni corrected P-values were <<1.

Another test was addressing overfit artifact by comparing the P-value of separation between the classes in the training set and at the global scale. This test showed that P-value in both cases was small and comparable (RD: file S13), suggesting low generalization error (Duda, 2001b). Furthermore, only 3–8 independent features were used. In such samples-to-features ratios (48:8), overfit has not been shown to occur often (Ambroise and McLachlan, 2002). Based on the evidence, overfit was ruled out.

Figure 2 addresses the validation issues. The training and testing sets were used in double cross-validation procedure (Pain et al., 2006) and the resulting sets of scores were also reproducible. Testing set in Figure 1A displayed the same score as the training one. See RMD: pp. 41–43, RD: S9 for more details.

Successful targets are better studied than random genes, thus being described by a more complete list of parameters than are random genes, where such parameters may be missing. Such a situation might lead to significant bias and detection of non-existent differences between classes. To rule out this possibility, genes with an incomplete set of classifying parameters were excluded from consideration for scoring system 1, leaving 7000 genes for analysis.

Combined (1 + 2) system dataset contains 20% of incomplete profiles and we assumed that the extent of bias introduced in this case is tolerable, since genome coverage was extended to 24 000 genes.

A hypothesis was proposed that low INTVAR arises as enhanced self-clustering of differentially expressed genes caused by a greater dynamic range of such profiles. If true, INTVAR does not represent an independent discovery tool, but is subordinate to DEXCON, thus diminishing the scope of the study's findings. To rule out this possibility, DEXCON and INTVAR were correlated and no significant correlation was found, pointing to independence of these features. The treatment of this problem is given in RMD: pp. 48–50.

An independent external validation of the proposed methodology was conducted. If the current score truly reflects the biology of the compared gene classes, the cancer causative behavior would be detected experimentally and reported in publications. Indeed, the high score bracket demonstrated ~3-fold more frequent literature references per a gene, thus implying a causative or facilitating role in cancer even after the known targets (successful, developing and proposed) were all excluded to eliminate a bias. The total number of publications per a gene (all topics) was also ~1.5-fold higher for the highest score bracket. Even after normalization by the total number of publications, the higher score bracket demonstrates ~2-fold preference for cancer-related effects, pointing to objective differences at biological level. Review RD: S12, for more details.

A conclusion was made that the observed separation of successful targets and other classes is unlikely to result from a random or systematic error, but follows a natural trend.

3.4 Significance of introducing a success score for the drug development process
As mentioned above, proposed targets were those researched for future development according to TTD. The status of progress varied broadly, from initial exploration to FDA-submitted target candidates. The parameters of average proposed targets were assumed to be equal to the parameters of average pipeline targets currently used in the pharmaceutical industry.

Such targets provide the nearest negative control for the FDA approved targets, since 95% of proposed targets are subject to attrition (Cockett et al., 2000). In this context, FDA-approved targets were re-defined as Low Attrition Class, while proposed targets were re-defined as High Attrition Class. The linear classifier was re-trained for optimal distinction between these categories, see RD: S11B, columns G and H.

As shown in Figure 3, the current algorithm classifies targets based on their likely attrition rate and thus can contribute to development of higher quality portfolios, potentially exerting a significant impact upon therapy design process.

3.5 Application to non-cancer disease
The folder S14 of RD presents the scoring for Alzheimer's disease targets and non-cancer targets based on the parameters developed in cancer research. The resolution corresponding to 4-fold enrichment was observed. However, it remains inferior to resolution for the specific cancer-related dataset (see ROC). More aggregated cancer-related data are needed to improve resolution. The possibility of expanding the method's scope would justify this effort.


    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
An important objective of this study was to investigate whether a rapid predictive method could point to the future therapeutic success of a target candidate. Because of accumulating evidence stressing the role of regulatory and structural uniqueness of target candidates, novel classification features capable of detecting such uniqueness were proposed. Technically, the analysis benefited from using an aggregated microarray dataset aiming to suppress random noise.

Passing of FDA controls requires special systemic interaction properties to be displayed by the intended targets of the considered therapies. Indeed, selecting the initial candidates from the score category comparable to that of the successful target class would focus the resources of society on the most promising subset.

Figure1A and B presents the results of the application of composite scores. The proportion of random genes per a bona fide target varied 10- to 15-fold between the high and low ranges of this score, depending on scoring system. This difference in relative enrichment relates to the cost efficiency of target development, as the probability of encountering a bona fide initial target lead is lower in the population dominated by random genes. Figure 1 shows that a high proportion of the total target pool (~50%) competes with too great a number of random genes. To assess that, compare the ratios of successful targets versus random genes at the scores above 0.9 and below 0.4. Attempts to extract this fraction of the target pool might cause a disproportional rise in development costs. Such a trend may be observed under the current conditions of intensified efforts to test multiple novel target candidates, because of pressure to accelerate anti-cancer therapy design.

Attrition rates are great for ligands and even greater for targets (~80% and ~95%, respectively, (http://www.fda.gov/oc/initiatives/criticalpath/whitepaper.html#execsummary, Cockett et al., 2000). In this context, the estimate in the current study that targets selection efficiency can be increased ~3-fold does not seem to contradict well known facts, is actually conservative and is confirmed by an independent method. Figure 3 also demonstrates that low and high attrition targets can be effectively classified by the given method, providing a predictive estimate to a clinical outcome via the use of ready available in vitro metrics. Use of the current method for large groups of genes would ensure minimization of group-averaged attrition rates, even if some individual components may not benefit.

Can the present method be used as a ‘stand-alone’ tool? The answer is unequivocally ‘no’, because the regulatory, structural, and metabolic uniqueness of a target candidate cannot constitute the sole basis of decision making. Other criteria—structural amenability (‘druggability’, Chen et al., 2007; Dechering, 2006), cellular localization, relation to functional families that have yielded most of the known targets historically, importance of specific biochemical and signal pathways effected by the blockade, availability of phenotype screening data (Kurzrock et al., 2003; Lennerz et al., 2005; van Es and Arts, 2005; Zambrowicz and Sands, 2003), issues of bioavailability and toxicity for projected target ligands and market niche—must all be included in the decision-making process.

One application of the method presented in the current study would be to select a portfolio by previously applied criteria, exceeding production capabilities. Next, the portfolio size could be reduced in an objective manner using the success score. Alternatively, an abbreviated enriched list of target candidates could be selected based on the current success score. Subsequent target candidate selections would adhere to this list in a disciplined fashion, using all other previously defined criteria. In such formats, the application of the current methods only complements earlier approaches and does not supplant them. As we mentioned above, the benefits of the current method would manifest better on a statistically representative scale. Consequently, larger and diversified development programs, especially company partnerships, would benefit more from this tool (Lengauer et al., 2005).

Drug development is only facilitated, but not exclusively determined, by optimized selection of targets. Ligands for even the best of targets are capable of accepting multiple conformations and orientations, a flexibility that allows interaction with multiple sites (also flexible), even if a particular binding event is predominant. In this regard, the total phenotypic effect of a ligand is exerted by the entire set of affected targets. Analysis of these systemic effects often correctly predicts ligand efficiency (Butcher, 2005). In view of such complexity, the target analysis must be paralleled with a machine-learning ligand assessment, and fortunately, such tools are now becoming available (Fliri et al., 2005; Gunther, 2003) as well as Equibits methodology:

(http://www.equbits.com/corp/press0103.htm)

The overall a priori success score of a target-ligand pair in this case is likely to be the product of both contributions. See additional discussion in RMD, pp. 52–54.


    5 CONCLUSIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
A novel method of target discovery was developed that relates the systemic interaction properties of a gene to the likelihood of its eventual satisfaction of FDA acceptance criteria.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The authors extend warm gratitude to the Fellow Editorial Board of the National Institutes of Health (NIH) and to Drs J.N. Weinstein, J. Voss, D. Krilov and P. Lorenzi for critical comments. This project was supported by the Intramural Research Program of the NIH, National Cancer Institute.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Jonathan Wren

Received on April 12, 2007; revised on July 30, 2007; accepted on August 25, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 4 DISCUSSION
 5 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Ambroise C, McLachlan GJ. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl Acad. Sci. USA (2002) 99:6562–6566.[Abstract/Free Full Text]

    Butcher EC. Can cell systems biology rescue drug discovery? Nat. Rev. Drug Discov (2005) 4:461–467.[CrossRef][Web of Science][Medline]

    Cascón A, et al. Gross SDHB deletions in patients with paraganglioma detected by multiplex PCR: a possible hot spot? Genes Chromosomes Cancer (2006) 45:213–219.[CrossRef][Web of Science][Medline]

    Chalandon Y, Schwaller J. Targeting mutated protein tyrosine kinases and their signaling pathways in hematologic malignancies. Haematologica (2005) 90:949–968.[Abstract/Free Full Text]

    Chen X, et al. TTD: Therapeutic Target Database. Nucleic Acids Res (2002) 30:412–415.[Abstract/Free Full Text]

    Chen X, et al. Does drug-target have a likeness? Methods Inf. Med (2007) 46:360–366.[Web of Science][Medline]

    Chen Y, et al. Ligand-protein inverse docking and its potential use in the computer search of protein targets of a small molecule. Proteins (2001) 43:217–26.[CrossRef][Web of Science][Medline]

    Chen Y, et al. Support vector machines approach for predicting druggable proteins: recent progress in its exploration and investigation of its usefulness. Drug Discov. Today (2007) 12:303–313.

    Cockett M, et al. Applied genomics: integration of the technology within pharmaceutical research and development. Curr. Opin. Biotechnol (2000) 11:602–609.[CrossRef][Web of Science][Medline]

    Dechering KJ. The transcriptome's drugable frequenters. Drug Discov. Today (2006) 10:857–864.[CrossRef][Web of Science]

    DiMasi JA, et al. The price of innovation: new estimates of drug development costs. J. Health Econ (2003) 22:151–185.[CrossRef][Web of Science][Medline]

    Duda RO, et al. Linear discriminant function. In: Pattern Classification. (2001a) Chapter 5. New York: Wiley. ISBN 0-471-05669-3.

    Duda RO, et al. Stopped training. In: Pattern Classification. (2001b) Chapter 6. New York: Wiley. paragraph 6.8.14. ISBN 0-471-05669-3.

    Fliri AF, et al. Biological spectra analysis: linking biological activity profiles to molecular structure. Proc. Natl Acad. Sci. USA (2005) 102:261–266.[Abstract/Free Full Text]

    Futreal PA, et al. A census of human cancer genes. Nat. Rev. Cancer (2004) 4:177–183.[CrossRef][Web of Science][Medline]

    Ganansia-Leymarie V, et al. Signal transduction pathways of taxanes-induced apoptosis. Curr. Med. Chem. Anticancer Agents (2003) 3:291–306.[CrossRef][Medline]

    Gunther EC. Prediction of clinical drug efficacy by classification of drug-induced genomic expression profiles in vitro. Proc. Natl Acad. Sci. USA (2003) 100:9608–9613.[Abstract/Free Full Text]

    Han L, et al. Support vector machines approach for predicting druggable proteins: recent progress in its exploration and investigation of its usefulness. Drug Discov. Today (2007) 12:304–313.[CrossRef][Web of Science][Medline]

    Hasan S, et al. Prioritizing genomic drug targets in pathogens: application to Mycobacterium tuberculosis. PLoS Comput. Biol (2006) 2:61.[CrossRef]

    Higgins M, et al. Cancer genes: a gene selection resource for cancer genome projects. Nucleic Acids Res (2007) 35(Database issue):D721–D726.[Abstract/Free Full Text]

    Kalnina Z, et al. Alterations of pre-mRNA splicing in cancer. In Genes Chromosomes Cancer (2005) 42:342–357.[CrossRef]

    Kamal A, et al. A high-affinity conformation of Hsp90 confers tumour selectivity on Hsp90 inhibitors. Nature (2003) 425:407–410.[CrossRef][Medline]

    Kurzrock R, et al. Philadelphia chromosome-positive leukemias: from basic mechanisms to molecular therapeutics. Ann. Intern. Med (2003) 138:819–830.[Abstract/Free Full Text]

    Lengauer C, et al. Cancer drug discovery through collaboration. Nat. Rev. Drug Discov (2005) 4:375–380.[CrossRef][Web of Science][Medline]

    Lennerz V, et al. The response of autologous T cells to a human melanoma is dominated by mutated neoantigens. Proc. Natl Acad. Sci. USA (2005) 102:16013–16018.[Abstract/Free Full Text]

    Li H, et al. TarFisDock: a web server for identifying drug targets with docking approach. Nucleic Acid Res (2006) 34:W219–W224.[Abstract/Free Full Text]

    Moffat J, et al. A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen. Cell (2006) 124:1283–1298.[CrossRef][Web of Science][Medline]

    Nettles J, et al. Bridging chemical and biological spaces: "target fishing" using 2D and 3D molecular descriptors. J. Med. Chem (2006) 49:6802–6810.[CrossRef][Web of Science][Medline]

    Oh P, et al. Subtractive proteomic mapping of the endothelial surface in lung and solid tumours for tissue-specific therapy. Nature (2004) 429:629–635.[CrossRef][Medline]

    Ong HT, et al. Oncolytic measles virus targets high CD46 expression on multiple myeloma cells. Exp. Hematol (2006) 34:713–720.[CrossRef][Web of Science][Medline]

    Pain S, et al. Customised birthweight: coefficients for an Australian population and validation of the model. Aust. N. Z. J. Obstet. Gynaecol (2006) 46:388–394.[CrossRef][Web of Science][Medline]

    Paul N, et al. Recovering the true targets of specific ligands by virtual screening of the protein data bank. Proteins (2004) 54:671–680.[CrossRef][Web of Science][Medline]

    Rhodes DR, et al. Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc. Natl Acad. Sci. USA (2004) 101:9309–9314.[Abstract/Free Full Text]

    Saglio G, et al. Glivec and CML: a lucky date. J. Biol. Regul. Homeost Agents (2004) 18:246–251.[Web of Science][Medline]

    Shannon CE. A mathematical theory of communication. Bell Syst. Tech. J (1948) 27:379–423, 623–656.

    Stein LD. Human genome: end of the beginning. Nature (2004) 431:915–916.[CrossRef][Medline]

    van Es HH, Arts GJ. Biology calls the targets: combining RNAi and disease biology. Drug Discov. Today (2005) 10:1385–1391.[CrossRef][Web of Science][Medline]

    Zambrowicz BP, Sands AT. Knockouts model the 100 best-selling drugs–will they model the next 100? Nat. Rev. Drug Discov (2003) 2:38–51.[CrossRef][Web of Science][Medline]

    Zheng C, et al. Progress and problems in the exploration of therapeutic targets. Drug Discov. Today (2006) 11:412–420.[CrossRef][Web of Science][Medline]

    Zheng CJ, et al. Therapeutic targets: progress of their exploration and investigation of their characteristics. Pharmacol. Rev (2006) 58:259–279.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Molecular Cancer TherapeuticsHome page
J. M. Flanagan, J. M. Funes, S. Henderson, L. Wild, N. Carey, and C. Boshoff
Genomics screen in transformed stem cells reveals RNASEH2A, PPAP2C, and ADARB1 as putative anticancer drug targets
Mol. Cancer Ther., January 1, 2009; 8(1): 249 - 260.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/3/389    most recent
btm447v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Mayburd, A. L.
Right arrow Articles by Mulshine, J. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mayburd, A. L.
Right arrow Articles by Mulshine, J. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?