Bioinformatics Advance Access originally published online on January 29, 2006
Bioinformatics 2006 22(8):924-933; doi:10.1093/bioinformatics/btl018
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Capturing expert knowledge with argumentation: a case study in bioinformatics
1 Biochemistry Building, Division of Molecular Biosciences, Imperial College London SW7 2AZ, UK
2 Department of Computing, Imperial College London 180 Queens Gate, London SW7 2BZ, UK
3 Cancer Research UK, Advanced Computation Laboratory, London Research Institute, Lincolns Inn Fields Laboratories 44 Lincolns Inn Fields, London WC2A 3PX, UK
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: The output of a bioinformatic tool such as BLAST must usually be interpreted by an expert before reliable conclusions can be drawn. This may be based upon the experts experience, additional data and statistical analysis. Often the process is laborious, goes unrecorded and may be biased. Argumentation is an established technique for reasoning about situations where absolute truth or precise probability is impossible to determine.
Results: We demonstrate the application of argumentation to 3D-PSSM, a protein structure prediction tool. The experts interpretation of results is represented as an argumentation framework. Given a 3D-PSSM result, an automated procedure constructs arguments for and against the conclusion that the result is a good predictor of protein structure. In addition to capturing the unique expertise of the author of 3D-PSSM for distribution to users, an improvement in recall of 510 percentage points is achieved. This technique can be applied to a wide range of bioinformatic tools.
Availability: Example public server and benchmarking data are available at http://www.sbg.bio.ic.ac.uk/~brj03/argumentation/paper/. Source code available on request.
Contact: m.sternberg{at}imperial.ac.uk
| 1 INTRODUCTION |
|---|
|
|
|---|
It is common when using bioinformatic tools for researchers to interpret the output using their own expert knowledge to assess the significance of the results and to make further predictions. For example, the results from sequence search tools such as BLAST (Altschul et al., 1990), protein function prediction algorithms and microarray analysis tools are not usually read uncritically. The user has expectations and knowledge beyond that applied by the tool, and brings this together with all of the output data to come to an informed conclusion.
There are many problems with this process. Often bioinformatic tools output more information than can be easily and efficiently interpreted by a researcher, and the output may be confusingly presented. Different researchers interpret data in different ways, and even the same researcher may make inconsistent interpretations, adding an unreliable and non-uniform element to data processing. Once the data has been interpreted, the reasoning behind an interpretation may be lost or only vaguely recalled. When a researcher leaves a research group, the method used to interpret data goes with them and is lost. Finally, the researchers interpretation may be biased towards getting a preconceived result.
This paper presents a method for applying argumentation theory (Dung, 1995) (Modgil and Fox, 2004) to this problem. The expert knowledge used by a researcher to interpret output is captured in a structure called an argumentation framework. This representation can be used to automate much of the final interpretation stage, making the process faster, more reliable, repeatable and unbiased. The results of the analysis may be presented in a variety of intuitive formats which make explicit the reasoning employed and the underlying assumptions. This can be sufficient in itself to help the user come to an informed and reliable conclusion. In many cases it is also possible to automate the conclusion itself, picking out good results and rejecting bad ones in a dataset. The paper demonstrates how these techniques can be applied to 3D-PSSM (Kelley et al., 2000), a tool for predicting the structure of a protein based on its sequence. It shows how the tools recall (or sensitivity) can be improved, and presents a method for further enhancing the accuracy of the argumentation process by applying a learning algorithm to refine the original expert-derived argumentation framework from a training set of examples and counterexamples.
Argumentation provides an approach to reasoning about situations where absolute truth or precise probability is impossible to determine, where a variety of often subjective factors have to be considered and compared, and where it is difficult to specify in advance which of these factors will carry most weight in any given case. It formalizes the idea that decisions in these circumstances are made on the basis of arguments for and against a claim, and that sometimes there may be no clear-cut decision. Argumentation theory has its origins in legal discourse, where careful, structured debate based upon a massive body of legal literature is necessary, usually requiring an experts time and experience. An early technique in legal argumentation, for example, is the Toulmin diagram (Toulmin, 1958), which gives a well-defined structure to an argument: a conclusion is drawn from a claim, given a warrant, or reason, for coming to the conclusion. The warrant may have backing evidence, and there may be a rebuttal of the conclusion. This form helps to standardise and record legal literature in order to enable automated searching and other tasks. In recent years, formal theories of argumentation have received much attention and have also been applied in many areas outside legal discourse, notably in medical decision making (Upshur and Colak, 2003; Fox et al., 2001) and risk assessment (Krause et al., 1998). In medical argumentation, for example, the conclusion of an argument may be the diagnosis of a condition or the prescription of a particular drug; the argumentation framework describes in an explicit and compact manner the various factors that have to be considered.
We do not use Toulmin diagrams in this work. Our technique is based upon a simpler, more general and more abstract argumentation model devised by Dung (1995) which is the starting point for many of the recent developments in this field. In Dungs model, an argumentation structure consists solely of claims and attacks, where attacks are similar to rebuttals in the Toulmin model. We add to Dungs model a method for drawing a simple conclusion from the argumentation, for broad categorization of results, and a novel method for refining the argumentation structure based upon a set of training data with known conclusions. Readers familiar with the Dung model will note that, in this work, we diverge from Dungs terminology for the sake of clarity.
The general area of application of this paper are biological search engines such as BLAST. The task here is to determine which matches are relevant or good (positive), and which are irrelevant or bad (negative). Traditionally this has been done by assigning an E-value to each match, with a cut-off applied to limit the number of results displayed. An E-value estimates the number of matches of similar quality expected to occur by chance with a given database: lower E-values indicate more statistically relevant results. Usually an additional arbitrary cut-off will be chosen by the researcher, above which results are dismissed as negative. In BLAST, the E-value cut-offs are used to distinguish homologous database entries from randomly matched, non-homologous entries. In order to assess a search result, however, a researcher will normally look beyond the E-value alone. They will take into account other features of the match (such as the complexity of the matched sequences or the level of sequence identity between the two), and perhaps also the output from other complementary bioinformatic tools, depending on application.
The specific example of this paper is 3D-PSSM (Kelley et al., 2000), an established protein fold recognition server which characterizes the structure of an unannotated protein sequence based upon sequence homology and likely secondary structure. Many protein sequences have been determined which have not yet been annotated structurally or functionally, and 3D-PSSM helps to close this gap. It uses the same E-value cut-off technique as BLASTordering the matches according to their E-value, displaying the top 20 and highlighting likely candidates based on a cut-off derived experimentally to produce
95% precision. A variety of other measures are also reported for each match, such as sequence identity, hydrophobic regions, secondary structure features, and so on.
Argumentation begins by identifying a subset of claims which apply to each search result. Claims are derived from expert knowledge and are often a matter of opinion. An example of a claim might be that the result is a good indicator of protein stucture because the E-value is very low. Each claim either supports or rejects the hypothesis that the match is positivein the case of 3D-PSSM, that the match is a good predictor of protein structure. So, for example, a long match would usually be good and a short match would be bad. Additionally, each claim may attack others. If a match is long (claim A), but the identity between the matched sequence and the query sequence is low (claim B), then claim B might attack claim A. If these two claims are the only ones present, then B would win the argument and the match would be rejected as negative (a bad predictor of protein structure). If a third claim C attacks B, then Bs attack on A is defeated, and now A together with C win the argument and the conclusion is that the match is positive.
With many claims and many attacks, each claim must be checked to see if it is defeated by other claims. In Dungs model, sets of undefeated claims are called preferred extensions of the argumentation framework. There are several algorithms for finding preferred extensions: we use the one presented in Cayrol et al. (2003). Every preferred extension is internally consistent and coherent, and is capable of defending itself against all attacks. The structure of the argumentation framework may occasionally be such that more than one preferred extension exists. We take the union of all preferred extensions for a given result to produce a single argument, consisting of all undefeated claims in all preferred extensions.
The claims in this argument may then be aggregated into an overall conclusion using a variety of techniques (Krause et al., 1995). We use a simple heuristic: if all the claims support the result, we conclude the result is positive, or a good predictor of protein structure. If all the claims oppose the result, then it is negative, or a bad predictor of protein structure. If there is a mixture of claims, the argumentation is undecided: there are reasons to believe that the result is a good predictor of protein structure, and reasons to believe it is a bad predictor, and no way of choosing between them.
The argument constructed for each search result serves two purposes. First, it summarizes each result to help the user interpret and reason about its significance, based upon expert knowledge. The argument may be presented graphically, as illustrated below, or textually (Reed and Grasso, 2001) (Modgil and Fox, 2004). Second, it can be interpreted as a filter on the result, identifying good and bad matches. As shown below, filtering achieves a significant improvement in predictive accuracy of the 3D-PSSM tool over basic E-value cut-off methods.
| 2 SYSTEMS AND METHODS |
|---|
|
|
|---|
2.1 Overview
There are four main steps in applying our argumentation method to study search results. These are illustrated in Figure 1.
- A model argumentation framework must be designed. This has three components.
- All the factors judged to be relevant when evaluating a result from the search engine are expressed in terms of claims. Each claim either supports or rejects the conclusion that the result is positive.
- For each claim, we specify the criteria for determining whether it applies to a given search result. We call these conditions the claim criteria. Claim criteria are boolean: a claim either applies to a search result, or it does not. For each result from the search engine, therefore, we are able to determine the set of claims that are applicable to it.
- The third component is a set of attacks between claims. A claim may attack another if it is contradictory, or if it undermines the basis of the other claim. The formulation of suitable attacks is a key part of representing the expert knowledge. All attacks have equal strength.
- All the factors judged to be relevant when evaluating a result from the search engine are expressed in terms of claims. Each claim either supports or rejects the conclusion that the result is positive.
- The model framework is applied to a set of search results as follows. Each search result is analysed to determine which claim criteria it satisfies and is thereby reduced to a set of applicable claims. Attacks between the applicable claims are taken from the model framework constructed at Step 1. This produces a specific Dung argumentation framework (Dung, 1995) for reasoning about a single search result.
- We now determine for each search result which of the applicable claims are undefeated by attacks from other claims (Dung, 1995) in the specific argumentation framework for that search result. There are several methods for doing this. We employ the algorithm in Cayrol et al. (2003) for computing the preferred extensions of an argumentation framework. We thus obtain one or more preferred extensions; we take the union of these preferred extensions as the set of undefeated claims for that search result (readers familiar with argumentation theory will recognise that we thus adopt the so-called credulous acceptance of the preferred extensions). The process is illustrated informally for a simple example in Figure 1.
- We can further reduce the set of undefeated claims to a simple good, bad or undecided verdict. If the undefeated claims all support the result, it is accepted as a good result. If all oppose the search result, then the result is rejected as bad. If some undefeated claims support and some oppose the result, then the verdict is undecided. For some purposes, the undefeated claims, together with an explanation of how they defend themselves against attacks, may already provide a useful analysis of the significance of a search result, even in the case where the verdict is undecided. We can also choose to regard good and bad verdicts as a solid prediction, in which case we can quantitatively compare the predictive accuracy of this method with others.
|
There is an optional optimization stage which we describe separately later.
2.2 Step 1: framework design
This initial step captures the knowledge (and opinion) an expert applies when interpreting search results.
This will produce a model argumentation framework: a general template for constructing argumentations for interpreting results from a particular search engine. A model argumentation framework for reasoning about 3D-PSSM results is shown in Figure 2.
|
Internally in the implementation, a claim is represented by a simple identifier (examples are given in Table 1), an indication as to whether the claim supports or opposes the result, and a short phrase or sentence for presentation to the user. An attack from one claim to another is represented as a pair of claim identifiers, attacker first and victim second.
|
The applicability criteria for each claim specify when it applies to a given search result. Some of these are simple threshold values, while others may be based upon properties of the entire search result set. For illustration, the claims from Figure 2 and their applicability criteria are described in Table 1. The nature of the claims, and the specific values and thresholds are initially chosen by the designer to reflect his or her own reasoning and preferences. The values, and the general structure of the argumentation framework, may be altered later to improve the predictions made by the system, to reflect better the preferences of a different user, or to customize the system to a particular problem. For example, a researcher working specifically with small proteins might want to lower the threshold for Short template to something more appropriate to their requirements. The normal case, however, is that the same model argumentation framework and claim criteria are used for all searches, and they are designed to be generally applicable.
Determining which claims attack which others is not easy, and is a decision for the designer and users of a particular application. A claim might attack another if it undermines the basis of that claim. For example, Very high sequence identity indicates good match might attack Short template indicates bad match, because a short template may still be significant if its sequence is very similar to the query sequence. An attack might be even looser, signifying a general superiority of one claim over another. It is likely that the specification of attacks will be adjusted and refined after the implementation of the system, in order to improve results being obtained.
In an earlier version of the system, we allowed experts to specify varying strengths of claims and attacks. This is easily implemented but we found that it added very little value to compensate for the additional complexity.
2.3 Steps 24: framework application
Steps 24 apply the expert knowledge to the search engine results. Step 2 creates a specific argumentation framework, based upon the model from Step 1, for each search result. This framework consists of claims which apply to that result, and any attacks between them. Step 3 determines the set of undefeated claims for each search result, using the algorithm in Cayrol et al. (2003) as illustrated in Figure 1. Figure 3 shows an example framework for a specific 3D-PSSM search result, and informally how the undefeated claims are obtained.
|
The specific argumentation framework and/or undefeated claims may already be useful as an evaluation of each search result. They may be presented graphically, textually or otherwise. Additionally, the undefeated claims may be used to come to a final conclusion, as a means of highlighting results. This is commonly done with other search engines using an E-value or other quality score cut-off. The argumentation system may be undecidedunable to come to a conclusion. In this case the user may fall back to the E-value cut-off, or study the argumentation in detail in order to come to a conclusion.
The application would typically be a single program which takes as input a complete argumentation framework (e.g. Fig. 2), a set of criteria (e.g. Table 1), and a set of search results, and outputs an argumentation framework, undefeated arguments, and a final conclusion for each search result. The outputs may be presented to the user in any sensible manner: an example web-based application for 3D-PSSM is available (3D-PSSM Argumentation Serverhttp://www.sbg.bio.ic.ac.uk/~brj03/argumentation/paper/). Here the implementation is a single web page script which performs the argumentation and outputs web pages to the researchers web browser. The script and supporting libraries are written in Python, a language similar to Perl and C++. The system was originally prototyped in Prolog, which lends itself to direct and compact application of logical algorithms. However it is easy to re-implement in any other language, in order to make deployment on a web server simpler.
2.4 Benchmarking
We benchmarked the improvement in recall of the conclusions drawn from an argumentation framework, over predictions based upon E-value alone. We determined what recall could be achieved when the argumentation framework is used to make a prediction about each search result. We compared this with the current naive method of calculating an E-value and predicting that everything below a cut-off point is positive and everything above is negative. The standard measures for recall and precision were used, as well as a simpler measure of accuracy:
- tp = number of true positives
- (positive results predicted to be positive)
- fp = number of false positives
- (negative results predicted to be positive)
- tn = number of true negatives
- (negative results predicted to be negative)
- fn = number of false negatives
- (positive results predicted to be negative)
- (positive results predicted to be positive)
![]() |
![]() |
(proportion of results predicted to be positive which actually are positive)
![]() |
(proportion of positive results which are predicted to be positive)
First, a model argumentation framework for the target search engine must be designed and implemented as described above: the author of 3D-PSSM designed our initial framework. If the database searched by the target search engine has many good matches for all possible queries, precision and recall become less meaningful measures as all matches are likely to be positive. In this case, it may be necessary to cut down the database so that it gives a mix of positive and negative results, effectively testing the design. We created a version of the 3D-PSSM database containing only sequences from the 30% identity subset of the ASTRAL SCOP (Brenner et al., 2000) database.
In order to measure the predictive accuracy of argumentation, a set of queries for the search engine must be devised for which the property being predicted is known. For 3D-PSSM, we found a set of queries for which we knew the SCOP superfamilies (Murzin et al., 1995). If two proteins have the same superfamily, then they are homologous. Therefore if a protein is predicted to be homologous to another, and they share the same superfamily, then the prediction is correct. In order to make each benchmark search useful, the queries can be picked to ensure that it is possible to find matching results in the database. We only picked queries where there were five or more potential matches in the database. The queries must also be picked so that matches are not trivial to find, in order to test the search algorithm. We ensured no query sequence had >30% identity with any of the search engine database templates, this representing the twilight zone of homologues which are hard to find.
The queries must be submitted to the (possibly modified) database. For each result returned, predictions should be made based on the conventional technique (e.g. E-value cut-off) and using the designed argumentation framework. Predictive accuracy may be measured directly as described above. However this measure understates improvements in recall alone, since there are likely to be fewer positive matches than negative ones. For example, 3D-PSSM always returns 20 matches, of which as few as five may be positive. Therefore we would also like to study the improvement in recall alone.
In order to compare the recall of an E-value cut-off prediction with argumentation, we adjusted the E-value threshold to make the precision of each technique equal. This is necessary owing to a characteristic of argumentation frameworks. The traditional measures of precision and recall locate the performance of a predictive technique at a single point within a two dimensional space. Most techniques have some parameter which may be varied, trading recall against precision, producing a curve (a precisionrecall curve) which characterizes the predictive abilities of the algorithm. When using an E-value cut-off, the threshold itself may be altered to vary precision and recall in this way. Two techniques may be compared by measuring the difference in area under their respective precisionrecall curves. However, argumentation has no such single variable parameter. A particular argumentation system has a fixed precision and recall for a given input dataset, and a precisionrecall curve cannot be produced. The precision of the E-value cut-off technique needs to be fixed, reducing it to a single point. The recall of the two techniques can then be compared in isolation. Different argumentation frameworks must be compared with different parts of the E-value precisionrecall curve, since they will have different precision values.
Given the results from a particular argumentation framework, we must find an E-value threshold which gives a precision as close as possible to that of the framework. This is done with the standard binary search algorithm. Then the recall values may be fairly compared directly.
2.5 Optimization
The initial expert design may be adjusted in order to maximize the predictive accuracy of the technique. This may be done by the expert, but this would be a slow process owing to the number of variables. Alternatively, adjustment may be automated using any optimization algorithm, which explores the possible frameworks and criteria for establishing which claims apply to a search result. This optimization stage is a novel feature in the application and development of argumentation frameworks.
We optimized the framework using the established iterative random walk technique to improve the accuracy incrementally and automatically, using our benchmark dataset as training data. In each iteration, a random change is made to either the framework or the claim criteria. The change is either retained or discarded, depending upon the nature of the change and the changes effect on the total number of correctly classified search results. The pattern of retention is designed to minimize the argumentation framework whilst maximizing accuracy, to give the most parsimonious framework possible.
The criteria for claim checking are changed by changing the claims parameters, which are usually thresholdse.g. a threshold length above which sequences are considered to be long. The standard simulated annealing process is used: parameters are changed randomly within a fixed range in jumps scaled according to a temperature parameter, which reduces each time a change results in an improvement to predictive accuracy.
Automatic optimization has the disadvantage that the resulting frameworks might not reflect an experts reasoning process, and so are less useful in intuitively summarizing the search results for the user. However, they perhaps better reflect the reality of the dataset, and abandon possibly faulty reasoning by the researcher who designed the original framework. The random nature of the optimization algorithm results in many different frameworks. The nature of the search space has not been explored in this work.
2.6 Statistical analysis
We used the exact Sign Test (Bland, 1989) to calculate P-values for the significance of the improvement in predictions of argumentation over the cut-off method. This non-parametric test makes no assumption about the distribution of predictions. For plus, we count cases where argumentation predicts correctly and E-value cut-off predicts wrongly. For minus, we count cases where argumentation is wrong and cut-off is correct. For each cross-validation, we pooled all of the predictions made for each of the test subsets and calculated a single P-value. Since we repeated the cross-validation five times, we calculated five P-values and reported a mean and maximum.
| 3 ALGORITHMS |
|---|
|
|
|---|
The pseudo-code algorithm for classifying a set of search results into positive (good), negative (bad) and undecided categories is given in Figure 4.
|
| 4 RESULTS |
|---|
|
|
|---|
4.1 Benchmarking
The results for the initial expert-designed argumentation framework and criteria are given in Table 2. Argumentation gave a precision of 92.8%. The E-value threshold was adjusted so that the cut-off technique gave a closest precision of 92.7%. At this precision level, the increase in recall is 5.2 percentage points. Overall accuracy increases slightly by 1.2 percentage points from 89.0 to 90.2%. The difference between these increases reflects the relatively small number of positive matches, as opposed to negative onesa ratio of around 1:3. The improvement of argumentation over E-value cut-off was found to be statistically significant, with a probability of 0.0042 that an equal or greater improvement might be achieved by chance.
|
We have not assessed the ability of argumentation to summarize and present search results to the researcher, which is difficult to measure. It may be achieved through user surveys on the 3D-PSSM argumentation server, which is beyond the scope of this work.
4.2 Optimization
We performed 5-fold cross-validation of the argumentation and optimization system, optimizing on 80% of the benchmark dataset and measuring performance on the remaining 20%. We produced five random replicates of the cross-validation. The results are given in Table 2. The mean precision of the optimized argumentation frameworks was 86.4%. The E-value threshold was adjusted in each case so that the cut-off technique gave a closest mean precision of 86.5%. At this precision level, the mean improvement in recall over the E-value cut-off algorithm was 10.2 percentage points, with a SD of 2.4. Therefore, optimization on average boosted the recall of the expert-defined argumentation framework by an extra 5.0 percentage points. Overall accuracy increases slightly by 2.1 percentage points from 89.2 to 91.3%. As with the unoptimized argumentation, the difference between these increases reflects the relatively small number of positive matches, as opposed to negative ones. The improvement of argumentation over E-value cut-off was found to be statistically significant for each of the optimization runs, with a probability of at most 0.0030 that an equal or greater improvement might be acheived by chance.
Bear in mind that each optimization run has different input data, and produces a slightly different argumentation framework. As a result, the precision is different in each case. A different E-value threshold is calculated for each run in order to compare recall values. In a sense, the automated optimization is exploring the multidimensional search space of possible argumentation frameworks. This is similar to adjusting the single E-value threshold to produce a precisionrecall curve. However, because we have so many variables in argumentation, their adjustment does not produce a curve: there are many possible recall values for a given precision. We have looked at two argumentation precision levels (92.8 and 86.4%) and shown that argumentation can out-perform the standard technique at both, to varying degrees.
We also performed a longer optimization run based upon the entire training set. The complete framework resulting from this optimization is given in Figure 2. Since this is based on as much training data as possible, it would be appropriate for use on a public server where the search database is sparse, like the database used to produce the training data.
| 5 DISCUSSION |
|---|
|
|
|---|
We designed an argumentation framework for reasoning about the output from a protein structure prediction tool. Then the framework was applied to the search results from the 3D-PSSM server. In common with other tools, the output from this server is complex and difficult to study consistently and as a whole. The argumentation framework effectively captured the reasoning used by the author of 3D-PSSM when studying results from his own search engine. This can now be made available to users as guidance as to how to use the tool effectively.
The method was benchmarked with respect to its improvement in recall when used to identify positive search results from 3D-PSSM. The initial argumentation framework for 3D-PSSM based upon expert knowledge produces a modest increase in recall of
5 percentage points with our benchmark dataset.
Finally, we determined the improvement achieved when the expert-designed argumentation framework was optimized for accuracy. This optimization doubles the increase in recall to
10 percentage points, albeit at a different precision level. It is likely that continued work on the optimization process will give further improvements in recall for any given precision level.
The framework and the criteria form a complex multi-dimensional search space which would benefit from more advanced algorithms. Longer optimization runs may also yield further improvements. As a proof of the basic concept, this work shows that argumentation has an immediate application as a tool for categorizing positive and negative matches in search results from complex search engines.
Once the framework has been refined, it is less reflective of the experts reasoning which went into the original design. This may indicate faults in the original design. For example, the optimized framework may have attacks between claims which agree on the search result. Whilst this appears counter-intuitive, it may indicate that the basis of one claim is incompatible with that of the other claim. Therefore the optimized framework retains some usefulness as an interpretative tool, even if the attack structure is difficult to comprehend. A systematic study of the agreement between the reasoning derived from optimization and expert reasoning is left for future work.
More directed and informed refinements by an expert may achieve the same or even better results than automated optimization, with the advantage that the framework still reflects an individuals reasoning. In addition, the argumentation framework need not be fixed for all users of a particular bioinformatic tool. Different frameworks might be appropriate in different circumstances, and it is important to allow researchers to adjust the system to their preferences, in order to fulfill their personal expectations. A suitable user interface for this refinement must be developed to make this refinement and customization as easy as possible. The framework should be seen as a valuable starting point of generalized expert knowledge, rather than an end in itself.
5.1 Comparison with other methods
Decision trees are a simple technique for categorization based on a set of yes/no questions of exactly the same kind as the claim criteria described in earlier sections. They are widely used for prediction in biology and medicine. For example, proteins in an archeon genome have been categorized into soluble and insoluble categories using decision trees grown from examples and counterexamples (Christendat et al., 2000). Cellular migration in response to various conditions has been modelled using a simple decision tree (Hautaniemi et al., 2005). In both of these examples, a small number of discrete and continuous properties are brought together into a simple logical structure which can be comprehended easily. At each level of questioning, the complexity of the problem is reduced, until a conclusion is reached. Because the questions may be arbitrarily complex, no assumptions are made regarding the distribution of input data: it is reduced to a simple binary response. Decision trees may be easily constructed automatically using established algorithms, or by hand. These features make decision trees a compelling solution for analysis of certain datasets.
Argumentation frameworks provide a much higher-level representation than decision trees. They place much less emphasis on the process of reaching a decision, focusing rather on making explicit the factors that contribute to a decision. Argumentation is a dialectic technique at heart; a mechanism for coming to a conclusion has been added to it. A decision tree can be constructed to produce exactly the same outcomes as the argumentation methods described in this paper. We devised an algorithm to do this, so we could see what the corresponding decision trees would look like. A decision tree equivalent to the example framework in Figure 2 is shown in Figure 5. It has a total of 34 binary decision points. Assuming a uniform distribution across all possible claims, the mean number of questions asked before coming to a conclusion is just over four, the maximum is nine and the minimum is two.
|
Whilst the process associated with coming to a conclusion is generally simpler with the decision tree, the tree itself is fairly complex. The same question appears multiple times in different branches, making it a less succinct representation than argumentation. More importantly, decision trees focus on coming to a conclusion efficiently; argumentation frameworks, as used in this paper, have the same predictive power as decision trees, but have the additional advantage of more closely and concisely reflecting the reasoning and the underlying justifications than the reductionist decision tree approach. However, argumentation frameworks are more difficult to construct, partly as they contain more information than decision trees. We have taken the approach of designing one by hand, which is then refined by automated optimization using a random walk algorithm.
Bayesian networks (or Bayes nets) are more complex than argumentation frameworks. They make decisions based on networks of Bayesian inferences. They have been employed successfully in algorithms for predicting various features of proteins and in other biological applications. For example, Drawid and Gerstein (2000) developed a network to predict the normal subcellular compartment occupied by proteins in yeast, giving
75% accuracy on a test dataset. Proteinprotein interactions can also be predicted with
65% accuracy using Bayes nets (Jansen et al., 2003). These applications operate directly on a large amount of biological data, which is inherently a difficult and subtle task, and they model and make predictions about biological systems which are themselves probabilistic. Therefore, a probability-based approach is apt. Moreover, some features may have only a subtle effect on the property being predicted, and this may be modelled with smaller probabilities contributing to the final conclusion. When predicting proteinprotein interactions in yeast, for example, there are a small number of positive examples compared with the total number of possibilities. Fewer than 100 000 of the possible 18 million interactions actually occur, a positive rate of
0.5% (Jansen et al., 2003). This requires a technique which can discriminate on the basis of many features, each with a variable contribution to the prediction.
Our situation is quite different. We are evaluating the output of a tool which has already made predictions, and reduced the possible search space to 20 candidates. From this, we are trying to pick at least one good match, assuming a good match is present. The positive rate is at least 5%, and the search space is tiny by comparison. We are not trying to model the tool, or the biological process behind the predictions that it makes. Our aim is to model the reasoning process of the researcher who uses the tool and evaluates its output. Whilst human reasoning of this kind accommodates a degree of uncertainty, it does not use a complex probabilistic model. Bayesian nets and argumentation frameworks are thus not direct competitors. Argumentation theories have been developed as an approach to reasoning about uncertainty where even rough estimates of probabilities are not available or meaningful.
In summary, decision trees potentially have the same predictive power as argumentation frameworks. They are easier to create, but are also lower-level representations that focus on the process of making a decision rather than its justification. Bayesian networks lend themselves to modelling problems with an underlying probabilistic, quantitative basis, where the contribution of each parameter may be very subtle. Argumentation frameworks are appropriate where we wish to model the reasoning process such that it can be used to make automatic predictions, and justify these predictions to a person. An argumentation framework represents a miniature debate about each search result. The focus of this work has been on summarizing this as a simple conclusion derived from the framework. The content of a framework is richer than this, and may be presented in various ways so that the user may come to their own conclusion. Natural language may be generated (Reed and Grasso, 2001) to explain the status of the result. For example, The length of the match would normally imply a good result, however the sequence identity is low might represent a framework with a Low sequence identity claim attacking a Long match claim. The framework may be presented visually as in Figures 2 and 3. Such a presentation may allow the user to interact with the framework, e.g. removing arguments to see what effect this has on the undefeated claims and final conclusion.
5.2 Conclusion
Argumentation captures expert knowledge as a simple model, so it can be automatically applied to data to aid interpretation. It has potential applications in all areas of bioinformatics, where large and complex datasets are commonplace. Given the ever-increasing number of tools available to the biologist, a technique which helps to make sense of their output is timely. Argumentation can also help in pooling evidence from multiple tools and coming to a consensus decision, a technique successfully applied to protein structure prediction which should prove useful in other biological problems.
| Acknowledgments |
|---|
Funding to pay the Open Access publication charges for this article was provided by The Wellcome Trust.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Anna Tramontan
Received on June 29, 2005; revised on January 12, 2006; accepted on January 22, 2006
| REFERENCES |
|---|
|
|
|---|
Altschul, S.F., et al. (1990) Basic local alignment search tool. J. Mol. Biol, . 215, 403410[CrossRef][Web of Science][Medline].
Bland, M. An Introduction to Medical Statistics, (1989) , Oxford, UK Oxford University Press, pp. 149151.
Brenner, S.E., et al. (2000) The ASTRAL compendium for sequence and structure analysis. Nucleic Acids Res, . 28, 254256
Cayrol, C., Doutre, S., Mengin, J. (2003) On decision problems related to the preferred semantics for argumentation frameworks. J. Logic. Comput, . 13, 377403
Christendat, D., et al. (2000) Structural proteomics of an archaeon. Nat. Struct. Biol, . 7, 903909[CrossRef][Web of Science][Medline].
Drawid, A. and Gerstein, M. (2000) A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome. J. Mol. Biol, . 301, 10591075[CrossRef][Web of Science][Medline].
Dung, P.M. (1995) On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artif. Intell, . 77, 321357.
Fox, J. (2003) Probability, logic and the cognitive foundations of rational belief. J. Appl. Logic, 1, 197224[CrossRef].
Fox, J., Glasspool, D., Bury, J. (2001) Quantitative and qualitative approaches to reasoning under uncertainty in medical decision making. Proceedings of the AIME 2001Cascais, Portugal, 2101 , pp. 272282.
Hautaniemi, S., et al. (2005) Modeling of signal response cascades using decision tree analysis. Bioinformatics, 21, 20272035
Jansen, R., et al. (2003) A Bayesian networks approach for predicting proteinprotein interactions from genomic data. Science, 302, 449453
Kelley, L.A., et al. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol, . 299, 499520[Web of Science][Medline].
Krause, P.J., et al. (1995) A logic of argumentation for reasoning under uncertainty. Comp. Intell, . 11, 113131.
Krause, P., et al. (1998) Qualitative risk assessment fulfils a need. Lect. Notes Comput. Sci, . 1455, 138156.
Marshall, C.C. (1989) Representing the structure of a legal argument. Proceedings of the 2nd International Conference on Artificial Intelligence and Law , pp. 121127.
(2004) Review of argumentation technology: state of the art, technical and user requirements. Proceedings of the ASPIC Consortium.
Murzin, A.G., et al. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol, . 247, 536540[CrossRef][Web of Science][Medline].
Reed, C. and Grasso, F. (2001) Computational models of natural language argument. Proceedings of the ICCS 2001 Springer-Verlag 2073, , pp. 9991008.
Toulmin, S.E. The Uses of Argument, (1958) , Cambridge, UK Cambridge University Press.
Upshur, R.E.G. and Colak, E. (2003) Argumentation and evidence. Theor. Med. Bioeth, . 24, 283299[CrossRef][Web of Science][Medline].
This article has been cited by other articles:
![]() |
M. R. Aniba, S. Siguenza, A. Friedrich, F. Plewniak, O. Poch, A. Marchler-Bauer, and J. D. Thompson Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis Brief Bioinform, January 1, 2009; 10(1): 11 - 23. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Kramer, W. Hanel, F. Shen, N. Isik, J. P. Malone, A. Maitra, W. Sigurdson, D. Swart, J. Tocker, T. Jin, et al. Cutting Edge: Identification of a Pre-Ligand Assembly Domain (PLAD) and Ligand Binding Site in the IL-17 Receptor J. Immunol., November 15, 2007; 179(10): 6379 - 6383. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









