Skip Navigation


Bioinformatics Advance Access originally published online on January 20, 2005
Bioinformatics 2005 21(9):1964-1970; doi:10.1093/bioinformatics/bti287
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/9/1964    most recent
bti287v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Vinterbo, S. A.
Right arrow Articles by Ohno-Machado, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Vinterbo, S. A.
Right arrow Articles by Ohno-Machado, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press 2005.

Small, fuzzy and interpretable gene expression based classifiers

Staal A. Vinterbo *, Eun-Young Kim and Lucila Ohno-Machado

Decision Systems Group, Brigham and Women's Hospital, and Division of Health Sciences and Technology, Harvard Medical School/Massachusetts Institute of Technology Boston, MA, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 EXPERIMENTS
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 

Motivation: Interpretation of classification models derived from gene-expression data is usually not simple, yet it is an important aspect in the analytical process. We investigate the performance of small rule-based classifiers based on fuzzy logic in five datasets that are different in size, laboratory origin and biomedical domain.

Results: The classifiers resulted in rules that can be readily examined by biomedical researchers. The fuzzy-logic-based classifiers compare favorably with logistic regression in all datasets.

Availability: Prototype available upon request.

Contact: staal{at}dsg.harvard.edu


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 EXPERIMENTS
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 
Complex data mining algorithms, such as support vector machines, neural networks and logistic regression have been used in the classification of gene-expression data. Usually, they produce models that are not easily interpretable by biologists and biomedical researchers, given the high number of variables and parameters. If simple but accurate rules could be induced from relatively small training samples, interpretation of the models would be greatly facilitated. As humans are better able to interpret simple rules such as ‘if A is upregulated and B is downregulated, then the probability of disease X is high’ than equations involving several coefficients, interaction terms and constants; some authors have proposed the use of rule-based systems for analysis of data originating from microarray. Some of these rely on experts or literature to directly determine rules to be used to construct gene networks or the constraints that need to be considered (Ressom et al., 2003; Woolf and Wang, 2000; Kauffman et al., 2003). Given that expert knowledge is still limited to a small number of genes, data mining approaches have also been utilized. There is a broad range of methods used to induce rules from training data. Unsupervised rule learning was reported by Creighton and Hanash (2003) and Becquet et al. (2002). There have been also reports of supervised rule induction. Some authors take advantage of tree-like algorithms that are well developed in the literature and have several different variants (Soinov et al., 2003; Li et al., 2003). Others use different formulations of association rule induction algorithms (Hvidsten et al., 2003; Gamberger et al., 2004). In this work, we present a rule induction and filtering strategy based on fuzzy sets and compare it with benchmark logistic regression models.

Several classification algorithms rely on discretization of values into a small number of categories. Discretization is also useful because it makes data simpler to interpret. The precision of values that are obtained from microarrays looks high. However, the meaning of this apparent precision is debatable. The values are read from scanners that measure the degree of fluorescence in different color channels. The process is prone to saturation and other artifacts, and often biologists rely on the experiments to get an idea of the degree of mRNA abundance, rather than obtaining absolute measures of gene expression. Given human interpretation of the values in terms of broad categories such as upregulation, neutral or downregulation, it may be possible (and desirable) to simplify the numerical scale into an ordinal one. Furthermore, it may be possible to use this simplification to produce simple rules that can be easily interpreted by humans, instead of real numbers that get multiplied by certain coefficients to generate a classifier (as is the case in other modeling approaches). Two types of algorithms are necessary for the induction of these rules from the data: algorithms for categorization of continuous values and algorithms for rule discovery and filtering that result in small, interpretable rules. ‘Crisp’ discretization (Butterworth et al., 2004; Jaroszewicz et al., 2004) (as opposed to fuzzy discretization) does not take into account that values at the borderline between value categories may be very similar. By associating values of membership in different categories, it is possible, for example, to describe a sample that is at the high-end of the ‘low’ category. Note that fuzzy memberships do not correspond to probabilities and their operators are different from those used in that framework as well. Fuzzy set based calculations have been referred to as ‘computing with words’ rather than computing with numeric values.

The algorithms we present in this paper combine fuzzy discretization (assignment of degrees of membership to the discrete categories of values such as low, medium and high, or benign and malignant) and fuzzy operators, with rule induction and filtering algorithms that are especially developed to produce a small number of short rules that are useful for model interpretation. In a way, this work extends (Ohno-Machado et al., 2002) in that the framework for fuzzy discretization is generalized, the induction and filtering of rules is automated, and a larger number of datasets is subject to comparison.


    2 METHODS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 EXPERIMENTS
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 
The method applied has four main parts. In order of application, they are gene preselection, learning of fuzzy memberships, rule synthesis and rule filtering.

For each preselected gene (Section 2.1) ordered qualitative levels that are easy to understand, such as low (for downregulated genes), medium (for neutral) and high (for upregulated) are chosen. Next, the membership functions for each of these levels are determined and each value in a copy of the data set is substituted with the label of the level corresponding to maximal membership (Section 2.3 and Table 1). The resulting discretized dataset is used to synthesize propositional rules (Section 2.4 and Example 1). The resulting rules set is applied in turn to each element in the dataset by using Equation (1), creating the matrix [Equation (3)] needed for the final step—removal of redundant rules (Section 2.5).


View this table:
[in this window]
[in a new window]
 
Table 1 Example data table where each expression measurement g(x) is replaced with L(g, x)

 
The final set of rules can be used to determine the class of any new, unseen element represented by the measurements needed, by applying Equation (2).

2.1 Gene preselection
For solving a classification problem with microarray data, preselection process is important for three reasons: the measurements are known to contain a high level of noise, computational costs are high when complex classifiers are involved and the low ratio of samples to measurements increases the probability of finding spurious relations in the data. Cluster analysis is a popular approach in microarray data analysis, but it focuses on group similarity, not on differential expression based on each gene (Thomas et al., 2001). Many other methods such as t-test, Wilcoxon rank sum test, regression modeling, etc. have been applied (Golub et al., 1999; Thomas et al., 2001; Troyanskaya et al., 2002). Microarray data is usually not normally distributed (Hunter et al., 2001) (even after log-transformation) and non-parametric methods may be applicable. Troyanskaya et al. (2002) suggested that Wilcoxon rank sum test is a conservative and robust method; so we applied it as a preselection method to build our classifiers with a small number of significant genes.

2.2 Fuzzy rules
Let U be a set of tissue samples, let G = {gj}j be a set of gene symbols, let C be a set of class labels and let c: U -> C be a partial function that assigns class labels to elements in U' {subseteq} U.

Let g(x) denote the value of expression of the gene g in tissue sample x. Following the assumptions above we consider genes to be regulated according to a qualitative level such as up (u), neutral (n), or down (d). Let L be the set of such levels. For each gene and qualitative level we associate a fuzzy set and let µ(l, g, x) be the membership of x in the fuzzy set associated with gene g and level l.

Let the set of descriptors over G be D = G x L. Traditionally, these descriptors are viewed in a propositional context and the semantics of descriptor d = (g, l) with respect to U is defined as {x U | g(x) = l}. This means that d can be viewed as a characteristic function for the set of elements in U for which g(x) = l. We extend this view to a membership function and let the descriptor d serve as the membership of g(x) in l. In other words, for descriptor d = (g, l), we have that d(x) = µ(l, g, x). This allows the direct extension of the classical crisp set conjunction (and disjunction) as min (and max) of the two characteristic functions to the standard fuzzy case. We define a rule to be an element in R = 2D x C. For a rule r = (D', c), we denote the antecedent D' by ant(r) and the consequent c of r by cons(r). Further, an application r(x) of a rule r = (D', c) to an element x U, is defined as

(1)
We view r(x) as being the membership of x in class c according to r. We extend this notion of membership to a set of rules R and use it to classify as follows. Given a class c, the membership of x in c according to R is

We can now assign a class to x by choosing the one with maximal membership.

It is sometimes useful to be able to reject a classification when it is not possible to be sufficiently certain that a case belongs to any of the classes. We incorporate this notion as follows. Let

Also associate with each class label c C a threshold tc. The threshold tc is the threshold for which we reject a classification into class c. We can now formally define what we mean by classification. Given the associated thresholds tc, we define the classification of x according to R to be the set of classes that share the maximal membership according to R. Formally,

(2)
We choose to reject the classification if classR(x) contains more than one element. This happens in the case when either all classes were rejected or several classes share the same maximal membership for x.

2.3 Learning membership functions and rejection thresholds
In order to apply a set of rules, we need to know the membership functions corresponding to the descriptors in the rules, and the rejection thresholds for each class label occurring in the rules.

We propose here a simple scheme such that given a set U' {subseteq} U, we can learn both the membership functions and the rejection thresholds.

Let v < w and let the ramp function {rho} be defined as

Let s = v0, v1, ..., vn–1 be an increasing sequence of real numbers of size n. We associate n functions with s the following way:

A sequence of length three results in the triangular functions shown in Figure 1. We now order our set of labels L such that the meaning of label li is qualitatively less than the meaning of label lj for 0 ≤ i < j ≤ n – 1. An example would be L = {l0, l1}, where l0 = ‘down’ and l1 = ‘up’. Associate with each gene g an increasing sequence of real numbers sg of length n = |L|. We now let µ(li, g, x) = µi(x), where µi is defined as above for the sequence sg.



View larger version (7K):
[in this window]
[in a new window]
 
Fig. 1 Functions for sequence of length three.

 
We propose the use of quantiles over the observed expression values for gene g in U' as the sequence sg if nothing else is dictated by expert knowledge. We used n = 2, v0 = min(g(U')) and v1 = max(g(U')) in our experiments.

Having determined the membership functions, we propose the following rejection threshold for class c:

The threshold tc is then the minimal non-zero membership assigned over U'. We assume that careful selection of the training set U', preferably where the partial function c is defined, is beneficial.

2.4 Learning rule descriptors
We will apply the minimal description principle (Rissanen, 1984) in a way similar to what is standard in Rough Set theory (Pawlak, 1982). The idea is to find a set of rules that discerns between the classes and needs a minimum of information. We will interpret this somewhat vague statement as finding for each object in the available learning data set a minimal length rule that classifies this element according to the labels that exist in this data set.

Let U' {subseteq} U, such that c is defined on U', i.e. the class labels for elements in U' are known, and such that the expression levels gj(x) for all gj G are known for U', making U' a feasible training set.

Let L(g, x) denote the set of labels in L, in which the expression value g(x) has the maximal membership. Formally,

Let {delta} : G x U2 -> {0, 1} be defined such that {delta}(gi, xj, xk) = 1 if and only if gene gi can be used to discern between tissue samples xj and xk. We will call {delta} the discernibility predicate. We define {delta} to return 1 if and only if L(gi, xj) != L(gi, xk).

Akin to Skowron and Rauszer (1992) we define m{delta} : U2 x 2G -> 2G to be

Given a set of genes G, the discernibility predicate d, and xi, xj U, then m{delta}(xi, xj, G) is the set of genes from G such that each discerns between xi and xj according to {delta}. If a rule is to discern between xi and xj using genes from G, it will have to contain at least one descriptor containing g m{delta}(xi, xj, G). Extending this, if the rule is to discern between xi and xj for j != i, it needs to contain at least the descriptors corresponding to the genes in Gi' such that Gi' {cap} m{delta}(xi, xj, G) != {emptyset} for each j != i. As we are interested in discerning between the classes only, we can ignore xjs for which the class is the same as for xi, including only j Ii where

Furthermore, as we are looking for a minimal length rule, we wish to minimize Gi'. The resulting optimization problem corresponds exactly to the Minimum Hitting Set (MHS) problem (Garey and Johnson, 1979) for the collection {m{delta}(xi, xj, G)}j Ii. The MHS problem is known to be NP-Hard, but approximable within 1 + ln m where m is the number of sets in the collection [for references see Ausiello et al. (1999)].

Once a set Gi' has been found, a set of rules Ri is constructed as following. First a set of |Gi'| tuples {(gl, L(g, xi)| gl Gi'} which we will call an antecedent is created. It is possible that there exists a tuple in the antecedent that contains more than one label. Here is how we deal with this ambiguity. Let the collection Ti contain only this antecedent (set of tuples). If an antecedent in Ti contains a tuple (gl, {l1, l2}) with two elements in the second coordinate, essentially representing a disjunction, make a copy of that antecedent, set the original antecedent tuple to be (gl, {l1}) and set the corresponding copy antecedent tuple to be (gl, {l2}). Repeat this step until no further changes can be made. This is essentially using distributive (and commutative) laws to transform a boolean expression into conjunctive normal form. For example,

Now we have a collection of antecedents, each of which contains tuples with exactly one label.

Transform each tuple (g, {l}) of an antecedent to be (g, l), i.e. transforming each singleton label set to be the label it contains. The final step is to construct Ri by taking each antecedent D' in Ti and forming a rule (D', c(xi)) in Ri. Usually Ri will contain one rule as the disjunction above is not all that common.

EXAMPLE 1. Let Table 1 be a data table ‘discretized’ using the label of maximal memberships. Let the rows correspond to the elements U' = {x1, x2, x3, x4}, let G = {g1, g2, g3}, and let {delta} be the discernibility predicate. We then have that

Further we have that

Since {delta} is symmetric, so is m{delta}. Combining these, we have that for each element, the collections for which we have to solve the MHS problem are:

The corresponding sets of solutions are:

If we choose the solutions marked with an asterisk (*), we get the following unique rules:

2.5 Rule filtering
A rule represents a projection of a set of data items onto the subdimensions found in the descriptors of its antecedent. As there are many equivalent projections for each data item that can be used to construct a particular rule, and the choice is somewhat made arbitrarily as in Example 1 above, an implementation will sometimes give the user the option of generating rules from multiple projections and combining the results using a voting scheme. The result is a larger coverage of the input space. An implementation of this is found in the ROSETTA system (Komorowski et al., 2002). One of the main drawbacks of this approach is the large number of rules it results in. In order to reduce the number of rules, one can filter them empirically, i.e. keep the ones that seem to work best on selected data. Ägotnes (1999) presents and discusses several ways of filtering propositional rules. We employ a simple but efficient, both with respect to computational resources and efficacy, set-cover-based filtering approach. To the best of our knowledge, this approach is novel.

According to the way we present rule application above, each rule can be interpreted as modeling a membership function in the set corresponding to its consequent. Let U' {subseteq} U be the set we wish to filter the rules R on. Recall that c is a partial function on U returning the class label on elements in its domain. We assume that c is defined on U'. Let mij, corresponding to the application of rule i to element xj, be defined as

(3)
This means that mij contains the membership value that rule ri assigns to its consequent for element xj if ri is ‘correct’. Otherwise mij is assigned the negative of that membership. We note that according to how we defined the application of fuzzy rules, that given a set R' {subseteq} R and an element xj U', we have that R' correctly classifies xj if the absolute value of the most negative of {mij | ri R'} is smaller than the value of the most positive element of the same set. We use this to formulate the filtering problem as finding the smallest subset R' such that it correctly classifies a maximum of elements in U', essentially a weighted set cover problem.

2.6 Implementation issues
Let be a collection of sets, and let n be the cardinality of the union G of the elements in . Then, the number of minimal hitting sets of is bounded by the size of an anti-chain in the the lattice 2G under set inclusion. This bound is exponential in the size of G. The MHS problem is to find a minimal hitting set with minimum cardinality. The algorithm we use to find a solution approximation to the MHS problem is a greedy algorithm based on Johnson’s greedy algorithm (Johnson, 1974) for the Minimum Set Cover Problem. Starting with an empty set as a candidate solution, it greedily selects an element with the maximal number of occurrences in , adds this to the candidate solution, and deletes all sets in containing this element. This is repeated until is empty, and the candidate solution set is returned as the solution. This method is guaranteed to return a minimal hitting set (but not necessarily one of minimum cardinality). A similar weighted algorithm was used to find an approximate solution to the empirical filtering of rules problem.


    3 EXPERIMENTS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 EXPERIMENTS
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 
In order to validate our approach, we compare our methods to what is the de facto standard classification method in the biomedical field, logistic regression. The comparison is made in terms of classification performance as measured by the C-index or alternatively the area under the Receiver Operating Characteristic Curve (AUC) (Harrell et al., 1982; Hanley and McNeil, 1982). In order to make a robust assessment of differences, we applied the 5 x 2 cross-validation test proposed by Alpaydin (1999) an improvement over the test proposed by Dietterich (1998).

3.1 Datasets
We used five published cancer datasets (Bhattacharjee et al., 2001; Febbo et al., 2002; Golub et al., 1999; Ramaswamy et al., 2001; NCBI-NLM, 2004, http://www.ncbi.nlm.nih.gov/geo/). After preselection all datasets except the sdataset contained measurements for 200 genes. The sdata set contained 130 such measurements. The SAGE dataset sdata was compiled from datasets taken from the National Center for Biotechnology Information's Gene Expression Omnibus. The dataset included samples with over 10 000 total tags. In each sample, the tags with a count of 1 were removed because this was considered as a noise (Lash et al., 2000). The expression value unit used was tags per million (TPM) [(1 000 000 * count)/(totaltags)].


    4 RESULTS
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 EXPERIMENTS
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 
The results indicate that the fuzzy classifiers performed better than the logistic regression classifiers using a comparable number of genes. However, the performance difference was not statistically significant between the paired results. Across all datasets we summarize the results, as in Table 3. Broken down per dataset the results are summarized in Table 4.


View this table:
[in this window]
[in a new window]
 
Table 3 Overall results

 

View this table:
[in this window]
[in a new window]
 
Table 4 Per dataset results

 
As an example of what a fuzzy rule classifier looks like, we chose the smallest generated fuzzy classifier for the well known Leukemia dataset. The four rules using five genes in total comprising this classifier are shown in Table 5. The genes used in this set of rules and in the corresponding logistic regression classifier can be seen in Table 6. The performance of the fuzzy rules measured by the AUC was 0.962 with a standard error of 0.03 computed as proposed by Hanley and McNeil (1982). The corresponding values for the logistic regression was 0.878 with a standard error of 0.05. Figures 2 and 3 contain a heat map and a dendrogram of the hierarchical complete linkage clustering using Eucledian distance of the data using the genes in the rules and selected by SAS for logistic regression respectively.


View this table:
[in this window]
[in a new window]
 
Table 5 The fuzzy rules for the leukemia dataset example

 

View this table:
[in this window]
[in a new window]
 
Table 6 Selected genes for the leukemia data set example

 


View larger version (26K):
[in this window]
[in a new window]
 
Fig. 2 Heat map and clustering of the leukemia data using genes found in the fuzzy rule classifier.

 


View larger version (61K):
[in this window]
[in a new window]
 
Fig. 3 Heat map and clustering of the leukemia data using genes selected by the SAS system for the logistic regression classifier.

 

    5 DISCUSSION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 EXPERIMENTS
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 
We present a fuzzy set framework for implementation of classifiers. We illustrate our implementation using five different gene expression datasets drawn from different laboratories and measurement technologies. Our models result in short rules that are easy to interpret and exhibit classification performances that compare favorably with those of logistic regression models, one of the standard classification methodologies applied in the biomedical domain.

This study has important limitations. The sample sizes were extremely small, and there may be an optimistic bias reflected in the results even with the use of a 5 x 2 cross-validation scheme. We compared our classifiers with those derived using multivariate logistic regression models. The latter have theoretical limitations that, although difficult to verify in practice, may come into play in one or more datasets that we utilized. If, for example, the data are not linearly separable, then more flexible models such as logistic regression models with interaction terms, artificial neural networks, or support vector machines with non-linear kernels might be more appropriate.

More importantly, although our rules were cross-validated in previously unseen samples, their biological significance needs to be verified. It may be the case that the rules have statistical significance but that their translation into biological significance is meaningless. If this is the case, our argument would not be true in practice. Extensive investigation involving domain experts and other measurement instruments would be necessary to demonstrate that the approach we propose is superior to the current status quo.


View this table:
[in this window]
[in a new window]
 
Table 2 The five datasets summarized

 


    Acknowledgments
 
This work was supported in part by grant 1 R01 LM07273 from NIH.

Received on August 13, 2004; revised on November 18, 2004; accepted on December 17, 2004

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODS
 3 EXPERIMENTS
 4 RESULTS
 5 DISCUSSION
 REFERENCES
 

    Ågotnes, T. (1999) Filtering large propositional rule sets while retaining classifier performance. Master's thesis Norwegian University of Science and Technology.

    Alpaydin, E. (1999) Combined 5 x 2CV F test for comparing supervised classification learning algorithms. Neural Comput., 11, 1885–1982[Abstract/Free Full Text].

    Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A., Protasi, M. Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties, (1999) , Berlin, Heidelberg Springer-Verlag.

    Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J.-F., Gandrillon, O. (2002) Strong-association-rule mining gene-data analysis: a case study on human sage data. Genome Biology, 3, RESEARCH0067[Medline].

    Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., et al. (2001) Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl Acad. Sci. USA, 98, 13790–13795[Abstract/Free Full Text].

    Butterworth, R., Simovici, D., Santos, G.S., Ohno-Machado, L. (2004) A greedy algorithm for supervised discretization. J. Biomed. Inform., 37, 285–292[CrossRef][ISI][Medline].

    Creighton, C. and Hanash, S. (2003) Mining gene expression databases for association rules. Bioinformatics, 19, 79–86[Abstract/Free Full Text].

    Dietterich, T.G. (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput., 10, 1895–1923[Abstract].

    Febbo, P.G., Singh, D., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D'Amico, A.V., Richie, J.P., et al. (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1, 203–209[CrossRef][ISI][Medline].

    Gamberger, D., Lavrac, N., Zelezny, F., Tolar, J. (2004) Induction of comprehensive models for gene expression datasets by the subgroup discovery methodology. J. Biomed. Inform., 37, 269–284[CrossRef][ISI][Medline].

    Garey, M.R. and Johnson, D.S. Computers and Intractability: A Guide to the Theory of NP-Completeness, (1979) , New York, NY W.H. Freeman and Company.

    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537[Abstract/Free Full Text].

    Hanley, J.A. and McNeil, B.J. (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143, 29–36[Abstract/Free Full Text].

    Harrell, F.E., Jr, Califf, R.M., Pryor, D.B., Lee, K.L., Rosati, R.A. (1982) Evaluating the yield of medical tests. J. Am. Med. Assoc., 247, 2543–2546[Abstract].

    Hunter, L., Taylor, R.C., Leach, S.M., Simon, R. (2001) Gest: a gene expression search tool based on a novel bayesian similarity metric. Bioinformatics, 17, Suppl., S115–S122[Abstract].

    Hvidsten, T.R., Lægreid, A., Komorowski, J. (2003) Learning rule-based models of biological process from gene expression time profiles using gene ontology. Bioinformatics, 19, 1116–1123[Abstract/Free Full Text].

    Jaroszewicz, S., Simovici, D., Kuo, W., Ohno-Machado, L. (2004) A metric on partitions derived from the Goodman–Kruskal association index and its applications in genetic diagnosis of cancer. IEEE Trans. Biomed. Eng., 51, 1095–1102[CrossRef][ISI][Medline].

    Johnson, D.S. (1974) Approximation algorithms for combinatorial problems. J. Comput. Syst. Sci., 9, 256–278.

    Kauffman, S., Peterson, C., Samuelsson, B., Troein, C. (2003) Random boolean network models and the yeast transcriptional network. Proc. Natl Acad. Sci. USA, 100, 14796–14799[Abstract/Free Full Text].

    Komorowski, J., Øhrn, A., Skowron, A. (2002) The ROSETTA rough set software system. In Klösgen, W. and Zytkow, J. (Eds.). Handbook of Data Mining and Knowledge Discovery, , Oxford, UK ISBN 0-19-511831-6 Oxford University Press, pp. 554–559.

    Lash, A.E., Tolstoshev, C.M., Wagner, L., Schuler, G.D., Strausberg, R.L., Riggins, G.J., Altschul, S.F. (2000) Sagemap: a public gene expression resource. Genome Res., 10, 1051–1060[Abstract/Free Full Text].

    Li, J., Liu, H., Downing, J.R., Yeoh, A.E.-J., Wong, L. (2003) Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (all) patients. Bioinformatics, 19, 71–78[Abstract/Free Full Text].

    NCBI-NLM. (2004) Ncbi-gene expression omnibus.

    Ohno-Machado, L., Vinterbo, S.A., Weber, G. (2002) Classification of gene expression data using fuzzy logic. J. Intell. Fuzzy Syst., 12, 19–24.

    Pawlak, Z. (1982) Rough sets. Int. J. Inf. Comput. Sci., 11, 341–356[CrossRef].

    Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P., et al. (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl Acad. Sci. USA, 98, 15149–15154[Abstract/Free Full Text].

    Ressom, H., Reynolds, R., Varghese, R.S. (2003) Increasing the efficiency of fuzzy logic-based gene expression data analysis. Physiol. Genomics, 13, 107–117[Abstract/Free Full Text].

    Rissanen, J. (1984) Universal coding, information, prediction, and estimation. IEEE Trans. Inf. Theory, IT-30, 629–636[CrossRef].

    Skowron, A. and Rauszer, C. (1992) The discernibility matrices and functions in information systems. Series D: System Theory, Knowledge Engineering and Problem Solving, , Dordrecht, The Netherlands Kluwer Academic Publishers Vol. 111, , pp. 331–362.

    Soinov, L.A., Krestyaninova, M.A., Brazma, A. (2003) Towards reconstruction of gene networks from expression data by supervised learning. Genome Biol., 4, R6[CrossRef][Medline].

    Thomas, J.G., Olson, J.M., Tapscott, S., Zhao, L.P. (2001) An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res., 11, 1227–1236[Abstract/Free Full Text].

    Troyanskaya, O.G., Garber, M.E., Brown, P.O., Botstein, D., Altman, R.B. (2002) Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics, 18, 1454–1461[Abstract/Free Full Text].

    Woolf, P.J. and Wang, Y. (2000) A fuzzy logic approach to analyzing gene expression data. Physiol. Genomics, 3, 9–15[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/9/1964    most recent
bti287v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Vinterbo, S. A.
Right arrow Articles by Ohno-Machado, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Vinterbo, S. A.
Right arrow Articles by Ohno-Machado, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?