Bioinformatics Advance Access originally published online on September 20, 2005
Bioinformatics 2005 21(22):4148-4154; doi:10.1093/bioinformatics/bti681
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Classification of microarrays to nearest centroids
Department of Biostatistics, University of Washington Seattle 98195, USA
| ABSTRACT |
|---|
|
|
|---|
Motivation: Classification of biological samples by microarrays is a topic of much interest. A number of methods have been proposed and successfully applied to this problem. It has recently been shown that classification by nearest centroids provides an accurate predictor that may outperform much more complicated methods. The Prediction Analysis of Microarrays (PAM) approach is one such example, which the authors strongly motivate by its simplicity and interpretability. In this spirit, I seek to assess the performance of classifiers simpler than even PAM.
Results: I surprisingly show that the modified t-statistics and shrunken centroids employed by PAM tend to increase misclassification error when compared with their simpler counterparts. Based on these observations, I propose a classification method called Classification to Nearest Centroids (ClaNC). ClaNC ranks genes by standard t-statistics, does not shrink centroids and uses a class-specific gene-selection procedure. Because of these modifications, ClaNC is arguably simpler and easier to interpret than PAM, and it can be viewed as a traditional nearest centroid classifier that uses specially selected genes. I demonstrate that ClaNC error rates tend to be significantly less than those for PAM, for a given number of active genes.
Availability: Point-and-click software is freely available at http://students.washington.edu/adabney/clanc
Contact: adabney{at}u.washington.edu
Supplementary Information: http://students.washington.edu/adabney/clanc/supplement.pdf
| INTRODUCTION |
|---|
|
|
|---|
Gene-expression microarrays (Schena et al., 1995) can be used to discriminate between multiple clinical or biological classes. A classifier is built from training data, consisting of expression profiles for samples of known class. Many methods exist for building classifiers, but the overall goal is the same: (1) find characteristics that define each class, and (2) build a function that compares the expression profile of an unknown sample with each class on the basis of these defining characteristics. The unknown sample is assigned to the class to which it is most similar.
Candidate classifiers are evaluated on the basis of (1) accuracy, (2) interpretability, and (3) practicality. High accuracy corresponds to low misclassification error. Interpretability may or may not be important, depending on the application. Often, it is desirable to learn something about the underlying processes at play in addition to classifying accurately. It may be very difficult to characterize the contribution of a single gene to the overall classifier (in turn making it difficult to learn about the biological functionality of that gene) if a complicated method is used. A complicated classifier may then be rejected for a simpler alternative, even if the simpler alternative does not perform as well. Finally, it must be practical to implement the classifier. Because of the resources required to form a microarray, it is desirable to base the classifier on the fewest genes possible.
Linear Discriminant Analysis (LDA) (Mardia et al., 1979) is a classical method of prediction that is simple to understand and has been shown to perform well with microarrays (Dudoit et al., 2002; Lee et al., 2005). Each class is characterized by its vector of means or centroid. An unknown sample is evaluated by computing the scaled distance between its expression profile and each class centroid. The unknown is assigned to the class to which it is nearest. Thus, LDA can be thought of as a nearest centroid classifier.
To identify the fewest genes necessary for discrimination, a feature-selection step can be added. Note that the goal of this step will depend on the setting. For example, with unlimited resources, we may wish to use all relevant genes in the classifier; although, we may be able to find a subset of genes that classify just as well as (or even better than) the complete set. In other settings, it may be necessary to make tradeoffs between accuracy and practicality.
An obvious way to incorporate feature-selection into LDA is to compute test statistics for each gene that measure that gene's ability to distinguish the classes, rank those statistics and choose only the top genes from this list to base the classifier on. In early work, Dudoit et al. (2000, 2002) used F-statistics for ranking genes. The Prediction Analysis of Microarrays (PAM) method (Tibshirani et al., 2002) uses what are essentially t-statistics. However, instead of the usual t-statistic, PAM adds a fudge-factor to each statistic's denominator. This prevents genes with large t-statistics but small mean differences (numerators) from being selected. A further level of complexity is added to PAM by shrinking the class centroids toward their overall mean. Thus, PAM can be thought of as a nearest shrunken centroid classifier.
I present here an alternative LDA-based classifier that I call ClaNC, for Classification to Nearest Centroids. ClaNC (1) does not shrink centroids, (2) uses unmodified t-statistics to select genes, (3) carries out class-specific feature selection, and (4) allows each gene to be active in at most one class. I first individually evaluate the four proposed changes on simulated datasets (Table 1). Shrinkage, fudge factors and class-nonspecific gene selection can all increase misclassification error, depending on the situation. The error rates for ClaNC are substantially lower than those for PAM at all considered numbers of active genes. Furthermore, ClaNC error estimates are consistently less variable than their PAM analogs. I then compare PAM and ClaNC on four previously published datasets (Tables 25 and Fig. 1). I again evaluate each of the four proposed changes individually. Overall, shrinkage, fudge factors and class-nonspecific gene selection tend to increase error in the real examples. There is some evidence that repeat representations of a single gene can also increase error. ClaNC error rates again tend to be smaller than those for PAM. Surprisingly, then, LDA-based classifiers that are even simpler than PAM can perform very well.
|
|
|
|
I note briefly some of the many other classification methods besides those based on LDA. These include neural networks (Khan et al., 2001), support vector machines (Ramaswamy et al., 2001), CART (Breiman et al., 1984), random forests (Breiman, 2001), and methods based on generalized linear models (Zhu and Hastie, 2004; Nguyen and Rocke, 2004). LDA-based methods have been shown to perform well when compared with more complicated classifiers (Dudoit et al., 2002; Lee et al., 2005). Furthermore, the simplicity of LDA-based methods arguably makes them preferable to more complicated alternatives in settings where interpretation of the classifier is important.
| SYSTEM AND METHODS |
|---|
|
|
|---|
Linear Discriminant Analysis (LDA)
We would like to classify unknown samples into one of K classes. To build a classifier, we obtain nk training samples per class, k = 1, 2, ..., K, with m genes on each microarray. For each training sample, we observe class membership Y and expression profile X. For simplicity, I will represent the classes by the numbers 1, 2, ..., K. Note that each expression profile is a vector of length m. We assume that expression profiles from class k are distributed as N (µk,
), the multivariate normal distribution with mean vector µk and covariance matrix
. Call L(·; µk,
) the corresponding probability density function. Finally, we agree upon prior probabilities
k that an unknown sample comes from class k, k = 1, 2, ..., K.
Bayes' theorem states that the probability that a sample comes from class k, given that sample's expression profile, is proportional to the product of the class density and prior probability:
![]() | (1) |
![]() | (2) |
The innards of the right side of Equation (2) are proportional to
![]() | (3) |
is the same for all classes, only the exponential component of Equation (3) is relevant to classification. We can then rewrite Equation (2) as
![]() | (4) |
), where ||xµ||2 = (x µ)T
1(x µ) is the square of the Mahalanobis distance between x and µ.
We can further simplify the problem by assuming independence between genes. This allows us to simplify the LDA classification rule (4) to
![]() | (5) |
![]() | (6) |
genes, class centroids are formed using only the genes corresponding to the largest
F-statistics. All other genes are discarded.
Prediction Analysis of Microarrays
The current standard in LDA-based classification for microarrays is PAM (Tibshirani et al., 2002). Instead of F-statistics, PAM uses the statistics
![]() | (7) |
The shrinkage approach used by PAM is soft-thresholding. For a particular choice of shrinkage parameter
, the shrunken statistic is
![]() | (8) |
in absolute value are shrunken to zero, and the rest are shrunken to somewhere between zero and their original values. The shrinkage of the remaining statistics toward zero is intended as a de-noising step. We can then rewrite Equation (7) with the shrunken statistics to produce corresponding shrunken centroids
![]() | (9) |
,
equal zero have shrunken centroid components that equal the corresponding components of the overall centroid. When distances from a new sample to the shrunken class centroids are computed in Equation (5), the components for these inactivated genes are identical for each class. Hence, they do not contribute to the classification and do not need to be measured in a new sample. We call the genes with at least one shrunken centroid component different from the corresponding overall centroid component the active genes and the rest the inactive genes.
A new sample is classified by comparing its expression profile with each shrunken centroid, over the
genes that remain active after shrinkage. Distances are assessed as in Equation (5), with the shrunken centroid
replacing µik and si + s0 replacing
i. Any prior information on class prevalences can be included in
k. One simple choice is
k = nk/n, placing prior weights on each class in proportion to its sample prevalence; another is
k = 1/K, placing equal prior weights on each class.
In a typical application of PAM, the shrinkage parameter
is allowed to vary over a wide range. For each
, a classifier is built and its error rate is estimated by cross-validation. The value of
to use in the final classifier is chosen from a plot of
against error rate, where the number of unique active genes corresponding to each
is also displayed. I argue that
does not mean as much as the number of active genes. The decision of how much error to tolerate will be made with cost and convenience in mind, and these are best gauged by the number of active genes.
In short, a PAM classifier selects
genes by (1) soft thresholding the modified t statistics dk, then (2) using the chosen, shrunken statistics to update the class centroids. The classifier can be represented by the shrunken centroid components and pooled standard deviations of the active genes, since these are the only components needed in the distance function (5). Each centroid is now of length
, with i-th component somewhere between that gene's class and overall means. Nothing is known about the distribution of active genes across classes. The selected genes are interpreted as simultaneously distinguishing all classes from each other.
Classification to Nearest Centroids
Shrinkage and fudge factors are intended to denoise the data, stabilizing the statistics used. Although shrinkage has been shown to produce more accurate mean estimates in noisy data (Donoho and Johnstone, 1994), it is not clear that this should translate to increased prediction accuracy. Furthermore, PAM's shrinkage procedure makes the class centroids look more similar to each other. It is unclear why this should be expected to make it easier to distinguish the classes from each other; this issue is discussed further below. For consistency with the preceeding discussion, we can formulate the gene selection process without shrinkage as
![]() | (10) |
The stated justification for fudge factors is that they guard against the possibility of large (test statistics) arising by chance from genes with low expression values (Tibshirani et al., 2002). However, an extreme t-statistic indicates a significant mean difference, regardless of whether its numerator is large or small. Mean differences that arise by chance will tend to have small t-statistics. In practice, then, fudge factors may remove from consideration informative genes with small mean differences and thus actually increase error.
Typically, genes are selected in a class-nonspecific manner by applying a common threshold to all test statistics. Suppose that class k0 is more heterogeneous than the others and hence more difficult to characterize; alternatively, k0 may simply have less training samples than the rest of the classes. Then the components of the vector
will tend to be closer to zero than their counterparts in the other classes, and class-nonspecific selection may not choose any class k0 genes. With this in mind, I consider class-specific selection
![]() | (11) |
k are chosen for each class. In practice, there is still a single tuning parameter: the number of active genes per class.
I note that a subsequent PAM publication proposed something similar to this (Tibshirani et al., 2003). Class-specific thresholds were chosen in an adaptive manner by including another scale parameter in the denominator of Equation (7), as in
![]() | (12) |
k are restricted to min(
k) = 1. In classes for which prediction error is highest, the scale parameter is decreased to allow more active genes for those classes, and the
k are rescaled so that min(
k) = 1 (see the comments in the Discussion section).
It is possible that a single gene will have more than one extreme t-statistic. In this case, if we eliminate all but the top
statistics, the number of (unique) active genes is less than
. It may be that the multiple extreme statistics for a class are redundant, and hence it would be inefficient to include more than one. Furthermore, it may be intuitively appealing for each selected gene to characterize a single class. With this in mind, I consider allowing each gene to be active in at most one class. If a gene has more than one extreme t-statistic, only the largest in absolute value is chosen. The others are set to zero. Note that this does not exclude genes with more than one extreme t-statistic. Such genes will be included in the classifier, but they will only be active in one class.
Combining all of the above, I propose an alternative LDA-based classifier called ClaNC, for Classification to Nearest Centroids. ClaNC (1) does not shrink centroids, (2) does not use fudge factors, (3) carries out class-specific gene selection, and (4) allows each gene to be active in at most one class. Thus, each active gene characterizes exactly one class. Its centroid component for that class equals its class-specific mean. Its centroid components in all other classes equal its overall mean. A ClaNC classifier can be represented by the centroid components and pooled standard deviations of the active genes. Out of
active genes, there are
genes that characterize each class by default. Note, however, that ClaNC allows one to easily choose a different distribution of genes across classes. For example, it is straightforward to form a ClaNC classifier with more than
characteristic genes in one class and less in another. The selected genes are interpreted as uniquely characterizing each class.
The role of shrinkage in classification
As mentioned earlier, the LDA classification rule minimizes misclassification error. This is because the LDA rule equals the Bayes' rule (Mardia et al., 1979). For simplicity, assume all genes are independent with variance one, and that each class has equal prior probability. Then the Bayes' rule is to classify a new sample to the class for which
is smallest. However, we must estimate the centroids µk in practice, using
in their place. Suppose x* comes from class k0. Then, expanding the squared distance between x* to class k0 and taking expectations, we have
![]() | (13) |
Reducing the MSE of
will bring us closer to the Bayes' rule. According to the Stein Paradox of statistics (Stein, 1956), we can reduce the MSE of
by shrinking toward
(or any other constant). In our setting, this suggests shrinking each centroid across its m components. Note, however, that PAM shrinks each gene toward its overall mean. In other words, PAM shrinks each gene across its K classes. Thus, although shrinkage could in principle improve prediction accuracy, PAM apparently shrinks in the wrong direction. Furthermore, by shrinking across classes, all class centroids are made more similar to each other. It is unclear why this would be expected to improve classification accuracy. It is true, as one reviewer maintained, that shrinkage can be carried out across classes. However, owing to the much higher number of genes than classes, one would expect the greatest gain from shrinking across genes.
Even when shrinking each centroid across genes, it is not necessarily clear that prediction accuracy will be improved. If, for example, all class centroid estimates have their MSEs reduced equally, then there may be no change in the relative relationships of the centroid estimates. Preliminary investigation suggests that this is the case. However, I intend to perform a more thorough investigation of shrinkage for classification in future work.
| RESULTS |
|---|
|
|
|---|
I first consider three simulated set-ups, with 35 simulations per set-up. There are K = 4 classes, with nk = 30 samples in each class and m = 5000 genes under consideration. Half of the genes provide no discriminatory information. In the other half, the class means differ randomly from each other. The first simulation allows for noisy data, where there is significant heterogeneity between genes. Overall means µi0, i = 1, 2, ..., m, were generated from the N(0, 1) distribution. Each class mean µik, i = 1, 2, ..., m, k = 1,2, ..., K, was the overall mean plus a draw from the U(1, 1) distribution. Observations were then generated from the N(µik,
i) distributions, where the squares of the
i were drawn from the
distibution. The second simulation has one class being easier to distinguish than the rest. Overall means were generated as above. The class means for class one were the overall means plus draws from the U(2, 2) distribution. Those for the other three classes were the overall means plus draws from the U(1, 1) distribution. Observations were then generated from the N(µik, 1) distributions. The third simulation is similar, now with one class being more difficult to distinguish than the rest. Class one means were overall means plus draws from the U(1, 1) distributions, while those for the other three classes were the overall means plus draws from the U(2, 2) distribution. I compared six classifiers: PAM, PAM without shrinkage (I), PAM without fudge factors (II), PAM with class-specific gene selection (III), PAM with unique features (IV) and ClaNC. The first 15 samples in each class were used for training the classifiers, and the remaining 15 samples were used as test data. For each of 35 simulations within each simulation setup, misclassification error was computed using the 15 test samples. Table 1 shows the average test error over the 35 simulations, together with standard error estimates. In all three simulations, all classifiers considered attain zero error when using all 2500 relevant genes. The most apparent differences occur at low numbers of active genes, and so we only report these. Note that, although ClaNC is formulated in terms of the number of active genes per class, all results are presented in terms of the total number of active genes.
In simulation one, fudge factors increase error. The heterogeneity across genes in variance creates many large t-statistics corresponding to relatively small mean differences. However, an extreme t-statistic indicates a significant mean difference, regardless of whether its numerator is large or small. It is not surprising then that fudge factors increase error here, since many informative genes have been excluded. In simulation two, class-nonspecific gene selection increases error. This is because the t-statistics for class one tend to be more extreme than those for the other classes. As a result, more genes are selected that characterize class one than are selected for the other classes, leading to poor classification in the other classes. In simulation three, class-nonspecific selection and shrinkage increase error. This is because the t-statistics for class one tend to be less extreme than those for the other classes. As a result, fewer genes are selected that characterize class one than are selected for the other classes, leading to poor classification in class one. Note finally that the PAM classifiers on these simulations tend to have greater variability than their ClaNC analogs, probably as a result of the increased simplicity of ClaNC.
I now compare PAM and ClaNC on four previously published cDNA microarray experiments. In each analysis, any missing values were imputed using k-nearest neighbors (Troyanskaya et al., 2001) with k = 10. I compare the methods on the basis of error rates from 5-fold cross-validation. I avoid gene-selection bias by completely rebuilding classifiers to identical specifications in each cross-validation iteration (Ambroise and McLachlan, 2002). Cross-validated error rates are nearly unbiased, being slightly conservative (Ambroise and McLachlan, 2002; Hastie et al., 2001), and they are thus sufficient for comparing classifiers. Figure 1 and Tables 25 compare the performance of PAM and ClaNC on the examples; nominal error rates will change with example, being the expected error for non-informative data. I have listed the top few genes selected by ClaNC in each example in Supplementary Tables 14. There is very good agreement between the genes chosen by ClaNC and those chosen by previously published classifiers.
|
Small round blue cell tumors
The first example involves small round blue cell tumors (SRBCT) of childhood (Khan et al., 2001). Expression measurements were made on 2307 genes in 83 SRBCT samples. The tumors were classified as Burkitt lymphoma, Ewing sarcoma, neuroblastoma or rhabdomyosarcoma. There are 11, 29, 18 and 25 samples in each respective class. As seen in Figure 1 and Table 2, PAM requires 20 genes to drop the cross-validation error below 10%, whereas ClaNC needs only 8. Note that the PAM misclassification error estimate for 40 active genes (0.05) is slightly lower than that for ClaNC (0.06). However, the differences are not statistically significant. The standard errors are 0.05 and 0.025, with a two-sample t-test giving a P-value of 0.58. Shrinkage and fudge factors increase error in this example.
Lymphoma
In the second example, expression measurements were made on 4026 genes in 58 lymphoma patients (Alizadeh et al., 2000). The tumors were classified as diffuse large B-cell lymphoma and leukemia, follicular lymphoma, and chronic lymphocytic leukemia. There are 42, 6 and 10 samples in each respective class. As seen in Figure 1 and Table 3, PAM error rates are >10% with even 45 genes. Meanwhile, ClaNC requires only three genes for 10% error. Each of shrinkage, fudge factors, class-nonspecific selection and repeat representations of a gene increase error in this example.
|
NCI cancer cell lines
The third example involves the cell lines used in the National Cancer Institute's screen for anti-cancer drugs (Ross et al., 2000; Scherf et al., 2000). Expression measurements were made on 6830 genes in 60 cell tumors. There are representative cell lines for each of lung cancer, prostate cancer, CNS, colon cancer, leukemia, melanoma, NSCLC, ovarian cancer, renal cancer and one unknown sample. I filtered out 988 genes for which 20% or more of the tumors had missing values. I also excluded samples from prostate cancer (there being only two samples) and the one unknown sample. There are 9, 6, 7, 6, 8, 7, 6 and 8 samples in each remaining respective class. Classification is more difficult in this example, at least partly owing to there being so many classes and few samples per class. PAM requires 80 genes for <45% error, whereas ClaNC needs only 16. The PAM misclassification error estimate for 102 genes (0.30) is less than that for ClaNC (0.35). Again, the differences are not statistically significant. The standard errors are 0.07 and 0.045, with a two-sample t-test giving a P-value of 0.67. Class-nonspecific selection and shrinkage increase error in this example. There is weak evidence that fudge factors decrease error.
Leukemia
The fourth example involves acute myeloid leukemia (AML) and acute lymphblastic leukemia (ALL) (Golub et al., 1999). The public version of the training data used in the original analysis include expression measurements on 3857 genes in 38 leukemia patients. There are 11 and 27 samples in each respective class. PAM requires 20 genes for 5% error, whereas ClaNC needs only 10. Shrinkage increases error in this example. Fudge factors, class-nonspecific gene selection and repeat representations of a gene increase error for very small numbers of active genes and decrease error somewhat for larger numbers of active genes.
| DISCUSSION |
|---|
|
|
|---|
LDA-based methods have been successful in classifying microarrays. Surprisingly, I have found that shrinkage and fudge factors tend to actually increase misclassification error. Also, selecting genes by class appears to offer improvements in performance. Finally, I have suggested the selection of genes that uniquely characterize each class. Based on these observations, I have proposed a new LDA-based classifier called ClaNC. The classifier ClaNC does not use shrinkage or fudge factors and hence is very simple. Also, selected genes in a ClaNC classifier are naturally interpreted as uniquely characterizing a single class. I have demonstrated that ClaNC error rates tend to be substantially lower than their PAM counterparts in each of several examples, both simulated and real. Finally, I have provided freely available point-and-click software for ClaNC.
Tibshirani et al. (2003) demonstrated that adaptive thresholds sometimes improve PAM performance. Applying the adaptive thresholds to the examples listed above, I found that ClaNC errors are still lower than those for PAM. Furthermore, the adaptive thresholds are not generally available to users, as they are not implemented in the PAM point-and-click software.
Other aspects of LDA-based classification for microarrays could be further investigated. For example, all LDA-based classifiers use univariate statistics to select genes. The top genes from this list will be those that individually best distinguish the classes. However, we would ideally like to select the best collection of genes. It may be inefficient to select genes from the list of t-statistics if some of the top genes act in similar ways. I am currently working on an LDA-based classifier that would choose optimal collections of genes.
| Acknowledgments |
|---|
I greatly appreciate the helpful comments of John D. Storey on Stein's Paradox in the context of classification. I am also thankful to the reviewers for many helpful comments and suggestions. This research was supported in part by the Cancer-Epidemiology and Biostatistics Training Grant 5T32CA009168-29, NIH grant 1 U54 GM2119-03, and a traineeship at the Los Alamos National Laboratory.
Conflict of Interest: none declared.
Received on August 15, 2005; revised on September 15, 2005; accepted on September 17, 2005
| REFERENCES |
|---|
|
|
|---|
Alizadeh, A., et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 503511[CrossRef][Medline].
Ambroise, C. and McLachlan, G.J. (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl Acad. Sci. USA, 99, 65626566
Breiman, L. (2001) Random forests. Machine Learning, 45, 532[CrossRef][ISI].
Breiman, L., Friedman, J.H., Olshen, R., Stone, C.J. Classification and Regression Trees, (1984) , Belmont, CA Wadsworth.
Donoho, D. and Johnstone, I. (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81, 425455
Dudoit, S., et al. (2000) Comparison of discriminant methods for the classification of tumors using gene expression data. , Berkeley Technical Report #576 University of California.
Dudoit, S., et al. (2002) Comparison of discriminant methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc., 97, 7787[CrossRef][ISI].
Golub, T., et al. (1999) Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531536
Hastie, T., Tibshirani, R., Friedman, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction, (2001) , New York Springer.
Khan, J., et al. (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med., 7, 673679[CrossRef][ISI][Medline].
Lee, J.W., et al. (2005) An extensive comparison of recent classification tools applied to microarray data. Comput. Stat. Data Anal., 48, 869885[CrossRef].
Mardia, K., Kent, J., Bibby, J. Multivariate Analysis, (1979) , London Academic Press.
Nguyen, D.V. and Rocke, D.M. (2004) On partial least squares dimension reduction for microarray-based classification: a simulation study. Comput. Stati. Data Anal., 46, 407425[CrossRef].
Ramaswamy, S., et al. (2001) Multiclass cancer diagnosis using tumor gene expression signature. Proc. Nati. Acad. Sci. USA, 98, 1514915154
Ross, D., et al. (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat. Genet., 24, 227235[CrossRef][ISI][Medline].
Schena, M., et al. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270, 467470
Scherf, U., et al. (2000) A gene expression database for the molecular pharmacology of cancer. Nat. Genet., 24, 236244[CrossRef][ISI][Medline].
Stein, C. (1956) Inadmissability of the usual estimator for the mean of a multivariate distribution. Proc. Third Berkeley Symp. Math. Statist. Prob., 1, 197206.
Tibshirani, R., et al. (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl Acad. Sci., 99, 65676572
Tibshirani, R., et al. (2003) Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Stat. Sci., 18, 104117[CrossRef][ISI].
Troyanskaya, O., et al. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics, 17, 520525
Zhu, J. and Hastie, T. (2004) Classification of gene microarrays by penalized logistic regression. Biostatistics, 5, 427443[Abstract].
This article has been cited by other articles:
![]() |
F. Tai and W. Pan Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms Bioinformatics, July 15, 2007; 23(14): 1775 - 1782. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Mullins, L. Perreard, J. F. Quackenbush, N. Gauthier, S. Bayer, M. Ellis, J. Parker, C. M. Perou, A. Szabo, and P. S. Bernard Agreement in Breast Cancer Classification between Microarray and Quantitative Reverse Transcription PCR from Fresh-Frozen and Formalin-Fixed, Paraffin-Embedded Tissues Clin. Chem., July 1, 2007; 53(7): 1273 - 1279. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. A. Wood, P. M. Visscher, and K. L. Mengersen Classification based upon gene expression data: bias and precision of error rates Bioinformatics, June 1, 2007; 23(11): 1363 - 1370. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wang and J. Zhu Improved centroids estimation for the nearest shrunken centroid classifier Bioinformatics, April 15, 2007; 23(8): 972 - 979. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Shen, D. Ghosh, A. Chinnaiyan, and Z. Meng Eigengene-based linear discriminant model for tumor classification using gene expression microarray data Bioinformatics, November 1, 2006; 22(21): 2635 - 2642. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Dabney ClaNC: point-and-click software for classifying microarrays to nearest centroids Bioinformatics, January 1, 2006; 22(1): 122 - 123. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||















