Bioinformatics Advance Access originally published online on November 14, 2007
Bioinformatics 2008 24(1):135-136; doi:10.1093/bioinformatics/btm535
vbmp: Variational Bayesian Multinomial Probit Regression for multi-class classification in R
1Medical Statistics Unit, Department of Medicine and Public Health, Second University of Napoli, Italy and 2Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, Scotland, UK
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Summary: vbmp is an R package for Gaussian Process classification of data over multiple classes. It features multinomial probit regression with Gaussian Process priors and estimates class posterior probabilities employing fast variational approximations to the full posterior. This software also incorporates feature weighting by means of Automatic Relevance Determination. Being equipped with only one main function and reasonable default values for optional parameters, vbmp combines flexibility with ease of usage as is demonstrated on a breast cancer microarray study.
Availability: The R library vbmp implementing this method is part of Bioconductor and can be downloaded from http://www.dcs.gla.ac.uk/~girolami
Contact: nicola.lama{at}unina2.it
Supplementary information: Supplementary data are available at http://www.dcs.gla.ac.uk/~girolami
| 1 INTRODUCTION |
|---|
|
|
|---|
Multi-class classification in which the data consist of more than two classes is rapidly gaining attention in the literature. Typical bioinformatics applications include gene expression profiling-based diagnosis of tumours (e.g. Kote-Jarai et al. (2006)) and protein structure and fold recognition (e.g. Ding and Dubchak (2001)). Among the different approaches to classification, methods based on Gaussian Processes (GPs) have become very popular since the influential papers by Neal (1998); Williams and Barber (1998) which motivated the development of posterior approximations that are computationally appealing alternatives to the Markov Chain Monte Carlo approach. As such GP based models have been widely adopted in both regression and classification applications often being considered as alternatives, within the Bayesian inference framework (MacKay, 2003), to kernel machines. Recently, Girolami and Rogers (2006) proposed a variational approximation to exact Bayesian inference for multi-class Gaussian Processes (GPs) classification. The present implementation of the vbmp package provides the multinomial-probit regression model with GP priors adopting the Variational Bayesian (VB) methodology to obtain an estimate of the required predictive distributions over classes. Example results are presented on a synthetic dataset and on microarray data from the breast cancer study by Kote-Jarai et al. (2006).
| 2 IMPLEMENTATION |
|---|
|
|
|---|
The vbmp package implements both training and testing of the multi-class classifier in the homonym function for R software (R Development Core Team, 2005). Several types of kernel functions are available: Gaussian, Cauchy, Laplace, polynomial, homogeneous polynomial, thin-plate spline, linear spline and the inner-product kernels. Arbitrary polynomial order can be specified by adding the grade number at polynomial kernel identifier.
The length-scale of kernel function parameters can either be provided as an argument or may be inferred by the method during the training process. In the second case, the method employs importance sampling to obtain posterior mean estimates of the kernel parameters. The location and scale parameters of the Gamma prior over covariance parameters defaults to 10–6, but can be made more informative by the user when deemed appropriate.
The package provides accessor methods to retrieve the main properties from the object returned by the vbmp function. In particular, these methods return estimates of class-specific probability values, lower bound of the model marginal likelihood, predictive likelihood and of the out-of-sample prediction error. Model convergence diagnostics can be evaluated during model training by enabling the monitor of the evolution graphs of the previously mentioned properties achieved at each iteration.
| 3 EXAMPLES |
|---|
|
|
|---|
Some illustrative experiments are provided to demonstrate the potential of the vbmp package. The R code for these examples is available on the package manual pages and in the vignette which is an interactive document containing code snippets giving a more task-oriented description of package functionality.
3.1 Synthetic multi-class dataset example
Following to Girolami and Rogers (2006), a sample of 500 uniformly distributed 2D data points x1 and x2 were drawn from three non-linearly separable classes $t1 = 0.1 < x12 + x22 < 0.5$,$t2 = 0.5
x12 + x22 < 1$ and t3 associated with $x12 + x22 < 0.1$ with [x1, x2] being a bivariate Gaussian with mean 0 and covariance the identity matrix with 0.01 values on the main diagonal. This dataset takes the form of two annular rings and one zero-centered Gaussian. An additional eight non-informative variables drawn from standard normal distribution were added to the dataset. A second sample of the same size was drawn from the above distribution for testing purposes.
A Gaussian kernel was adopted with inverse scale parameters inferred by the method using vague hyperpriors (length-scale hyperparameters set to 10 – 6).
Figure 1 shows the results and convergence status plotted by the vbmp method at each iteration. The graphs highlight the predictive evolution of the method which achieves 0.022 error rate performance on the test dataset after 50 iterations. Figure 1 also demonstrates the Automatic Relevance Determination (ARD) process (Neal, 1998), which forces the inverse-scaling of the uninformative covariates to large values, hence small values of relevance.
|
3.2 Breast cancer dataset example
In this example, the data are from the study by Kote-Jarai et al. (2006) where the differential gene expression changes following radiation-induced DNA damage in healthy cells from BRCA1/BRCA2 mutation carriers were compared with controls using high-density microarray technology. The dataset consists of 8080 cDNA clones of fibroblast cultures from 10 control samples, 10 BRCA1 and 10 BRCA2 mutation carriers.
The code available for this example reproduces the leave-one-out cross-validation (LOOCV) prediction performance obtained from the vbmp method on this dataset. Using a common inner-product (linear) covariance kernel and using 25 training iterations, the vbmp multi-class classifier achieved 100% LOOCV performance accuracy. The vbmp method outperformed the SVM classifier adopted by Kote-Jarai et al. (2006) to distinguish the different types of samples studied in this experiment.
| 4 DISCUSSION |
|---|
|
|
|---|
The vbmp package implements a VB approach to classification of multi-class datasets. This non-parametric approach is developed within a probabilistic framework for Bayesian inference, which yields to efficient sparse approximations by optimizing a strict lower bound of the marginal likelihood of a multinomial probit regression model.
Compared with the multinomial logit approach, this method is appealing since it provides a means of developing a Gibbs sampler and subsequent computationally efficient approximations for the GP random variables. Girolami and Rogers (2006) showed how the multi-class probit GP could be made sparse through Informative Vector Machine updates (Lawrence et al., 2005).
Girolami and Zhong (2007) compared the VB approximation implemented in this package to the Expectation Propagation (EP) approximation (Minka, 2001) empirically and showed that both these approaches performed as well as a Gibbs sampler and consistently outperformed the Laplace approximation.
This software also implements a feature-weighting method by exploiting the ARD approach to emphasize the most relevant input parameters while reducing the impact of those that do not contribute significantly.
The package does not implement any procedure for extensive ad hoc tuning of the solution (e.g. as for SVM classifiers) since this is not needed by the method. It is noteworthy that the vbmp method provides confidence measures to the class assignment which could be very useful in clinical practice applications.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
N.L. acknowledge research fellowship support from Associazione Italiana per la Ricerca sul Cancro (AIRC). M.G. is an EPSRC Advanced Research Fellow EP/E052029/1.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Trey Ideker
Received on August 24, 2007; revised on August 24, 2007; accepted on October 16, 2007
| REFERENCES |
|---|
|
|
|---|
Ding C, Dubchak I. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics (2001) 17:349–358.
Girolami M, Rogers S. Variational Bayesian multinomial probit regression with Gaussian process priors. Neural Comput (2006) 18:1790–1817.[CrossRef][Web of Science]
Girolami M, Zhong M. Data integration for classification problems employing Gaussian process priors. In: Advances in Neural Information Processing Systems.—Scholkopf B, et al, eds. (2007) 19. Cambridge: MIT Press.
Kote-Jarai Z, et al. Accurate prediction of BRCA1 and BRCA2 heterozygous genotype using expression profiling after induced DNA damage. Clin. Cancer Res (2006) 12:3896–3901.
Lawrence ND, et al. Extensions of the informative vector machine. In: Deterministic and Statistical Methods in Machine Learning.—Winkler J, et al, eds. (2005) Berlin Heidelberg: Springer-Verlag.
MacKay D. Information Theory, Inference, and Learning Algorithms. (2003) Cambridge: Cambridge University Press.
Minka T. A family of algorithms for approximate Bayesian inference. In: Doctoral Dissertation. (2001) Cambridge, Mass: MIT.
Neal R. Regression and classification using Gaussian process priors. In: Bayesian Statistics.—Dawid AP, et al, eds. (1998) Vol. 6. Oxford: Oxford University Press. 475–501.
R Development Core Team. R: a language and environment for statistical computing. In: R Foundation for Statistical Computing. (2005) Vienna, Austria. ISBN 3-900051-07-0.
Williams CKI, Barber D. Bayesian classification with Gaussian processes. IEEE Trans. on Pattern Anal. Mach. Intell (1998) 20:1342–1352.[CrossRef]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
