Skip Navigation


Bioinformatics Advance Access originally published online on November 14, 2007
Bioinformatics 2008 24(1):135-136; doi:10.1093/bioinformatics/btm535
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/1/135    most recent
btm535v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lama, N.
Right arrow Articles by Girolami, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lama, N.
Right arrow Articles by Girolami, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

vbmp: Variational Bayesian Multinomial Probit Regression for multi-class classification in R

Nicola Lama 1,* and Mark Girolami 2

1Medical Statistics Unit, Department of Medicine and Public Health, Second University of Napoli, Italy and 2Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, Scotland, UK

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 EXAMPLES
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: vbmp is an R package for Gaussian Process classification of data over multiple classes. It features multinomial probit regression with Gaussian Process priors and estimates class posterior probabilities employing fast variational approximations to the full posterior. This software also incorporates feature weighting by means of Automatic Relevance Determination. Being equipped with only one main function and reasonable default values for optional parameters, vbmp combines flexibility with ease of usage as is demonstrated on a breast cancer microarray study.

Availability: The R library vbmp implementing this method is part of Bioconductor and can be downloaded from http://www.dcs.gla.ac.uk/~girolami

Contact: nicola.lama{at}unina2.it

Supplementary information: Supplementary data are available at http://www.dcs.gla.ac.uk/~girolami


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 EXAMPLES
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Multi-class classification in which the data consist of more than two classes is rapidly gaining attention in the literature. Typical bioinformatics applications include gene expression profiling-based diagnosis of tumours (e.g. Kote-Jarai et al. (2006)) and protein structure and fold recognition (e.g. Ding and Dubchak (2001)). Among the different approaches to classification, methods based on Gaussian Processes (GPs) have become very popular since the influential papers by Neal (1998); Williams and Barber (1998) which motivated the development of posterior approximations that are computationally appealing alternatives to the Markov Chain Monte Carlo approach. As such GP based models have been widely adopted in both regression and classification applications often being considered as alternatives, within the Bayesian inference framework (MacKay, 2003), to kernel machines. Recently, Girolami and Rogers (2006) proposed a variational approximation to exact Bayesian inference for multi-class Gaussian Processes (GPs) classification. The present implementation of the vbmp package provides the multinomial-probit regression model with GP priors adopting the Variational Bayesian (VB) methodology to obtain an estimate of the required predictive distributions over classes. Example results are presented on a synthetic dataset and on microarray data from the breast cancer study by Kote-Jarai et al. (2006).


    2 IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 EXAMPLES
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The vbmp package implements both training and testing of the multi-class classifier in the homonym function for R software (R Development Core Team, 2005). Several types of kernel functions are available: Gaussian, Cauchy, Laplace, polynomial, homogeneous polynomial, thin-plate spline, linear spline and the inner-product kernels. Arbitrary polynomial order can be specified by adding the grade number at polynomial kernel identifier.

The length-scale of kernel function parameters can either be provided as an argument or may be inferred by the method during the training process. In the second case, the method employs importance sampling to obtain posterior mean estimates of the kernel parameters. The location and scale parameters of the Gamma prior over covariance parameters defaults to 10–6, but can be made more informative by the user when deemed appropriate.

The package provides accessor methods to retrieve the main properties from the object returned by the vbmp function. In particular, these methods return estimates of class-specific probability values, lower bound of the model marginal likelihood, predictive likelihood and of the out-of-sample prediction error. Model convergence diagnostics can be evaluated during model training by enabling the monitor of the evolution graphs of the previously mentioned properties achieved at each iteration.


    3 EXAMPLES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 EXAMPLES
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Some illustrative experiments are provided to demonstrate the potential of the vbmp package. The R code for these examples is available on the package manual pages and in the vignette which is an interactive document containing code snippets giving a more task-oriented description of package functionality.

3.1 Synthetic multi-class dataset example
Following to Girolami and Rogers (2006), a sample of 500 uniformly distributed 2D data points x1 and x2 were drawn from three non-linearly separable classes $t1 = 0.1 < x12 + x22 < 0.5$,$t2 = 0.5 ≤ x12 + x22 < 1$ and t3 associated with $x12 + x22 < 0.1$ with [x1, x2] being a bivariate Gaussian with mean 0 and covariance the identity matrix with 0.01 values on the main diagonal. This dataset takes the form of two annular rings and one zero-centered Gaussian. An additional eight non-informative variables drawn from standard normal distribution were added to the dataset. A second sample of the same size was drawn from the above distribution for testing purposes.

A Gaussian kernel was adopted with inverse scale parameters inferred by the method using vague hyperpriors (length-scale hyperparameters set to 10 – 6).

Figure 1 shows the results and convergence status plotted by the vbmp method at each iteration. The graphs highlight the predictive evolution of the method which achieves 0.022 error rate performance on the test dataset after 50 iterations. Figure 1 also demonstrates the Automatic Relevance Determination (ARD) process (Neal, 1998), which forces the inverse-scaling of the uninformative covariates to large values, hence small values of relevance.


Figure 1
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Monitor evolution diagnostics for the estimates of: posterior means for the Gaussian kernel scale parameters (top-left), lower bound on the marginal likelihood (top-right), predictive likelihood (bottom-left) and out-of-sample accuracy (0/1-error loss) on the example dataset 3.1.

 
3.2 Breast cancer dataset example
In this example, the data are from the study by Kote-Jarai et al. (2006) where the differential gene expression changes following radiation-induced DNA damage in healthy cells from BRCA1/BRCA2 mutation carriers were compared with controls using high-density microarray technology. The dataset consists of 8080 cDNA clones of fibroblast cultures from 10 control samples, 10 BRCA1 and 10 BRCA2 mutation carriers.

The code available for this example reproduces the leave-one-out cross-validation (LOOCV) prediction performance obtained from the vbmp method on this dataset. Using a common inner-product (linear) covariance kernel and using 25 training iterations, the vbmp multi-class classifier achieved 100% LOOCV performance accuracy. The vbmp method outperformed the SVM classifier adopted by Kote-Jarai et al. (2006) to distinguish the different types of samples studied in this experiment.


    4 DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 EXAMPLES
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
The vbmp package implements a VB approach to classification of multi-class datasets. This non-parametric approach is developed within a probabilistic framework for Bayesian inference, which yields to efficient sparse approximations by optimizing a strict lower bound of the marginal likelihood of a multinomial probit regression model.

Compared with the multinomial logit approach, this method is appealing since it provides a means of developing a Gibbs sampler and subsequent computationally efficient approximations for the GP random variables. Girolami and Rogers (2006) showed how the multi-class probit GP could be made sparse through Informative Vector Machine updates (Lawrence et al., 2005).

Girolami and Zhong (2007) compared the VB approximation implemented in this package to the Expectation Propagation (EP) approximation (Minka, 2001) empirically and showed that both these approaches performed as well as a Gibbs sampler and consistently outperformed the Laplace approximation.

This software also implements a feature-weighting method by exploiting the ARD approach to emphasize the most relevant input parameters while reducing the impact of those that do not contribute significantly.

The package does not implement any procedure for extensive ad hoc tuning of the solution (e.g. as for SVM classifiers) since this is not needed by the method. It is noteworthy that the vbmp method provides confidence measures to the class assignment which could be very useful in clinical practice applications.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 EXAMPLES
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
N.L. acknowledge research fellowship support from Associazione Italiana per la Ricerca sul Cancro (AIRC). M.G. is an EPSRC Advanced Research Fellow EP/E052029/1.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Trey Ideker

Received on August 24, 2007; revised on August 24, 2007; accepted on October 16, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 EXAMPLES
 4 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Ding C, Dubchak I. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics (2001) 17:349–358.[Abstract/Free Full Text]

    Girolami M, Rogers S. Variational Bayesian multinomial probit regression with Gaussian process priors. Neural Comput (2006) 18:1790–1817.[CrossRef][Web of Science]

    Girolami M, Zhong M. Data integration for classification problems employing Gaussian process priors. In: Advances in Neural Information Processing Systems.—Scholkopf B, et al, eds. (2007) 19. Cambridge: MIT Press.

    Kote-Jarai Z, et al. Accurate prediction of BRCA1 and BRCA2 heterozygous genotype using expression profiling after induced DNA damage. Clin. Cancer Res (2006) 12:3896–3901.[Abstract/Free Full Text]

    Lawrence ND, et al. Extensions of the informative vector machine. In: Deterministic and Statistical Methods in Machine Learning.—Winkler J, et al, eds. (2005) Berlin Heidelberg: Springer-Verlag.

    MacKay D. Information Theory, Inference, and Learning Algorithms. (2003) Cambridge: Cambridge University Press.

    Minka T. A family of algorithms for approximate Bayesian inference. In: Doctoral Dissertation. (2001) Cambridge, Mass: MIT.

    Neal R. Regression and classification using Gaussian process priors. In: Bayesian Statistics.—Dawid AP, et al, eds. (1998) Vol. 6. Oxford: Oxford University Press. 475–501.

    R Development Core Team. R: a language and environment for statistical computing. In: R Foundation for Statistical Computing. (2005) Vienna, Austria. ISBN 3-900051-07-0.

    Williams CKI, Barber D. Bayesian classification with Gaussian processes. IEEE Trans. on Pattern Anal. Mach. Intell (1998) 20:1342–1352.[CrossRef]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/1/135    most recent
btm535v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lama, N.
Right arrow Articles by Girolami, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lama, N.
Right arrow Articles by Girolami, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?