Skip Navigation


Bioinformatics Advance Access originally published online on November 30, 2006
Bioinformatics 2007 23(3):390-391; doi:10.1093/bioinformatics/btl602
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
23/3/390    most recent
btl602v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Google Scholar
Right arrow Articles by Medina, I.
Right arrow Articles by Dopazo, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Medina, I.
Right arrow Articles by Dopazo, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Prophet, a web-based tool for class prediction using microarray data

Ignacio Medina 1, David Montaner 1,2, Joaquín Tárraga 1,2 and Joaquín Dopazo 1,2,*

1 Department of Bioinformatics Centro de Investigación Príncipe Felipe (CIPF), Valencia, E46013, Spain
2 Functional Genomics Node, (INB) Centro de Investigación Príncipe Felipe (CIPF), Valencia, E46013, Spain

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 BACKGROUND
 BUILDING THE PREDICTOR AND...
 REFERENCES
 

Sample classification and class prediction is the aim of many gene expression studies. We present a web-based application, Prophet, which builds prediction rules and allows using them for further sample classification. Prophet automatically chooses the best classifier, along with the optimal selection of genes, using a strategy that renders unbiased cross-validated errors. Prophet is linked to different microarray data analysis modules, and includes a unique feature: the possibility of performing the functional interpretation of the molecular signature found.

Availability: Prophet can be found at the URL http://prophet.bioinfo.cipf.es/ or within the GEPAS package at http://www.gepas.org/

Contact: jdopazo{at}cipf.es

Supplementary information: http://gepas.bioinfo.cipf.es/tutorial/prophet.html


    BACKGROUND
 TOP
 ABSTRACT
 BACKGROUND
 BUILDING THE PREDICTOR AND...
 REFERENCES
 
One of the crucial factors behind the success of DNA microarray technologies has been its application to the definition of predictors of clinical outcomes (van 't Veer et al., 2002). Albeit not free from criticisms (Simon, 2005), the practical implications of this particular goal have definitively fuelled the use of microarrays. Common errors in the early proposals of predictors, such as the selection bias (Ambroise and Mclachlan, 2002; Simon et al., 2003), which causes unrealistic, biased-down error estimations, are behind the above mentioned criticisms. Recently, proper strategies for an unbiased cross-validation have been proposed. The estimation of the classification errors must take into account the gene selection step as well as any other parallel step taken such as the optimization of the number of selected genes, the selection among various classifiers, etc. However, it is still frequent to find publications in which this important fact has not been taken into account (Ambroise and McLachlan, 2002). In the root of this commonly extended conceptual error is, probably, the lack of easy-to-use, accurate and freely available solutions that allow end users to carry out such analysis.

Prophet aims to fulfill the demand of a simple but powerful tool for prediction purposes in the microarray context. Since web-based solutions are gaining acceptance in the microarray community for data analysis purposes (see for example: http://bioinformatics.ubc.ca/resources/links_directory/index.php?subcategory_id=101), Prophet was conceived to be accessible over the web. To our knowledge, the only other web-based, equivalent tool available is the M@CBETH (Pochet et al., 2005). However, this program can only handle two-class problems. Moreover, since PCA is used to reduce the dimension of the data the identity of the genes is lost and only the principal components can be retrieved from the predictor. Finally, only support vector machines (SVMs) and Fisher's discriminant analysis can be used as classification algorithms.


    BUILDING THE PREDICTOR AND PREDICTING
 TOP
 ABSTRACT
 BACKGROUND
 BUILDING THE PREDICTOR AND...
 REFERENCES
 
Prophet has two main options: ‘train’ and ‘predict’. The first one (corresponding to the training step) is used to build the predictor while in the second one the predictor found can be used for predicting class membership for new samples.

Prophet builds a prediction rule based on genes. There are several options for defining the dataset of genes to be used for training the predictors. Prophet accepts user-defined selections of genes or, alternatively, it can find the optimal subset within the whole set of genes. For the second option, also known as the ‘filter approach’ in the machine learning literature, Prophet pre-selects the genes which will potentially provide more accuracy to the predictor. Two ways of ranking genes for subsequent selection have been implemented: the F-ratio (Dudoit et al., 2002) and the Wilcoxon statistic, a non-parametric test for differences between two classes. These can be used in combination with any of the class-prediction algorithms implemented in Prophet, which have been shown to perform very efficiently with microarray data (Dudoit et al., 2002; Romualdi et al., 2003; Wessels et al., 2005). The methods are: SVM (Vapnik, 1999), k-nearest neighbor (KNN), diagonal linear discriminant analysis (DLDA), SOM (Kohonen, 1997) and shrunken centroids (PAM) (Tibshirani et al., 2002).

The ‘train’ option of the Prophet form implements the strategy for finding the best predictor with the optimal number of genes. A leave-one-out (LOO) cross-validation strategy is implemented here to return the cross-validated error rate of the complete process of building several predictors and then choosing the one with the smallest error rate. The procedure used is as follows: a LOO sample is drawn from the training dataset. Genes are ranked by one of the methods above mentioned (F-ratio or Wilcoxon statistic) and using the n top genes (n = 2, 5, 10, 20, 35, 50, 75 and 100, by default) a predictor is built with the methods above mentioned (KNN, DLDA, SVM, PAM, SOM or a sub-selection of them). Then, the LOO error is calculated for each method for each n genes. Finally, the smallest set of n genes in combination with the method that results in the smallest CV error is reported. The results include a plot of the CV error across the range of sets of n genes for all the classification methods tried along with the corresponding confusion matrices (very useful to detect asymmetries in the determination of classes). In addition, the prediction for each LOO sample is provided, which is quite useful for detecting outlayers or anomalous missassignments. More detailed information and examples are available in the tutorial page at, http://gepas.bioinfo.cipf.es/tutorial/prophet.html. Finally, all the supplementary information was included in the tutorial.

Once the optimal predictor (combination of a set of n genes and a classification method) has been found, it can be saved. Then, in the ‘predict’ option of the form, the predictor can be retrieved and applied to new samples and a class membership prediction will be obtained for them.

The input file format is quite simple: a tab-delimited text file with genes in rows and experiments in columns. The first column corresponds to the gene identifiers. Individual experiment identifiers as well as class identifiers can be provided in a separate file or within the main file with the corresponding labels (#NAME and #CLASS, respectively, see tutorial and Supplementary information for details).

Prophet is integrated within the GEPAS (Herrero et al., 2003; Montaner et al., 2006) environment, thus a complete analysis of the microarray data, from the first steps of normalization and preprocessing, can be performed without the necessity of switching among different programs with different input/output formats. Another unique feature is the possibility of having a functional interpretation of the genes included in the predictor. This is achieved through tools such as FatiGO+ (Al-Shahrour et al., 2004) an others, included in the Babelomics package (Al-Shahrour et al., 2005, 2006), to which Prophet is also linked.

In addition to the web interface, Prophet can be invoked as a web service.

To summarize, Prophet provides an accurate, conceptually correct and easy-to-use framework for building predictors based on microarray gene expression data that can be later used to predict class membership for new samples. Moreover, this is the only web-based tool that builds predictors based on genes and allows a further functional interpretation of the results.


    Acknowledgments
 
This work is supported by grants from NRC Canada-SEPOCT Spain, project BIO 2005-01078 from the MEC and INDIGO EU project. The Functional Genomics node (INB) is supported by Genoma España. Funding to pay the Open Access publication charges for this article was provided by Genoma Españna.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Chris Stoeckert

Received on October 23, 2006; revised on November 19, 2006; accepted on November 20, 2006

    REFERENCES
 TOP
 ABSTRACT
 BACKGROUND
 BUILDING THE PREDICTOR AND...
 REFERENCES
 

    Al-Shahrour, F., et al. (2004) FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics, 20, 578–580[Abstract/Free Full Text].

    Al-Shahrour, F., et al. (2005) BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments. Nucleic Acids Res, . 33, W460–W464[Abstract/Free Full Text].

    Al-Shahrour, F., et al. (2006) BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments. Nucleic Acids Res, . 34, W472–W476[Abstract/Free Full Text].

    Ambroise, C. and McLachlan, G.J. (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl Acad. Sci. USA, 99, 6562–6566[Abstract/Free Full Text].

    Dudoit, S., et al. (2002) Comparison of discrimination methods for the classification of tumors suing gene expression data. J. Am. Stat. Assoc, . 97, 77–87[CrossRef][Web of Science].

    Herrero, J., et al. (2003) GEPAS: A web-based resource for microarray gene expression data analysis. Nucleic Acids Res, . 31, 3461–3467[Abstract/Free Full Text].

    In Kohonen, T. (Ed.). Self-organizing Maps, (1997) , Berlin Springer-Verlag.

    Montaner, D., et al. (2006) Next station in microarray data analysis: GEPAS. Nucleic Acids Res, . 34, W486–W491[Abstract/Free Full Text].

    Pochet, N.L., et al. (2005) M@CBETH: a microarray classification benchmarking tool. Bioinformatics, 21, 3185–3186[Abstract/Free Full Text].

    Romualdi, C., et al. (2003) Pattern recognition in gene expression profiling using DNA array: a comparative study of different statistical methods applied to cancer classification. Hum. Mol. Genet, . 12, 823–836[Abstract/Free Full Text].

    Simon, R. (2005) Roadmap for developing and validating therapeutically relevant genomic classifiers. J. Clin. Oncol, . 23, 7332–7341[Abstract/Free Full Text].

    Simon, R., et al. (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J. Natl Cancer Inst, . 95, 14–18[Free Full Text].

    Tibshirani, R., et al. (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl Acad. Sci USA, 99, 6567–6572[Abstract/Free Full Text].

    van 't Veer, L.J., et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536[CrossRef][Medline].

    In Vapnik, V. (Ed.). Statistical Learning Theory, (1999) , New York John Wiley and Sons.

    Wessels, L.F., et al. (2005) A protocol for building and evaluating predictors of disease state based on microarray data. Bioinformatics, 21, 3755–3762[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Cancer Res.Home page
T. Schepeler, J. T. Reinert, M. S. Ostenfeld, L. L. Christensen, A. N. Silahtaroglu, L. Dyrskjot, C. Wiuf, F. J. Sorensen, M. Kruhoffer, S. Laurberg, et al.
Diagnostic and Prognostic MicroRNAs in Stage II Colon Cancer
Cancer Res., August 1, 2008; 68(15): 6416 - 6424.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Tarraga, I. Medina, J. Carbonell, J. Huerta-Cepas, P. Minguez, E. Alloza, F. Al-Shahrour, S. Vegas-Azcarate, S. Goetz, P. Escobar, et al.
GEPAS, a web-based tool for microarray data analysis and interpretation
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W308 - W314.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Saeys, I. Inza, and P. Larranaga
A review of feature selection techniques in bioinformatics
Bioinformatics, October 1, 2007; 23(19): 2507 - 2517.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
23/3/390    most recent
btl602v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Google Scholar
Right arrow Articles by Medina, I.
Right arrow Articles by Dopazo, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Medina, I.
Right arrow Articles by Dopazo, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?