Bioinformatics Advance Access originally published online on November 16, 2007
Bioinformatics 2008 24(2):258-264; doi:10.1093/bioinformatics/btm550
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Algebraic stability indicators for ranked lists in molecular profiling
1FBK, via Sommarive 18, I-38100 Povo (Trento), 2DISI, University of Genova, via Dodecaneso 35, I-16146 Genova and 3DIT, University of Trento, via Sommarive 14, I-38100 Povo (Trento), Italy
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: We propose a method for studying the stability of biomarker lists obtained from functional genomics studies. It is common to adopt resampling methods to tune and evaluate marker-based diagnostic and prognostic systems in order to prevent selection bias. Such caution promotes honest estimation of class prediction, but leads to alternative sets of solutions. In microarray studies, the difference in lists may be bewildering, also due to the presence of modules of functionally related genes. Methods for assessing stability understand the dependency of the markers on the data or on the predictor's type and help selecting solutions.
Results: A computational framework for comparing sets of ranked biomarker lists is presented. Notions and algorithms are based on concepts from permutation group theory. We introduce several algebraic indicators and metric methods for symmetric groups, including the Canberra distance, a weighted version of Spearman's footrule. We also consider distances between partial lists and an aggregation of sets of lists into an optimal list based on voting theory (Borda count). The stability indicators are applied in practical situations to several synthetic, cancer microarray and proteomics datasets. The addressed issues are predictive classification, presence of modules, comparison of alternative biomarker lists, outlier removal, control of selection bias by randomization techniques and enrichment analysis.
Availability: Supplementary Material and software are available at the address http://biodcv.fbk.eu/listspy.html
Contact: furlan{at}fbk.eu
Supplementary information: Supplementary data are available at Bioinformatics online.
Associate Editor: Limsoon Wong
Received on May 7, 2007; revised on October 12, 2007; accepted on October 31, 2007
This article has been cited by other articles:
![]() |
A.-L. Boulesteix and M. Slawski Stability and aggregation of ranked gene lists Brief Bioinform, September 1, 2009; 10(5): 556 - 568. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Barla, G. Jurman, S. Riccadonna, S. Merler, M. Chierici, and C. Furlanello Machine learning methods for predictive proteomics Brief Bioinform, March 1, 2008; 9(2): 119 - 128. [Abstract] [Full Text] [PDF] |
||||
