Bioinformatics Advance Access originally published online on May 31, 2007
Bioinformatics 2007 23(16):2063-2072; doi:10.1093/bioinformatics/btm289
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Statistical assessment of functional categories of genes deregulated in pathological conditions by using microarray data
1Istituto di Studi sui Sistemi Intelligenti per l'A;utomazione, CNR, Via Amendola 122/D-I, 70126 Bari, 2Unità Operativa di Gastroenterologia, IRCCS, Casa Sollievo della Sofferenza-Ospedale, Viale Cappuccini, 71013 San Giovanni Rotondo (FG), 3Istituto di Tecnologie Biomediche-Sezione di Bari, CNR, Via Amendola 122/D, 70126 Bari, 4Servizio di Genetica Medica, IRCCS, Casa Sollievo della Sofferenza-Ospedale, Viale Cappuccini, 71013 San Giovanni Rotondo (FG) and 5Dipartimento di Biochimica e Biologia Molecolare - Università di Bari, Via E. Orabona 4, 70126 Bari, Italy
*To whom correspondence should be addressed.
| Abstract |
|---|
Motivation: A major challenge in current biomedical research is the identification of cellular processes deregulated in a given pathology through the analysis of gene expression profiles. To this end, predefined lists of genes, coding specific functions, are compared with a list of genes ordered according to their values of differential expression measured by suitable univariate statistics.
Results: We propose a statistically well-founded method for measuring the relevance of predefined lists of genes and for assessing their statistical significance starting from their raw expression levels as recorded on the microarray. We use prediction accuracy as a measure of relevance of the list. The rationale is that a functional category, coded through a list of genes, is perturbed in a given pathology if it is possible to correctly predict the occurrence of the disease in new subjects on the basis of the expression levels of the genes belonging to the list only. The accuracy is estimated with multiple random validation strategy and its statistical significance is assessed against a couple of null hypothesis, by using two independent permutation tests. The utility of the proposed methodology is illustrated by analyzing the relevance of Gene Ontology terms belonging to biological process category in colon and prostate cancer, by using three different microarray data sets and by comparing it with current approaches.
Availability: Source code for the algorithms is available from author upon request.
Contact: ancona{at}ba.issia.cnr.it
Supplementary information: Colon cancer data set and a complete description of experimental results are available at: ftp://bioftp:76bioftpxxx@marx.ba.issia.cnr.it/supp-info.htm
Received on April 5, 2007; revised on May 14, 2007; accepted on May 21, 2007
This article has been cited by other articles:
![]() |
J. Russo, G. A. Balogh, I. H. Russo, and and the Fox Chase Cancer Center Hospital Network P Full-term Pregnancy Induces a Specific Genomic Signature in the Human Breast Cancer Epidemiol. Biomarkers Prev., January 1, 2008; 17(1): 51 - 66. [Abstract] [Full Text] [PDF] |
||||
