Skip Navigation


Bioinformatics Advance Access originally published online on July 14, 2006
Bioinformatics 2006 22(18):2315-2316; doi:10.1093/bioinformatics/btl385
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/18/2315    most recent
btl385v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lottaz, C.
Right arrow Articles by Spang, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lottaz, C.
Right arrow Articles by Spang, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

OrderedList—a bioconductor package for detecting similarity in ordered gene lists

Claudio Lottaz 1,*, Xinan Yang 1,2, Stefanie Scheid 1 and Rainer Spang 1

1 Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics Ihnestrasse 63-73, D-14195 Berlin, Germany
2 State Key Laboratory of Bioelectronics, Southeast University 210096 Nanjing, People's Repuplic of China

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 EXEMPLARY ANALYSIS
 REFERENCES
 

Summary: OrderedList is a Bioconductor compliant package for meta-analysis based on ordered gene lists like those resulting from differential gene expression analysis. Our package quantifies the similarity between gene lists. The significance of the similarity score is estimated from random scores computed on perturbed data. OrderedList illustrates list similarity in intuitive plots and determines the score-driving genes for further analysis.

Availability: http://www.bioconductor.org

Contact: claudio.lottaz{at}molgen.mpg.de

Supplementary information: Please visit our webpage on http://compdiag.molgen.mpg.de/software


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 EXEMPLARY ANALYSIS
 REFERENCES
 
Motivation: In microarray studies, researchers often compare gene expression profiles from two different conditions to generate lists of induced genes ordered according to a measure of upregulation and downregulation. For comparing results generated in different studies, we can search for similarities between ordered gene lists. The OrderedList package is dedicated to this task. The underlying algorithm is described in detail in Yang et al. (2006).

Range of applications: We can compare independent microarray studies addressing the same research question to confirm findings. More interestingly, we can compare studies from different but related contexts, e.g. survival in different types of cancer. Here gene list comparison can discover common markers. Moreover, two studies, which do not reach statistical significance for differential gene expression on their own, may present significant similarities in the corresponding gene lists. Comparisons are also feasible between different technological platforms, for instance between studies performed on different microarrays. Actually, data can also be deduced from heterogeneous data sources, for example, protein activities measured with immunoprecipitation arrays, allele frequencies determined in SNP studies and brain activity per voxel determined by functional magnetic resonance imaging (fMRI) (Loring et al., 2002). Although the method described in Yang et al. (2006) focuses on the comparison of microarray expression studies holding many profiles, the OrderedList package additionally implements a method purely based on lists. This further enlarges its fields of application.

Similarity score: Yang et al. define a similarity score to quantify list similarity. To compute the score, OrderedList determines the number of shared elements Sn in the first n elements of the lists for each n. The final score is a weighted sum over Sn where the ends of the lists receive larger weights, thus ensuring that the more strongly induced genes dominate the score.

Significance Analysis: To estimate the significance of detected list similarities, OrderedList randomly perturbs the input data to compute null distributions of the similarity score. Here we distinguish two modes of operation: the first needs complete sets of gene expression profiles whereas the second works with simple ordered lists. In the first case OrderedList perturbs the input data by subsampling from the profiles, and reordering the genes (Yang et al., 2006). When only single ordered lists are provided, shuffling is used to generate random lists. In both cases, scoring the perturbed lists generates null distributions for similarity scores, from which empirical P-values are deduced. In the presence of sufficient data, however, the first method is preferable, since it avoids that constantly expressed genes obtain prominent ranks in random lists. This is desirable, because otherwise random scores and empirical P-values are underestimated.

Results: An analysis by OrderedList yields a significant estimation for the similarity of gene lists. In addition, the package detects how far into the lists striking similarities occur. Finally, our algorithm determines the genes that drive the observed similarity score, i.e. genes with prominent ranks in all compared lists. These genes are most promissing for further analysis and interpretation.


    2 IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 EXEMPLARY ANALYSIS
 REFERENCES
 
Availability: The OrderedList software package is written in the language R developed within the R Project for Statistical Computing (R Development Core Team, 2004). It is part of release 1.8 of the Bioconductor suite of packages related to life science applications (Gentleman et al., 2004), free for use under the GNU General Public License and easy to install on various UNIX and Windows systems.

Data formats: OrderedList accepts data in two different formats. For the subsampling mode expression data including several profiles per condition need to be provided in Bioconductor specific format. In addition to the expression levels, the data must contain class labels for each profile. For the shuffling mode, OrderedList expects ordered vectors of character strings, each element identifying one gene. By default, OrderedList considers measurements (in expression data) or ranks (in ordered lists) as being related to the same gene, when they carry the same name. The user can provide mappings, however, to indicate pairs of differing identifiers relating to the same gene. Thus ordered lists generated on different platforms can be compared.

Output: OrderedList determines empirical P-values of similarity scores. It graphically illustrates the list comparison analysis as shown in Figure 1. Here the number of shared genes in the lists up to rank sn is related to the number of shared elements expected in randomly shuffled lists: in addition to the observed Sn, OrderedList draws its expectation and the 95% confidence intervals either according to the empirical distribution obtained from the subsampling if in sampling mode or according to a hypergeometric distribution if in shuffling mode. For further interpretation, OrderedList determines the genes that dominate the similarity score and returns their identifiers.


Figure 1
View larger version (33K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Graphical output of package OrderedList. Displayed are the numbers of shared top-ranking genes in the two lists. Top ranks correspond to upregulated genes and bottom ranks to downregulated genes. In addition, the expected overlap and 95% confidence intervals derived from the empirical distribution obtained from subsampling are shown.

 

    3 EXEMPLARY ANALYSIS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 EXEMPLARY ANALYSIS
 REFERENCES
 
Example data: We illustrate the functionality of our package by comparing the following two gene expression studies: the breast cancer study by Huang et al. (2003) characterizing differentially expressed genes in patients at high risk versus patients at low risk for relapse, and the prostate cancer study by Singh et al. (2002) relating first diagnosis expression profiles from relapsed patients to those of cured patients. Both datasets were measured on Affymetrix GeneChip® HG-U95av2 arrays.

Results: Within each comparison, OrderedList derived rankings using regularized t-scores. We observed a significant similarity of the two gene lists (P = 0.0470). In Figure 1 we show one graphical output provided by the package. Displayed is the observed number of shared genes for all ranks and the corresponding expectation with 95% confidence intervals. In addition to the P-value of the similarity score, the plot supports the significance of the overlap.

Within the first 1000 top and bottom ranks, we found 102 genes contributing 95% to the total similarity score. In Table 1 we show the top-scorers of the prostate cancer comparison with their corresponding ranks in the breast cancer comparison. Some genes, such as AZGP1, were found at high ranks in both comparisons, other genes, such as MAFF, are far down the list of the breast cancer comparison. This finding shows that OrderedList does not aim for the most significantly induced genes but for a significant overlap of two independent expression studies, when the overlap differs substantially from randomness. Among the top-ranking overlaps we found many genes connected to various kinds of cancer, i.e. AZGP1, MAFF, ODC1, FMOD, JUNB, BTG2 and FOS [see OMIMTM (Online Mendelian Inheritance in Man, 2000, http://www.ncbi.nlm.nih.gov/omim/)]. This shows that OrderedList is able to pinpoint genes relevant to both compared studies.


View this table:
[in this window]
[in a new window]

 
Table 1 Subset of overlapping genes upregulated in low-risk breast and non-relapsed prostate cancers.

 

    Acknowledgments
 
This research has been supported by BMBF grants 01GS0445 and 01GR0455 of the German Federal Ministry of Education and Research. In addition X.Y. was supported by a DAAD-Fellowship. Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Joaquin Dopazo

Received on April 28, 2006; revised on June 26, 2006; accepted on July 6, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 IMPLEMENTATION
 3 EXEMPLARY ANALYSIS
 REFERENCES
 

    Gentleman, R., et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol, . 5, R80[CrossRef][Medline].

    Huang, E., et al. (2003) Gene expression predictors of breast cancer outcomes. Lancet, 361, 1590–1596[CrossRef][Web of Science][Medline].

    Loring, D., et al. (2002) Now you see it, now you don't: statistical and methodological considerations in fMRI. Epilepsy Behav, . 3, 539–547[CrossRef][Web of Science][Medline].

    Online Mendelian Inheritance in Man. (2000) McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD).

    R: A Language and Environment for Statistical Computing R Development Core Team. (2004) Manual of the R Foundation for Statistical Computing, Vienna, Austria.

    Singh, D., et al. (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1, 203–209[CrossRef][Web of Science][Medline].

    Yang, X., et al. (2006) Similarities of ordered gene lists. J. Bioinf. Comp. Biol, . in press.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
A.-L. Boulesteix and M. Slawski
Stability and aggregation of ranked gene lists
Brief Bioinform, September 1, 2009; 10(5): 556 - 568.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
G. Galvez-Valdivieso, M. J. Fryer, T. Lawson, K. Slattery, W. Truman, N. Smirnoff, T. Asami, W. J. Davies, A. M. Jones, N. R. Baker, et al.
The High Light Response in Arabidopsis Involves ABA Signaling between Vascular and Bundle Sheath Cells
PLANT CELL, July 1, 2009; 21(7): 2143 - 2162.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Jurman, S. Merler, A. Barla, S. Paoli, A. Galea, and C. Furlanello
Algebraic stability indicators for ranked lists in molecular profiling
Bioinformatics, January 15, 2008; 24(2): 258 - 264.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
R. Hoffmann, C. Lottaz, T. Kuhne, A. Rolink, and F. Melchers
Neutrality, Compensation, and Negative Selection during Evolution of B-Cell Development Transcriptomes
Mol. Biol. Evol., December 1, 2007; 24(12): 2610 - 2618.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/18/2315    most recent
btl385v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lottaz, C.
Right arrow Articles by Spang, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lottaz, C.
Right arrow Articles by Spang, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?