Skip Navigation


Bioinformatics Advance Access originally published online on August 18, 2005
Bioinformatics 2005 21(20):3933-3934; doi:10.1093/bioinformatics/bti637
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/20/3933    most recent
bti637v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Peddada, S.
Right arrow Articles by Harvey, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Peddada, S.
Right arrow Articles by Harvey, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press 2005

ORIOGEN: order restricted inference for ordered gene expression data

S. Peddada 1,*, S. Harris 2, J. Zajd 2 and E. Harvey 2

1Biostatistics Branch, National Institute of Environmental Health Sciences Durham, NC 27713, USA
2Constella Group, LLC Durham, NC 27713, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 ALGORITHM
 3 IMPLEMENTATION
 4 PROGRAM OVERVIEW
 5 LIMITATIONS
 APPENDIX
 REFERENCES
 

Summary: ORIOGEN is a user-friendly Java-based software package for selecting and clustering genes according to their profiles across various treatment groups. In particular, ORIOGEN is useful for analyzing data obtained from time-course or dose–response type experiments.

Availability: The ORIOGEN software can be downloaded freely from http://dir.niehs.nih.gov/dirbb/oriogen/index.cfm

Contact: peddada{at}niehs.nih.gov (for statistical questions) and oriogen{at}constellagroup.com (for software support)

Supplementary information: ORIOGEN has a full set of help files. Also, an example input file is provided with the download.


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 ALGORITHM
 3 IMPLEMENTATION
 4 PROGRAM OVERVIEW
 5 LIMITATIONS
 APPENDIX
 REFERENCES
 
There is considerable interest among researchers to study gene expression patterns across various treatment groups. For example, in time-course/dose–response experiments researchers are often interested in understanding the pattern of gene expression over time/dose. Recently, Peddada et al. (2003) developed a methodology based on order-restricted inference to analyze such gene expression data. We have developed a user-friendly software called ORIOGEN to implement this methodology, with some modifications. ORIOGEN selects statistically significant genes and clusters genes that have similar profiles across treatment groups.


    2 ALGORITHM
 TOP
 Abstract
 1 INTRODUCTION
 2 ALGORITHM
 3 IMPLEMENTATION
 4 PROGRAM OVERVIEW
 5 LIMITATIONS
 APPENDIX
 REFERENCES
 
Once the user provides a list of candidate profiles of interest, such as increasing, decreasing, umbrella or cyclical profiles, ORIOGEN computes a goodness-of-fit statistic for each gene under each profile. For each gene, it then identifies the profile with largest goodness-of-fit statistic and computes the test statistic (1) provided in the Appendix. As in Peddada et al. (2003), the null hypothesis of no change in mean expression of the gene g across treatments is tested using bootstrap methodology. The alternative hypothesis is the union of all profiles provided by the researcher. Note that the alternative hypothesis allows equality between some of the means, but not all the means (Peddada et al., 2003).

A gene that is declared significant by the above procedure is initially assigned to the profile with the largest goodness-of-fit statistic. However, ORIOGEN refines the initial cluster assignment to distinguish between profiles based on strict inequalities and those that allow equality. As a result of this refinement, a gene that was originally classified to have umbrella-shaped profile may get reclassified to have an increasing or decreasing-shaped profile. Similarly, a gene with an initial classification of cyclical-shaped profile may be reclassified to an umbrella, increasing, or decreasing-shaped profile. Conversely, a gene that was initially classified to have an increasing or decreasing-shaped profile may get reclassified to have a cyclical-shaped profile. A detailed description of the above reclassification procedure is available in the ORIOGEN software.


    3 IMPLEMENTATION
 TOP
 Abstract
 1 INTRODUCTION
 2 ALGORITHM
 3 IMPLEMENTATION
 4 PROGRAM OVERVIEW
 5 LIMITATIONS
 APPENDIX
 REFERENCES
 
ORIOGEN is a Java-based program and can be run on a variety of operating system platforms. The user must first install the Java Run-Time Environment, version 1.4.2 or a later version (http://java.sun.com/j2se/1.4.2/download.html).

Once ORIOGEN is downloaded the user is advised to read the ‘Readme.txt’ file that provides instructions on how to get started. Upon double-clicking the ORIOGEN.jar file the first pop-up window provides an expanded version of this document that describes various assumptions and limitations of the methodology. For user convenience, help buttons are provided in all pop-up windows.


    4 PROGRAM OVERVIEW
 TOP
 Abstract
 1 INTRODUCTION
 2 ALGORITHM
 3 IMPLEMENTATION
 4 PROGRAM OVERVIEW
 5 LIMITATIONS
 APPENDIX
 REFERENCES
 
Inputting data. When users first start ORIOGEN, they are presented with a screen showing the various inputs to the algorithm. Users specify the names and locations of the input and output files, along with the number of dose/time points, replicates, bootstraps and the significance level. Some suggestions are provided in the help files. If available, users may supply the name and location of a reference ontology file to be used to annotate the results.

Inputting candidate profiles. Users input profiles by clicking on the appropriate radio buttons. These profiles can be increasing, decreasing, umbrella or cyclic (one peak and one valley). As an example, for four time/dose points, there are eight possible profiles to select (increasing, decreasing, four umbrella and two cyclic). Graphs of each selected profile are provided.

Processing. When users are satisfied with their profile selections, they click the button to start the processing. Using the algorithm described earlier, ORIOGEN reads each gene from the input file and determines if it meets the user-inputted significance criterion. If it does meet the criterion, the gene is classified into one of the user-selected profiles and the results are stored in the user-specified output file.

Viewing results. Once the significant genes from the input file have been identified and classified according to their profiles, ORIOGEN can be used to view the results in a graphical form. Both the raw mean values and the fitted mean values are stored in the output file, and either one can be viewed on the graph. Each graph type can be stored in a JPG format.

The ontology data or any other descriptive data can be provided as a column in the input file, or it can be supplied in a separate reference file. If a reference file is used then it must be of a format similar to that provided by the TIGR ftp site (ftp://ftp.tigr.org/pub/data/tgi/Resourcerer/). In the output file and graphs (e.g. Fig. 1), ORIOGEN will link such information to each selected gene. In addition to the P-values, ORIOGEN also estimates the q-values (Storey, 2002).



View larger version (55K):
[in this window]
[in a new window]
 
Fig. 1 ORIOGEN results screen showing the genes selected for a particular profile. The gene information is shown when the user clicks on a line on the graph.

 

    5 LIMITATIONS
 TOP
 Abstract
 1 INTRODUCTION
 2 ALGORITHM
 3 IMPLEMENTATION
 4 PROGRAM OVERVIEW
 5 LIMITATIONS
 APPENDIX
 REFERENCES
 
ORIOGEN has certain limitations and makes some assumptions regarding the input data. Some are as listed below.
Normalization—ORIOGEN does not normalize the data.
Univariate approach—One gene at a time is analyzed. Potential correlation between genes is not exploited.
Independence—All samples are assumed to be independent, both within and across time/dose points.
Homoscedasticity—For a given gene, the variability in expression is assumed to be same across time/dose points. However, because genes are analyzed one at a time, it does not require all genes to have same variance.
Long time-series data—For an experiment with a large number of time points, it might be preferable to use a parametric model (e.g. Liu et al., 2004), where the parameters of the model may have important information regarding the underlying biology.


    APPENDIX
 TOP
 Abstract
 1 INTRODUCTION
 2 ALGORITHM
 3 IMPLEMENTATION
 4 PROGRAM OVERVIEW
 5 LIMITATIONS
 APPENDIX
 REFERENCES
 
The following is the test statistic used to determine whether a particular gene is significant or not:

(1)
Here, for the gene g and the best fitting profile r: is the goodness-of-fit value described in Peddada et al. (2003), sg is the pooled sample standard deviation for all time points/dose groups, and is the sample mean of the ith time point/dose group.


    Acknowledgments
 
The authors thank Drs Leping Li, David M. Umbach and Clarice Weinberg, Biostatistics Branch, NIEHS, for numerous discussions and feedback during the preparation of this software, and for carefully reading this manuscript and making suggestions that improved its presentation.

Conflict of Interest: none declared.

Received on June 1, 2005; revised on August 15, 2005; accepted on August 16, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 ALGORITHM
 3 IMPLEMENTATION
 4 PROGRAM OVERVIEW
 5 LIMITATIONS
 APPENDIX
 REFERENCES
 

    Liu, D., et al. (2004) A random-periods model for expression of cell-cycle genes. Proc. Natl Acad. Sci. USA, 101, 7240–7245[Abstract/Free Full Text].

    Peddada, S., et al. (2003) Gene selection and clustering for time-course and dose–response microarray experiments using order-restricted inference. Bioinformatics, 19, 834–841[Abstract/Free Full Text].

    Storey, J.D. (2002) A direct approach to false discovery rates. J. R. Statist. Soc. B, 64, 479–498[CrossRef].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/20/3933    most recent
bti637v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Peddada, S.
Right arrow Articles by Harvey, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Peddada, S.
Right arrow Articles by Harvey, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?