Bioinformatics Advance Access originally published online on August 18, 2005
Bioinformatics 2005 21(20):3933-3934; doi:10.1093/bioinformatics/bti637
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Published by Oxford University Press 2005
ORIOGEN: order restricted inference for ordered gene expression data
1Biostatistics Branch, National Institute of Environmental Health Sciences Durham, NC 27713, USA
2Constella Group, LLC Durham, NC 27713, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Summary: ORIOGEN is a user-friendly Java-based software package for selecting and clustering genes according to their profiles across various treatment groups. In particular, ORIOGEN is useful for analyzing data obtained from time-course or doseresponse type experiments.
Availability: The ORIOGEN software can be downloaded freely from http://dir.niehs.nih.gov/dirbb/oriogen/index.cfm
Contact: peddada{at}niehs.nih.gov (for statistical questions) and oriogen{at}constellagroup.com (for software support)
Supplementary information: ORIOGEN has a full set of help files. Also, an example input file is provided with the download.
| 1 INTRODUCTION |
|---|
|
|
|---|
There is considerable interest among researchers to study gene expression patterns across various treatment groups. For example, in time-course/doseresponse experiments researchers are often interested in understanding the pattern of gene expression over time/dose. Recently, Peddada et al. (2003) developed a methodology based on order-restricted inference to analyze such gene expression data. We have developed a user-friendly software called ORIOGEN to implement this methodology, with some modifications. ORIOGEN selects statistically significant genes and clusters genes that have similar profiles across treatment groups.
| 2 ALGORITHM |
|---|
|
|
|---|
Once the user provides a list of candidate profiles of interest, such as increasing, decreasing, umbrella or cyclical profiles, ORIOGEN computes a goodness-of-fit statistic for each gene under each profile. For each gene, it then identifies the profile with largest goodness-of-fit statistic and computes the test statistic (1) provided in the Appendix. As in Peddada et al. (2003), the null hypothesis of no change in mean expression of the gene g across treatments is tested using bootstrap methodology. The alternative hypothesis is the union of all profiles provided by the researcher. Note that the alternative hypothesis allows equality between some of the means, but not all the means (Peddada et al., 2003).
A gene that is declared significant by the above procedure is initially assigned to the profile with the largest goodness-of-fit statistic. However, ORIOGEN refines the initial cluster assignment to distinguish between profiles based on strict inequalities and those that allow equality. As a result of this refinement, a gene that was originally classified to have umbrella-shaped profile may get reclassified to have an increasing or decreasing-shaped profile. Similarly, a gene with an initial classification of cyclical-shaped profile may be reclassified to an umbrella, increasing, or decreasing-shaped profile. Conversely, a gene that was initially classified to have an increasing or decreasing-shaped profile may get reclassified to have a cyclical-shaped profile. A detailed description of the above reclassification procedure is available in the ORIOGEN software.
| 3 IMPLEMENTATION |
|---|
|
|
|---|
ORIOGEN is a Java-based program and can be run on a variety of operating system platforms. The user must first install the Java Run-Time Environment, version 1.4.2 or a later version (http://java.sun.com/j2se/1.4.2/download.html).
Once ORIOGEN is downloaded the user is advised to read the Readme.txt file that provides instructions on how to get started. Upon double-clicking the ORIOGEN.jar file the first pop-up window provides an expanded version of this document that describes various assumptions and limitations of the methodology. For user convenience, help buttons are provided in all pop-up windows.
| 4 PROGRAM OVERVIEW |
|---|
|
|
|---|
Inputting data. When users first start ORIOGEN, they are presented with a screen showing the various inputs to the algorithm. Users specify the names and locations of the input and output files, along with the number of dose/time points, replicates, bootstraps and the significance level. Some suggestions are provided in the help files. If available, users may supply the name and location of a reference ontology file to be used to annotate the results.
Inputting candidate profiles. Users input profiles by clicking on the appropriate radio buttons. These profiles can be increasing, decreasing, umbrella or cyclic (one peak and one valley). As an example, for four time/dose points, there are eight possible profiles to select (increasing, decreasing, four umbrella and two cyclic). Graphs of each selected profile are provided.
Processing. When users are satisfied with their profile selections, they click the button to start the processing. Using the algorithm described earlier, ORIOGEN reads each gene from the input file and determines if it meets the user-inputted significance criterion. If it does meet the criterion, the gene is classified into one of the user-selected profiles and the results are stored in the user-specified output file.
Viewing results. Once the significant genes from the input file have been identified and classified according to their profiles, ORIOGEN can be used to view the results in a graphical form. Both the raw mean values and the fitted mean values are stored in the output file, and either one can be viewed on the graph. Each graph type can be stored in a JPG format.
The ontology data or any other descriptive data can be provided as a column in the input file, or it can be supplied in a separate reference file. If a reference file is used then it must be of a format similar to that provided by the TIGR ftp site (ftp://ftp.tigr.org/pub/data/tgi/Resourcerer/). In the output file and graphs (e.g. Fig. 1), ORIOGEN will link such information to each selected gene. In addition to the P-values, ORIOGEN also estimates the q-values (Storey, 2002).
|
| 5 LIMITATIONS |
|---|
|
|
|---|
ORIOGEN has certain limitations and makes some assumptions regarding the input data. Some are as listed below.
- NormalizationORIOGEN does not normalize the data.
- Univariate approachOne gene at a time is analyzed. Potential correlation between genes is not exploited.
- IndependenceAll samples are assumed to be independent, both within and across time/dose points.
- HomoscedasticityFor a given gene, the variability in expression is assumed to be same across time/dose points. However, because genes are analyzed one at a time, it does not require all genes to have same variance.
- Long time-series dataFor an experiment with a large number of time points, it might be preferable to use a parametric model (e.g. Liu et al., 2004), where the parameters of the model may have important information regarding the underlying biology.
- Univariate approachOne gene at a time is analyzed. Potential correlation between genes is not exploited.
| APPENDIX |
|---|
|
|
|---|
The following is the test statistic used to determine whether a particular gene is significant or not:
![]() | (1) |
is the goodness-of-fit value described in Peddada et al. (2003), sg is the pooled sample standard deviation for all time points/dose groups, and
is the sample mean of the ith time point/dose group.
| Acknowledgments |
|---|
The authors thank Drs Leping Li, David M. Umbach and Clarice Weinberg, Biostatistics Branch, NIEHS, for numerous discussions and feedback during the preparation of this software, and for carefully reading this manuscript and making suggestions that improved its presentation.
Conflict of Interest: none declared.
Received on June 1, 2005; revised on August 15, 2005; accepted on August 16, 2005
| REFERENCES |
|---|
|
|
|---|
Liu, D., et al. (2004) A random-periods model for expression of cell-cycle genes. Proc. Natl Acad. Sci. USA, 101, 72407245
Peddada, S., et al. (2003) Gene selection and clustering for time-course and doseresponse microarray experiments using order-restricted inference. Bioinformatics, 19, 834841
Storey, J.D. (2002) A direct approach to false discovery rates. J. R. Statist. Soc. B, 64, 479498[CrossRef].
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

