Skip Navigation


Bioinformatics Advance Access originally published online on November 18, 2004
Bioinformatics 2005 21(7):1129-1137; doi:10.1093/bioinformatics/bti149
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/7/1129    most recent
bti149v2
bti149v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (30)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ben-Shaul, Y.
Right arrow Articles by Soreq, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ben-Shaul, Y.
Right arrow Articles by Soreq, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expression

Yoram Ben-Shaul 1,2,3, Hagai Bergman 2,3 and Hermona Soreq 1,2,*

1Department of Biological Chemistry, The Life Sciences Institute Jerusalem, 91904, Israel
2Center for Computational Neuroscience and the Eric Roland Center of Neurodegenerative Diseases Jerusalem, 91904, Israel
3The Department of Physiology of Hadassah Medical School, The Hebrew University of Jerusalem Jerusalem, 91904, Israel

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 

Motivation: Analysis of large-scale expression data is greatly facilitated by the availability of gene ontologies (GOs). Many current methods test whether sets of transcripts annotated with specific ontology terms contain an excess of ‘changed’ transcripts. This approach suffers from two main limitations. First, since gene expression is continuous rather than discrete, designating a gene as changed or unchanged is arbitrary and oblivious to the actual magnitude of the change. Second, by considering only the number of changed genes, finer changes in expression patterns associated with the category may be ignored. Since genes generally participate in multiple networks, widespread and subtle modifications in expression patterns are at least as important as extreme increases/decreases of a few genes.

Results: Numerical simulations confirm that incorporating continuous measures of gene expression for all measured transcripts yields detection of considerably more subtle changes. Applying continuous measures to microarray data from brains of mice injected with the Parkinsonian neurotoxin, MPTP, enables detection of changes in various biologically relevant GO terms, many of which are overlooked by discrete approaches.

Availability: Software (MATLAB) is available upon request from the authors.

Contact: soreq{at}cc.huji.ac.il

Supplementary information: www.icnc.huji.ac.il/?GOdisv_supp_info


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
One of the most promising approaches in genome science is large-scale analysis of gene expression patterns. However, the huge amounts of raw data produced in a typical experiment can easily overwhelm the investigator. Analysis is greatly facilitated by a standard and concise system of gene annotation. One commonly used system is provided by the Gene Ontology Consortium (GOC) (Ashburner et al., 2000) comprising ontologies for Molecular Function (MF), Biological Process (BP) and Cellular Components (CC). Each of these classes gives rise to branches and sub-branches representing ever finer definitions. Genes are then annotated with one or more ontology terms. Such annotations can be downloaded from various databases (Liu et al., 2003; Bono et al., 2002; Ashburner et al., 2000).

Availability of large-scale expression data annotated with GO terms enables investigation of whether specific ontology terms are overrepresented in the set of transcripts with significant expression changes across treatments. Most commonly, this is addressed by determining whether the observed number of ‘changed’ transcripts associated with specific GO terms exceeds the expected number, given the entire set of transcripts. This can be addressed with either a {chi}2-test, or more appropriately, an exact test employing the hypergeometric distribution (Hosack et al., 2003). Various software tools for this analysis, designated henceforth as the ‘discrete’ approach, are available online (Beissbarth and Speed, 2004; Al-Shahrour et al., 2004; Volinia et al., 2004; Smid and Dorssers, 2004; Zeeberg et al., 2003). Although useful for investigating the concerted changes that follow various experimental manipulations, the discrete approach is based on two crucial assumptions that are not always justified. The first concerns the notion of a ‘changed’ gene. In reality, genes are not simply increased, decreased or unchanged when compared across conditions. Reducing gene expression to a discrete variable ignores important information about its actual expression level. The second assumption relates to the nature of changes governing subsets of genes with related functions. Highlighting the overrepresentation of changed transcripts within a population implies that biologically relevant changes involve an excessive number of transcripts undergoing profound expression changes. However, gene ensembles may be modified concertedly, yet without any member undergoing extreme modulations.

It is becoming increasingly clear that single gene products are associated with multiple processes, functions or cellular components (Bomsztyk et al., 2004; Soreq and Seidman, 2001; Rosenstein and Krum, 2004), as reflected by the frequency in which multiple GO terms are assigned to individual genes. Thus, it may be more appropriate to think of genes not as solo actors dedicated to a single mission, but rather as members in multiple teams. This implies that adaptation of biological networks to various stimuli should involve subtle adjustments in a large set of elements, rather than extreme changes in a few selected members. Testing merely the number of changed genes within a category addresses the tails of the distribution but ignores additional potentially important distributional characteristics of expression patterns. These may include changes in location, dispersion or other characteristics, which may be either not reflected in the number of changed genes, or reflected in non-trivial ways.

The output of the commonly used methods includes a set of terms containing more changed transcripts than expected based on the entire set of transcripts on the array. However, it is often interesting to know whether a certain category follows a different distribution not from the entire set of transcripts, but rather from a closely related subset. When employing hierarchical annotation schemes, as provided by the GOC, it is worthwhile to ask whether genes within a category display a different distribution than that of its parent term(s). Although this issue can in principle be resolved using the discrete approach, to the best of our knowledge this has not yet been done.

This work presents a method aimed at solving these shortcomings. Although recent publications presented methods incorporating the entire set of measured genes (Volinia et al., 2004; Smid and Dorssers, 2004, Mootha et al., 2003), neither of these explicitly employed the actual distributions of the measured transcripts for comparison. We begin by presenting our approach, denoted as the continuous approach, implemented by the MATLAB® program GOdist. Using numerical simulations we demonstrate several essential differences between this and the discrete approach. Finally, we apply both approaches to biological data: expression patterns in mouse prefrontal cortex (PFC) following injection of the Parkinsonian neurotoxin, MPTP (1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine), using Affymetrix® arrays and ontologies provided by the GOC.


    2 SYSTEMS AND METHODS
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
2.1 Description of the procedure
GOdist requires three classes of inputs (Fig. 1):

  1. Transcript codes and associated values. Typically these are expression ratios across treatments, but can be any values associated with individual transcripts. It is assumed that basic preprocessing of the data (normalization, filtration, averaging) has been performed.
  2. Ontology files of the GOC (Ashburner et al., 2000).
  3. Lists of ontology terms and UniGene codes associated with each transcript.
GOdist creates lists of transcripts annotated with each term of the BP, MF and CC ontologies (a transcript is annotated with a term, if annotated with it directly or with any of its child terms). Then, the distribution of values (e.g. fold expression changes) of transcripts associated with each term is compared to a reference distribution. Initially, the reference distribution comprises all transcripts excluding those associated with the currently tested term. The comparison is performed using a two-sample Kolmogorov–Smirnov (KS) test for sample distributions (Sokal and Rohlf, 1994), a non-parametric test sensitive to any difference between the distributions. A test is conducted for each tail of the distribution. Although the two one-tailed KS tests do not actually detect only increases or decreases, we employ these terms below for simplicity. For completeness, GOdist also implements the discrete approach using the hypergeomteric distribution to calculate the probability to obtain the observed or a larger number of changed (increased/decreased) transcripts given the total number of transcripts, the total number of changed transcripts, the number of transcripts within the specific category and the number of changed transcripts within the category, as determined by a user-specified threshold. Outputs are provided as tab-delimited text files containing a list of GO terms, P-values for the KS tests and for the discrete comparisons (i.e. increases or decreases) and the number of transcripts associated with the specific GO term. Owing to limitations of the KS test, only terms containing at least two elements are analyzed.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 1 Overview of GOdist and examples using simulated data. (A) Inputs, basic data conversions and outputs of GOdist. (B) Histograms of expression ratios of the entire set of transcripts (continuous line) and three subgroups of transcripts (broken lines) associated with specific GO terms. Insets show changed transcripts in the global and two of the subgroups categories. The discrete approach considers the number of increased or decreased transcripts within a specific category, and compares it to the expected number based on the number of changed transcripts on the entire array. The continuous approach compares the distribution of expression ratios of all transcripts within a cate-gory with those of the global distribution, or the parent categories (data not shown for simplicity). (C) Cumulative distribution functions of the categories shown in (B).

 
Microarrays often contain multiple probes for individual UniGene clusters and considering these as independent is unjustified. Hence, GOdist allows several options for selecting transcripts from the same UniGene cluster. Since here we employed duplicates of both baseline and experimental treatments, when multiple probes were available, we considered the one with the smallest coefficient of variation (CV) averaged across the two baseline and the two experimental samples. Additionally, any transcript for which CV > 1 was excluded from analysis. As expression data, we considered log2(mean(E1,E2)/mean(B1,B2)), where E1 and E2 are duplicates of the experimental samples (MPTP) and B1 and B2 are duplicates of the baseline samples (naive).

GOdist can also conduct additional analyses for specific terms including:

  1. Non-parametric ANOVA (KW, Kruskal–Wallis test) and a test on the variances of the target and global populations.
  2. Comparison of the selected term against transcripts associated with its parent term(s) but not with itself.
  3. Plots of the estimated (Kaplan–Meier) cumulative density functions (CDFs) of expression ratios of the transcripts associated with each term under comparison.
All programs used for analyses and simulations were written in MATLAB.

2.2 Experimental procedures
Adult male FVB/N mice (Harlan, Jerusalem, Israel) were injected with 60 mg/kg MPTP, with four injections of 15 mg/ml at 2 h intervals. Mice were anesthetized and decapitated 72 h following injections. The PFC was dissected and stored in RNA-later solution (www.ambion.com). RNA was extracted using the Qiagen RNeasy mini kit (Qiagen, Hilden, Germany). As controls, we used PFCs from eight untreated female mice, grouped into two pools of four. For the MPTP group, PFCs from six males were grouped into two pools of three. Pools were hybridized at the Weizmann Institute Service Unit to Affymetrix M430A-2.0 arrays (Affymetrix, Santa Clara, CA) according to the manufacturer's instructions. Arrays were scanned by the GeneArray scanner 3000 (Affymetrix), visually inspected for hybridization imperfections and analyzed using MAS-5 software (Affymetrix) by scaling to an average intensity of 150. Experiments were conducted in accordance with the Animal Care and Use Committee of the Hebrew University of Jerusalem.


    3 RESULTS
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
3.1 Demonstration of the continuous approach using numerical simulations
We employed numerical calculations to demonstrate the validity of the continuous approach. To render this analysis independent of any specific ontology structure, the problem was simplified by considering only two transcript sets: those annotated with the term of interest (target population), and all others (global population). The global population comprises 10 000 transcripts with expression values obeying a standard normal distribution (µ = 0, {sigma} = 1). The target population also follows a normal distribution, but here µ, {sigma} and N (population size) are manipulated. Populations are repeatedly sampled (1000 times) given the specified parameters, and the resulting distributions are compared using both the discrete and the continuous approaches. Finally, we compared the percentage of cases in which a change was detected by each of the methods (P ≤ 0.05). Parameters for all simulations are provided in Table 1.


View this table:
[in this window]
[in a new window]
 
Table 1 Parameters used in the simulations (Fig. 2)

 
We first tested the effect of changing µ and {sigma} of the target population with N = 100 transcripts. Figure 2A shows that the continuous method fails when µ approaches 0 and {sigma} approaches 1. This is expected, since these are the parameters of the global population. However, even small deviations from either µ or {sigma} are invariably detected. This example illustrates that the continuous approach is sensitive not only to changes in location (e.g. mean) but other characteristics of the distribution as well (e.g. variance). Figure 2A also shows that the discrete method fails throughout most of the parameter space. Changes are detected reliably only when µ and {sigma} are both high. For the entire parameter space examined here, the continuous method is at least as successful as the discrete method in detecting changes.



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 2 Results of numerical calculations. (A) Effect of changing µ and {sigma} of the target category on the continuous and discrete methods. Colorbar denotes percentage of correct detections. (B) Effect of fraction of modified transcripts on detection. (C) Effect of detection threshold. (D) Effect of changing the target category size. See Table 1 for a summary of parameters used in each panel.

 
3.2 Detection of partially modified categories
It may occur that only a fraction of transcripts annotated with a given term will undergo expression changes. For example, transcripts associated with a broadly defined term may comprise several functionally distinct subsets. Here, we explore the performance of both approaches under this scenario. Specifically, we consider a category of size 1000, and vary the fraction of transcripts subject to changes from 0.1 to 1 (abscissa in Fig. 2B). A fraction of 1 corresponds to the analysis in Section 3.1. In all simulations, {sigma} = 1, but µ varies from 0.1 to 1 as represented by different colored traces. Figure 2B shows that for large µ (1) and a low fraction of changed transcripts (0.1), the discrete method slightly outperforms the continuous. This is expected, since this situation represents a small number of transcripts undergoing relatively large changes. However, with increasing µ or fraction of modulated genes, the continuous method becomes superior. The differences are particularly clear when the changed fraction is increased, demonstrating the sensitivity of the continuous method to widespread changes. Moreover, for µ < 0.8, the continuous method outperforms the discrete also when only a small fraction of the transcripts is modified.

3.3 Effect of the threshold for calling a change
The threshold for calling changed genes is a crucial parameter for the discrete approach. In the previous examples, a value of 1.65 corresponding to the 5% tails of a standard normal distribution was employed. Figure 2C shows that the number of detected categories decreases as a function of the threshold for the discrete method but plays no role for the continuous approach. The effect on the threshold is because of two reasons. The first and obvious cause is that alteration of the threshold directly influences the number of transcripts defined as changed within both the global and the target populations. The second, subtler reason is that typically, the target and global distributions differ in their shapes around the region of the threshold (e.g. Fig. 1B). Consequently, changing the threshold affects the fraction of changed genes in the global and target populations non-proportionally. Since the expected number of changed transcripts in a given subset (and hence the results of the statistical test) is determined by the frequency of changes in the global population, changing the threshold alters the sensitivity of the discrete method to detect overrepresentation of changed transcripts.

3.4 Effect of target population size
To study the role of the target population size, we maintained fixed values for µ and {sigma} (at 0.5 and 1, respectively, values at which the discrete approach also performs satisfactorily), and varied the target distribution size (N) in increments from 2 to 500. Figure 2D shows that while both methods detect changes for large target populations, the continuous approach remains sensitive, whereas the discrete approach deteriorates with decreasing population sizes. This implies that the discrete method is especially likely to overlook changes in the distribution when the target population is small.

3.5 Comparison of the methods on biological data
We analyzed expression ratios between PFC of MPTP injected male and naive female mice. The PFC is a target of the midbrain dopaminergic system, which is damaged in Parkinson's disease or following MPTP treatment (Cohen et al., 2002). Transcripts included in the BP and MF GOC ontologies were analyzed using the continuous and discrete approaches. For the discrete method, a transcript was defined as increased/decreased if the log2 expression ratio was >1 or <–1. For both methods, a P-value of 0.05 was used to call a significant change in the distribution.

The results of this analysis are shown in Figure 3A as a function of the log2 detection threshold (affecting only the discrete approach). More changed categories are detected by the continuous approach for any threshold employed. Although for the discrete approach, lower thresholds generally yield more detections, this trend is not strictly monotonous. This is owing to the complex structure of GO terms, the transcripts and their expression values. Although the trends in Figure 3A are specific for the dataset analyzed here, the general conclusions are likely to be relevant for many other comparisons.



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 3 Comparison of the methods on biological data. (A) Number of detected categories as a function of detection threshold for increases and decreases in BP and MP ontologies. (B) Simplified scheme of MPTP modes of action.

 
3.6 Biological relevance of the detected GO terms
Examination of the outputs of both methods reveals several classes of detected terms (Supplementary information): one includes general terms that hardly provide any insight (e.g. cellular physiological process, comprising 2759 transcripts). A second class most probably reflects the fact that MPTP injected mice were males whereas untreated controls were females (e.g. genital morphogenesis). The most relevant here, are terms related to the MPTP exposure. Figure 3B shows a simplified scheme with some key processes believed to follow MPTP exposure (Speciale, 2002; Dauer and Przedborski, 2003). Table 2 lists relevant terms detected by either or both methods. Results for the discrete approach are shown for a threshold of both 1 (absolute value of log2 ratio) and 0.5. A log2 threshold of 1 is commonly employed but a value of 0.5 yielded the most detections for our data (Fig. 3A). The comparison shows that various relevant categories were detected by both methods. However, the continuous approach yields more expected categories also with a threshold of 0.5. Moreover, although the lower threshold allows the discrete method to detect predictable changes unnoticed with a threshold of 1 (e.g. ubiquitin cycle), in some cases, predictable terms detected with the higher threshold are not detected with a lower threshold (e.g. oxidoreductase activity). Thus, reducing the threshold for the discrete method may not necessarily lead to enhanced identification of biologically relevant terms (recall that a category is detected as changed by the discrete method if it contains more changed transcripts than expected, and that this expectation is based on the same threshold applied to all transcripts).


View this table:
[in this window]
[in a new window]
 
Table 2 Examples of ontology terms detected with the continuous approach and two thresholds for the continuous approach

 
3.7 Taking a closer look at changed terms
Figure 4 provides examples of more detailed analyses of selected terms. Figure 4A–C show this analysis for protein amino acid phosphorylation (PAAP). Figure 4A, showing the comparison against the global distribution, reveals that the CDF is shifted to the right compared to that of the global distribution. This is reflected by the high KS statistic (P < 0.001) as well as by the KW test (P < 0.001) showing that indeed these two distributions differ in their location. In addition, the variance test using the F-statistic shows that the variance of PAAP is smaller than that of the global distribution (P < 0.001), suggesting that this category is tightly regulated. Figure 4B and C show the comparison of PAAP relative to its two parent terms. Figure 4B and C shows that the parent term protein modification is statistically indistinguishable from PAAP according to the KS, KW and variance tests, consistent with the highly overlapping CDFs. In contrast, compared to the parent term phosphorylation, there is a clear rightward shift.



View larger version (42K):
[in this window]
[in a new window]
 
Fig. 4 Examples of categories detected by the continuous approach in data from MPTP-injected mouse brains. Cumulative distribution functions (CDFs) of the reference (continuous lines) and target (broken lines) categories. Increased and decreased transcripts within the target category are depicted by vertical lines, plotted at their corresponding log ratios. Text within the panels shows the results of the one-tailed Kolmogorov–Smirnov tests (high-KS, low-KS), the Kruskal–Wallis non-parametric ANOVA (KW), and the variance test (—: not significant; *: P < 0.05; **: P < 0.01; ***: P < 0.001). Legends show the name of the categories and the number of transcripts included in parentheses. (A–C) Comparison of Protein amino acid phosphorylation with respect to the global distribution (A), and its two parent categories (B and C). (D–F) Signal transducer activity and receptor activity compared to the global distribution (D and E), and to each other (F). In all panels, the ordinate spans the interval [0,1] and the abscissa spans the interval [–3.5, 3.5]. Insets show close-ups of the CDFs within the corresponding panels for log ratios [–1, 1]

 
The MF term signal transducer activity (STA) is analyzed in Figure 4D–F. Figure 4D shows that the distribution of STA transcripts is more widespread than for the global population. This is reflected by both tailed KS tests (P < 0.001) and the variance test (P < 0.001). However, the KW test does not reveal a change, indicating that the populations significantly differ in dispersion rather than location. Figure 4E shows the comparison of the closely related term receptor activity against the global distribution, yielding essentially identical results, and thus suggesting that they are essentially indistinguishable. Indeed, Figure 4F, comparing receptor activity with its immediate parent term STA shows that the distributions of transcripts in these populations are indistinguishable. This example highlights the potential dependence among GO terms, and the importance of considering the hierarchical structure of ontology terms before reaching conclusions about the biological significance of the results.


    4 DISCUSSION
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 
We have shown that the continuous approach reliably detects subtle changes in simulated data, and changes in relevant categories in biological data. Many of these changes evade the more commonly used discrete approach. The essence of the differences is not merely technical, but rather addresses the nature of biologically relevant changes. While clearly sometimes selected genes undergo extreme changes, subtler modulations in individual transcripts, and therefore in the populations they belong to, may be the rule rather than the exception. Our results also show that certain changes are detected by the discrete approach, yet not by the continuous approach. We therefore believe that neither of these approaches is categorically ‘correct’, but rather that they are complementary, and provide information about different (and occasionally overlapping) aspects of changes in gene expression patterns. Given that changes in gene expression patterns can assume various forms, it is reasonable that complementary statistical measures are required for a comprehensive analysis.

Two recent publications also advocated the use of continuous measures of gene expression, rather than merely the ‘changed’ status of genes (Volinia et al., 2004; Smid and Dorssers, 2004). The main difference between these approaches and ours is that we explicitly compare distributions of expression patterns, rather than statistics derived from the entire set of genes, or significance scores associated with each of the transcripts. Moreover, we present a systematic comparison between the two approaches on simulated and biological data. Finally, the continuous approach provides a natural solution for comparing hierarchically related (i.e. child–parent) terms with each other. Another approach (GSEA) compares the rank ordering of expression level changes of all transcripts on an array using a Kolmogorov–Smirnov statistic (Mootha et al., 2003). The main limitation of that approach is the strong dependence of detection sensitivity on sample size (Damian and Gorfine, 2004). Although the authors rightly claim that by definition, statistical significance is a function of sample size, the GSEA approach cannot detect changes of even infinite magnitudes in small categories. As we have shown in Section 3.4 (Figure 2D), the continuous method can detect changes in categories containing as few as ~4 transcripts. The rationale for developing the continuous approach was that gene expression changes are continuous rather than discrete, and that categories may be overall modified without any member undergoing extreme changes. The results obtained from biological data suggest that these arguments hold for the specific dataset analyzed. Specific detailed examples demonstrating this are given as supplementary information.

Current microarray technology necessitates replicate experiments to overcome technical and biological variability. In the present context, two approaches may be adopted. Thus, it is possible to use replicate experiments to yield more reliable estimates of changes in individual transcripts, and feed this information to ontology level analyses. Alternatively, each experiment can be subject to the ontology analysis, only then to combine (i.e. average, intersect) results across experiments. We have followed the first approach by averaging expression data across replicates of each treatment group. We also discarded from analysis transcripts with a CV > 1 in each treatment group. Finally, for multiple transcripts from the same UniGene cluster, the single transcript with the lowest average CV across both treatment groups was selected (this is one of the options of GOdist to deal with multiple transcripts from the same UniGene cluster).

While we have applied the continuous approach to GOC ontologies and for comparative expression data from Affymetrix microarrays, the method is not restricted to these specific inputs. Any gene classification scheme including structural elements, sequence motives, tissue localization, etc. may be used. Also, one is not limited to comparing expression signals across two treatments. Thus, when multiple arrays are available, it is possible to use as an input for each gene, the variance over replicates, potentially revealing categories that are tightly or loosely regulated. Concluding, we believe that incorporating the continuous approach in analyses of large-scale gene expression data will aid in uncovering biologically relevant processes, including those characterized by subtle adaptations.


    Acknowledgments
 
We thank Dr. Eran Meshorer for an introduction to microarray experiments and for performing the initial experiments. This research was supported by grants from ISF(618/02-1) and the European Union (H.S.) Y.B.S. has been an incumbent of a post-doctoral fellowship from the Interdisciplinary Center for Computational Neuroscience.

Received on November 10, 2004; revised on November 11, 2004; accepted on November 18, 2004

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 SYSTEMS AND METHODS
 3 RESULTS
 4 DISCUSSION
 REFERENCES
 

    Al-Shahrour, F., Diaz-Uriarte, R., Dopazo, J. (2004) FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics, 20, 578–580[Abstract/Free Full Text].

    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet., 25, 25–29[CrossRef][Web of Science][Medline].

    Beissbarth, T. and Speed, T.P. (2004) GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics, 20, 1464–1465[Abstract/Free Full Text].

    Bomsztyk, K., Denisenko, O., Ostrowski, J. (2004) hnRNP K: one protein multiple processes. Bioessays, 26, 629–638[CrossRef][Web of Science][Medline].

    Bono, H., Kasukawa, T., Furuno, M., Hayashizaki, Y., Okazaki, Y. (2002) FANTOM DB: database of functional annotation of RIKEN mouse cDNA clones. Nucleic Acids Res., 30, 116–118[Abstract/Free Full Text].

    Cohen, J.D., Braver, T.S., Brown, J.W. (2002) Computational perspectives on dopamine function in prefrontal cortex. Curr. Opin. Neurobiol., 12, 223–229[CrossRef][Web of Science][Medline].

    Damian, D. and Gorfine, M. (2004) Statistical concerns about the GSEA procedure. Nat. Genet., 36, 663[Web of Science][Medline].

    Dauer, W. and Przedborski, S. (2003) Parkinson's disease: mechanisms and models. Neuron, 39, 889–909[CrossRef][Web of Science][Medline].

    Hosack, D.A., Dennis, G., Jr, Sherman, B.T., Lane, H.C., Lempicki, R.A. (2003) Identifying biological themes within lists of genes with EASE. Genome Biol., 4, R70[CrossRef][Medline].

    Liu, G., Loraine, A.E., Shigeta, R., Cline, M., Cheng, J., Valmeekam, V., Sun, S., Kulp, D., Siani-Rose, M.A. (2003) NetAffx: Affymetrix probesets and annotations. Nucleic Acids Res., 31, 82–86[Abstract/Free Full Text].

    Mootha, V.K., Lindgren, C.M., Eriksson, K.F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E., et al. (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet., 34, 267–273[CrossRef][Web of Science][Medline].

    Rosenstein, J.M. and Krum, J.M. (2004) New roles for VEGF in nervous tissue—beyond blood vessels. Exp. Neurol., 187, 246–253[CrossRef][Web of Science][Medline].

    Smid, M. and Dorssers, L.C. (2004) GO-Mapper: functional analysis of gene expression data using the expression level as a score to evaluate Gene Ontology terms. Bioinformatics, 20, 2618–2625[Abstract/Free Full Text].

    Sokal, R.R. and Rohlf, F.J. Biometry, (1994) , NY W.H. Freeman & Company.

    Soreq, H. and Seidman, S. (2001) Acetylcholinesterase—new roles for an old actor. Nat. Rev. Neurosci., 2, , pp. 294–302[CrossRef][Web of Science][Medline].

    Speciale, S.G. (2002) MPTP: insights into parkinsonian neurodegeneration. Neurotoxicol. Teratol., 24, 607–620[CrossRef][Web of Science][Medline].

    Volinia, S., Evangelisti, R., Francioso, F., Arcelli, D., Carella, M., Gasparini, P. (2004) GOAL: automated Gene Ontology analysis of expression profiles. Nucleic Acids Res., 32, W492–W499[Abstract/Free Full Text].

    Zeeberg, B.R., Feng, W., Wang, G., Wang, M.D., Fojo, A.T., Sunshine, M., Narasimhan, S., Kane, D.W., Reinhold, W.C., Lababidi, S., et al. (2003) GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol., 4, R28[CrossRef][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
H. S. Leong and D. Kipling
Text-based over-representation analysis of microarray gene lists with annotation bias
Nucleic Acids Res., June 1, 2009; 37(11): e79 - e79.
[Abstract] [Full Text] [PDF]


Home page
JDRHome page
E.L. Hendrickson, R.J. Lamont, and M. Hackett
Tools for Interpreting Large-scale Protein Profiling in Microbiology
Journal of Dental Research, November 1, 2008; 87(11): 1004 - 1015.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Q. Zheng and X.-J. Wang
GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W358 - W363.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
D. Nam and S.-Y. Kim
Gene-set approach for expression pattern analysis
Brief Bioinform, May 1, 2008; 9(3): 189 - 197.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Yang, Y. Li, H. Xiao, Q. Liu, M. Zhang, J. Zhu, W. Ma, C. Yao, J. Wang, D. Wang, et al.
Gaining confidence in biological interpretation of the microarray data: the functional consistence of the significant GO categories
Bioinformatics, January 15, 2008; 24(2): 265 - 271.
[Abstract] [Full Text] [PDF]


Home page
Mol. Pharmacol.Home page
N. Krynetskaia, H. Xie, S. Vucetic, Z. Obradovic, and E. Krynetskiy
High Mobility Group Protein B1 Is an Activator of Apoptotic Response to Antimetabolite Drugs
Mol. Pharmacol., January 1, 2008; 73(1): 260 - 269.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. W. Huang, B. T. Sherman, Q. Tan, J. Kir, D. Liu, D. Bryant, Y. Guo, R. Stephens, M. W. Baseler, H. C. Lane, et al.
DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W169 - W175.
[Abstract] [Full Text] [PDF]


Home page
BloodHome page
A. Gilboa-Geffen, P. P. Lacoste, L. Soreq, G. Cizeron-Clairac, R. Le Panse, F. Truffault, I. Shaked, H. Soreq, and S. Berrih-Aknin
The thymic theme of acetylcholinesterase splice variants in myasthenia gravis
Blood, May 15, 2007; 109(10): 4383 - 4391.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. M. McCarthy, S. M. Bridges, N. Wang, G. B. Magee, W. P. Williams, D. S. Luthe, and S. C. Burgess
AgBase: a unified resource for functional analysis in agriculture
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D599 - D603.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Scheer, F. Klawonn, R. Munch, A. Grote, K. Hiller, C. Choi, I. Koch, M. Schobert, E. Hartig, U. Klages, et al.
JProGO: a novel tool for the functional interpretation of prokaryotic microarray data using Gene Ontology information.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W510 - W515.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/7/1129    most recent
bti149v2
bti149v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (30)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ben-Shaul, Y.
Right arrow Articles by Soreq, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ben-Shaul, Y.
Right arrow Articles by Soreq, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?