Bioinformatics Advance Access originally published online on September 23, 2004
Bioinformatics 2005 21(5):650-659; doi:10.1093/bioinformatics/bti042
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification
1 Department of Molecular Genetics, Weizmann Institute of Science 76100 Rehovot, Israel
2 Department of Physics of Complex Systems, Weizmann Institute of Science 76100 Rehovot, Israel
3 Department of Biological Services, Weizmann Institute of Science 76100 Rehovot, Israel
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: Genes are often characterized dichotomously as either housekeeping or single-tissue specific. We conjectured that crucial functional information resides in genes with midrange profiles of expression.
Results: To obtain such novel information genome-wide, we have determined the mRNA expression levels for one of the largest hitherto analyzed set of 62 839 probesets in 12 representative normal human tissues. Indeed, when using a newly defined graded tissue specificity index
, valued between 0 for housekeeping genes and 1 for tissue-specific genes, genes with midrange profiles having 0.15 <
< 0.85 were found to constitute >50% of all expression patterns. We developed a binary classification, indicating for every gene the I B tissues in which it is overly expressed, and the 12 I B tissues in which it shows low expression. The 85 dominant midrange patterns with I B = 211 were found to be bimodally distributed, and to contribute most significantly to the definition of tissue specification dendrograms. Our analyses provide a novel route to infer expression profiles for presumed ancestral nodes in the tissue dendrogram. Such definition has uncovered an unsuspected correlation, whereby de novo enhancement and diminution of gene expression go hand in hand. These findings highlight the importance of gene suppression events, with implications to the course of tissue specification in ontogeny and phylogeny.
Availability: All data and analyses are publically available at the GeneNote website, http://genecards.weizmann.ac.il/genenote/ and, GEO accession GSE803.
Contact: doron.lancet{at}weizmann.ac.il
Supplementary information: Four tables available at the above site.
| INTRODUCTION |
|---|
|
|
|---|
The ontogeny of complex multicellular organisms is enabled by the differential expression of genes across various cell types. Expression profiling with DNA arrays offers the opportunity to systematically identify such patterns (Halfon and Michelson, 2002; Slonim, 2002). Housekeeping genes are expressed in all cell types, whereas other genes are expressed in a more restricted selection of tissues. In the previous research on the tissue specificity of genes, emphasis has mainly been on the extremes of one-tissue specific (Hsiao et al., 2001, http://www.humangenes.org; Su et al., 2002) and housekeeping genes (Eisenberg and Levanon, 2003; Lercher et al., 2002; Warrington et al., 2000). However, many genes may show midrange patterns of expression, that is, expressed at a high level in a subset of the tissues, and at a much lower level or not at all in other tissues. This term is related to the cross-tissue breadth of gene expression, rather than high or low overall expression intensities. Here, we investigate the occurrence and potential significance of midrange patterns of expression, noting that important information about a given tissue may be harbored not only in tissue-specific enhancement of expression, but also in tissue-specific suppression.
Some recent high-throughput DNA arrays studies of gene expression have been aimed at characterizing healthy tissue transcription patterns. One of these examined the transcription profiles in 28 normal human tissues and 45 mouse tissues, utilizing 12 000 oligonucleotide probesets (Su et al., 2002). cDNA arrays have also been used to examine expression of genes across normal human tissues (Saito-Hisaminato et al., 2002). These, as well as other surveys on normal tissues (Haverty et al., 2002; Hsiao et al., 2001) were limited only to the more well-characterized genes, and did not afford a total genome-wide view. Studies on a more complete gene set focused on a comparison between diseased and non-diseased states (Bakay et al., 2002; Iacobuzio-Donahue et al., 2002; Mariani et al., 2002). In a recent report (Shmueli et al., 2003), as well as in the current work, we queried 12 normal human tissues with a complete gamut of 62 839 probesets, representing 23 271 identifiable human genes. This is one of the largest sets employed to date, and includes nearly 12 000 genes whose tissue expression has not been examined by the earlier studies. Most recently, Su et al. (2004) have extended their expression atlas to encompass 79 human and 61 mouse tissues.
The resulting genome-wide view of gene expression patterns is used here to reveal relationships among healthy human tissues, as well as to generate new genome annotation tools. Specifically, our data shed new light on genes with midrange profiles of expression, with implications to the fine balance of gene expression and suppression that underlie tissue specification.
| SYSTEMS AND METHODS |
|---|
|
|
|---|
Expression data preprocessing
The expression intensity of mRNA was assayed across five microarrays (Affymetrix GeneChips U95AE), containing a total of 62 839 probesets, each in duplicate. Poly(A)+RNA samples from the human tissues were purchased from Clontech (Palo Alto, CA, details in Table S1 in the Supplementary Material). This collection of major human tissues, includes bone marrow, brain, heart, kidney, liver, lung, pancreas, prostate, skeletal muscle, spinal cord, spleen and thymus. These RNA samples have relatively coarsely defined tissue delineations but are compatible in this respect to those used in other studies of transcription patterns in a group of normal human tissues (Su et al., 2002, 2004; Saito-Hisaminato et al., 2002). Each RNA sample was typically composed of a pool of 1025 individuals. While such commercial pooled samples from anonymous donors are demographically ill-defined, they are advantageous in enabling others to reproduce the experiments.
Replicate experiments were done independently, mostly from RNA of identical lot numbers. Exceptions are kidney, pancreas and prostate. Aliquots of each sample (12 µg cRNA in 200 µl hybridization mix) were hybridized to a GeneChip Human Genome U95AE array set (Affymetrix, Santa Clara, CA). Preparation and hybridization of cRNA were done according to the manufacturer's instructions (Affymetrix, 2001, http://.com/support/technical/manuals.affx).
The expression value for each gene was determined using the MicrroArray Suite version 5.0 (MAS 5.0) software (Hubbell et al., 2002; Liu et al., 2003) with default parameters, without using the MAS 5.0 scaling and normalizing procedures. The quantilization procedure used here (see below) encapsulates some features of a preprocessing method, RMA normalization (Irizarry et al., 2003). Affymetrix MAS 5.0 signal values were normalized by taking the log10 of all values (substituting 1 for zero intensities) and then subtracting the mean for the particular array and adding the total experimental mean (Shmueli et al., 2003). Finally, intensities less than log10 30 were set to log10 30 to eliminate the perturbation by the noise present in the low intensities. Variations in this threshold resulted in no significant changes. The MAS 5.0 intensities, ranging on a decimal logarithmic scale from log10 30 to roughly 4, were converted into a quantile scale. The expression data, averaged over the two replicates, were divided into 11 bins, whereby 10 equal density quantiles spanned the values above log10 30, and an eleventh zero bin included the remaining low-intensity values. Henceforth, the quantiled profiles were used in the analysis.
Statistical analysis of differential expression
Single-classification ANOVA with equal sample sizes was employed on the preprocessed 24 element expression vector composed of 12 tissues in two replicates. For each tissue profile, the sum of the squares of the differences between the replicates was compared with the sum of the squares of the differences between the averages of the tissue expressions. To account for the multiple comparison problem inherent in calculating the P-values for all 62 839 probesets, we calculated the false discovery rate (Benjamini and Hochberg, 1995). We chose a 1% error rate, which gave a P-value cutoff of 0.0036. This resulted in 22 936 profiles that were defined as differentially expressed. The remaining profiles were further divided into not expressed profiles, defined as having all 12 values in the zero bin, and housekeeping profiles, whose expression is non-zero in all tissues and all intensities are of a similar value (SD smaller than 1 quantile unit). The remaining profiles were defined as uncategorized. The algorithms described below were deployed on the 22 936 differentially expressed profiles.
Probesets to genes analysis
The association of probesets to genes was performed using the GeneAnnot algorithm (Chalifa-Caspi et al., 2003, 2004). GeneAnnot comprehensively identifies relationships between oligonucleotide array probesets and annotated genes in GeneCards (Safran et al., 2002) by performing pairwise alignments between the probe sequences and gene transcripts, and assigning sensitivity and specificity scores to them. A further step of probeset annotation, conducted by GeneTide (Shklar et al., 2004), was to assign annotation based upon the transcript from which the probeset was derived. This was carried out by an integration of transcript annotation data from several resources such as UniGene (Wheeler et al., 2003) and AceView (http://www.humangenes.org). Furthermore, these target sequences were aligned against the human genome using BLAT (Kent, 2002), and assigned a gene according to their genomic location using GeneLoc (Rosen et al., 2003).
| ALGORITHMS |
|---|
|
|
|---|
Tissue specificity index
The index
is defined as:
![]() |
= 0.95. Other definitions, for example, based on entropy or geometric considerations, were pursued but found less robust in terms of sensitivity to extreme profile component values.
Binary patterns
We first defined the gap index for each expression profile as the maximum difference between the two neighboring values in the sorted quantile vector. When the same gap was found more than once in a profile, the first gap, between the smaller neighboring values with that gap was taken. The gap was used to convert expression profiles into binary form. For those 8224 differentially expressed profiles with a gap of at least 3, expression above the gap was interpreted as overexpressed (1) and the rest as underexpressed (0). This set of 8224 probeset profiles form our mingap set. The remaining 14 712 differentially expressed profiles were classified to the best matching binary patterns detected by gap as follows. The Euclidean distance was calculated between each of the 14 712 profiles and the mean expression profile of each of the binary patterns. The pattern to which this distance was smallest was selected as the matching binary pattern for the profile. The binary index, I B, corresponding to each binary pattern is defined as the number of 1s in the pattern.
Unsupervised clustering
The Superparamagnetic clustering (SPC) algorithm (Blatt et al., 1996) was applied to the same set of profiles used in the binary pattern analysis. Before clustering, each profile was centered and normalized such that its mean was centered to zero, and its norm became one [as described by Kannan et al. (2001)]. The SPC parameters used are detailed in Table S2 (Supplementary Material).
Ancestral tissue reconstructions
Given two binary tissue expression profiles, an ancestor profile was inferred by first assuming that instances of agreement (both 1s or both 0s) are unaltered in the ancestor. In the disagreement cases (1 and 0, or 0 and 1), maximum parsimony is applied, with a majority call of expression in the remaining tissues. Our method for inferring the ancestors of each node in a dendrogram including the deep internal nodes involved following the linkages of the hierarchically clustered tree and successively inferring each node.
Availability
All analyses were implemented in Matlab (www.mathworks.com). Scripts and intermediate data are all available upon request.
| IMPLEMENTATION |
|---|
|
|
|---|
Expression profile categorization
Expression profiles were generated for a set of 12 representative normal human tissues (Fig. 1). This was done with a total of 62 839 oligonucleotide probesets, of which nearly 75% corresponded to annotated human genes, encompassing 23 271 GeneCards entries (Safran et al., 2002), and the rest could not be associated with currently known gene-related sequences (Table 1). The 50 214 probesets included in the four less commonly used arrays U95BE provided novel expression information on 11 418 GeneCards genes. This genome-wide view of human tissue expression patterns is available in the GeneNote database (Shmueli et al., 2003, http://genecards.weizmann.ac.il/genenote/). The expression profiles were classified into four categories: differentially expressed, housekeeping, not expressed and uncategorized (Fig. 1 and Table 1). It is seen that a majority (
90%) of the probesets in the first two categories are related to known genes, while most of the unannotated probesets are included in the last two categories, as expected.
|
|
Distribution of tissue specificities
To examine the complete expression pattern diversity, we developed a tissue specificity index,
, a quantitative, graded scalar measure of the specificity of an expression profile.
values interpolate the entire range between 0 for housekeeping genes and 1 for strictly one-tissue-specific genes. It is seen (Fig. 2A) that
values near 0 and 1 tend to be more probable than the intermediate values, generating a U-shaped distribution. However, as many as 57% of all profiles have intermediate specificities: 0.15
0.85, constituting the largest group, greater than the housekeeping and one-tissue-specific sets combined.
|
To evaluate the robustness of the shape of the
distribution to additional tissues and organisms, we calculated the same distributions for a previously published set (Su et al., 2002) where 27 human and 45 mouse normal tissues with replicates were analyzed for one-fifth of the gene representations examined here. We found that the shape of the
distributions was largely similar in all three datasets. Indeed, nearly identical percentages of profiles with intermediate specificity (0.15
0.85) are detected: 56% for mouse and 57% for human.
Do our tissue-specificity (
) estimates from 12 tissues scale up when a more comprehensive number of tissues are examined? A recently published study (Su et al., 2004) provides human gene expression profiles across 74 non-cancerous human tissues. We found a high correlation (R = 0.85) between the
indices of genes across the two datasets for differentially expressed genes (Fig. 3). Two clusters of
values differ markedly: low
in GeneNote, high
in the new study and vice versa. The former correspond to genes specific to tissues not present in GeneNote, and the latter to spleen, not present in the more recent study. Congruence between the tissue specificity values based upon 12 and 74 tissues demonstrates the power of our newly defined tissue specificity index, and shows that our choice of tissues is fairly representative of the complete tissue-set transcriptome. An analysis of the distribution of
values for the new dataset (Fig. 2A, dotted line) shows a relatively high preponderance (>60%) of intermediate
values, likely stemming from the use of subtissues such as different brain regions.
|
Binary expression patterns
The one-dimensional tissue specificity index is limited in its capacity to identify and categorize specific classes of expression patterns. To overcome this, we developed a procedure that converts an arbitrary expression profile into a binary pattern. The quantiled expression profiles are mapped from a very large set of the cardinality 1112 (more than 3.1 billion) to a reduced set of 212 = 4096 possible patterns. This analysis was initially performed on a subset of 8224 probeset expression profiles that fulfilled a specific intensity gap criterion (the mingap set, see the Algorithms section).
Of the possible 4094 binary patterns (excluding the all-0 and all-1 patterns), 859 were actually observed in this set. The probesets of the first microarray (U95A) detected only 498 of these patterns, while the remaining 42% of the patterns were found only on the four additional arrays (U95BE). Further, the four additional arrays strengthen 127 patterns that were populated by only one profile in the first array. Subsequently, the differentially expressed profiles not included in the mingap set were binarized by matching each one to its closest binary counterpart.
The results of the binarization are shown in Figure 4. The different panels 4.i (i = 112) have profiles (parsed from among the 8224 gene representations of the differentially expressed and 4216 housekeeping genes) with high expression in i tissues and underexpression in 12 minus i tissues. Panel 4.12 contains the strictly defined housekeeping genes. In panel 4.1 (single-tissue specificity), brain, bone marrow, pancreas, skeletal muscle and liver are more highly represented, while spinal cord, kidney, heart and spleen have relatively few profiles. In panel 4.2, prevalent two-tissue specific patterns are brain and spinal cord, heart and skeletal muscle, bone marrow and thymus, and kidney and liver. Bone marrow, spleen and thymus tissues define a major three-tissue pattern in panel 4.3. Panels 4.94.11 depict profiles with expression in all but 3, 2 or 1 tissue(s), respectively. Notably, the same five tissues with the most single-tissue specific profiles (brain, bone marrow, pancreas, muscle and liver) also have the greatest number of single-tissue suppressed profiles.
|
Of the individual profiles appearing in Figure 4, 5220 are not well annotated to any known gene and should therefore be considered interesting. For the implied novel genes, function can be preliminarily ascertained based upon their expression profile. Table S3 (Supplementary Materials) shows the expression profile along with the identifier of the sequence from which the probeset was derived.
We subsequently defined the 99 most populated (>25 probeset profiles) binary patterns among the 22 936 differentially expressed genes including the housekeeping (all 1s) and null (all 0s) patterns (Fig. 2B). The number of populated binary patterns in each binary index, I B, shows a clear bimodal distribution (Fig. 2C), with peaks at binary index values of 2 and 10. This behavior reflects the same bimodal trend seen for
values in Figure 2A. Whereas all 12 one-tissue-specific patterns are included, only about one-third of the two-tissue expressed patterns (I B = 2) and about one-quarter of the two-tissue repressed patterns (I B = 10) are included in this set, suggesting biases toward specific oligo-tissue combinations. The peak at high I B values in Figure 2C corresponds to profiles with low expression in 13 tissues and high expression in the others. We use the term suppression to describe instances where genes are expressed at lower levels in a few tissues. This does not necessarily imply an active process where the expression of a gene is specifically turned off. It could equally be due to a loss of activation in expression or a dilution of mRNA levels in one tissue relative to another, due to a different cellular composition.
To test the validity of the supervised binary clustering, we also applied unsupervised SPC (Blatt et al., 1996; Getz et al., 2000) to our data (Fig. 5A). SPC is suitable for the clustering of gene expression profiles due to its stability against noise and the inherent measure of cluster stability (Getz et al., 2000). The identified 70 SPC clusters showed a strong correlation with the 97 binary clusters (Fig. 5B). Some binary patterns are represented by multiple SPC clusters, thus serving to further refine the relevant binary patterns (Fig. 5C). The high level of overall agreement between the two clustering methods lends additional credence to the binary classification proposed here.
|
Tissue relationships based upon the expression repertoire
Inter-tissue distances were calculated between pairs of tissue vectors, each containing the 22 936 expression values of the set of differentially expressed profiles. The resulting tissue dendrogram (Fig. 6A) shows a specific set of groupings relating to different degrees of inter-tissue similarities. The dendrogram reveals a set of tissue relationships that is consistent with previous knowledge (Hsiao et al., 2001). The immunological tissues, bone marrow, thymus and spleen, along with the lung, cluster together. Pairs of related tissues coupled in the dendrogram are: heart and skeletal muscle, kidney and liver, brain and spinal cord, and prostate and pancreas.
|
To isolate those profiles that specify the underlying relationships among the tissues, we split the differentially expressed profiles into two groups: those with I B = 1 and those with I B = 211. We found that the tree based upon the second group, with midrange profiles (Fig. 6B) recovers the most important features of the dendrogram based on the entire set: a united nervous system, muscle tissues juxtaposed and immune system mutually coherent. In contrast, the dendrogram based upon the I B = 1 group (Fig. 6C) is very different and appears much more removed from known tissue relationships. For example, the spinal cord is closest to heart and very distant from brain. One could argue that the visible non-zero off-diagonal values in the I B = 1 patterns (panel 4.1 of Fig. 4) would contribute sufficient information, so as to generate a more biologically realistic tissue dendrogram. But this is not the case.
Inferring ancestral tissue profiles
The availability of genome-wide expression profiles for each of the tissues provides a unique opportunity to obtain additional information regarding mutual relationships among different tissues. Specifically, it is possible to derive from the tissue dendrogram an inferred gene expression profile for each of the ancestral tissues represented by the internal nodes (Fig. 7A). As an example, it is seen (Fig. 7B) that in most cases of expression in brain but not in spinal cord, underexpression is inferred for the ancestral tissue, reflecting de novo specificities for brain. On the other hand, most of the high expressions found in spinal cord but not in brain are also positive in the inferred ancestral tissue, suggesting that the difference corresponds to brain-specific suppressions. More generally, we found that for most ancestral tissues, including all but one of the most closely related doublets, the tissue with more genes showing novel expression also exhibits more genes with novel suppression (Fig. 7C). This phenomenon is also gleaned by visual inspection of panels 1 and 11 of Figure 4, as described above.
|
| DISCUSSION |
|---|
|
|
|---|
This paper proposes a set of novel genome-wide-specific annotation tools. First, each of the 23 271 genes targeted by 46 185 probesets (Table 1) has one or more tissue expression profiles, documented in GeneNote. A set of tools has been developed to allow one to generate a consensus expression pattern for each of these genes, with the exclusion of outliers. Second, every gene is marked with a specific value of
, identifying it as belonging to a particular range on a graded tissue specificity measure between extreme tissue specificity and a complete absence of such specificity. Third, a gene with a differentially expressed profile is related to a binary pattern, indicating the combination of tissues in which it is more highly expressed and suppressed. We believe that these binary patterns are more amenable to intuitive scientific interpretation than classification based on standard clustering algorithms. It is reassuring, though, that a high degree of correlation is demonstrated between the two systems. All the above information provides tools for assigning potential function to novel and hitherto un-annotated genes. As the annotation tools presented here are easily generalized, we believe they can be fruitfully applied to a wide spectrum of datasets, for example, to sets with tumor and non-tumor samples. The binary pattern analysis is particularly useful in revealing expression profiles that constitute unusual tissue combinations. For example, the pattern number 36 in Figure 2B denotes high expression in bone marrow, pancreas and liver; pattern number 47 denotes high expression in heart, prostate and spinal cord. In general, among the 98 binary patterns of Figure 2B that show expression in at least one tissue, 1 pattern corresponds to the housekeeping expression profile and another 12 denote single-tissue-specific profiles. The remaining 85 patterns are defined here as denoting midrange profiles of expression. Of these, a maximum of 33 patterns may be considered as consistent with tissue clustering as defined by the dendrogram of Figure 6A, as they correspond to the groups of tissues defined by the terminal and internal nodes of the dendrogram. Thus, a majority of the dominant binary patterns corresponding to midrange profiles may be viewed as unexpected. Such patterns are difficult to explain in terms of tissue similarities, including the sharing of common cell types among disparate tissues. Alternatively, there may be yet undiscovered underlying transcription control mechanisms that could be discerned by future research. Some such unexpected expression patterns may be a neutral mode of expression (Khaitovich et al., 2004; Yanai et al., 2004).
The approach explored here focuses on midrange profiles of transcription, with elevated expression/suppression in specific tissue combinations, and intermediate values of the tissue specificity index
. Our analysis has revealed that midrange profiles constitute a majority of the tissue specificity expression patterns. Despite its ubiquity, this category has received remarkably little attention relative to its housekeeping and tissue-specific counterparts. Of the nearly 100 most populated binary patterns, more than 80% are midrange patterns. A recent expression study in maize has also shown that a relatively small portion of genes tend to be organ specific while the remaining show diverse expression (Cho et al., 2002).
Most focused arrays with specific subsets of genes used by various authors contain mostly tissue-specific genes whose level is elevated in a single tissue. Such arrays may be considered too focused. Our results and analyses, suggesting the importance of genes with midrange expression profiles, could have serious impact in terms of array design and experimentation.
A dominant property of midrange profiles is the surprising preponderance of patterns with tissue-specific gene suppression (I B = 911), which are almost as populated as oligo-expression patterns (I B = 24). The most underrepresented set of profiles are the midrange profiles with I B = 57. Our results also indicate that in the evolution of a tissue, de novo expression and de novo suppression go hand in hand.
It thus appears that gene suppression plays a major role in tissue evolution and is tightly coupled with novel expression in the origin of distinct tissues. Such tissue-specific gene suppression may be mediated by specific pathways of transcription control (Hsia and McGinnis, 2003), as well as by other cellular mechanisms, including those mediated by RNA interference (Cerutti, 2003). One practical conclusion related to tissue-specific arrays is that these should preferably contain, in addition to single-tissue-specific genes, also genes that manifest more complex patterns of expressionsuppression.
| CONCLUSION |
|---|
|
|
|---|
Understanding the signaling and control pathways that govern organ development during ontogeny constitutes a fundamental problem of developmental biology (Burgess et al., 2002). Studies in model organisms such as Drosophila have demonstrated the complex interplay of signaling molecules that underlie developmental events associated with the embryonic maturation of tissues and cell types (St Johnston, 2002). The exact spatial and temporal expression of genes, and the interaction of their protein products elicit a developmental code of organ commitment and early patterning. This code likely manifests itself in the pattern of gene expression in each of the tissues. Furthermore, when new tissues are formed in ontogeny or phylogeny, their ancestral precursors should have their own expression patterns, complexly related to those of the more highly differentiated derived tissues. To validate this concept in the future, direct experimental testing of expression patterns at early stages in embryogenesis will be required. Our analysis of ancestral tissue expression, which points to a correlation between novel tissue expression and suppression, and the availability of a tissue dendrogram relating to the full gamut of genes of the human genome, can serve as a valuable tool for such studies.
| Acknowledgments |
|---|
I.Y. is a Koshland Scholar, D.L. is the incumbent of the Ralph and Lois Silver Chair in Human Genomics, and E.D. is the incumbent of the Henry J. Leir Professorial Chair. This work was made possible by the generosity of the Abraham and Judith Goldwasser Foundation. It was further supported by the Crown Human Genome Center and by the Koshland Center for Basic Research.
Received on April 14, 2004; revised on June 14, 2004; accepted on September 19, 2004
| REFERENCES |
|---|
|
|
|---|
Affymetrix. (2001) Microarray Suite User Guide, Version 5.
Bakay, M., Zhao, P., Chen, J., Hoffman, E.P. (2002) A web-accessible complete transcriptome of normal human and DMD muscle. Neuromuscul. Disord., 12, (Suppl. 1), S125S141.
Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B, 57, 289300.
Blatt, M., Wiseman, S., Domany, E. (1996) Superparamagnetic clustering of data. Phys. Rev. Lett., 76, 32513254[CrossRef][ISI][Medline].
Burgess, R., Lunyak, V., Rosenfeld, M. (2002) Signaling and transcriptional control of pituitary development. Curr. Opin. Genet. Dev., 12, 534539[CrossRef][ISI][Medline].
Cerutti, H. (2003) RNA interference: traveling in the cell and gaining functions?. Trends Genet., 19, 3946[CrossRef][ISI][Medline].
Chalifa-Caspi, V., Shmueli, O., Benjamin-Rodrig, H., Rosen, N., Shmoish, M., Yanai, I., Ophir, R., Kats, P., Safran, M., Lancet, D. (2003) GeneAnnot: interfacing GeneCards with high throughput gene expression compendia. Brief. Bioinformatics, 4, 349360
Chalifa-Caspi, V., Yanai, I., Ophir, R., Rosen, N., Shmoish, M., Benjamin-Rodrig, H., Shklar, M., Stein, T.I., Shmueli, O., Safran, M., et al. (2004) GeneAnnot: comprehensive two-way linking between oligonucleotide array probesets and GeneCards genes. Bioinformatics, 20, 14571458
Cho, Y., Fernandes, J., Kim, S.H., Walbot, V. (2002) Gene-expression profile comparisons distinguish seven organs of maize. Genome Biol., 3, research0045[Medline].
Eisenberg, E. and Levanon, E.Y. (2003) Human housekeeping genes are compact. Trends Genet., 19, 362365[CrossRef][ISI][Medline].
Getz, G., Levine, E., Domany, E. (2000) Coupled two-way clustering analysis of gene microarray data. Proc. Natl Acad. Sci. USA, 97, 1207912084
Halfon, M.S. and Michelson, A.M. (2002) Exploring genetic regulatory networks in metazoan development: methods and models. Physiol. Genomics, 10, 131143
Haverty, P.M., Weng, Z., Best, N.L., Auerbach, K.R., Hsiao, L.L., Jensen, R.V., Gullans, S.R. (2002) HugeIndex: a database with visualization tools for high-density oligonucleotide array data from normal human tissues. Nucleic Acids Res., 30, 214217
Hsia, C.C. and McGinnis, W. (2003) Evolution of transcription factor function. Curr. Opin. Genet. Dev., 13, 199206[CrossRef][ISI][Medline].
Hsiao, L.L., Dangond, F., Yoshida, T., Hong, R., Jensen, R.V., Misra, J., Dillon, W., Lee, K.F., Clark, K.E., Haverty, P., et al. (2001) A compendium of gene expression in normal human tissues. Physiol. Genomics, 7, 97104
Hubbell, E., Liu, W.M., Mei, R. (2002) Robust estimators for expression analysis. Bioinformatics, 18, 15851592
Iacobuzio-Donahue, C.A., Maitra, A., Shen-Ong, G.L., van Heek, T., Ashfaq, R., Meyer, R., Walter, K., Berg, K., Hollingsworth, M.A., Cameron, J.L., et al. (2002) Discovery of novel tumor markers of pancreatic cancer using global gene expression technology. Am. J. Pathol., 160, 12391249
Irizarry, R., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., Speed, T.P. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249264[Abstract].
Kannan, K., Kaminski, N., Rechavi, G., Jakob-Hirsch, J., Amariglio, N., Givol, D. (2001) DNA microarray analysis of genes involved in p53 mediated apoptosis: activation of Apaf-1. Oncogene, 20, 34493455[CrossRef][ISI][Medline].
Kent, W.J. (2002) BLATthe BLAST-like alignment tool. Genome Res., 12, 656664
Khaitovich, P., Weiss, G., Lachmann, M., Hellmann, I., Enard, W., Muetzel, B., Wirkner, U., Ansorge, W., Paabo, S. (2004) A neutral model of transcriptome evolution. PLOS Biol., 2, E132[CrossRef][Medline].
Lercher, M.J., Urrutia, A.O., Hurst, L.D. (2002) Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat. Genet., 31, 180183[CrossRef][ISI][Medline].
Liu, G., Loraine, A.E., Shigeta, R., Cline, M., Cheng, J., Valmeekam, V., Sun, S., Kulp, D., Siani-Rose, M.A. (2003) NetAffx: affymetrix probesets and annotations. Nucleic Acids Res., 31, 8286
Mariani, T.J., Budhraja, V., Mecham, B.H., Gu, C.C., Watson, M.A., Sadovsky, Y. (2002) A variable fold-change threshold determines significance for expression microarrays. FASEB J., 17, 321323.
Rosen, N., Chalifa-Caspi, V., Shmueli, O., Adato, A., Lapidot, M., Stampnitzky, J., Safran, M., Lancet, D. (2003) GeneLoc: exon-based integration of human genome maps. Bioinformatics, 19, (Suppl. 1), I222I224.
Safran, M., Solomon, I., Shmueli, O., Lapidot, M., Shen-Orr, S., Adato, A., Ben-Dor, U., Esterman, N., Rosen, N., Peter, I., et al. (2002) GeneCards (2002): towards a complete, object-oriented, human gene compendium. Bioinformatics, 18, 15421543
Saito-Hisaminato, A., Katagiri, T., Kakiuchi, S., Nakamura, T., Tsunoda, T., Nakamura, Y. (2002) Genome-wide profiling of gene expression in 29 normal human tissues with a cDNA microarray. DNA Res., 9, 3545[Abstract].
Shklar, M., et al. (2004) GeneTide: Terra Incognita Discovery Endeavor Mining ESTs and Expression Data to Elucidate Known and De-Noro GeneCards Genes. 478479 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, CSB2004.
Shmueli, O., Horn-Saban, S., Chalifa-Caspi, V., Shmoish, M., Ophir, R., Benjamin-Rodrig, R., Safran, M., Domany, E., Lancet, D. (2003) GeneNote: whole genome expression profiles in normal human tissues. C.R. Biologies, 326, 10671072.
Slonim, D.K. (2002) From patterns to pathways: gene expression data analysis comes of age. Nat Genet., 32, (Suppl.), 502508.
St Johnston, D. (2002) The art and design of genetic screens: Drosophila melanogaster . Nat. Rev. Genet., 3, 176188[CrossRef][ISI][Medline].
Su, A.I., Cooke, M.P., Ching, K.A., Hakak, Y., Walker, J.R., Wiltshire, T., Orth, A.P., Vega, R.G., Sapinoso, L.M., Moqrich, A., et al. (2002) Large-scale analysis of the human and mouse transcriptomes. Proc. Natl Acad. Sci. USA, 99, 44654470
Su, A.I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K.A., Block, D., Zhang, J., Soden, R., Hayakawa, M., Kreiman, G., et al. (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl Acad. Sci. USA, 101, 60626067
Warrington, J.A., Nair, A., Mahadevappa, M., Tsyganskaya, M. (2000) Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes. Physiol. Genomics, 2, 143147
Wheeler, D.L., Church, D.M., Federhen, S., Lash, A.E., Madden, T.L., Pontius, J.U., Schuler, G.D., Schriml, L.M., Sequeira, E., Tatusova, T.A., et al. (2003) Database resources of the National Center for Biotechnology. Nucleic Acids Res., 31, 2833
Yanai, I., Graur, D., Ophir, R. (2004) Incongruent expression profiles between human and mouse orthologous genes suggest widespread neutral evolution of transcription control. OMICS, 8, 1524[CrossRef][ISI][Medline].
This article has been cited by other articles:
![]() |
N. D. Singh, A. M. Larracuente, and A. G. Clark Contrasting the Efficacy of Selection on the X and Autosomes in Drosophila Mol. Biol. Evol., February 1, 2008; 25(2): 454 - 467. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Bar-Joseph, Z. Siegfried, M. Brandeis, B. Brors, Y. Lu, R. Eils, B. D. Dynlacht, and I. Simon Genome-wide transcriptional analysis of the human cell cycle identifies genes differentially regulated in normal and cancer cells PNAS, January 22, 2008; 105(3): 955 - 960. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Vinogradov and O. V. Anatskaya Organismal complexity, cell differentiation and gene expression: human over mouse Nucleic Acids Res., October 8, 2007; 35(19): 6350 - 6356. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. B. Axelsen, J. Lotem, L. Sachs, and E. Domany Genes overexpressed in different human solid cancers exhibit different tissue-specific expression profiles PNAS, August 7, 2007; 104(32): 13122 - 13127. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Jarry, M. F. Rioux, V. Bolduc, Y. Robitaille, V. Khoury, I. Thiffault, M. Tetreault, L. Loisel, J. P. Bouchard, and B. Brais A novel autosomal recessive limb-girdle muscular dystrophy with quadriceps atrophy maps to 11p13-p12 Brain, February 1, 2007; 130(2): 368 - 380. [Abstract] [Full Text] [PDF] |
||||
![]() |
B.-Y. Liao, N. M. Scott, and J. Zhang Impacts of Gene Essentiality, Expression Pattern, and Gene Compactness on the Evolutionary Rate of Mammalian Proteins Mol. Biol. Evol., November 1, 2006; 23(11): 2072 - 2080. [Abstract] [Full Text] [PDF] |
||||
![]() |
B.-Y. Liao and J. Zhang Low Rates of Expression Profile Divergence in Highly Expressed Genes and Tissue-Specific Genes During Mammalian Evolution Mol. Biol. Evol., June 1, 2006; 23(6): 1119 - 1128. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Kim, D. J. Dix, K. E. Thompson, R. N. Murrell, J. E. Schmid, J. E. Gallagher, and J. C. Rockett Gene expression in head hair follicles plucked from men and women. Ann. Clin. Lab. Sci., March 1, 2006; 36(2): 115 - 126. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||












