Bioinformatics Advance Access originally published online on June 28, 2007
Bioinformatics 2007 23(17):2273-2280; doi:10.1093/bioinformatics/btm340
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Identification of aberrant chromosomal regions from gene expression microarray studies applied to human breast cancer
1German Cancer Research Center (DKFZ), Department of Molecular Genome Analysis, 69120 Heidelberg, 2Institute of Medical Informatics, Biometrics, and Epidemiology (IBE), LMU, 81377 Munich and 3Institute for Medical Biometry Epidemiology and Informatics (IMBEI), 55131 Mainz, Germany
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: In cancer, chromosomal imbalances like amplifications and deletions, or changes in epigenetic mechanisms like DNA methylation influence the transcriptional activity. These alterations are often not limited to a single gene but affect several genes of the genomic region and may be relevant for the disease status. For example, the ERBB2 amplicon (17q21) in breast cancer is associated with poor patient prognosis. We present a general, unsupervised method for genome-wide gene expression data to systematically detect tumor patients with chromosomal regions of distinct transcriptional activity. The method aims to find expression patterns of adjacent genes with a consistently decreased or increased level of gene expression in tumor samples. Such patterns have been found to be associated with chromosomal aberrations and clinical parameters like tumor grading and thus can be useful for risk stratification or therapy.
Results: Our approach was applied to 12 independent human breast cancer microarray studies comprising 1422 tumor samples. We prioritized chromosomal regions and genes predominantly found across all studies. The result highlighted not only regions which are well known to be amplified like 17q21 and 11q13, but also others like 8q24 (distal to MYC) and 17q24-q25 which may harbor novel putative oncogenes. Since our approach can be applied to any microarray study it may become a valuable tool for the exploration of transcriptional changes in diverse disease types.
Availability: The R source codes which implement the method and an exemplary analysis are available at http://www.dkfz.de/mga2/people/buness/CTP/.
Contact: a.buness{at}gmx.de
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
In breast and pancreas cancer (Heidenblad et al., 2005; Pollack et al., 2002), a strong influence of DNA copy number on gene expression is well known. The expression level of
60% of the genes within the highly amplified region is found to be at least moderately elevated. For breast cancer cell lines an impact on the expression level for 44% of the genes in highly amplified regions was reported (Hyman et al., 2002). In the light of these observations, it is likely that expression patterns of genes in distinct chromosomal regions originate from amplifications or deletions. Comparative genome hybridization (CGH) has been widely used to detect genomic imbalances. The Human Cancer Genome Project which aims to catalog genetic changes associated with cancer underlines the recent interest in genomic alterations (Check, 2005). However, from genomic array data alone, it is not possible to identify genes in the aberrant regions whose expression is associated with the disease. The increasing amount of microarray gene expression data represents an important resource for biomedical research. Many efforts were made to publicly access data via centralized and standardized databases. Nevertheless, this resource has not yet fully been exploited. Attempts like the Oncomine project (Rhodes et al., 2005) highlight only certain aspects of the data, mainly differential gene expression between phenotypically well-known subgroups. Exploratory approaches like two-way- or bi-clustering of gene expression patterns (Prelic et al., 2006; Tanay et al., 2002) are important alternatives to identify patient subgroups with unknown or only presumed common characteristics. Additionally, chromosomal localization has been integrated in several methods which aim to detect differential gene expression of adjacent genes. One method predicts regions of DNA amplification for each individual tumor sample based on its standardized expression profile, whereas other methods compare defined specimen collectives, like healthy versus cancerous tissues, to identify regional gene expression alterations (Callegaro et al., 2006; Caron et al., 2001; Crawley and Furge, 2002; Levin et al., 2005; Myers et al., 2004; Staub et al., 2006; Toedling et al., 2005; Zhou et al., 2003). If exceptionally high or low expression is present in only a subset of one group, e.g. the group of tumor specimens, tailored statistical methods have to be applied as they have been proposed very recently (Tibshirani and Hastie, 2007).
We present an unsupervised approach using genome-wide microarray gene expression data to systematically detect patients showing distinct levels of either consistently higher or lower gene expression in a specific region of the genome. To this end, we tested whether any subset of patients shows a significantly altered expression level in a sliding window spanning a fixed number of genes across each chromosome. Adjacent genes differ in absolute signal intensity and also differ in change if they are collectively deregulated in a tumor specimen subgroup since the length and base composition of the represented cDNA or oligo sequences on the microarray, their hybridization efficacy and the incorporation rates of modified nucleotides vary. To account for these limitations, the scores of the windows used to systematically test the separability of subsets of samples are based on the ranks of the samples. The sample ranks are derived separately for each gene with respect to the gene expression values. Our approach provides a tool for the large scale exploration of genome-wide expression studies of various disease types. While identifying and characterizing groups of patients with distinct regional gene expression patterns, the ultimate goal is to find associations with their clinical parameters. In human breast cancer, several clinical parameters like histological grading, lymph node status, recurrence, survival and therapy response were already reported to be associated with chromosomal imbalances (Callagy et al., 2005; Jain et al., 2001; Nessling et al., 2005; Rennstam et al., 2003). We were able to systematically unveil chromosomal aberrations. This may lead to improved patient stratification and point to genes beneficial for diagnosis and therapy. We found an association between tumor grade and the identified chromosomal regions. Additionally, the results obtained by supervised approaches confirmed our approach.
| 2 METHOD, META-ANALYSIS AND DATA SETS |
|---|
|
|
|---|
2.1 Method
We present an intuitive method to score chromosomal regions on the basis of microarray gene expression data. Let G be the set of genes under consideration, and M the set of samples. The expression value of gene g
G in sample m
M is denoted by xmg. Thus the complete data set is an M x G matrix (xmg)m
M,g
G. As a notational convention, for G'
G a subset of genes and M'
M a subset of samples, denote by xM'G;' the M' x G' submatrix (xm'g;')m'
M',g'
G'. In particular, for each gene g
G the vector xM{g} consists of the expression values of gene g for all samples m
M. From the expression value matrix we derive the rank matrix R = (rmg)m
M,g
G, in which rmg is the rank of xmg in the descendingly sorted vector xM{g}. Thus for a given gene g
G and a sample m
M, the value rmg is the rank of the expression of gene g in sample m among the expression values of gene g across all samples. For any vector or matrix x, let |x| be the sum of its entries.
We aim to check whether a sample subset M' of size k shows significantly higher expression levels in a set of l consecutive genes G' (k and l fixed). If this is the case, the ranks of the k x l matrix rM'G;' should be significantly smaller than in an arbitrary submatrix of R of the same size. Hence, |rM'G;'| is a fairly natural measure for increased gene expression in xM'G;'. Since we are primarily interested in finding gene loci of increased expression rather than sample subsets, the statistics we use is the extreme value statistics
|
|
which can be calculated by summing up the smallest k values of the row sums |r{m}G'|, m
M. To minimize the impact of any correlation structure at the lower end of the expression levels on the statistic while aiming to identify upregulated subsets, we actually did not use the rank matrix R in our analysis. For example, consistently decreased expression levels in some samples cause such correlation structure while affecting the distribution of the (remaining) ranks among all other samples. Specifically, the distribution of the ranks of the subset of k samples are affected on which our statistic is based upon. Hence, we focused on the ranks of the k highest expression levels instead and modified the rank matrix by replacing all higher ranks by the constant value k. The method is illustrated in Figure 1.
|
A bootstrap approach is applied to obtain the null distribution of these statistics. For t = 1.,T = 106, draw a bootstrap sample
|
|
This defines another statistic, which we use to score the significance of each window G'. The windows G' can then be ranked according to their score sG'. This will identify the regions which are most likely to display differential up-regulation in some sample subgroup. The check for differential down-regulation is performed in an analogous manner, with the only change that –xmg are used instead of xmg. The minimum of the sG' for down- and up-regulation does not only define an order of relevance for the windows G', it also provides the size of the most significant subset and its sample composition, as well as the level of expression, i.e. up or down.
2.2 Meta-analysis
We systematically screened 12 independent breast cancer studies comprising 1422 patients (Table 1) for gene expression patterns in distinct chromosomal regions in parallel. The terms chromosomal region and genomic region are defined here as a number of consecutive genes on the chromosome. Since the distances of genes along chromosomes are not even, this does not correspond to a physical region of fixed length on the chromosome. Moreover, the microarray experiments on various platforms cover different sets of genes, which implies that a window of length l starting with the same gene may contain different subsets of genes. Restriction to the set of genes common to all studies was not possible, because it was too small to conduct a meaningful analysis, therefore leaving us with scores which were not directly comparable across the studies. The choice of the appropriate window length l depends on various factors, like the total number of localized genes in the data set as well as the extension of the region one aims to detect. We screened windows of length l = 10 and l = 20 for distinct expression patterns and ranked all windows G' for each l on the basis of their score, i.e. the minimum of sG' for up- and down-regulated subsets. Then all highest ranking windows were selected which formed 10 non-overlapping chromosomal regions. To assess the influence of the window length on the results, we compared the resulting most significant regions found for l = 10 and l = 20. The overlap of regions was on average 65 % (Supplementary Table 1), indicating a good overall agreement of the results.
|
For each breast cancer study, we determined the 10 most significant chromosomal regions. In several studies (1, 5, 9, 10 and 12), we found more than 10 regions supported by windows of the same minimal possible significance score (Supplementary Table 1). The subsequent step highlighted regions which were identified in a significant fraction of all studies. To this end, we merged all regions found in the individual studies such that a list of non-overlapping continuous chromosomal regions was obtained and sorted by relevance. The relevance of such a meta-region was defined by the number of independent studies having at least one region covered by the meta-region. Only those meta-regions were listed which were found in at least three independent breast cancer studies. The distribution of all meta-regions with respect to the number of contributing studies is shown in Supplementary Figure 1. All selected meta-regions were of the type up meaning that the gene expression of a subset of tumor samples was consistently up-regulated compared to all others in this particular chromosomal region. Only a small fraction of regions (6%) in the individual studies contributing to meta-regions belonged to the type down (data not shown).
A chromosomal region was characterized by its approximate start and extension on the chromosome (Location, Length) and the frequency of tumor samples showing consistently increased expression (Frequency). The median values across all contributing studies are shown in Table 2. Additionally, 20 representative genes were listed for each region in Supplementary Table 2. We selected those genes which were most frequently found in the contributing regions. This was done irrespective of their effective frequency in the data sets. In case of equally frequent genes, only the most centrally located genes were listed. The heterogeneity of the data sets did not allow to identify a small, common set of genes. However, a defined, small set of representative genes for each region and each data set is useful for the visualization and for any subsequent analysis which utilizes the stratification imposed by the gene expression pattern. For each region and each data set, we calculated the Spearman rank correlation matrix of all given gene pairs and excluded the gene with the lowest average correlation. This was repeated until a prespecified number of genes was obtained. We defined these genes as best representatives and regarded their expression pattern to be characteristic for the region. Here, we determined separately five best representative genes.
|
For the clustering, we chose agglomerative hierarchical clustering with Euclidean distance which was applied to gene-wise standardized expression values. Standardization was achieved as follows: to reduce the impact of outliers gene expression values deviating more than three median absolute deviations from the median were set to this threshold. Then z-scores were calculated by centering and rescaling.
To independently assess and confirm the validity of the identified regions and the underlying genes we tested their association with the histological grading of the tumors. We chose a t-statistic to order the genes up-regulated in grade 3 tumors when compared to the pooled group of grade 1 and 2 tumors. A Fisher's; exact test was used to test for an overrepresentation of the genes identified by our approach in the top ranked up-regulated genes.
To further validate our results, we chose the two supervised approaches MACAT and LAP (Callegaro et al., 2006; Toedling et al., 2005). For MACAT, we used k-nearest-neighbors, prior parameter optimization and 1000 permutations of the sample labels. A q-value threshold of <0.0001 was applied to the result of LAP which was run with the default settings. Starting with the outcome of the supervised analysis, we tested whether the resulting genes showed an association with the scores used to rank the relevance of the windows G'. To this end, each window score was assigned to its central gene. Scores of the genes obtained by the supervised approach were compared to the scores of the remaining genes in the data set. We applied a one-sided Kolmogorov–Smirnov test for the comparison of the scores.
All analysis was done using R (Ihaka and Gentleman, 1996) and Bioconductor (Gentleman et al., 2004).
2.3 Data sets
Publicly available data sets of 12 breast cancer microarray studies were compiled and prepared for the analysis (Table 1).
To determine the chromosomal localization of the measured transcripts in base pairs, i.e. their corresponding probes on the microarrays, we used Bioconductor (Gentleman et al., 2004) metadata packages for the Affymetrix Genechip based studies. Alternatively, we mapped the identifiers provided in the data set to an Entrez gene identifier (http://www.ncbi.nlm.nih.gov/entrez/) via the Unigene database (ftp://ftp.ncbi.nih.gov/repository/UniGene/). All localizations were based on the genome build hg17. To avoid ambiguities only probes for which a unique Entrez gene identifier could be determined were used in the subsequent analysis. If multiple probes matched the same gene identifier, only that with the highest average expression was used. This mapping strategy reduced the data sets substantially.
The four cell lines (MCF7, T47D, SKBR3 and BT474) present in the data set 6 (Pollack et al., 2002) were excluded from the joint analysis covering all breast cancer studies of tumor biopsies. From the study 7 (Sorlie et al., 2003), we selected the tumor samples which were measured on the most comprehensive microarray spanning the largest gene collection. These samples corresponded to the patient group whose expression data had not been published previously. Missing values had to be imputed for this study as well as for the data sets 9 and 12 (Sorlie et al., 2003; van't Veer et al., 2002; Zhao et al., 2004). Ten nearest-neighbor averaging was used for the imputation. In total, five samples with exceedingly high numbers of missing values were removed.
The method turned out to be sensitive to outlier samples as well as to technical artifacts like spotting batches. We advise to remove single outliers or to perform separate analyses on appropriate sample subsets. Tumor samples were considered as outliers if they were found to be on average less highly correlated with all other samples in the study than the others between themselves (Spearman rank correlation between the expression profiles). We excluded in total eight samples, which appeared to deviate from all others.
| 3 RESULTS |
|---|
|
|
|---|
3.1 Chromosomal regions with distinct gene expression patterns
We systematically screened 12 independent breast cancer studies (Table 1) for gene expression patterns in distinct chromosomal regions. The meta-analysis enabled us to prioritize chromosomal regions predominantly found across a collection of 1422 tumor samples (Table 2). Estimates on frequencies of aberrations were provided. The identified chromosomal regions included well known, frequent amplification sites in breast cancer like 1q21, 8p11-p12, 11q12-q13, 12q13-q15 and 17q12-q21 (Chin et al., 2006; Courjal and Theillet, 1997; Kauraniemi et al., 2001; Nessling et al., 2005; Rennstam et al., 2003). In addition, known genes associated with breast cancer were uncovered, including MUC1 in 1q21, FGFR1 in 8p11-p12, RELA in 11q13, MDM2 in 12q15 and ERBB2 in 17q21 (Supplementary Table 2). For example, the well-described ERBB2 amplicon on chromosome 17q21 was found in all but one study. The region as it was identified in data set 9 by our sliding window approach is shown exemplary in Figure 2. The clustering of the standardized gene expression values reveals a consistent and clear separation of a subset of tumor samples in this specific chromosomal region.
|
The top five scoring regions (Table 2) were detected in at least nine data sets (75%) and displayed for the three largest data sets which contain the tumor grade annotation of its patients (1,5,9; Fig. 3). The five best representative genes were automatically determined separately for each region and each data set as described. For example, the ERBB2 amplicon on chromosome 17q21 is visualized by PPARBP, STARD3, ERBB2, GRB7, PSMD3 and THRAP4. These genes cover a compact region of 1.1 MB, whereas the median extension was estimated to be 2.6 MB. Our findings correlate with a detailed mapping of the amplicon (Kauraniemi et al., 2003). The frequency of patients with up-regulated ERBB2 was
13% in the data sets 5 and 9 and 22% in the data set 1.
|
The chromosomal arm 17q was very prominent in our outcome including two further distinct regions among the top five ranked regions, i.e. 17q11-q12 and 17q24-q25. TRAF4 (MLN62) in 17q11-q12 was described to be amplified in breast cancer independently of ERBB2 (Bièche et al., 1996). However, TRAF4 appeared to be associated with the expression of ERBB2 (Fig. 3A and C). The genes in 17q24-q25 were found to be up-regulated in
11% of breast cancer patients. We identified GRB2 and ARHGIA potentially involved in tumor progression. The activation of ERBB2 by heregulin or its overexpression requires GRB2 to stimulate the Akt pathway to propagate mitogenic signals (Lim et al., 2000). ARHGIA was described as estrogen responsive gene (Ise et al., 2005), and may be associated with cell growth in breast cancer.
The third region 8q24 was mainly associated with MYC as oncogene in breast cancer (Yao et al., 2006). Our result pointed to genes like CYC1, SIAHBP and SCRIB located
17 MB distal to MYC. In particular, SIAHBP (FIR) has been shown to be involved in a complex regulating MYC gene expression (Liu et al., 2006). The fourth region includes MHC class I and II genes (HLA-genes) which were identified to be deregulated in
14% of the breast cancer patients. A total loss of MHC class I genes was described to be an independent indicator of good prognosis in breast cancer (Madjd et al., 2005).
Overall, we identified regions which are well known to be amplified but also others which may harbor novel putative oncogenes.
3.2 Gene expression associates with tumor grade
We assumed that the subgroups of samples having increased gene expression in the identified chromosomal regions associate with tumor grade since disease progression correlates with an accumulation of genetic alterations like DNA amplification. Indeed, the clustering in Figure 3 suggests an association between tumor grade and gene expression profile. Lower grade tumors appeared to align with low gene expression (blue), whereas higher grades correlated with high expression (red).
We used a t-statistic to rank the genes in each of the three largest data sets which contain the tumor grade of its samples (1,5,9). Grade 3 tumors were compared with pooled grade 1 and 2 tumors. The t-statistic prioritizes genes which consistently separate both groups. To this end, we tested if the genes representative for the 32 identified regions (Table 2) were enriched in the ranked gene list. We chose the top 1000 ranked genes in each data set which were up-regulated in grade 3 tumors and used Fisher's; exact test to test for overrepresentation. We obtained P-values < 10-5 when taking the 160 best representative genes (five per region). It should be mentioned that these genes were obtained without any knowledge of the sample tumor grade. In summary, these findings related the result of our meta-analysis to the grade of the tumor and confirmed its validity.
Supervised approaches allow to identify chromosomal regions while utilizing clinical parameter like tumor grade. The sample groups defined by tumor grade 3 versus grade 1 and 2 in three data sets (1,5,9) served as input for the methods LAP and MACAT (Callegaro et al., 2006; Toedling et al., 2005). LAP (MACAT) identified 299 (373), 687 (101) and 100 (136) genes in the data sets 1, 5 and 9, respectively. In contrast to the predefined two-group setting our unsupervised approach resulted in clustered genes significantly expressed between unspecified sample subsets, irrespective of clinical data. Hence, to further validate our results we tested whether the scores of the windows related to the genes identified by LAP and MACAT are more significant than all other scores. To this end, we applied a one-sided Kolmogorov–Smirnov test to compare the scores of the identified genes with the remaining scores. For each of the two windows l = 10, 20 and each of the three data sets the P-value was < 10-11 for MACAT. In case of LAP the P-values were < 10-8 except from the data set 5 which was not significant. Thus, the scores obtained for genes identified by the supervised approaches are in general more significant than the scores of genes not identified, further confirming the validity of our approach. Interestingly, for each of the supervised methods all genes in the intersection of the genes found across at least two of the three data sets mapped either to the region 17q25 or 16q21-q25. Both chromosomal regions are covered in the result of our unsupervised approach (Supplementary Table 2).
| 4 CONCLUSIONS |
|---|
|
|
|---|
We established an approach to identify gene expression patterns in chromosomal regions. We applied the method to 12 independent human breast cancer studies comprising 1422 samples and detected regions which are well known to be amplified as well as novel regions. Our meta-analysis prioritized and characterized these regions (Table 2). We focused on those regions which were commonly found across many microarray studies and which may represent major chromosomal aberrations that are essential for tumorigenesis and disease progression. The meta-analysis overcame potential shortcomings of an individual data set, e.g. biases caused by experimental conditions, specific microarray platforms or histologically preselected biopsies, and unveiled striking concordance and significance of chromosomal regions found across the studies. The result included chromosomal regions like 1q, 8p, 11q, 12q and 17q, which correlated well to common amplification sites in breast tumors, as well as further less described chromosomal regions like 17q24-q25 (GRB2), 8q24 (distal of MYC) or 16p13 (NME4), which were found with a frequency >10%.
Gene expression patterns of consistently highly or lowly expressed genes in distinct chromosomal regions found in single tumor samples may originate from DNA aberrations, mainly amplifications. This analysis did not only highlight aberrant regions but also focused on the transcriptional level, which more directly translates into phenotype, and proposed candidate genes for tumorigenesis. A strong correlation with increased mRNA levels was experimentally observed for highly amplified regions (Heidenblad et al., 2005; Hyman et al., 2002; Pollack et al., 2002). Therefore, we do not expect to detect chromosomal DNA aberrations which only show a weak correlation with gene expression and thus may have no effect on phenotype. When assuming a positive correlation or even a linear relationship between the copy number of a gene and its mRNA expression level, deletions are presumably less likely to be detected since amplifications may occur in large numbers, whereas genes losses affect two alleles at most. Additionally, the signal to noise ratio at the lower range of intensities might be lower, such that it is less likely to find regions with a clear consistent separation in microarray gene expression data. Similarly, any relationship might be blurred at the lower end of measured intensities. These considerations may explain the apparent lack of deleted regions, meaning that the gene expression is consistently decreased in that chromosomal region.
Distinct regional gene expression patterns may also result from specific genetic or epigenetic factors like common transcriptional regulation or DNA methylation. For example, the MHC class I and II genes in 6p21 identified to be deregulated in
14% of the breast tumors were already associated with epigenetic control mechanism. The MHC class I genes were described to be deregulated by hypermethylation in melanoma and squamous cell carcinoma and could be relevant for immune surveillance and immune escape of cancer cells (Fonsatti et al., 2003; Nie et al., 2001).
Data sets with unequal coverage of the genome, combined with variable spatial resolution impose limitations on the analysis. However, an integrative approach based on several independent data sets minimizes such limitations. In addition, expression data generated by whole genome tiling arrays would suit our method. Shortcomings of the microarray technology like cross-hybridizations of DNA belonging to gene families may also induce an artificial ordering structure of the genes across the samples. In particular, if the between sample variation exceeds the within variation, we may detect a falsely positive region. This may hold true solely for the identified regions 16q11-q13 (metallothionein genes) and 6p21-p22 (H2B histone gene family, Supplementary Table 2). However, regarding the metallothionein genes it should be noted that the expression of the isoforms MT-1F and MT-2A have been reported to be associated with higher histological grade in breast cancer (Jin et al., 2004).
Gene expression signatures are about to enter clinical practice for diagnosis and prognosis of disease. A major issue on the way towards their derivation is dimension reduction, an appropriate preselection of genes associated with the disease process. We established an algorithm highlighting heterogeneities in transcriptional activity among the samples. A collection of structurally informative chromosomal regions may help to guide and improve the preselection step. Moreover, the accumulation of genetic alterations during tumor progression can be used for the assessment of tumor status. For the modeling of dependences like their cooccurrence, evolutionary tree models were used to improve survival prognosis (Rahnenführer et al., 2005). Similarly, the stratification imposed by the gene expression in each region identified by our approach can be used to assess and model dependencies between regions for an improved disease prognosis.
The identification of the most relevant genetic events in cancer progression is challenging since the inherent genetic instability of cancer cells results in many, not necessarily relevant, chromosomal changes. Our analysis prioritized chromosomal regions and particularly pointed to ERRB2 on chromosome 17q21 which is an important drug target in breast cancer therapy. Gene expression was shown to associate with tumor grade while supervised approaches further supported the validity of our results. Given the vast amount of publicly available gene expression data, not only other types of cancer are amenable for this analysis, but also data collections of several cancer types can be tackled in a single, comprehensive meta-analysis. Together this represents a fruitful resource which may lead to the identification of clinically relevant features, to improved gene expression signatures and a better understanding of complex diseases.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We like to thank Alexander Mehrle for the database access and support in mapping the probes to their chromosomal location, Michael Stojanov whose request initiated the project and Tim Beißbarth. This research project was supported by the National Genome Research Program (NGFN, grant numbers 01GR0418 and 01GR0459).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Chris Stoeckert
Received on March 30, 2007; revised on June 4, 2007; accepted on June 21, 2007
| REFERENCES |
|---|
|
|
|---|
Bièche I, et al. Two distinct amplified regions at 17q11-q21 involved in human primary breast cancer. Cancer Res (1996) 56:3886–3890.
Callagy G, et al. Identification and validation of prognostic markers in breast cancer with the complementary use of array-CGH and tissue microarrays. J. Pathol (2005) 205:388–396.[CrossRef][Web of Science][Medline]
Callegaro A, et al. A locally adaptive statistical procedure (lap) to identify differentially expressed chromosomal regions. Bioinformatics (2006) 22:2658–2666.
Caron H, et al. The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science (2001) 291:1289–1292.
Chang HY, et al. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc .Natl Acad. Sci. USA (2005) 102:3738–3743.
Check E. Big money for cancer genomics. Nature (2005) 438:894.[CrossRef][Medline]
Chin K, et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell (2006) 10:529–541.[CrossRef][Web of Science][Medline]
Courjal F, Theillet C. Comparative genomic hybridization analysis of breast tumors with predetermined profiles of DNA amplification. Cancer Res (1997) 57:4368–4377.
Crawley JJ, Furge KA. Identification of frequent cytogenetic aberrations in hepatocellular carcinoma using gene-expression microarray data. Genome Biol (2002) 3. RESEARCH0075.1–0075.8, http://genomebiology.com/2002/3/12/research/0075.
Farmer P, et al. Identification of molecular apocrine breast tumours by microarray analysis. Oncogene (2005) 24:4660–4671.[CrossRef][Web of Science][Medline]
Fonsatti E, et al. Methylation-regulated expression of HLA class I antigens in melanoma. Int. J. Cancer (2003) 105:430–431. author reply 432–433.[CrossRef][Web of Science][Medline]
Gentleman RC, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol (2004) 5:R80.[CrossRef][Medline]
Heidenblad M, et al. Microarray analyses reveal strong influence of DNA copy number alterations on the transcriptional patterns in pancreatic cancer: implications for the interpretation of genomic amplifications. Oncogene (2005) 24:1794–1801.[CrossRef][Web of Science][Medline]
Huang E, et al. Gene expression predictors of breast cancer outcomes. Lancet (2003) 361:1590–6.[CrossRef][Web of Science][Medline]
Hyman E, et al. Impact of DNA amplification on gene expression patterns in breast cancer. Cancer Res (2002) 62:6240–6245.
Ihaka R, Gentleman R. A language for data analysis and graphics. J. Comput. Graph. Stat (1996) 5:299–314.[CrossRef]
Ise R, et al. Expression profiling of the estrogen responsive genes in response to phytoestrogens using a customized DNA microarray. FEBS Lett (2005) 579:1732–1740.[CrossRef][Web of Science][Medline]
Jain AN, et al. Quantitative analysis of chromosomal CGH in human breast tumors associates copy number abnormalities with p53 status and patient survival. Proc. Natl Acad. Sci. USA (2001) 98:7952–7957.
Jin R, et al. Clinicopathological significance of metallothioneins in breast cancer. Pathol. Oncol. Res (2004) 10:74–79.[Web of Science][Medline]
Kauraniemi P, et al. New amplified and highly expressed genes discovered in the ERBB2 amplicon in breast cancer by cDNA microarrays. Cancer Res (2001) 61:8235–8240.
Kauraniemi P, et al. Amplification of a 280-kilobase core region at the ERBB2 locus leads to activation of two hypothetical proteins in breast cancer. Am. J. Pathol (2003) 163:1979–1984.
Lahiri SN. Resampling Methods for Dependent Data. (2003) Berlin: Springer.
Levin AM, et al. A model-based scan statistic for identifying extreme chromosomal regions of gene expression in human tumors. Bioinformatics (2005) 21:2867–2874.
Lim SJ, et al. Grb2 downregulation leads to Akt inactivation in heregulin-stimulated and ErbB2-overexpressing breast cancer cells. Oncogene (2000) 19:6271–6276.[CrossRef][Web of Science][Medline]
Liu J, et al. The FUSE/FBP/FIR/TFIIH system is a molecular machine programming a pulse of c-myc expression. EMBO J (2006) 25:2119–2130.[CrossRef][Web of Science][Medline]
Ma X-J, et al. A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen. Cancer Cell (2004) 5:607–616.[CrossRef][Web of Science][Medline]
Madjd Z, et al. Total loss of MHC class I is an independent indicator of good prognosis in breast cancer. Int. J. Cancer (2005) 117:248–255.[CrossRef][Web of Science][Medline]
Miller LD, et al. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc. Natl Acad. Sci. USA (2005) 102:13550–13555.
Myers CL, et al. Accurate detection of aneuploidies in array cgh and gene expression microarray data. Bioinformatics (2004) 20:3533–3543.
Nessling M, et al. Candidate genes in breast cancer revealed by microarray-based comparative genomic hybridization of archived tissue. Cancer Res (2005) 65:439–447.
Nie Y, et al. DNA hypermethylation is a mechanism for loss of expression of the HLA class I genes in human esophageal squamous cell carcinomas. Carcinogenesis (2001) 22:1615–1623.
Pollack JR, et al. Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc. Natl Acad. Sci. USA (2002) 99:12963–12968.
Prelic A, et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics (2006) 22:1122–1129.
Rahnenführer J, et al. Estimating cancer survival and clinical outcome based on genetic tumor progression scores. Bioinformatics (2005) 21:2438–2446.
Rennstam K, et al. Patterns of chromosomal imbalances defines subgroups of breast cancer with distinct clinical features and prognosis. A study of 305 tumors by comparative genomic hybridization. Cancer Res (2003) 63:8861–8868.
Rhodes DR, et al. Mining for regulatory programs in the cancer transcriptome. Nat. Genet (2005) 37:579–583.[CrossRef][Web of Science][Medline]
Sorlie T, et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc. Natl. Acad. Sci. USA (2003) 100:8418–8423.
Sotiriou C, et al. Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proc. Natl Acad. Sci. USA (2003) 100:10393–10398.
Staub E, et al. A genome-wide map of aberrantly expressed chromosomal islands in colorectal cancer. Mol. Cancer (2006) 5:37.[CrossRef][Medline]
Tanay A, et al. Discovering statistically significant biclusters in gene expression data. Bioinformatics (2002) 18(Suppl. 1):S136–S144.[Abstract]
Tibshirani R, Hastie T. Outlier sums for differential gene expression analysis. Biostatistics (2007) 8:2–8.
Toedling J, et al. Macat–microarray chromosome analysis tool. Bioinformatics (2005) 21:2112–2113.
van't Veer LJ, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature (2002) 415:530–536.[CrossRef][Medline]
Wang Y, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet (2005) 365:671–679.[Web of Science][Medline]
West M, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl Acad. Sci. USA (2001) 98:11462–11467.
Yao J, et al. Combined cDNA array comparative genomic hybridization and serial analysis of gene expression analysis of breast tumor progression. Cancer Res (2006) 66:4065–4078.
Zhao H, et al. Different gene expression patterns in invasive lobular and ductal carcinomas of the breast. Mol. Biol. Cell (2004) 15:2523–2536.
Zhou Y, et al. Genome-wide identification of chromosomal regions of increased tumor expression by transcriptome analysis. Cancer Res (2003) 63:5781–5784.
This article has been cited by other articles:
![]() |
Y. Y. Shevelyov, S. A. Lavrov, L. M. Mikhaylova, I. D. Nurminsky, R. J. Kulathinal, K. S. Egorova, Y. M. Rozovsky, and D. I. Nurminsky The B-type lamin is required for somatic repression of testis-specific gene clusters PNAS, March 3, 2009; 106(9): 3282 - 3287. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ghoussaini, H. Song, T. Koessler, A. A. Al Olama, Z. Kote-Jarai, K. E. Driver, K. A. Pooley, S. J. Ramus, S. K. Kjaer, E. Hogdall, et al. Multiple Loci With Different Cancer Specificities Within the 8q24 Gene Desert J Natl Cancer Inst, July 2, 2008; 100(13): 962 - 966. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. De Preter, R. Barriot, F. Speleman, J. Vandesompele, and Y. Moreau Positional gene enrichment analysis of gene sets for high-resolution identification of overrepresented chromosomal regions Nucleic Acids Res., April 1, 2008; 36(7): e43 - e43. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




