Bioinformatics Advance Access originally published online on February 5, 2008
Bioinformatics 2008 24(6):768-774; doi:10.1093/bioinformatics/btn048
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ITALICS: an algorithm for normalization and DNA copy number calling for Affymetrix SNP arrays


1Institut Curie, Service de Bioinformatique, 2INSERM, U900, 3CNRS UMR144, 4Institut Curie, Translational Research Department, 26 rue dUlm, Paris F-75248 and 5Ecole des Mines de Paris, ParisTech, Fontainebleau, F-77300 France
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Affymetrix SNP arrays can be used to determine the DNA copy number measurement of 11 000–500 000 SNPs along the genome. Their high density facilitates the precise localization of genomic alterations and makes them a powerful tool for studies of cancers and copy number polymorphism. Like other microarray technologies it is influenced by non-relevant sources of variation, requiring correction. Moreover, the amplitude of variation induced by non-relevant effects is similar or greater than the biologically relevant effect (i.e. true copy number), making it difficult to estimate non-relevant effects accurately without including the biologically relevant effect.
Results: We addressed this problem by developing ITALICS, a normalization method that estimates both biological and non-relevant effects in an alternate, iterative manner, accurately eliminating irrelevant effects. We compared our normalization method with other existing and available methods, and found that ITALICS outperformed these methods for several in-house datasets and one public dataset. These results were validated biologically by quantitative PCR.
Availability: The R package ITALICS (ITerative and Alternative normaLIzation and Copy number calling for affymetrix Snp arrays) has been submitted to Bioconductor.
Contact: italics{at}curie.fr
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
The development of high-throughput technologies, and of microarrays in particular, has made it possible to analyze DNA copy number throughout the entire genome, with ever-increasing resolution. Various techniques for detecting DNA copy number alterations are available (for a review, see Ylstra et al., 2006). Affymetrix SNP arrays, such as the Affymetrix GeneChip Human Mapping 100K Set (Kennedy et al., 2003), seem to be one of the most widely used tools. These chips can be used for simultaneous genotyping and copy number determination for single nucleotide polymorphism (SNP), at high resolution. This technology has various uses, including studies of copy number variations in populations and the identification of genomic alterations in developmental genetics or cancer (for a review, see Pinkel and Albertson, 2005). In cancer studies, Affymetrix SNP arrays provide new insight into the mechanisms of tumor progression; they can be used to pinpoint new candidate genes for tumor-suppressor genes (Liu et al., 2007) and oncogenes (thought to be present in loss and gain regions, respectively), and to classify tumors, improving diagnosis for new patients and the evaluation of prognosis.
Like all microarrays, Affymetrix SNP arrays are affected by systematic non-relevant sources of experimental variation. For accurate extraction of the biologically relevant effect (i.e. the true DNA copy number of each SNP in the genome, corresponding to the biological signal), the raw data must be corrected, taking these different effects into account. We present here a normalization algorithm for this purpose, which can be used for the simultaneous correction of different sources of experimental variation and biological signal estimation when trying to infer DNA copy number.
Several methods have already been developed for correcting non-relevant sources of variation. These methods include CNAG (Nannya et al., 2005), GIM (Komura et al., 2006) and CARAT (Huang et al., 2006). However, none of these methods take into account that the range of variation due to the non-relevant effects is similar or higher than the biologically relevant effect. Therefore, the impacts of the biologically relevant effect and non-relevant effects may easily be confused. Correct estimation of the non-relevant effects also depends on the correct estimation of copy number. We therefore propose an alternative, iterative method for estimating the biologically relevant effect and non-relevant effects, to improve biological signal estimation. We will begin by briefly presenting Affymetrix SNP arrays. We will then describe our algorithm (ITerative and Alternative normaLIzation and Copy number calling for affymetrix Snp arrays: ITALICS) for data normalization in detail. We then discuss the results obtained with this algorithm, comparing them with those obtained with other algorithms. Finally, we discuss the advantages of ITALICS and possible improvements to this method.
| 2 MATERIALS AND METHODS |
|---|
|
|
|---|
2.1 Affymetrix SNP arrays
Technology: Affymetrix SNP arrays can be used to detect DNA copy number alterations at a resolution of 6–210 kb, using around 11 000–500 000 human SNPs. The Affymetrix GeneChip Human Mapping 100K and 500K Sets are comprised of two arrays. Each array is based on specific restriction enzymes: XbaI and HindIII for the 100K set and StyI and NspI for the 500K set. The Affymetrix 50K XbaI and HindIII arrays contain no common SNPs and their combination provides the DNA copy numbers of more than 115 000 SNPs.
Each allele of each SNP is represented by ni perfect match (PM) probes and ni mismatch (MM) probes. Reverse or forward probes may be used and these probes may be centered on the SNP position or offset by –4 to +4 base pairs. Thus, all the PM probes of an SNP allele have different DNA sequences. Probes are grouped into probe quartets of four probes: one PM and one MM probe for each of alleles A and B. All four probes have the same orientation and offset.
The Affymetrix SNP arrays assay is carried out as follows. Genomic DNA is digested with a restriction endonuclease. Adaptors are ligated to all fragments. These fragments are amplified by PCR and then fragmented, labeled with biotin and hybridized with the chip. The chip is then washed and scanned to generate the cell intensity file (.CEL) which is used as input to the proposed algorithm.
Hereafter, the raw signal Yi. of a given SNP i is given by:
|
|
Non-relevant sources of variation: ITALICS deals with known systematic sources of variation, such as the GC-content of the QuartetsPM (QGCij), the length of the PCR-amplified fragment (FLi) and the GC-content of the fragment amplified by PCR (FGCi) (Nannya et al., 2005; Komura et al., 2006). It also takes into account the QuartetPM effect (Qij), resulting from the systematically low intensity of some QuartetsPM and the systematically high intensity of others.
We also found that some Affymetrix SNP arrays suffer from spatial artifacts, as reported by Neuvial et al. (2006) for CGH array data. A spatial artifact is illustrated in Figure 1A: neighboring QuartetsPM on the chip present abnormal intensities. The corresponding SNPs which appear as outliers in the genomic profile, as shown in Figure 1C, D and E, and should be removed. We have addressed this issue using a filtering criterion, making it possible to discard bad probes, as described subsequently.
|
2.2 The ITALICS algorithm
Overview: In Affymetrix SNP arrays, non-relevant sources of variation (NonRelij) have comparable or greater influence on the raw signal variability than the biological signal (CopyNbi) (see Section 3.2 to compare the type III sum of squares of the different effects in a multiple linear model). We therefore propose an iterative, alternative normalization method, making it possible to estimate the biological signal and non-relevant effects and, therefore, to eliminate most of the non-relevant effects while preserving most of the biological information. During each iteration, ITALICS:
- Estimates the biological signal CopyNbi using the GLAD algorithm (Hupé et al., 2004),
- Assuming the biological signal to be known, it estimates the non-relevant effects NonRelij on raw data, by multiple linear regression.
After the last iteration, the QuartetsPM for which multiple linear regression predicts the signal poorly are flagged. They correspond to QuartetsPM with abnormal values and are excluded from the final step, in which ITALICS uses GLAD to estimate the biological effect CopyNbi on the remaining normalized QuartetsPM. The algorithm is presented in more detail below.
Biological signal estimation (CopyNb_step): ITALICS applies the GLAD algorithm to Yi. values to estimate the biological signal. The GLAD algorithm segments the genomic profile, defining regions of homogeneous DNA copy number. For each of these regions, it provides a smoothing value and a status (gain, normal or loss). The smoothing value is the median of the Yi. values within the region concerned, and corresponds to the inferred copy number CopyNbi.
Non-relevant effect estimation (NonRel_step): After estimating the biological effect CopyNbi, ITALICS infers the non-relevant effects by multiple linear regression. The model used is as follows:
|
|
|
|
The multiple linear regression can also be expressed in classical matrix notation:
|
|
|
|
The parameter
is estimated using the ordinary least-squares method. The degrees of the polynomial functions Pk were chosen using the BIC criterion (Schwarz, 1978) on a training data set of 128 reference diploid chips (Matsuzaki et al., 2004).
The QuartetPM effect is dealt with by calculating Qij as the mean of each QuartetPM on the 64 female chips of the same Affymetrix reference data set (Matsuzaki et al., 2004).
Once the non-relevant effects have been estimated, the Yij values are corrected as follows:
|
|
ITALICS uses GLAD and therefore we investigate if the normalization was influenced by the choice of GLAD parameters. In Supplementary information, we give guidelines for choosing parameters and expose the result of sensitivity analysis that shows a large robustness of ITALICS to parameter settings.
Elimination of poorly predicted QuartetsPM: After the last iteration, QuartetsPM Yij poorly predicted by multiple linear regression are flagged out. This is achieved by calculating the 95% prediction interval. All Yij outside this interval are flagged. SNPs with less than three non-flagged QuartetsPM in a total of ni are then discarded. If more than three Yij are not flagged,
is recalculated as:
|
|
Data scaling: The data are scaled to allow between-chip comparison. After the first GLAD step, the biological signal is subtracted and the standard deviation s of (Yi. – CopyNbi) is calculated for each chip using all SNPs i of the chip. The data are then scaled as follows:
|
|
The ITALICS procedure is summarized in Table 1.
|
2.3 Comparison with other methods
Other methods: Several other methods have already been developed. Most use linear regression to estimate and correct for non-relevant effects. They differ in the effects taken into account and in their pre- and post-processing steps.
CNAG: Copy Number Analysis for GeneChip (Nannya et al., 2005). CNAG corrects the raw signal intensity of a sample, by introducing the notion of averaged best fit, corresponding to a pseudochip constructed from the five samples most similar to the reference samples. CNAG subtracts this averaged best fit from the raw signal and then corrects for the length of the PCR-amplified fragment and GC-content effects by linear regression. This method is available within CNAG 2.0 and is also used in CNAT 4.0 (Copy Number Analysis Tool, see below).
CNAT 3.0: Chromosome Copy Number Analysis Tool 3.0. Affymetrix developed this method for the extraction of DNA copy number. No specific step for the correction of non-relevant effects is included. This method uses samples with varying chromosome X copy number for intensity calibration and transforms SNP intensity into copy number values.
CNAT 4.0: Chromosome Copy Number Analysis Tool 4.0. This tool uses CNAG to normalize the data and then smoothes the data with a user-defined window. This step artificially reduces the variance of the data and visibly improves the quality of the profile.
CARAT: Copy Number Analysis with Regression And Tree (Huang et al., 2006). CARAT uses a reference data set to select probes showing a high-allelic response and to remove those with no such response. For each new sample, it first standardizes the probe signal, based on mismatch probe information. It then corrects for probe GC-content and PCR fragment length effects, by linear regression. Finally, each SNP intensity is regressed against the average intensity of the reference samples with the same genotype.
GIM: Genomic Imbalance Map (Komura et al., 2006). GIM roughly estimates the biological effect and subtracts it from the raw signal, using a simpler version of ChARM (Myers et al., 2004). It removes defective probes with a high local GC-content and then re-estimates the biological effect without using the defective probes and subtracts this effect from the raw signal. It takes into account probe GC-content, the length of the PCR-amplified fragment and its GC-content, and mean SNP intensity for the reference dataset, by linear regression. GIM is implemented in Matlab and is freely available.
We compared ITALICS with CNAG, CNAT 3.0 and GIM. We did not compare ITALICS with CARAT, because no software was available for CARAT at the time of the study, or with CNAT 4.0, which presents no improvement over CNAG. For the CNAG, CNAT 3.0 and GIM genomic profiles, copy number and the status of the genomic regions were inferred with the GLAD algorithm, using the same parameters as for the ITALICS algorithm.
Quality criteria: As described by Neuvial et al. (2006), we used several quality criteria to compare the various normalization algorithms.
As defined by Neuvial et al. (2006), the dyn criterion estimates the dynamics of the DNA copy number signal. Its value is:
|
|
The criterion out is the number of outliers detected by GLAD. GLAD defines regions of homogeneous DNA copy number and outliers are SNPs with values different from those of other SNPs in the same region. These abnormal values may be accounted for by point mutations in the genome. However, a large number of such changes is unlikely, so the total number of outliers should be relatively low and the out parameter close to zero.
The criterion flag is the number of flagged SNPs. We introduced this criterion for the comparison of methods that remove SNPs, such as GIM and ITALICS. These methods may artificially improve the quality of the signal (as measured by dyn and out), by removing SNPs with abnormal behavior. The number of flagged SNPs should, therefore, not be too high. When faced with a choice between two methods with equal SNR, the method with the lowest flag should be preferred.
Comparison of two normalization methods: These three criteria can be used to determine which of the two normalization methods gives the best results for a given array. In this pairwise comparison context, dyn must be calculated with the same definition of gain, normal and loss regions for both normalized arrays. We therefore define consensus gain, normal and loss regions associated with an array processed with two different normalization methods, as the intersection of the two corresponding gain, normal and loss regions obtained with the two different normalization methods [see also Neuvial et al. (2006) for details].
For the comparison of two different methods, a and b, in terms of a certain criterion, we calculate relative performances as follows:
|
|
2.4 Datasets
We carried out our study on two public datasets: a dataset for 128 reference diploid chips (Matsuzaki et al., 2004) and a glioma dataset corresponding to 356 chips (Kotliarov et al., 2006). We also used datasets produced by the Affymetrix platform of the Institut Curie obtained with 22 uveal melanoma samples, 40 ovarian cancer samples and 26 breast cancer samples.
| 3 RESULTS |
|---|
|
|
|---|
3.1 Choosing the number of iterations
We assessed the extent to which each iteration within the ITALICS algorithm improved the SNR, by calculating the dyn criteria for different values of itermax (0, 1, 2, 3 and 5) for each chip of the 356-glioma chips dataset. The percentage improvement RPdyn for different values of itermax (1, 2, 3 and 5) with respect to no iteration was then calculated (Fig. 2). One iteration gave 53.8% improvement, two gave 56.1% improvement and three and five gave 56.3% improvement. As the third and subsequent iterations gave only a very slight improvement, we set itermax to two in the ITALICS algorithm.
|
3.2 Importance of each effect on the signal
For each chip of the glioma dataset, we calculated the type III sum of squares for each effect in our multiple linear regression model. A low type III sum of squares indicates that the difference between the full model and the model excluding the studied effect is very small. The QuartetsPM effect gave the highest type III sum of squares, with a mean of 550 x 103 versus 10.4 x 103, 16 x 103 and 14 x 103 for QuartetsPM GC-content, fragment length and fragment GC-content. The biological effect was the second most important effect, with a mean of 24 x 103.
3.3 ITALICS outperformed the other methods
We calculated dyn and out with ITALICS, GIM, CNAT 3.0 and CNAG, using three different cancer datasets: two in-house datasets corresponding to 22 choroidal melanoma chips and 40 ovarian cancer chips and one public data set of 356 glioma chips. All methods were used with their default parameters.
We calculated the percentage improvement (RP) for CNAT 3.0, CNAG and GIM, in terms of dyn and out, with respect to ITALICS (Fig. 3). For the three competitors RPcri(competitor,ITALICS) is calculated and we performed t-tests to assess the significance of the improvement. We found that ITALICS outperformed CNAT 3.0, CNAG and GIM, in terms of dyn and out, with t-test P-values below 10–5 for all three data sets. For GIM, RPdyn ranged from –10.9% to –6.5%, for CNAG, it ranged from –23.9% to –16.0% and for CNAT 3.0 it ranged from –33.4% to –26.0%. RPout ranged from –98.1% to –89.0% for all three methods. Chip data normalized with ITALICS therefore had a significantly better SNR than those normalized with CNAT, CNAG and GIM, with fewer outliers.
|
Both ITALICS and GIM flag certain SNPs for elimination. The improvement in SNR obtained with these methods may therefore be partially due to the mechanical effect of this removal. We compared the number of SNPs flagged between GIM and ITALICS and found that ITALICS flagged significantly fewer SNPs than GIM, with a mean of 300 SNPs per chip for ITALICS versus 3000 for GIM. The RPflag(GIM,ITALICS) is –90%.
3.4 Spatial artifact correction
Some Affymetrix SNP arrays suffer from spatial artifacts. The step flagging poorly predicted QuartetsPM removes most QuartetsPM with abnormal intensity detected by visual inspection, as shown in Figures 1A and B. To our knowledge, ITALICS is the only method capable of doing this. Moreover, the removal of these abnormal QuartetsPM increases the quality of the signal, by removing many outliers from the genomic profile: 1661, 1818 and 2331 outliers were detected for CNAT 3.0, CNAG and GIM (Figure 1C, D and E). With ITALICS, there were only 88 outliers (Figure 1F), but only 13 of the 56 000 SNPs were removed because they had less than three non-flagged QuartetsPM.
3.5 Biological validation
Quantitative PCR validation: We used QPCR (see Supplementary Material for more detail) to validate our method with a different technology. As a test case, we used a set of paired breast cancer samples (primary tumor and relapse, Bollet et al. 2008) and tried to identify a breakpoint in chromosome 20. We compared the results obtained with QPCR with those obtained with ITALICS, CNAG, GIM and CNAT, for the XbaI and HindIII arrays. We also carried out QPCR on two breast cancer tumors, each with a normal chromosome 20 (white and striped bars in Fig. 4) to assess noise for QPCR and to validate the significance of copy number change. As shown in Figure 4, ITALICS was more accurate than CNAG, GIM and CNAT 3.0 for comparisons of copy numbers, based on the estimates obtained with PCR. ITALICS, CNAG, GIM and CNAT 3.0 detected changes in copy number in this region of chromosome 20. However, ITALICS breakpoints were closer to QPCR breakpoints than CNAT breakpoints (see Fig. 4A, C and D) and CNAG and GIM breakpoints (see Figure 4A). In Figure 4A, QPCR and ITALICS breakpoints are found at identical positions (between P14 and P15). In Figure 4C and D, CNAG, GIM and ITALICS detect a copy number change between P12 and P13, close to that detected by QPCR between P14 and P15, whereas CNAT detects this breakpoint further away, between P06 and P07 in Figure 4C and between P08 and P09 in Figure 4D. In Figure 4B, QPCR, CNAT, GIM, CNAG and ITALICS found the same breakpoint.
|
Patients with breast cancer relapses: The problem tackled was determining whether the second cancer was a true recurrence of the first cancer or a new primary tumor, based on the two Affymetrix SNP array profiles (Bollet et al., 2008). We tried to identify common breakpoints between the cancer chips for the two tumors. The breakpoints detected with CNAT 3.0 or ITALICS normalization are represented in Figure 5A and B for chromosome 6 and 9, respectively, for one patient. GIM and CNAG results are similar to ITALICS for chromosome 6 and similar to CNAT for chromosome 9 (data not shown). ITALICS identified breakpoints at identical locations for both cancers and this is true for the two chromosomes presented in Figure 5A and B. It is important to notice that this was not possible with CNAT 3.0, CNAG and GIM. The precise match between the breakpoints mapped in the two cancers with ITALICS suggests that the second cancer is a true recurrence, whereas the opposite conclusion would have been drawn with CNAT 3.0. As CNAG and GIM detect less precise matches, they lead to the same conclusion as ITALICS, but the evidences for this conclusion are weaker. Expert assessment based on clinical data also indicated that this was a true recurrence, and was therefore consistent with the results obtained with ITALICS. Similar conclusions were drawn for the rest of the data set (13 first and second cancer pairs). Thus, ITALICS improves the classification of true recurrences and new primary tumors.
|
| 4 DISCUSSION AND PERSPECTIVES |
|---|
|
|
|---|
We present here a new method for normalizing Affymetrix SNP arrays: ITALICS. This method is highly efficient and outperforms other normalization methods, such as CNAT 3.0, CNAG and GIM, in terms of SNR, giving a more accurate localization of breakpoints validated by QPCR. This improvement may be due to various features of the ITALICS algorithm. This algorithm estimates alternatively and iteratively both non-relevant and biologically relevant effects. The correct estimation of relevant effects depends on correct estimation of the biological signal and vice versa, as the relevant effects induce similar or higher ranges of variation than the biologically relevant effect. By estimating both the non-relevant and biologically relevant effects in an iterative manner, we avoid overestimation of the non-relevant effects and a loss of biological signal. The first estimation on raw data is necessarily rough, but improves the subsequent estimation of non-relevant effects. Each new estimation of the biological or non-relevant effects leads to a better estimation of the other effects. In practice we iterate our algorithm twice, as additional iterations were found to lead to no significant improvement in the SNR. This algorithm also includes a flagging step, making it possible to remove aberrant SNPs. Indeed, some PM intensity values are subject to spatial artifacts. The PM intensity of their QuartetsPM is therefore abnormal, poorly predicted by the regression model and flagged. The discarding of poorly predicted QuartetsPM does not necessarily lead to the discarding of the corresponding SNP, provided that enough QuartetsPM remain elsewhere on the chip. As a result, very few SNPs are removed from the final genomic profile. This filtering step detects spatial artifacts only indirectly, but nevertheless gives good results in practice. Methods for the precise detection of spatial artifacts and the removal of all probes within spatial artifacts have already been developed (Neuvial et al., 2006). However, their direct application to SNP chips is impossible due to the very high density of these chips (more than 2 million probes per chip). Computing QuartetsPM effect on an in-house reference dataset would certainly improve the quality of the normalization. Nevertheless, the QuartetsPM effect is the most important effect and ignoring it would decrease the efficiency of the normalization.
We normalized XbaI and HindIII chips separately. The same major changes were detected with both chips. However, it is difficult to merge XbaI and HindIII data due to the difference in signal amplitude for consecutive alterations between the two chips. The merging of the XbaI and HindIII genomic profiles would result in a higher resolution profile, but also in a lower SNR. The ITALICS algorithm could be improved by taking into account the enzyme effect (XbaI and HindIII) to overcome this problem.
Technically, the ITALICS algorithm could be applied to higher density chips, such as the Affymetrix GeneChip Human Mapping 500K Set and even the Genome Wide SNP array 5.0 and 6.0, which do not have MM probes, as ITALICS is based solely on PM probes. Of course, we would have to check whether the non-relevant effects in our model are also observed with these higher density chips. We would also need to obtain a reference dataset for calculating the quartet effect.
| 5 CONCLUSION |
|---|
|
|
|---|
We developed ITALICS, a new normalization algorithm for Affymetrix SNP arrays. This method was designed for the normalization and analysis of DNA copy number and significantly outperformed other methods, such as CNAT 3.0, CNAT 4.0, CNAG and GIM, in terms of SNR and can also be used to correct for experimental artifacts due to spatial effects. This method was validated by QPCR and accurately detected the breakpoints in genomic profiles. It could therefore be used to improve the characterization of samples in genomic studies.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
This work was supported by the Institut Curie and the Centre National de la Recherche Scientifique. We thank Sophie Piperno-Neumann and Simon Saule, Jean-Paul Thiery and Marc Bollet, who were kind enough to provide us with access to their uveal melanoma, ovarian cancer and breast cancer datasets, respectively. We thank Marc Bollet, Nicolas Servant and Pierre Neuvial for fruitful discussions. We thank Audrey Rapinat and David Gentien for performing the Affymetrix Genechip experiments.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Chris Stoeckert
The authors wish to be known that, in their opinion, the first two authors should be regarded as joint First Authors. ![]()
Received on August 21, 2007; revised on January 29, 2008; accepted on January 29, 2008
| REFERENCES |
|---|
|
|
|---|
Bollet M, et al. High resolution mapping of breakpoints to define true recurrences among ipsilateral breast tumor recurrences. J. Natl Cancer Inst (2008) 100:48–58.
Huang J, et al. CARAT: a novel method for allelic detection of DNA copy number changes using high density oligonucleotide arrays. BMC Bioinformatics (2006) 7:83.[CrossRef][Medline]
Hupé P, et al. Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics (2004) 20:3413–3422.
Kennedy GC, et al. Large-scale genotyping of complex DNA. Nat. Biotechnol (2003) 21:1233–1237.[CrossRef][Web of Science][Medline]
Komura D, et al. Noise reduction from genotyping microarrays using probe level information. In Silico Biol (2006) 6:79–92.[Medline]
Kotliarov Y, et al. High resolution global genomic survey of 178 gliomas reveals novel regions of copy number alteration and allelic imbalances. Cancer Res (2006) 66:9428–9436.
La Rosa P, et al. VAMP: visualization and analysis of array- CGH, transcriptome and other molecular profiles. Bioinformatics (2006) 22:2066–2073.
Liu W, et al. Deletion of a small consensus region at 6q15, including the MAP3K7 gene, is significantly associated with high-grade prostate cancers. Clin. Cancer Res (2007) 13:5028–5033.
Matsuzaki H, et al. Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat. Methods (2004) 1:109–111.[CrossRef][Web of Science][Medline]
Myers CL, et al. Accurate detection of aneuploidies in array CGH and gene expression microarray data. Bioinformatics (2004) 20:3533–3543.
Nannya Y, et al. A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Res (2005) 65:6071–6079.
Neuvial P, et al. Spatial normalization of array-CGH data. BMC Bioinformatics (2006) 7:264.[CrossRef][Medline]
Pinkel D, Albertson DG. Comparative genomic hybridization. Annu Rev. Genomics Hum. Genet (2005) 6:331–354.[CrossRef][Web of Science][Medline]
Schwarz G. Estimating the dimension of a model. Ann. Stat (1978) 6:461–464.[CrossRef]
Ylstra B, et al. BAC to the future! or oligonucleotides: a perspective for micro array comparative genomic hybridization (array CGH). Nucl. Acids Res (2006) 34:445–450.
This article has been cited by other articles:
![]() |
R. Pique-Regi, A. Ortega, and S. Asgharzadeh Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA Bioinformatics, May 15, 2009; 25(10): 1223 - 1230. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







