Skip Navigation


Bioinformatics Advance Access originally published online on December 16, 2007
Bioinformatics 2008 24(3):433-434; doi:10.1093/bioinformatics/btm614
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/3/433    most recent
btm614v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Alberts, R.
Right arrow Articles by Jansen, R. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Alberts, R.
Right arrow Articles by Jansen, R. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

affyGG: computational protocols for genetical genomics with Affymetrix arrays

Rudi Alberts , Gonzalo Vera and Ritsert C. Jansen *

Groningen Bioinformatics Centre, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Haren, The Netherlands

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROTOCOL
 REFERENCES
 

Motivation: Affymetrix arrays use multiple probes per gene to measure mRNA abundances. Standard software takes averages over probes. Important information may be lost if polymorphisms in the mRNA affect the hybridization of individual probes.

Results: We present custom software to analyze genetical genomics experiments in human, mouse and other organisms: (i) an R package providing functions for QTL analysis at the individual probe level and (ii) Perl scripts providing custom tracks in the UCSC Genome Browser to check for sequence polymorphisms in probe regions.

Availability: http://gbic.biol.rug.nl/supplementary

Contact: r.c.jansen{at}rug.nl


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROTOCOL
 REFERENCES
 
In genetical genomics (Jansen and Nap, 2001), gene expression profiles of genetically different individuals are combined with molecular markers in the DNA to reveal expression quantitative trait loci (QTL). Especially the Affymetrix technology, which is a popular platform for profiling gene expression, poses many challenges in data analysis since it uses multiple 25 mer probes per gene to measure its mRNA abundance. Alberts et al. (2005) proposed a statistical multi-probe model: log(y ij) = m + Pj + Gi + PGij + ei + eij, where yij is the signal of the jth probe of the ith individual, Pj is the probe effect and Gi is the genotype effect at the marker under study. PGij is the probe-specific genotype effect, which may be caused by (unassayed) single nucleotide polymorphisms (SNPs) in probe regions leading to a difference in hybridization, and not in gene expression (Alberts et al., 2007a). The model is computed at all marker positions to find the position (QTL) with the most significant Gi; the significance of the corresponding PGij quantifies the probe specificity of the QTL (QTL x probe). Using parametric bootstrap the significance is calculated from data drawn from the no-QTL model log(yij) = m + ei + eij. See Alberts et al. (2005) and Alberts et al. (2007a) for applications in human (association analysis of cell lines) and mouse (linkage analysis of recombinant inbred lines). The model is also applicable to backcross, intercross and other experimental designs (optionally with additive QTL effects). The model can also include a batch factor to remove unwanted batch effects.


    2 PROTOCOL
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROTOCOL
 REFERENCES
 
Figure 1 shows the workflow from the raw Affymetrix data in the form of .CEL files to QTL visualization. In the pre-processing part, the .CEL files are background corrected and normalized using the RMA method (Irizarry et al., 2003). In the processing part, QTL analysis is performed as described in Alberts et al. (2005), and deviating probes are detected using a procedure for backward elimination as described in Alberts et al. (2007a). Finally, a custom track in the UCSC Genome Browser is created, visualizing individual Affymetrix probes and known sequence polymorphisms (SNPs, splicing variants) in probe regions. The affyGG software accepts missing marker genotypes, but excludes them from the QTL analysis. In crosses, most likely genotypes can be imputed in advance using R/QTL (www.rqtl.org/manual/html/argmax.geno.html).


Figure 1
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Pre-processing and processing of Affymetrix array data. Squared boxes are R functions, rounded boxes are data.

 
2.1 Pre-processing
  1. Create and load in R a comma separated values (CSV) file containing the genotype data (e.g. genotypes.csv), with format:{molecular marker names} x {individuals}, where the cells contain the genotype labels (e.g. 1 or 2; AA, AC or CC; U for missing data).> genotypes <- read.csv(’genotypes.csv’)
  2. Create and load in R a CSV file (e.g. markerpositions.csv) containing the positions of the markers, with format:{molecular marker names} x {chromosome number (chr), position in basepairs (bp)}> markersPos <- read.csv(’markerpositions.csv’)
  3. Create a vector containing the names of the .CEL files to be used, and specify the directory where the .CEL files are located:> celfiles <- c("bxd1a.cel", "bxd2a.cel", "bxd5a.cel", "bxd6a.cel", ...)> directory <- "C: /myproject/celfiles"
  4. Run the pre-processing function by typing:> probesignals <- rma.preprocessing(directory, celfiles, filename = "probesignals.csv")Custom CDF files, as provided by Alberts et al. (2007b), can be used as follows:>probesignals <- rma.preprocessing(directory, celfiles, filename = "probesignals.csv", cdffile = cdffile, cdfpackage = cdfpackage)

2.2 QTL analysis

(5) Specify in which batch each individual was processed:>batch <- c(2, 2, 2, 1, 1, 2, 1, 1, 2, 3, 2, 2, 2, 2, 3, 3, 3, 3, 2, 3, 3, 1, 1, 1, 2, 3, 1, 2, 1, 1)
(6) Select the probe level data for one probe set:>traits <- probesignals[probesignals$probeset = = ’96254_f_at’,]
(7) Perform QTL analysis on probe level:>qtlmap <- qtlMap.xProbe(genotypes = genotypes, traits = traits, batch = batch)>qtlProbe <- cbind(qtlmap[,1:3], minlog10pmarker =–log10(qtlmap[,4]))
(8) Perform QTL analysis on probe set level:>qtlmapProbeset <- qtlMap.xProbeSet(genotypes = genotypes, traits = traits, batch = batch)>qtlProbeset <- –log10(qtlmapProbeset$pmarker)>intProbeset <- –log10(qtlmapProbeset$pinteraction)
(9) Calculate the genome-wide significance threshold:>qtlThres <- qtlThresholds.sma(genotypes = genotypes, batch = batch, nProbes = 16, nIndiv = 30, n.simulations = 1000, filename = "qtlThres.csv")

2.3 Visualizations

(10) Create plot of the probe signals for a given probe set (Figure 2):Specify the interrogation positions of the probes on the mRNA:>pos <- c(1893, 1894, 1897, 1904, 1906, 1911, 1912, 1913, 1916, 1925, 1929, ...)>probePlot(traits = traits, probesetName = "96243_f_at", probesPos = pos, alleleColors = mycolors)
(11) Create QTL plots for each of the probes of one probe set:Collect the starting positions of each chromosome (in basepairs):>chrOffsets <- c(0.000000, 197.842934, 379.178330, 540.742912, 693.473822, ...)>qtlPlot.xProbe(probesetName = "96243_f_at", markersPos = markersPos, probeQtlProfiles = qtlProbe, qtlThres = 3.72, chrOffsets = chrOffsets, filename = "out1.png")
(12) Create QTL plots on probe set level (Figure 3):>qtlPlot.xProbeSet(probesetName = "96243_f_at", markersPos = markersPos, probesetQtlProfile = qtlProbeset, interactionProfile = intProbeset, qtlThres = 3.65, interactionThres = 3.75, chrOffsets = chrOffsets, filename = "out2.png")


Figure 2
View larger version (34K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Affymetrix probe level data for 30 mice. The mice with black (grey) profile carry the B6 (D2) allele at marker D15mit158. The probes have been designed for B6. Two unassayed and previously unknown SNPs in D2 (one in probes 4–9, the other in probes 11–15) explain the lower signals of the grey profiles.

 

Figure 3
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. QTL plot on probe set 96243_f_at. The black (gray) curve corresponds to the significance of QTL (QTL x probe).

 
2.4 Eliminate and check deviating probes
  1. (13) Eliminate deviating probes using the statistical method developed in Alberts et al. (2007a):>pe <- probeElimination(probesetName = "160371_at", markerName = "D7Mit100", genotypes = genotypes, traits = traits, batch = batch)
  2. (14) Users can download our Perl scripts to check probes in the UCSC Genome Browser (http://genome.ucsc.edu) for known SNPs not used in the genotyping of the mapping population.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: John Quackenbush

Received on July 4, 2007; revised on November 22, 2007; accepted on December 10, 2007

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 PROTOCOL
 REFERENCES
 

    Jansen RC, Nap JP. Genetical genomics: the added value from segregation. Trends Genet (2001) 17:388–391.[CrossRef][Web of Science][Medline]

    Irizarry RA, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics (2003) 4:249–264.[Abstract]

    Alberts R, et al. A statistical multiprobe model for analyzing cis and trans genes in genetical genomics experiments with short-oligonucleotide arrays. Genetics (2005) 171:1437–1439.[Abstract/Free Full Text]

    Alberts R, et al. Sequence polymorphisms cause many false cis eQTLs. PLoS ONE (2007a) 2:e262.[CrossRef]

    Alberts R, et al. A verification protocol for the probe sequences of Affymetrix genome arrays reveals high probe accuracy for studies in mouse, human and rat. BMC Bioinformatics (2007b) 8:132.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/3/433    most recent
btm614v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Alberts, R.
Right arrow Articles by Jansen, R. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Alberts, R.
Right arrow Articles by Jansen, R. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?