Skip Navigation


Bioinformatics Advance Access originally published online on July 27, 2007
Bioinformatics 2007 23(19):2643-2644; doi:10.1093/bioinformatics/btm376
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/19/2643    most recent
btm376v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Seelow, D.
Right arrow Articles by Lindner, T. H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Seelow, D.
Right arrow Articles by Lindner, T. H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

AssociationDB: web-based exploration of genomic association

Dominik Seelow 1,*, Katrin Hoffmann 1,2 and Tom H. Lindner 3

1Institute for Medical Genetics, Charité – Universitätsmedizin Berlin, Augustenburger Platz 1, D-13353 Berlin, 2Max Planck Institute for Molecular Genetics, Ihnestrasse. 73, D-14195 Berlin and 3Department of Nephrology and Hypertension, Medical Clinic 4, University of Erlangen-Nuremberg, Breslauer Strasse, 201, D-90471 Nürnberg, Germany

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: Genome-wide association studies use hundreds of thousands of markers making it challenging to present and finally interpret the results. We developed a graphical, web-based solution for an interactive exploration of the results of case-control studies, with a tight integration of related gene information and tissue-specific expression data. Association results are presented as physical position-based vertical bars with known genes included as horizontal bars at their respective physical positions. The interface allows the specification of filtering criteria for the association data and highlights potentially interesting genes with user-specified terms occurring in their reports or with relevant expression patterns. Pop-up windows and hyperlinks provide drill-down capabilities and quick access to relevant data AssociationDB can either be used as a stand-alone solution or as a front-end joining association results obtained by other software with genomic information.

Availability: http://genetik.charite.de/AssociationDB

Contact: dominik.seelow{at}charite.de

Supplementary information: The source code, a web-based demo, a step-by-step manual, and an installation guide are available at http://genetik.charite.de/AssociationDB.

Whole-genome association studies generate a huge amount of data rendering a quick review of results nearly impossible. In a conventional approach one would start to sort the information by the significance of P-values. However, genomic context and information provided by nearby markers will be lost this way. In addition, the overwhelming amount of data makes the visualization of the results very difficult since standard spreadsheet applications are just not able to solve this task. Further, a hit remains anonymous as long as no candidate gene can be assigned to the result.

We generated the web-based interactive open source database AssociationDB that tries to solve those analysis bottlenecks. The database is primarily intended to provide a user-friendly and fast overview of the results of either genome-wide or locus-specific association studies with a case-control setting. The integration of gene information, gene expression data and eventually hyperlinks to WWW resources puts the results straight into a genomic and functional context. Due to the client-server architecture no additional software installation on client computers is required. The intuitive web-based interface lets anyone quickly query and visualize association results and genomic data. It also allows data from different research groups and projects to be kept on the same server. Access on ongoing projects can be granted to the respective data owners and restricted to others while published data could be made completely public.

The backbone of the gene information table was taken from NCBI Entrez Gene, further information comprises NCBI and OMIM reports (Hamosh et al., 2005). Gene and marker positions were taken from the NCBI genetic map, build 36.1. Other data sets being integrated are known microRNAs (Griffiths-Jones et al., 2006), selection data (Voight et al., 2006) and GeneAtlas expression data (Su et al., 2004). Adding own association data is a two-staged process: first, the analysis groups must be defined and their genotypes imported. Afterwards, basic statistics such as {chi}2 for Hardy–Weinberg equilibrium (HWE), genotyping and allelic association can be performed with functions on the database level. To provide reasonable access times the database stores aggregated genotypes and pre-analysed results instead of re-calculating them for every new query. This restricts reports to predefined cases and controls but permits the easy integration of results obtained by further external statistical tests. We provide Perl scripts for an easy export of genotypes and import of results. The general workflow and typical import, analysis and access times are presented on our website.

The main purpose of AssociationDB is a fast and intuitive interpretation of association results in the context of the respective genomic data, and hence a graphical representation was chosen (Fig. 1). The display comprises three different bar charts; allelic and genotypic association, and an aggregation of the allelic association of nearby markers (Fig. 1). Here, each SNP is scored for the significance of the association (weak, modest, high); scores of nearby markers are divided by the distance in SNPs and added. This procedure is repeated for every comparison included. It gives an aggregation of vicinity information as well as of different controls making the scoring relatively robust against false positives due to genotyping errors. On top of the window, genes are represented as vertical bars reflecting their position and size. Genes with words of interest in their gene information or OMIM reports (Hamosh et al., 2005) or fulfilling certain expression criteria are highlighted. The gene description is presented in pop-up windows, clicking on a gene also opens a pop-up providing direct links to gene-specific information in our database, ENSEMBL, NCBI Entrez Gene and GeneCards (Rebhan et al., 1997). In case of our own database, the information comprises relevant OMIM reports, expression data, NCBI GeneRIFs. To add further decision criteria, the location of microsatellite markers and LOD scores obtained in previous linkage analyses can be included as well. The database design allows the storage of multiple maps and hence permits an easy update of the positions as well as the use of older builds if necessary.


Figure 1
View larger version (39K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. This screen shot shows the results of a comparison of the HapMap CEU population (Utah residents with European ancestry) against the YRI (Yoruba from Nigeria), CHB (Han Chinese from Beijing) and JPT (Japanese from Tokyo) populations, respectively. A 4 Mbp region around the SLC24A5 gene was retrieved from the database. SLC24A5 is involved in melanin pigmentation and known to show significant differences in allele frequencies among different populations (Lamason et al., 2005). On top of the window, genes are shown as horizontal bars. The bar charts show the P-values of genotypic (top) and allelic (bottom) association on a logarithmic scale at a genomic position. P-values from the three studies are plotted in different colours, the P-value supported by all is shown as a black bar. When both, allelic and genotypic association, are significant in all studies, the black is replaced by red as a signal colour. On the left side, a condensed view of the P-values supported by all comparisons in the whole region is presented and the colour codes are described. Although there is very high ‘association’ throughout the whole region, there is a clear peak at the SLC24A5 locus confirming the association with skin pigmentation. Further information on genes or SNPs is provided in pop-up windows. Clicking on a gene will open a new page giving more detailed information such as the exact position, its OMIM record and expression in various tissues. The data presented here can be explored interactively on our website.

 
For a validation of the results, up to three association studies (e.g. cases versus three different control groups or studies in different populations) can be displayed. Cases can be tested against up to two other control populations. In the graphical representation, P-values shared among different groups are displayed in darker colours. P-values smaller than a user-defined significance threshold are indicated as red bars. Deviation from HWE in controls which may point at genotyping problems is indicated as well unless the user decides to completely remove those markers from the output.

AssociationDB differs from existing data analysis solutions such as PLINK (http://pngu.mgh.harvard.edu/~purcell/plink), Stata (Dufouil et al., 2004), Genomizer (Franke et al., 2006), Scout (Epstein et al., 2005) or R modules (Zhao and Tan, 2006) which offer an exhaustive set of analysis methods but lack the integration of genomic data other than marker positions and are often difficult to use for non-statisticians. On the other hand, AssociationDB is not intended to be a mere data repository such as the Genetic Association Database (Becker et al., 2004). AssociationDB's aim is to fill the gap between sophisticated data analysis tools and integrated visualization approaches with an intuitive access to genomic data. The data analysis capabilities of AssociationDB are limited, it can neither generate haplotypes nor perform extended statistical analyses. However, the results of such analyses carried out by other tools can easily be integrated and explored in their genomic context.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 ACKNOWLEDGEMENTS
 REFERENCES
 
K.H. is supported by Deutsche Forschungsgemeinschaft grant DFG, SFB 577, project A4, and is a recipient of a Rahel Hirsch Fellowship, provided by the Charité Medical Faculty. T.H.L. is supported by grants from the Deutsche Forschungsgemeinschaft (DFG; LiDFG768/4-1/4-2/6-1).

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Alex Bateman

Received on December 11, 2006; revised on May 31, 2007; accepted on July 14, 2007

    REFERENCES
 TOP
 ABSTRACT
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Becker KG, et al. The genetic association database. Nat. Genet. (2004) 36:431–432.[CrossRef][Web of Science][Medline]

    Dufouil C, et al. Analysis of longitudinal studies with death and drop-out: a case study. Stat. Med. (2004) 23:2215–2226.[CrossRef][Web of Science][Medline]

    Epstein MP, et al. Genetic association analysis using data from triads and unrelated subjects. Am. J. Hum. Genet. (2005) 76:592–608.[CrossRef][Web of Science][Medline]

    Franke A, et al. GENOMIZER: an integrated analysis system for genome-wide association data. Hum. Mutat. (2006) 27:583–588.[CrossRef][Web of Science][Medline]

    Griffiths-Jones S, et al. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. (2006) 34:D140–D144.[Abstract/Free Full Text]

    Hamosh A, et al. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. (2005) 33:D514–D517.[Abstract/Free Full Text]

    Lamason RL, et al. SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science (2005) 310:1782–1786.[Abstract/Free Full Text]

    Rebhan M, et al. GeneCards: integrating information about genes, proteins and diseases. Trends Genet. (1997) 13:163.[CrossRef][Web of Science][Medline]

    Su AI, et al. A gene atlas of the mouse and human protein-encoding transcriptomes, Proc. Natl Acad. Sci. USA (2004) 101:6062–6067.[Abstract/Free Full Text]

    Voight BF, et al. A map of recent positive selection in the human genome. PLoS Biol. (2006) 4:e72.[CrossRef][Medline]

    Zhao JH, Tan Q. Integrated analysis of genetic data with R. Hum. Genomics (2006) 2:258–265.[Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/19/2643    most recent
btm376v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Seelow, D.
Right arrow Articles by Lindner, T. H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Seelow, D.
Right arrow Articles by Lindner, T. H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?