Bioinformatics Advance Access originally published online on February 17, 2006
Bioinformatics 2006 22(8):1013-1014; doi:10.1093/bioinformatics/btl058
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Pedigree-drawing with R and graphviz
MRC Epidemiology Unit, Strangeways Research Laboratory Cambridge CB1 8RN, UK
| ABSTRACT |
|---|
|
|
|---|
Summary: Two functions for pedigree-drawing available in R (http://www.r-project.org): plot.pedigree in kinship and pedtodot in gap are described. The latter requires graphviz (http://www.graphviz.org). They can produce many pedigree diagrams quickly into a single file, serving as alternatives to programs that only offer interactive use.
Availability: Packages kinship and gap are available from http://cran.r-project.org.
Contact: jinghua.zhao{at}mrc-epid.cam.ac.uk
Graphical display of pedigree data is of interest in family studies and there are many computer programs avaiable [see Dudbridge et al. (2004) and http://linkage.rockefeller.edu]. For example, there is a flexible yet sophisticated package under the GNU General Public License (http://www.gnu.org/copyleft/gpl.html) called Madeline (http://eyegene.ophthy.med.umich.edu). In view of the rising interest in R (Zhao and Tan, 2006), two R functions for drawing pedigree diagrams are introduced. The first is called plot.pedigree and in package kinship originally in S-Plus by Terry Therneau and Beth Atkinson, while the second is called pedtodot and in package gap motivated by a gawk script by David Duffy.
Before a full description of them is given, it is necessary to know the way which pedigree information is organized and represented. It would also be helpful to have some understanding of the algorithmic aspects (Tores and Barillot, 2001).
For example, information for pedigree numbered 10081 from Genetic Analysis Workshop 14 (http://www.gaworkshop.org) is shown as follows.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Where the first three columns represent individual and parent IDs, changed to integers for clarity. Pedigree ID allows multiple pedigrees to be maintained in a single database. Individual's gender (e.g. 1 = male, 2 = female) is included as auxiliary information. Here the variable aff indicates whether an individual is alcoholic (1 = no, 2 = yes). Parents for individuals whose parents are not in the pedigree are set to be zero. The last two columns are the genotypes for marker GABRB1 and D4S1645.
In a typical pedigree diagram males and females are shown in squares and circles, respectively. Spouses can form marriage nodes from which nodes for children are derived. It is also customary to draw pedigree diagrams top down, so that children at a given generation could have children of their own in the next generation.
This implies that a conceptually simple algorithm for pedigree drawing would involve sorting members of a pedigree by generation and align members of the same generation horizontally and those at different generations vertically. In other words, the family is drawn as a graph with members as nodes and ordered by their generation numbers. The algorithm could be more involved if there are marriage loops in the family, i.e. overlapping generations, or if the pedigree is too large to fit in a single page. Therefore pedigree information maintained in a database is such that each record of which corresponds to a node in the pedigree graph.
Now suppose the example pedigree above is kept in an ASCII text file called 10081.pre, we use pre <- read.table ("10081.pre", header=T) to read it into object pre. We can call plot.pedigree as follows.
- library(kinship)
- attache(pre)
- ped < pedigree(id,fid,mid,sex,aff)
- par(xpd = T)
- plot.pedigree(ped)
- attache(pre)
|
Package kinship was developed for linear mixed and mixed-effects Cox models of family data (Zhao, 2005) and package gap was for population and family-based genetic analyses in general. These can be obtained within R through library(help=kinship) as with ?plot.pedigree or library(help=gap) with ?pedtodot, while help.start() presents these information in a web browser. Furthermore, path.diagram in the R package sem can generate dot file to be used by graphviz, as with the Bioconductor (http://www.bioconductor.org) package Rgraphviz. It is therefore desirable that graphviz will eventually be part of the R system.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Martin Bishop
Received on December 13, 2005; revised on February 13, 2006; accepted on February 13, 2006
| REFERENCES |
|---|
|
|
|---|
Dudbridge, F., et al. (2004) Pelican: pedigree editor for linkage computer analysis. Bioinformatics, 20, 23272328
Tores, F. and Barillot, E. (2001) The art of pedigree drawing: algorithmic aspects. Bioinformatics, 17, 174179
Zhao, J.H. (2005) Mixed-effects Cox models of alcohol dependence in extended pedigrees. BMC Genetics, 6, Suppl 1, S127.
Zhao, J.H. and Tan, Q. (2006) Integrated analysis of genetic data with R. Hum. Genomics, 2, 258265[Medline].
This article has been cited by other articles:
![]() |
C. Fuchsberger, M. Falchi, L. Forer, and P. P. Pramstaller PedVizApi: a Java API for the interactive, visual analysis of extended pedigrees Bioinformatics, January 15, 2008; 24(2): 279 - 281. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. H. Trager, R. Khanna, A. Marrs, L. Siden, K. E.H. Branham, A. Swaroop, and J. E. Richards Madeline 2.0 PDE: a new program for local and web-based pedigree drawing Bioinformatics, July 15, 2007; 23(14): 1854 - 1856. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

