Skip Navigation


Bioinformatics Advance Access originally published online on October 23, 2006
Bioinformatics 2007 23(1):64-70; doi:10.1093/bioinformatics/btl539
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/1/64    most recent
btl539v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Dietter, J.
Right arrow Articles by Strauch, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Dietter, J.
Right arrow Articles by Strauch, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Linkage analysis using sex-specific recombination fractions with GENEHUNTER-MODSCORE

Johannes Dietter 1,*, Manuel Mattheisen 2, Robert Fürst 2, Franz Rüschendorf 3, Thomas F. Wienker 2 and Konstantin Strauch 1

1 Institute of Medical Biometry and Epidemiology, Philipps University Marburg 35032 Marburg, Germany
2 Institute for Medical Biometry, Informatics and Epidemiology, University of Bonn 53105 Bonn, Germany
3 Gene Mapping Center, Max-Delbrück-Center for Molecular Medicine 13092 Berlin, Germany

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 Appendix A: implementation of...
 REFERENCES
 

Motivation: Sex-specific marker maps have become increasingly available. We have implemented the usage of sex-specific recombination frequencies in the GENEHUNTER-MODSCORE program that performs multipoint linkage analysis. Furthermore, we have devised a consistent method to choose the combinations of male and female genetic positions at which linkage scores should be calculated. Marker coordinates can be read automatically from publicly available genetic maps.

Results: In a MOD-score analysis of the COGA dataset provided for Genetic Analysis Workshop 14, the highest linkage peak on chromosome 1 further increases when using sex-specific maps, while some smaller peaks are decreased. Simulations confirm that the MOD score can be biased when a sex-averaged instead of the correct sex-specific map is employed. This shows that an adequate modeling of the female:male ratio of genetic distances is important, especially for complex traits.

Availability: The new version of GENEHUNTER-MODSCORE can be downloaded from the following website: http://www.staff.uni-marburg.de/~strauchk/software.html

Contact: dietter{at}med.uni-marburg.de


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 Appendix A: implementation of...
 REFERENCES
 
The use of sex-specific recombination fractions in the context of linkage analysis has been extensively discussed. The work reported so far ranges from papers, which present methods to detect genetic regions of sex-specific recombination fractions to papers, which show methods for linkage analysis with sex-specific recombination rates or application of these methods to real datasets. Wu et al. (2002) developed an algorithm to estimate sex-specific recombination rates for a set of molecular markers, which segregate in a full-sib family with two heterozygous parents. Feenstra et al. (2004) used the calculation of ELODs to estimate the minimum sample size and the optimal proportion of maternally versus paternally informative matings in order to detect sex-specific differences in recombination rates. A linkage method for affected sib pairs which relies on alleles that are shared identical by descent and also takes imprinting and sex-specific maps into account has been developed by Wu et al. (2005). The Framingham Heart study data have been analyzed with linkage based methods by Mukhopadhyay and Weeks, (2003). They examined the data for imprinting in combination with sex-averaged or sex-specific maps. One of the results was that evidence for imprinting on certain chromosomal regions disappeared when they used sex-specific recombination frequencies. A systematical investigation of the effects when calculating parametric LOD-scores with sex-specific maps has been accomplished by Daw et al. (2000). They found that, even though modest, any misspecification of the inter-marker distances results in loss of power. Even more important has been an increase in the type-I error in case of map misspecification by using sex-averaged maps. By performing simulations, Fingerlin et al. (2006) have shown that non-parametric test statistics are biased, when using sex-averaged maps instead of sex-specific maps, in case that parents of only one sex are genotyped.

The aforementioned work is especially important since during the last years several maps with sex-specific genetic distances have been constructed (Duffy, 2005, http://www2.qimr.edu.au/davidD/Duffy_unifiedmap2005.html; Kong et al., 2002, 2004; Nievergelt et al., 2004; Broman et al., 1998). In addition to the work already listed in the first paragraph there is some standard linkage software, which is able to deal with the sex-specific marker positions supplied by these maps. These programs are based on the Elston-Stewart algorithm (Elston and Stewart, 1971), which is restricted to a simultaneous analysis of only few markers. A program which is able to handle considerably more markers is Superlink (Fishelson and Geiger, 2002). This program employs the technique of Bayesian networks (Pearl, 1988) to the problem of calculating the LOD score. Current software based on the Lander-Green algorithm (Lander and Green, 1987) is far less limited on the number of markers used. Such programs are MERLIN (Abecasis et al., 2002), ALLEGRO (Gudbjartsson et al., 2000, 2005) and GENEHUNTER (Kruglyak et al., 1996). These programs are routinely employed for genome and chromosome wide studies. While MERLIN (Fingerlin et al., 2006) and ALLEGRO have been extended to use sex-specific recombination frequencies, the widely used GENEHUNTER program is still confined to the usage of sex averaged recombination frequencies. Idury and Elston, (1997) described an extension to the Lander-Green algorithm, which allowed faster computations and also incorporated the possibility of using sex-specific recombination frequencies. This algorithm was employed in the context of GENEHUNTER v1.2; however, the sex-specific option was not included. MERLIN and ALLEGRO are able to perform linkage analysis with sex-specific maps but these programs do not offer the possibility of a parametric linkage analysis with imprinting and sex-specific maps, which is clearly a relevant situation. In addition to that, ALLEGRO or MERLIN has no functionality to optimize the parameters of the disease model, which is needed in the context of a MOD-score analysis.

In their editorial review ‘MERLIN ... and the Geneticists Stone?’ Nicolae and Cox (2002) argue in favor of ‘sophisticated tools that allow for sex-specific genetic maps and imprinting’. Likewise, Nievergelt et al. (2004) state that many of the most widely used programs for human linkage analysis does not use sex-specific maps. Here, we have extended GENEHUNTER-MODSCORE (Strauch, 2003; Strauch et al., 2005), which is based on the original GENEHUNTER version 2.1r6 and includes all of its functionality, to perform multipoint linkage analysis with sex-specific recombination frequencies. GENEHUNTER-MODSCORE allows for an automatic MOD-score analysis in which the LOD score is maximized with respect to the trait-model parameters (i.e. penetrances and disease allele frequency). The program is a further development of GENEHUNTER-IMPRINTING (Strauch et al., 2000), which is capable of taking imprinting into account in a parametric linkage analysis. With the new version of GENEHUNTER-MODSCORE presented here, it is possible to perform parametric and non-parametric multipoint linkage analysis, including MOD-score calculations with and without imprinting, using sex-specific recombination frequencies. In conjunction with the new version, we provide a Perl script GH_modview that, by help of the GNUPLOT graphics package (Williams and Kelley, 2004), creates plots of the LOD or MOD score, displayed by the single family contributions. This type of diagram is useful for both Mendelian and complex traits, since it identifies families with positive versus negative contribution to the linkage signal at a particular genetic position.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 Appendix A: implementation of...
 REFERENCES
 
Modified transfer matrix for sex-specific recombination fractions
This section details modifications to the Lander-Green algorithm as implemented in GENEHUNTER (Kruglyak et al., 1996), which are necessary in order to calculate LOD, MOD and NPL scores with sex-specific recombination fractions. Kruglyak and Lander (1998) already indicated how the corresponding formulas are to be changed when extended to sex-specific recombination fractions, even though the extension is not implemented in GENEHUNTER. Furthermore with the introduction of GENEHUNTER version 2.1 (Markianos et al., 2001) that includes significant algorithmic improvements the extension to sex-specific recombination fractions has become less obvious on the algorithmic side as well as regarding the implementation into the source code.

Here, we first indicate the general framework of linkage score calculation in GENEHUNTER, which is shown in detail in (Kruglyak et al., 1996). It is divided in two parts:

  1. Extraction of information about the inheritance pattern in a pedigree, which depends only on the markers.
  2. Definition of a statistic or score to assess linkage, for a given inheritance pattern, which depends only on the trait information on all pedigree members.

The inheritance pattern is coded in a so called inheritance vector. This is a binary vector that indicates for every meiosis whether a maternally or paternally inherited allele has been transmitted (bit 1 or 0, respectively). According to this definition, the length of an inheritance vector simply equals the number of meioses in the pedigree. For a graphical illustration of the concept of an inheritance vector see Figure 1a of Kruglyak et al. (1996). The above mentioned score is defined as a function which measures to what degree an inheritance vector w indicates the presence of a disease gene at a given position, in consideration of the trait phenotypes {phi}. Since, in general, more than one inheritance vector will be compatible with the information supplied by the marker data; one has to calculate the probability distribution P[v(x) = w] over the set V of all possible inheritance vectors. Here, v(x) denotes the inheritance vector at genetic position x of the putative disease locus relative to the marker group used. Now one can define the averaged scoring function Formula (Kruglyak et al., 1996):

Formula 1(1)
In order to introduce sex-specific recombination fractions in GENEHUNTER one has to focus on the probability distribution P[v(x) = w]. Strictly spoken P[v(x) = w] is the inheritance distribution at the genetic location x given the information of all marker genotypes:

Formula 2(2)
M1, M2, M3, ... , Mn denotes the marker data at the loci 1, 2, 3, ... , n for each person in the pedigree and n is the number of genotyped markers. The probability distribution in Equation (2) can explicitly be calculated if the corresponding stochastic process is formulated as a Hidden Markov process. This is described in detail in (Kruglyak et al., 1995). As a result of this procedure one gets an expression for Equation (2), which depends on the so-called transfer matrix. This is a matrix of state transition probabilities, with one state being the inheritance vector at marker j and the other state the inheritance vector at marker j + 1. If the inheritance vectors at the two successive markers are not the same, one or more recombination has occurred between marker j and j + 1. The number of recombinations equals the number of different bits of the inheritance vectors at the two markers. Because a recombination between the two markers takes place with probability {theta}j,j+1, the state transition probability between two consecutive markers is:

Formula 3(3)
vj: inheritance vector at marker j


Figure 1
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 This plot shows the rectangles (gray) at which the linkage score can be meaningfully calculated in a multipoint situation. M1, ... , M4 indicate the marker positions on the male map and on the female map.

 
s: number of meioses in the pedigree

{theta}j,j+1: recombination fraction between marker j and j + 1

H(vj, vj+1): Hamming distance between inheritance vector at marker j and inheritance vector at marker j + 1. The Hamming distance between two inheritance vectors is simply the number of different bits between these vectors.

When considering the state transition probability as being built up of individual factors for the meioses of the two sexes, one realizes that the extension to sex-specific recombination fractions is as follows:

Formula 4(4)

Formula 4: recombination fraction for male and female meioses between marker j and j + 1, respectively. sm, sf: number of meioses in males and females, respectively.

H(vj, vj+1)m: Hamming distance between inheritance vector at marker j and inheritance vector at marker j + 1 but only for bits relating to male meioses.

H(vj,vj+1)f: Hamming distance between inheritance vector at marker j and inheritance vector at marker j + 1 but only for bits relating to female meioses.

We have implemented this extended transfer matrix into the GENEHUNTER-MODSCORE source code in accordance with the concept of fixed and variable bits. This concept was introduced in (Markianos et al., 2001). It serves as means to handle the reduction of inheritance space by omitting inheritance vectors, which are not compatible with the marker genotype information. In more detail there are three levels of representation of inheritance vectors in the GENEHUNTER source code. The first level contains the information for every meiosis in the pedigree whether a paternal or maternal allele has been transmitted. The next level is the so-called ‘founder symmetry' level. This level takes into account that the parental origin of haplotypes is not known for founders. So one can fix the first meiosis of each founder in the pedigree per definition without losing information while the outcome of the remaining meioses of this founder are then fixed relative to the first one (Kruglyak et al., 1996). One more level is introduced by separating the inheritance vectors into fixed and variable bits. This concept takes into account that the outcomes of certain meioses are predefined (fixed) according to the available marker genotype information, thus leading to a fixed value of 0 or 1 for the corresponding bit. Likewise, if the outcome of meiosis cannot be determined unambiguously with the marker data, the respective bit is considered variable. The computational cost strongly decreases with the number of fixed bits, as described by Markianos et al. (2001). Here, in order to introduce sex-specific recombination fractions, we divided the fixed and variable bits into fixed male and fixed female as well as variable male and variable female bits. The explicit formulas are given in Appendix A.

Altogether, parametric and non-parametric linkage statistics (LOD, MOD and NPL scores), the expected number of meioses and the haplotypes via the ‘MaxProb’ method (Kruglyak et al., 1996) are now evaluated with sex-specific recombination fractions.

For what combinations of male and female recombination values between disease locus and its flanking markers should linkage scores be calculated?
When using sex-specific recombination frequencies, the linkage score is a function of both the female and male recombination frequencies. Thus, the linkage score is no longer a 1D random process, but a 2D random field. One might vary both recombination frequencies independently of one another as is described in (Ott, 1999). As a consequence due to the additional parameter, the LOD-3 criterion has to be raised to 3.5 in case of a parametric analysis (Lander and Lincoln, 1988). However, in the multipoint situation, the possible combinations of female and male map positions are mutually restricted, and their graphic representation is a series of rectangles defined by the set of markers (cf. Fig. 1). But by usage of more markers these rectangles are getting smaller. In an idealized situation the marker set is infinitely dense, and thus, the allowed combinations of female and male map positions are positioned on a curve defined by the marker set. It is noticeable that one can differentiate between two kinds of markers in this situation, i.e. loci, which coincide with genotyped markers and additional loci which are not genotyped but only serve to define the curve of allowed combinations of male and female recombination fractions. The positions of these additional loci can be taken from a specified genetic map. In consideration of these remarks, the biologically defined relation is adequately modeled if the set of female/male genetic positions, for which the LOD, MOD or NPL score should calculated, is positioned on the polygon defined by these markers. Since now the positions are again described by one single parameter -the position on the polygon-, there is no additional free parameter. Hence, the criterion for significance does not have to be raised further for maximization over additional parameter space, which would not correspond to biologically possible combinations. Please note that, in the context of a MOD-score analysis, significance criteria that apply in a traditional LOD-score analysis under a single trait model do need to be raised due to the additional maximization over the trait-model parameters [cf. Mattheisen and Strauch (2005); please also refer to the Discussion sections of Strauch et al. (2000, 2003)]. However, as outlined above, using sex-specific instead of sex-averaged recombination fractions should not lead to an additional inflation of MOD scores, just as in the case of LOD or NPL scores.

Once the set of markers to be used has been chosen, the female and male positions are determined as follows. The user specifies the number of points, between two typed markers, at which the LOD, MOD or NPL score is to be calculated. As already stated these points are located on the polygon defined by the male and female coordinates of the genotyped markers and the additional loci, which are taken from a map. In addition to that, the average of the male and female position of a point has to be equal to the same coordinate as if the sex-averaged map had been used. With these two requirements the positions at which the linkage score is to be calculated are uniquely determined. A more detailed description of how to determine the set of positions at which the linkage score is calculated is given in Appendix B which can be found in the Supplementary material.

We would like to mention that we have adopted the common practice to treat the differences between the sexes at the level of the genetic distances only, and not via sex-specific mapping functions between genetic distances and recombination frequencies. This means that one allows the chiasma incidence, which is the link between the physical distance and the genetic distance to be sex-specific, which in consequence leads to sex-specific genetic distances. The phenomenon which is described by the mapping function is the chiasma interference and maybe chromatid interference (Bailey, 1961; Whitehouse, 1973). Because we use the same mapping function for males and females, the chiasma and chromatid interference are assumed to be sex-independent. This approach is illustrated in Figure 2.


Figure 2
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Illustration of the relationship between physical distance (Mb = megabases) and genetic distance (cM = centiMorgans) and the relationship between genetic distance and recombination frequencies. Also shown is the different modelling of the biological phenomena under the aspect of sex differences.

 
In order to reduce the work caused by manually handling the recombination frequencies, we have implemented an option to read the coordinates for all markers of a certain map from a specified file. This functionality is prepared to read data from the Marshfield (Broman et al., 1998), the deCODE (Kong et al., 2002), the Nievergelt-Schork (Nievergelt et al., 2004), the Rutgers (Kong et al., 2004) or the Duffy map (Duffy, 2005). By inserting user specified markers, it is possible to build one's own customized map. However, it is necessary that the markers read from a map fulfill the condition that the genetic position (female, male and average) of consecutive markers are consistent with their position in the map, i.e. if marker B is listed after marker A, then the female and male position of marker B have to be greater than (and, for technical reasons, not equal to) the corresponding positions of marker A. All of the aforementioned maps contain markers whose positions are not consistent as described above, since neighboring markers often have identical coordinates. With some of the maps, there even are markers for which the male or female genetic coordinate is smaller than the coordinate of the preceding marker. For this reason we have provided a functionality to correct the marker coordinates by adding a small value {varepsilon} in such cases. If the genetic coordinate for one sex happens to be smaller than or equal to the coordinate of the preceding marker in the used map file, the current coordinate is set to the previous coordinate plus {varepsilon}. One should keep in mind that the resulting marker coordinates will be slightly different from those in the original map; however, these differences are marginal.

Simulations
Simulations have been carried out in order to investigate the differences between MOD scores calculated using sex-specific recombination frequencies to MOD scores obtained with sex-averaged recombination frequencies. We simulated two markers, either with a disease locus half-way between them (H1: alternative hypothesis) or without a disease locus (H0: null hypothesis of no linkage). The genetic distance between the two markers was defined as 2.5 cM for males and 7.5 cM for females (average 5 cM), resulting in a female:male ratio of 3. For studying the effect of the average map distance, we also used 5 and 15 cM as male and female distances, respectively, thus having an average of 10 cM. The markers have each four alleles with the same frequency. Following the simulation design by Fingerlin et al. (2006), the disease allele frequency was fixed at 0.2. The penetrances have been 0.05, 0.175 and 0.3 for 0, 1 and 2 copies of the disease allele. We have simulated 10 000 replicates with the SIMULATE program (Terwilliger and Ott, 1993) under the null hypothesis and 5000 replicates with the FASTSLINK program (Cottingham et al., 1993) under the alternative. The replicates consisted of 500 affected sib pairs. We have evaluated the percentage of replicates simulated under the null hypothesis with a MOD score >1, >1.5, >2, >2.5 and >3 at the position half-way between the two markers. Under the alternative we have computed the percentage of replicates with a maximum MOD score >1, >1.5, >2, >2.5 and >3 over positions defined by 20 equidistant intervals between the two markers and the expected maximum MOD score (EMMOD). While all the replicates have been generated with the usage of the sex-specific map, the aforementioned quantities have been calculated by the assumption of sex-specific distances and for comparison by the assumption of the sex-averaged distances. Like Fingerlin et al. (2006) we also investigated the effect of missing parental genotype data and performed simulations as described above but with either no maternal genotype data and complete paternal genotype data or no paternal genotype data and complete maternal genotype data.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 Appendix A: implementation of...
 REFERENCES
 
Application of GENEHUNTER-MODSCORE to the GAW 14 data
We have applied GENEHUNTER-MODSCORE with sex-specific recombination frequencies to the Collaborative Study on the Genetics of Alcoholism dataset provided for Genetic Analysis Workshop 14. For a detailed description of the data and the analysis, and a more comprehensive discussion concerning the results not specific to the use of sex-specific recombination frequencies, see (Strauch et al., 2005). Figure 3 displays the MOD score for chromosome 1 as calculated with sex-averaged (dark gray curve) and sex-specific recombination frequencies (light gray curve). Even though the curves are reasonably similar, there are some differences. One notices that peak positions and heights are nearly preserved (160 cM), or more pronounced (120 cM), or strongly reduced (57 cM).


Figure 3
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Comparison of the MOD score for chromosome 1 of the GAW14 data calculated with sex-averaged (dark gray curve) and sex-specific recombination frequencies (light gray curve). The position indicated on the x-axis corresponds to the average of the male and female positions in case of the calculation with the sex-specific map.

 
The computation time was ~32 h with an AMD Opteron 2 GHz processor for both the sex-specific and sex-averaged map; please see Appendix C of the Supplementary material for details regarding computation time issues.

Simulations
The main results of our simulations are summarized in Table 1. The complete results can be found in Appendix D of the Supplementary material. We have found no noteworthy difference between the analysis with a sex-specific map and a sex-averaged map when both parents are genotyped. When only mothers are genotyped the probability under the null hypothesis that the MOD score exceeds 1 at the disease locus position is inflated using the sex-averaged map compared to using the sex-specific map. In the case of only genotyped fathers this probability is deflated. The same trend is seen for the maximum MOD score and the EMMOD calculated under the alternative hypothesis. As can be seen from Appendix D, the simulations with 10 cM marker distance do not show qualitatively different results.


View this table:
[in this window]
[in a new window]

 
Table 1 Percent of replicates with MOD score >1 at the disease locus in case of the simulation under null hypothesis and percent of replicates with maximum MOD score >1 under the alternative hypothesis

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 Appendix A: implementation of...
 REFERENCES
 
We have presented an algorithmic extension to GENEHUNTER-MODSCORE with which it is possible to calculate LOD, MOD or NPL scores with sex-specific maps and hence sex-specific recombination frequencies. This extension is implemented by retaining the concept of fixed and variable bits (Markianos et al., 2001), so that the computational efficiency achieved by this technique is kept. When looking at the parameter space of possible values for male and female genetic positions, the linkage scores are determined along a polygon, which is defined by the positions of all markers in the map used, some of which are genotyped in the study under analysis, and others not. The coordinates of both the genotyped and the ungenotyped markers, which are taken from a publicly available map, serve as to define the ratio of male and female genetic distances. Because the positions of ungenotyped markers are used as well, the sex ratio of genetic distances is allowed to vary between different segments even if they are located within the same interval of two genotyped markers. Due to this method, the positions at which the scores are evaluated follow the biologically meaningful path. Also by employing this method we avoid the need to raise significance criteria for linkage statistics, since there is no necessity to vary female and male positions independently from one another, thus avoiding an additional parameter. The usage of sex-specific recombination frequencies, in combination with the method to construct the set of disease loci at which the linkage scores are determined, defines a consistent framework for using the recombination data of the markers which results in better genetic modeling. At the same time the functionality to automatically read in the marker data simplifies the practical handling of the data. To our knowledge no other program offers this complete handling and modeling of marker locations.

Our simulations do not show any difference in type-I error or power between the usages of sex-specific or sex-averaged maps when both parents are genotyped but there are noticeable differences when either only mothers or only fathers are genotyped. These findings are in accordance with the results of Fingerlin et al. (2006). However, both our results as well as the findings by Fingerlin et al. do not agree with the results obtained by Daw et al. (2000). This is due to the fact that Daw et al. calculated simple parametric LOD scores that, at a given genetic position in the multipoint context, were not maximized over any free parameter. This is contrary to both the non-parametric test statistics used by Fingerlin et al. and the MOD score employed here.

Williamson and Amos (1995) observe that misspecification of inter-marker distances interfere with a usually unavoidable misspecification of the disease model so as to lead to false positive linkage results. In consequence it is particularly relevant to apply sex-specific recombination frequencies when examining complex diseases. Since sex-specific genetic maps are publicly available, we should proceed to use them, in order to exploit all available linkage information in the best possible way. This will aid in a more reliable discrimination between true and false linkage peaks.

In conclusion, with this algorithmic extension presented here that has been implemented into the latest version of GENEHUNTER-MODSCORE, it is now possible to carry out parametric and non-parametric multipoint linkage analysis, including MOD-score calculations with and without imprinting, using sex-specific recombination frequencies. This results in a more adequate formulation of the genetic models underlying linkage analysis. This increased efficiency comes without an inferential penalty, and the critical values of the linkage test statistic defined for the case of a sex-averaged map need not to be raised when a sex-specific map is employed. In conjunction with the usage of sex-specific recombimation frequencies we have introduced a consistent method to determine the coordinates at which the linkage scores are to be calculated. Furthermore we offer the user-friendly possibility to automatically read in the marker coordinates from a publicly available genetic map instead of manually inserting the marker positions in the respective input files. At the same time, our new program version offers researchers access to the complete functionality of the GENEHUNTER program as well as to the extended capabilities of GENEHUNTER-MODSCORE.


    Appendix A: implementation of sex-specific recombination frequencies
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 Appendix A: implementation of...
 REFERENCES
 
A key step in the Hidden Markov formalism employed in the calculation of Equation (2) is to multiply the transition matrix Equation (3), respectively Equation (4) with a probability distribution over inheritance vectors. The details of this procedure are covered in Kruglyak and Lander (1998). Markianos et al. (2001) have demonstrated that the product of the transition matrix with the distribution over inheritance vectors can be reformulated by making use of fixed and variable bits. If the transition takes place between marker M1 and marker M2 then k1 meioses with known outcome at M1 correspond to k1 fixed bits and k2 meioses with known outcome at M2 correspond to k2 fixed bits. Now, the sequence of bits of an inheritance vector w at M2 is reordered according to the following scheme: We first reorder the bits of the inheritance vectors at M1, so that the k1 fixed bits precede the remaining s-k1 variable bits. Here, s is the total number of meioses. The ranking among fixed and variable bits is not changed. This reordering of bits at M1 is also applied to the inheritance vectors at M2 i.e. the bits at M2 are reordered according to which bits are fixed at M1 and not at M2. Accordingly, an inheritance vector w at M2 can be written as w = (y, x) y are those bits of w which were at the positions of the fixed bits of the inheritance vectors at M1 and x are the bits of w which were at the positions of the variable bits of the inheritance vectors at M1. With y0 being the fixed bits at M1 and {theta} the recombination frequency between marker M1 and M2, the aforementioned matrix vector product can be written as follows (Markianos et al., 2001):

Formula 5(5)
The extension of the first two terms in Equation (5) to sex-specific recombination frequencies is completely analogous to the extension of the transition matrix in Equation (4). One only has to replace the total number of meioses s by the number of fixed bits k1 and the complete inheritance vectors vj, vj + 1 by the inheritance vectors consisting of the fixed bits y0 and y:

Formula 6(6)
{theta}m and {theta}f are the male and female recombination fraction. The third term in Equation (5), T(sk1)Py0, is the matrix vector product of the transition matrix with the distribution over inheritance vectors, which takes only the sk1 variable bits into account. This matrix vector product can be calculated with the Fast-Fourier transformation technique as described in (Kruglyak and Lander, 1998). However, the resulting expressions are now extended due to the usage of sex-specific recombination frequencies. In particular the Fourier transform of the transition matrix can be written as:

Formula 7(7)
w = (w1, ... , wsk1): inheritance vector consisting of the sk1 variable bits.

wgtm(w): number of nonzero bits (weight), referring to male meioses.

wgtf(w): number of nonzero bits (weight), referring to female meioses.

With these expressions we can calculate sex-specific linkage scores by retaining the concept of fixed and variable bits.


    Acknowledgments
 
This work was supported by grant Str643/1-3 (project D2 of FOR423) of the German Research Foundation (D.F.G.). The MOD-score calculations have been performed on the MARC Cluster of the Philipps University (Marburg, Germany) and on the Sun Fire SMP Cluster of Aachen University (Aachen, Germany).

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on May 12, 2006; revised on October 13, 2006; accepted on October 16, 2006

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 Appendix A: implementation of...
 REFERENCES
 

    Abecasis, G.R., et al. (2002) Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet, . 30, 97–101[CrossRef][ISI][Medline].

    Bailey, N.T.J. Introduction to the Mathematical Theory of Genetic Linkage, (1961) , UK Oxford University Press.

    Broman, K.W., et al. (1998) Comprehensive human genetic maps: individual and sex-specific recombination. Am. J. Hum. Genet, . 63, 861–869[CrossRef][ISI][Medline].

    Cottingham, R.W., Idury, R.M., Schaffer, A.A. (1993) Fast sequential genetic linkage computation. Am. J. Hum. Genet, . 53, 252–263[ISI][Medline].

    Daw, E.W., et al. (2000) Bias in multipoint linkage analysis from map misspecification. Genet. Epid, . 19, 366–380.

    Duffy. (2005) .

    Elston, R.C. and Stewart, J. (1971) A general method for the genetic analysis of pedigree data. Hum. Hered, . 21, 523–542[CrossRef][ISI][Medline].

    Feenstra, B., et al. (2004) Using LOD scores to detect sex differences in male-female recombination fractions. Hum. Hered, . 57, 100–108[ISI][Medline].

    Fishelson, M. and Geiger, D. (2002) Exact genetic linkage computations for general pedigrees. Bioinformatics, 18, 189–198.

    Fingerlin, T.E., et al. (2006) Using sex-averaged genetic maps in multipoint linkage analysis when identity-by-descent status is incompletely known. Genet. Epidemiol, . 30, 384–396[CrossRef][ISI][Medline].

    Gudbjartsson, D.F., et al. (2000) Allegro, a new computer program for multipoint linkage analysis. Nat. Genet, . 25, 12–13[CrossRef][ISI][Medline].

    Gudbjartsson, D.F., et al. (2005) Allegro version 2. Nat. Genet, . 37, 1015–1016[CrossRef][ISI][Medline].

    Idury, R.M. and Elston, R.C. (1997) A faster and more general hidden markov model algorithm for multipoint likelihood calculations. Hum. Hered, . 47, 197–202[ISI][Medline].

    Nicolae, D.L. and Cox, N.J. (2002) MERLIN...and the geneticist's stone? Nat. Genet, . 30, 3–4[CrossRef][ISI][Medline].

    Kong, A., et al. (2002) A high-resolution recombination map of the human genome. Nat. Genet, . 31, 241–247[CrossRef][ISI][Medline].

    Kong, X., et al. (2004) A combined linkage-physical map of the human genome. Am. J. Hum. Genet, . 75, 1143–1148[CrossRef][ISI][Medline].

    Kruglyak, L., et al. (1995) Rapid multipoint linkage analysis of recessive traits in nuclear families, including homozygosity mapping. Am. J. Hum. Genet, . 56, 519–527[ISI][Medline].

    Kruglyak, L., et al. (1996) Parametric and nonparametric linkage analysis: a unified multipoint approach. Am. J. Hum. Genet, . 58, 1347–1363[ISI][Medline].

    Kruglyak, L. and Lander, E.S. (1998) Faster multipoint linkage analysis using Fourier transforms. J. Comput. Biol, . 5, 1–7[ISI][Medline].

    Lander, E.S. and Green, P. (1987) Construction of multilocus genetic linkage maps in humans. Proc. Natl Acad. Sci. USA, 84, 2363–2367[Abstract/Free Full Text].

    Lander, E.S. and Lincoln, S.E. (1988) The appropriate threshold for declaring linkage when allowing sex-specific recombination rates. Am. J. Hum. Genet, . 43, 396–400[ISI][Medline].

    Markianos, K., et al. (2001) Efficient multipoint linkage analysis through reduction of inheritance space. Am. J. Hum. Genet, . 68, 963–977[CrossRef][ISI][Medline].

    Mattheisen, M. and Strauch, K. (2005) Distribution of MOD scores under no linkage. Genet. Epidemiol, . 29, 268.

    Mukhopadhyay, N. and Weeks, D.E. (2003) Linkage analysis of adult height with parent of origin effects in the Framingham Heart Study. BMC Genet, . 31, S76[CrossRef].

    Nievergelt, C.M., et al. (2004) Large-scale integration of human genetic and physical maps. Genome Res, . 14, 1199–1205[Abstract/Free Full Text].

    Ott, J. Analysis of Human Genetic Linkage, (1999) 3rd edn , Baltimore, Maryland John Hopkins University Press.

    Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, (1988) , San Mateo, CA Morgan Kaufmann.

    Strauch, K., et al. (2000) Parametric and nonparametric linkage analysis with imprinting and two-locus-trait models: application to mite sensitization. Am. J. Hum. Genet, . 66, 1945–1957[CrossRef][ISI][Medline].

    Strauch, K., et al. (2003) How to model a complex trait. 1. General considerations and suggestions. Hum. Hered, . 55, 202–210[CrossRef][ISI][Medline].

    Strauch, K. (2003) Parametric linkage analysis with automatic optimization of the disease model parameters. Am. J. Hum. Genet, . 73, A2624.

    Strauch, K., et al. (2005) Linkage analysis of alcohol dependence using MOD scores. BMC Genetics, 6, 162[CrossRef].

    Terwilliger, J.D., et al. (1993) Chromosome-based method for rapid computer simulation in human genetic linkage analysis. Genet. Epidemiol, . 10, 217–224[CrossRef][ISI][Medline].

    Whitehouse, H.L.K. The Chiasma Type Theory: Towards an Understanding of the Mechanism of Heredity, (1973) , NY St. Martins Press.

    Williams, T. and Kelley, C. (2004) Gnuplot: an interactive plotting program. Manual, version 4.0.

    Williamson, J.A. and Amos, C. (1995) Guess LOD approach: sufficient conditions for robustness. Genet. Epidemiol, . 12, 163–176[CrossRef][ISI][Medline].

    Wu, R., et al. (2002) Linkage mapping of sex-specific differences. Genet. Res, . 79, 85–96[CrossRef][ISI][Medline].

    Wu, C.C., et al. (2005) Linkage analysis of affected sib pairs allowing for parent-of-origin effects. Ann. Hum. Genet, . 69, 113–126[CrossRef][ISI][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Genome Res.Home page
T. C. Matise, F. Chen, W. Chen, F. M. De La Vega, M. Hansen, C. He, F. C.L. Hyland, G. C. Kennedy, X. Kong, S. S. Murray, et al.
A second-generation combined linkage physical map of the human genome
Genome Res., December 1, 2007; 17(12): 1783 - 1786.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/1/64    most recent
btl539v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Dietter, J.
Right arrow Articles by Strauch, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Dietter, J.
Right arrow Articles by Strauch, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?