Bioinformatics Advance Access originally published online on February 22, 2005
Bioinformatics 2005 21(10):2447-2455; doi:10.1093/bioinformatics/bti342
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mapping genomegenome epistasis: a high-dimensional model
Department of Statistics, University of Florida Gainesville, FL 32611, USA
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: The proper development of any organ or tissue requires the coordinated expression of its underlying genes that can be located on different genomes present in an organism. For instance, each step in the development of seed for a higher plant is the consequence of gene interactions from the maternal, embryo and endosperm genomes.
Results: We present a multivariate statistical model for mapping quantitative trait loci (QTL) by incorporating two important aspects of seed development in plantsQTL interactions derived from different genomes, the maternal, embryo and endosperm, and genetic correlations among phenotypic traits expressed in different genome-specific tissues. This model, which has a high dimensionality, is constructed within the maximum-likelihood context based on a finite mixture model. The implementation of the expectationmaximization algorithm allows for the efficient estimation of QTL positions, their action and interaction effects and pleiotropic effects. The application of this high-dimensional model to a real rice dataset has validated its usefulness.
Conclusions: Our model was derived for self-pollinated plants, but it can be extended to cross-pollinated plants and to animals. With the burgeoning of genetic and genomic data, this high-dimensional model will have many implications for agricultural and evolutionary genetic research.
Availability: A package of software will be provided from the corresponding author upon request.
Contact: rwu{at}stat.ufl.edu
| INTRODUCTION |
|---|
|
|
|---|
Studies of genome-wide scans for quantitative trait loci (QTL) that determine phenotypic traits have received considerable attention in the past 15 years (Lander and Botstein, 1989; Zeng, 1994; Wu et al., 2002a). The aim of these studies was to understand the genetic architecture of quantitative variation for complex traits of agricultural, evolutionary and biomedical interest (reviewed in Mackay, 2001). The genetic principle behind these studies is the occurrence of recombination events between genetic loci when gametes are formed and transmitted from parents to offspring. Although statistical methods for QTL mapping were proposed originally on the basis of a bivariate approach that associates one gene with one trait, considerable attempts have been made to develop multivariate approaches for mapping multiple interacting QTL (reviewed in Carlborg and Haley, 2004) and multiple correlated phenotypic traits (Jiang and Zeng, 1995; Korol et al., 1995; Knott and Haley, 2000). In this work, we address problems associated with intergenomic epistasis and multiple correlated traits that have not previously been addressed.
First, the genetic interaction between different genes or epistasis provides important fuel for creating novel quantitative genetic variation when an organism is forced to adapt to a new environment (Whitlock et al., 1995). The conventional concept of epistasis implies the effect of an allele at one gene affected by another allele at another gene on the same genome or individual (Falconer and Mackay, 1996). However, there is also another type of epistasis that occurs between different genes each from a different genome or individual (Wolf, 2000; Wolf et al., 1998). Such genomegenome or individualindividual epistasis has been believed to be an important force in maintaining genetic variation in fluctuating environments (Wolf et al., 2002) and to help select optimal life history strategies (Wolf, 2003). An excellent example of genomegenome epistasis is the coordinated regulation among the maternal, embryo and endosperm tissues in a developing seed (Walbot and Evans, 2003). The genetic mapping of genomegenome epistasis based on molecular markers is in its infant stage. Cui et al. (2004) recently published a series of statistical models for detecting epistatic effects on embryo- or endosperm-specific traits between different QTL derived from the maternal, embryo and endosperm genomes in seed plants. These models take into account the genetic and developmental mechanisms for seed development and can be of greater significance in the study of genetic control of seed traits aimed at improving grain production in crops with the aid of molecular biotechnologies.
Second, correlations between different biological traits are ubiquitous, with the pattern and degree of trait correlations thought to be the consequence of natural selection and evolution (Scheiner, 1993). Traditional correlation analysis deals with different traits from the same individual. But it is common for two different traits each from a different individual to be correlated. For example, maternal preferences for oviposition sites affect the survival rate and development of offspring in birds (Lloyd and Martin, 2004). In plants, the level of hormones released by endosperm is thought to guide embryo development (Chaudhury et al., 2001). Genetic mapping approaches for multiple traits capitalize on the information about interrelationships among different traits measured and, therefore, can affect the statistical power of QTL detection. Although a joint analysis of many traits does not necessarily lead to a higher power of detection due to an increased number of parameters being estimated, it has been shown that the statistical power to detect a QTL can be increased by including a few correlated traits. Such an increase in power has been demonstrated using regression methods (Knott and Haley, 2000), a maximum-likelihood method (Korol et al., 1995; Jiang and Zeng, 1995), and variance component models (Almasy et al., 1997). It is particularly favorable to utilize the correlated information when mapping QTL for low heritability traits that are correlated to a trait of higher heritability. Lund et al. (2003) documented several advantages of multitrait QTL mapping over a single trait analysis.
With the burgeoning recognition of the importance of genomegenome epistasis and genetic correlations between individual-specific traits, it is appealing to develop a multivariate statistical model for mapping QTL interactions that affect multiple correlated traits expressed on different individuals or genomes. This motivation stimulates us to develop a high-dimensional model for estimating and testing the gene action and interaction effects on individual-specific traits between the QTL from different genomes. This high-dimensional model was derived from a mixture-based likelihood model and implemented with the expectation(E)maximization(M) algorithm (Dempster et al., 1977) for Monte Carlo simulations under different sampling strategies to investigate the statistical behavior of our multivariate model. The successful detection of interactive QTL in an example, for rice validates the usefulness of this model.
| EXPERIMENTAL DESIGN |
|---|
|
|
|---|
Our model will be developed for a simple backcross, but can be extended to an F2 or other designs. Consider two homozygous inbred lines which are crossed to generate the heterozygous F1. Crossing the F1 to one of the two parents (say the homozygous recessive) leads to two different genotypes at each locus in the backcross. The progeny of the backcross can be obtained through self-pollination for autogamous species, such as rice and soybean or through outcrossing pollination for allogamous species, such as maize and animals.
The backcross is genotyped for a set of molecular markers to construct a genetic linkage map. As shown in Wu et al. (2002a), genotyping the diploid progeny of the backcross with the same set of markers can increase the power to map the QTL that are expressed in the progeny generation, such as the embryo and endosperm of the seed. Here, we suppose that the markers from both the backcross and its diploid progeny are available to characterize interactions between multiple QTL from different genomes. For animals, a genomegenome interaction may occur as a maternaloffspring interaction. For plants, the progeny (seeds) develop within the maternal sporophyte tissue after double fertilization of the gametophyte; hence there are potentially extensive genomegenome interactions. Double fertilization forms the diploid embryo by fusing the haploid egg with one of the sperm cells and the triploid endosperm by fusing the maternal homodiploid central cell with a second sperm cell (Chaudhury et al., 2001). Proper seed development requires the coordinated expression of the maternal, embryo and endosperm tissues (van Hengel et al., 1998; Opsahl-Ferstad et al., 1997). There has been a wealth of evidence for the genetic control of different genes from these three genomes over seed development (Chaudhury et al., 2001; Evans and Kermicle, 2001; Dilkes et al., 2002; Walbot and Evans, 2003). Therefore, for plants, the genomegenome interaction should include three possible types, maternalembryo, maternalendosperm and embryoendosperm. For this reason, the models to characterize genomegenome interactions developed for plant systems will also cover those for animal systems.
If the backcross is assumed to be at generation t, then its progeny obtained through outcrossing pollination is viewed as generation t + 1. Let Pt, Qt+1 and
be three different QTL from the maternal (generation t), embryo (generation t + 1) and endosperm (generation t + 1) genomes, respectively. In generation t, there are two QTL genotypes at Pt, expressed as Ptpt and ptpt, whereas, in generation t+1, there are three QTL genotypes at Qt+1 in the embryo, expressed as Qt+1Qt+1, Qt+1qt+1 and qt+1qt+1, and four QTL genotypes at
in the endosperm, expressed as
and
.
The maternalembryo interaction model
The two QTL from the maternal (Pt) and embryo genomes (Qt+1) form six across-generation QTL genotypes. Their genotypic values for a quantitative trait, denoted by µjtjt+1, where jtjt+1 stands for the genome-specific QTL genotypes in terms of different numbers of capital QTL alleles, are assigned as follows:
|
| (1) |
We treat the genetic map location of the QTL as missing data, to be inferred from known markers by the EM algorithm. The marker information provided differently by the backcross and its offspring will be combined for our mapping model. Assume that the maternal Pt is bracketed by two flanking markers,
and
, genotyped from the backcross, and the offspring Qt+1 is bracketed by two flanking markers,
and
, genotyped from the offspring. Let r, r1 and r2 be the recombination fractions between the two maternal markers, marker
and maternal Qt, and maternal Qt and marker
, respectively. The corresponding recombination fractions are denoted as s, s1 and s2 for the offspring markers and QTL. The conditional probabilities of maternal QTL genotypes given maternal marker interval,
, in the backcross can be expressed in terms of r, r1 and r2. Depending on the pollination type, we can also derive the conditional probabilities of embryo QTL genotypes in terms of s, s1 and s2, given the across-generation marker interval
. Wu et al. (2002) and Cui et al. (2004) provided such conditional probabilities for self-pollinated plants. Similar procedures can be used to derive these conditional probabilities for cross-pollinated plants.
If different from the offspring interval
is different from the marker interval
, the conditional probabilities of across-generation QTL genotypes given across-generation marker genotypes can be calculated as the product of QTL-specific conditional probabilities. If these two markers are the same, i.e. the maternal and offspring QTL are located on the same interval, then the conditional probabilities of across-generation QTL genotypes should be derived independently (Cui et al., 2004). These conditional probabilities will be used for the test and estimation of the positions of the two interacting QTL.
The maternalendosperm interaction model
Across-generation QTL genotypes for the maternal (Pt) and endosperm
genomes include eight combinations between two maternal genotypes and four endosperm genotypes. The genotypic values of the maternalendosperm QTL genotypes, µjtjt+1, can be assigned as follows:
|
| (2) |
is the additive effect at endosperm
,
and
are the dominance effects due to the intra-locus interaction between QQ and q and between Q and qq at
, respectively, I' is the cross-generation maternal-additive x endosperm-additive epistatic effect, and
and
are the across-generation maternal-additive x endosperm-dominant epistatic effects for d(t+1)1 and d(t+1)2, respectively.
Assume that a pair of flanking markers
and
are used to map the endosperm
. Let s',
and
be the recombination fractions between the two markers, marker
and the QTL, and the QTL and marker
, respectively. The conditional probabilities of endosperm QTL genotypes given the across-generation maternalembryo marker genotypes can be derived in terms of s',
and
, depending on the type of pollination. These conditional probabilities for self-pollinated plants have been derived by some groups. The conditional probabilities for cross-pollinated plants can be similarly derived.
The embryoendosperm interaction model
For the embryo (Qt+1) and endosperm
) QTL at the same generation t + 1, we have 12 joint QTL genotypes whose values, µjtjt+1, are expressed as
|
| (3) |
is the embryo-additive x endosperm-additive and embryo-dominant x endosperm-additive epistatic effect between embryo Qt+1 and endosperm
,
and
are the embryo-additive x endosperm-additive epistatic effect for d(t+1)1 and d(t+1)2, respectively,
is the embryo-dominant x endosperm-dominant epistatic effect, and
and
are the embryo-dominant x endosperm-dominant epistatic effects for d(t+1)1 and d(t+1)2, respectively. Similarly, the conditional probabilities of embryoendosperm QTL genotypes given across-generation marker genotypes can be derived separately for two different cases in which the two QTL are located in the same interval or in different intervals. Such derivations will be different for self- and cross-pollinated systems.
As shown in Cui et al. (2004), the genetic effect parameter vectors h1=(µ,at,at+1,dt+1,I,J) for the maternalembryo interaction model,
for the maternalendosperm interaction model and
for the embryoendosperm interaction model can be estimated from the corresponding genotypic values, µjtjt+1, by solving a group of regular linear equations as contained in matrices (1)(3). As can be seen below, we derive a closed-form solution for the EM algorithm to obtain the maximum-likelihood estimates (MLEs) of the genotypic values. Thus, the MLEs of the genetic effect parameters can be estimated accordingly.
| STATISTICAL METHOD |
|---|
|
|
|---|
Statistical model for multiple traits
Let us suppose there are three quantitative traits, one expressed in the maternal tissue (denoted by x), the second expressed in the embryo tissue (denoted by y) and the third expressed in the endosperm tissue (denoted by z). The three QTL from different genomes, Pt, Qt+1 and
, interact through coordinated pathways to affect each of these three traits. The statistical models for the phenotypic values of the three traits affected by the hypothetical epistatic QTL are formulated for each of the three types of genomegenome interactions.
For the maternalembryo interaction model, the bivariate phenotypes (xi,yi) for seed i in the backcross population in terms of genotypic values, can be expressed, as
![]() | (4) |
ijtjt+1 is the indicator variable defined as 1 if seed i carries the maternalembryo QTL genotype jtjt+1 and 0 otherwise;
and
are the values of QTL genotype jtjt+1 for two traits x and y, respectively, and
and
are the residual errors that follow a bivariate normal distribution with means zero and covariance matrix
![]() |
Equation (4) can be written, in matrix notation, as
![]() | (5) |
is the vector for the genotypic values of a joint maternalembryo QTL genotype and
is the vector for the residual effects of seed i.
For self-pollinated plants, the maternal parent receives no genes from other sources to generate its progeny. Thus, the gene segregation in the progeny would not lead to the variation of the maternal trait. To reflect this characteristic, the maternalembryo interaction that occurs across generations should be modeled with the constraints
![]() | (6) |
[see Matrix (1)].
Similarly, we can formulate a statistical model for the maternalendosperm interaction, except for four triploid QTL genotypes at
. But the embryoendosperm interaction model will be different because such an interaction occurs within the same generation in which embryo (y) and endosperm traits (z) are also affected by a QTL from the opposite genome. The bivariate model for phenotypic traits (y,z) can be expressed as
![]() | (7) |
is the indicator for the embryoendosperm QTL genotype,
is the vector for the genotypic values of a joint embryoendosperm QTL genotype and
is the vector for the residual effects of seed i.
Bivariate mixture model
Finite mixture models are a type of density model that comprises a number of component functions, usually Gaussian. These component functions are combined to provide a multimodal density. Gaussian mixture models can be employed to model genotypic segregation of specific genetic factors that determine quantitative traits. According to mixture models, each observation is assumed to have arisen from one of a known or unknown number of components (QTL genotypes), each component being modeled by a multivariate normal distribution density. Under the maternalembryo epistasis model, the bivariate likelihood function of phenotypic traits (u) and marker data (
) based on mixture models is expressed as
![]() | (8) |
= {
jtjt+1|i} is the vector for the conditional (or prior) probability of maternalembryo QTL genotype jtjt+1 given a particular across-generation marker genotype for seed i and m = {mjtjt+1} is the vector of genotypic means for two traits that follow a bivariate normal distribution N(mjtjt+1,
). With the knowledge about conditional probabilities and genotypic values, we can construct similar mixture-based likelihood functions for the maternalendosperm and embryoendosperm interaction models. We provide a procedure for estimating the parameters contained in the likelihood functions.
The EM algorithm
Conditional probabilities are a function of the recombination fractions between QTL and their flanking markers and therefore can provide the information about QTL locations. Mean vectors and the covariance matrix are quantitative genetic parameters associated with the genetic effects of QTL. Let
= (
,m,
) denote the unknown parameters. We implement the EM algorithm to obtain the MLE of
. The log-likelihood function of Equation (8) for the maternalembryo interaction model is given by
![]() | (9) |

,
![]() |
![]() | (10) |
,
}, where
= {
jtjt+1|i}. Conditional on
, we solve for the zeros of (
/

)log L(
) to get our estimates of
.
In the E-step, the prior conditional probabilities of the QTL genotypes given the marker genotypes and the normal distribution function are used to calculate the
jtjt+1|i matrix. In the M-step, the calculated posterior probabilities are used to solve the unknown parameters using
![]() | (11) |
![]() | (12) |
In the procedure described above for the EM algorithm, we treated the positions of QTL as known parameters, although their MLEs can also be obtained through iterative steps. We can use a grid approach to estimate the QTL positions. By hypothesizing a pair of embryo and endosperm QTL every 2 cM at marker intervals, we can draw the landscape of log-likelihood test statistics throughout the entire genome. The positions corresponding to the peak of the landscape across a linkage group are the MLEs of the QTL positions.
The MLEs of the QTL positions and effects under the maternalendosperm and embryoendosperm epistasis models can be similarly derived. The QTL effects are specified differently among these three models, depending on the dosage of QTL alleles (Table 1). As like in general QTL mapping models, the proportion of the total variance explained by each QTL from a different genome can be calculated for each trait.
|
Hypothesis testing
A number of statistical hypothesis tests can be performed for the underlying parameters of interest. The presence of the QTL from different genomes with joint effects on two quantitative traits expressed in different tissues can be tested by a log-likelihood ratio (LLR) test statistic calculated under the full model (assuming that there are such QTL) and the reduced model (assuming that there is no QTL). The LLR is asymptotically
2-distributed with the degrees of freedom that are equivalent to the number of unknown parameters estimated. For a mixture model like ours here, this may be violated due to some regularity problem (McLachlan and Peel, 2000). The critical threshold value for declaring the existence of the testing QTL is empirically calculated on the basis of permutation tests (Churchill and Doerge, 1994). After the existence of QTL from different genomes is tested, we can test the additive and dominant QTL effect from a particular genome and additive x additive, additive x dominant, dominant x additive and dominant x dominant epistatic effects derived from two different genomes. Our model allows for testing the effects of specific QTL on individual traits, although, for our experimental design, different genomegenome interaction models characterize different types of genetic effects. All these effect-specific tests are performed by implementing the EM algorithm and the critical value for declaring significance can be obtained empirically through simulation studies.
| A WORKED EXAMPLE |
|---|
|
|
|---|
The newly developed model was used to analyze published data on the endosperm in rice (Tan et al., 1999). The F1 heterozygote between two rice inbred lines, ZS97 and MH63, was self-crossed for 9 generations to produce 241 recombinant inbred lines (RILs) for high-resolution genetic mapping of genes influencing endosperm traits. These RILs that are homozygous for the alternative alleles were genotyped for 221 polymorphic markers distributed throughout the genome to construct a molecular linkage map composed of 12 rice chromosomes. These RILs as the female parent were backcrossed toward one original inbred line, ZS97, as the male parent to generate a backcross population containing 241 plants. All the backcross plants were evaluated for gel consistency in their endosperm tissues in two successive years (1999 and 2000) to determine any major QTL segregating in this material.
Because of the nature of this pedigree, we make some modifications to our general embryoendosperm model to identify interacting QTL on embryo and endosperm tissues. First, the conditional probabilities that suit this pedigree are derived to predict the embryoendosperm QTL genotypes based on the markers collected in the embryo. Second, in this design, the number of embryoendosperm QTL genotypes is reduced to 4 and, thus, the genetic effects that can be estimated are the additive effects of embryo Qt+1 (at+1) and endosperm
and additive x additive epistatic effect (
) between these two QTL. Third, our model was originally developed to analyze the phenotypes expressed in the embryo and endosperm, but the data for this design were collected from the endosperm in two different years. According to Falconer (1952), the same trait measured in different years can be viewed as different traits.
The phenotypic correlation between endosperm gel consistency measured in two different years is 0.68, suggesting that some common genetic basis is shared over years. A genome-wide scan was performed to detect the existence and distribution of interacting QTL throughout the entire genome. Significant joint genetic effects were detected between two QTL on chromosomes 6 and 8. The maximum LLR value throughout the genome is 270.9, markedly larger than the genome-wide critical threshold 30.5, empirically obtained from permutation tests at the 0.005 significance level. One of the detected significant QTL is located at 12.0 cM from the first marker on chromosome 6 of the embryo genome, whereas the second QTL is located at 29.4 cM from the first marker on chromosome 8 of the endosperm genome. The embryo QTL is located at a candidate gene, Waxy, that is associated with a critical step of amylose biosynthesis (Okagaki and Wessler, 1988), which well validates our model.
We estimated the additive effect, at+1, of the embryo QTL, the additive effect,
, of the endosperm QTL and their epistatic effects,
, on gel consistency in two different years (Table 1). Further hypotheses were performed for the significance tests of the additive and epistatic genetic effects. The LLRs for testing the significance of these effect parameters suggest that the additive effect of the embryo QTL is highly significant, whereas the additive effect of the endosperm QTL and the additive x additive effect between the two QTL are significant, but at lower levels.
In this example, we can use our model to test how genetic effects are expressed differently from year to year. If the genetic effect of a QTL is year-dependent, then this QTL is thought to display a significant genotype x year interaction. Figure 1 illustrates the unparallel changes of the four joint embryoendosperm QTL genotypes across different years for gel consistency in the endosperm. The LLR test for the year-dependent non-parallel response suggests that there are significant QTL x year interactions (P < 0.0001). Further tests indicate that the additive effects of the QTL from the two genomes are expressed differently between the two years studied (P = 6.42 x 1012 for the embryo QTL and P = 1.98x 106 for the endosperm QTL; Table 1). The additive x additive epistatic effect between the embryo and endosperm QTL is also different between the two years (P=0.005). These pieces of information obtained from data analyses by our model are fundamental to the design of crop breeding aimed at improving high-quality starch in rice.
|
| MONTE CARLO SIMULATION |
|---|
|
|
|---|
We carried out a series of simulation studies to examine the statistical properties of our genomegenome models by focusing on the epistatic model from the embryo and endosperm genomes. A similar statistical behavior should be held for the other two epistatic models, maternalembryo and maternalendosperm. Our simulation studies aim to examine the model performance under different situations when heritability, sample size and QTL location change. Five equidistant markers are simulated from the embryo population and are ordered as
on a linkage group with the length of 80 cM. The Haldane map function was used to convert the map distance into the recombination fraction. For simplicity, we use two traits to achieve our goals. Three different combinations of heritability between two traits (0.1, 0.1), (0.1, 0.4) and (0.4, 0.4) and two different sample sizes (200, 400) were used.
Suppose there are two different putative QTL on the embryo and endosperm genomes. Both the embryo (Qt+1) and endosperm
QTL are assumed to pleiotropically affect two traits, one expressed in the embryo (y) and the other expressed in the endosperm (z). The two QTL could be either linked together and located on the same marker interval or located on different marker intervals. The phenotypic values for each seed were simulated according to a bivariate normal distribution with different joint QTL genotypic values, determined by effect parameters, the overall (µ), additive effect of Qt+1 (at+1), additive effect of
, the additive x additive epistatic effect (
) between the two QTL for each trait, y and z, and residual variances (
2) and correlation (
).
Tables 2 and 3 give the hypothesized values and MLEs of the QTL effect parameters for each trait, as well as the square roots of the mean squared errors used to evaluate the precision and accuracy of the parameter estimation, under different simulation schemes. In general, our model can provide reasonable estimates of the parameters with estimation precision increasing with increased heritability levels and sampling sizes. The QTL position estimates when located in the same interval (Table 3) were not as good as when they were located at different intervals (Table 2). But this problem can be avoided if it is possible to increase the density of mapped markers to reduce the probability that two QTL are located in the same interval.
|
|
Our model has an excellent capacity to detecst epistatically interacting embryo and endosperm QTL effects. In all cases of different sample sizes and heritabilities, the maximum values of the LLR landscapes from 100 simulation replicates are all beyond the critical thresholds at the
= 0.001 level determined from 1000 permutation tests for the simulated data. Furthermore, there is reasonable estimation precision for the additive x additive genetic effects even when the heritability is at a modest level. | DISCUSSION |
|---|
|
|
|---|
We have proposed a general statistical framework for simultaneously mapping multiple correlated traits expressed in different genome-specific tissues. Different from previous multitrait QTL mapping (Jiang and Zeng, 1995; Korol et al., 1995; Knott and Haley, 2000; Evans, 2002; Lund et al., 2003), our model framework implements interactions between multiple QTL located on different genomes. It has been well recognized that the coordinated expression of genes from different genomes is essential for the proper development of organs. For example, in higher plants, support and nourishment of embryo and endosperm tissues by the maternal tissue is fundamental to proper seed development (Chaudhury et al., 2001; Evans and Kermicle, 2001; Dilkes et al., 2002; Walbot and Evans, 2003).
The current literature has well established the belief that multiple correlated traits can add information to each other and, therefore, multitrait linkage analysis can give rise to more precise inferences about the position and effects of pleiotropic QTL affecting multiple traits, as compared to single-trait analyses (Jiang and Zeng, 1995; Korol et al., 1995; Knott and Haley, 2000; Evans, 2002; Wu et al., 2002c; Lund et al., 2003). Somewhat equivalent to the role of repeated measurements, information from correlated traits can reduce the effect of error variance, thus making it easier (more powerful) to detect QTL. Not only is the power of QTL detection increased, but also the estimation of the QTL map position is more precise. The model proposed in this paper deals with a different type of trait correlation that occurs between different individuals connected through coherent pathways. The best example is the impact of the growth vigor of a plant on its seed development by supplying adequate nutrients. In light of the consideration of the coordinated expression of traits owing to genes and development, our model, which can be viewed as high-dimensional, should be able to produce results that are closer to biological realism than those without such a solid developmental basis of phenotypic traits.
The statistical behavior of our high-dimensional model has been carefully investigated through computer simulation. The model has been found to provide reasonable power and estimation of interactive QTL from the embryo and endosperm genomes in a range of trait heritabilities and sample sizes. Nevertheless, the best validation for our model may be the successful detection of significant QTL that exert considerable effects on an endosperm trait measured in two consecutive years. These two annual measurements can be viewed as two different traits (Falconer, 1952). Previous approaches for endosperm mapping are purely based on the triploid inheritance of the endosperm (Wu et al., 2002a; Wu et al., 2002b; Xu et al., 2003; Kao, 2004). Our model has the power to identify interactive QTL from the embryo and endosperm genomes. Using our high-dimensional model, both the embryo and endosperm genomes were detected to harbor QTL for gel consistency in rice, with the embryo QTL located almost at the same position as the Waxy gene on the short arm of chromosome 6 (Terada et al., 2002). The Waxy gene is known to influence a major step in amylose synthesis in the endosperm for many grasses including maize and rice. Our bivariate mapping model also has the power to discern how genetic effects of the embryo and endosperm QTL are different across years. Whereas the embryo QTL triggers a large effect on gel consistency, a significant additive effect x interaction year of the endosperm QTL suggests that this QTL can modify the endosperm trait to make seed development better adapted to a year-to-year environmental change. Beyond traditional single trait mapping, our high-dimensional mapping model can detect the interaction for gel consistency between the additive x additive epistatic effect and year of interaction. Further functional analysis of these detected embryo and endosperm QTL will accelerate their usefulness to improve the quality and quantity of rice grains.
The derivations of our model were based on the plant system that undergoes self-pollinated reproduction. This model can be extended in several aspects. First, by incorporating unique segregation patterns of genes in the mixture-based likelihood function, this model can be modified to map genomegenome interactive QTL for cross-pollinated systems. Such a modified model will also be useful for animals in which birth weight is influenced by the uterine environment through the coordinated expression of the maternal and offspring QTL. Second, a mature cereal plant contains three sets of genomes, the maternal, embryo and endosperm. The current model allows for the modeling of interactions between any two sets of genomes. It is crucial to extend it to consider the triple-genome interactions among these three organs. With this triple interaction model, we can understand better the network of gene expression and regulation during seed development.
| Acknowledgments |
|---|
We thank the two anonymous referees for their constructive comments on this manuscript. This work is supported by an Outstanding Young Investigator Award of the National Natural Science Foundation of China (30128017), a University of Florida Research Opportunity Fund (02050259) and a University of South Florida Biodefense grant (7222061-12) to R.W. We thank Drs Jianguo Wu and Chunhai Shi at Zhejiang University for providing molecular marker and phenotypic data. The publication of this manuscript was approved as Journal Series No. R-10584 by the Florida Agricultural Experiment Station.
Received on December 20, 2004; revised on January 25, 2005; accepted on February 17, 2005
| REFERENCES |
|---|
|
|
|---|
Almasy, L., et al. (1997) Bivariate quantitative trait linkage analysis: pleiotropy versus co-incident linkages. Genet. Epidemiol., 14, 953958[CrossRef][ISI][Medline].
Chaudhury, A.M., et al. (2001) Control of early seed development. Ann. Rev. Cell Dev. Biol., 17, 677699[CrossRef][ISI][Medline].
Churchill, G.A. and Doerge, R.W. (1994) Empirical threshold values for quantitative trait mapping. Genetics, 138, 963971[Abstract].
Cui, Y.H., et al. (2004) Mapping quantitative trait locus interactions from the maternal and offspring genomes. Genetics, 167, 10171026
Dempster, A.P., et al. (1977) Maximum likelihood from incomplete data via EM algorithm. J. Roy. Stat. Soc. B, 39, 138.
Dilkes, B.P., et al. (2002) Genetic analyses of endoreduplication in Zea mays endosperm: evidence of sporophytic and zygotic maternal control. Genetics, 160, 11631177
Evans, D.M. (2002) The power of multivariate quantitative-trait loci linkage analysis is influenced by the correlation between the variables. Am. J. Hum. Genet., 70, 15991602[CrossRef][ISI][Medline].
Evans, M.M.S. and Kermicle, J.L. (2001) Interaction between maternal effect and zygotic effect mutations during maize seed development. Genetics, 159, 303315
Falconer, D.S. (1952) The problem of environment and selection. Am. Nat., 86, 293298[CrossRef].
Falconer, D.S. and Mackay, T.F.C. Introduction to Quantitative Genetics, (1996) edn. 4 , Harlow, Essex, UK Longmans Green.
Jiang, C. and Zeng, Z.-B. (1995) Multiple trait analysis of genetic mapping of quantitative trait loci. Genetics, 140, 11111127[Abstract].
Kao, C.-H. (2004) Multiple-interval mapping for quantitative trait loci controlling endosperm traits. Genetics, 167, 19872002
Knott, S.A. and Haley, C.S. (2000) Multitrait least squares for quantitative trait loci detection. Genetics, 156, 899911
Korol, A.B., et al. (1995) Interval mapping of quantitative trait loci employing correlated trait complexes. Theor. Appl. Genet., 92, 9981002[CrossRef].
Lander, E.S. and Botstein, D. (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics, 121, 185199
Lloyd, D.J. and Martin, T.E. (2004) Nest-site preference and maternal effects on offspring growth. Behavioral Ecology, 15, 816823
Lund, M.S., et al. (2003) Multitrait fine mapping of quantitative trait loci using combined linkage disequilibria and linkage analysis. Genetics, 163, 405410
Mackay, T.F.C. (2001) Quantitative trait loci in Drosophila. Nat. Rev. Genet., 2, 1120[ISI][Medline].
McLachlan, G.J. and Peel, D. Finite Mixture Models, (2000) , New York Wiley.
Okagaki, R.J. and Wessler, S.R. (1988) Comparison of non-mutant and mutant waxy genes in rice and maize. Genetics, 120, 11371143
Opsahl-Ferstad, H.G., et al. (1997) ZmEsr, a novel endosperm-specific gene expressed in a restricted region around the maize embryo. Plant J., 12, 235246[CrossRef][ISI][Medline].
Scheiner, S.M. (1993) Genetics and evolution of phenotypic plasticity. Ann. Rev. Ecol. Syst., 24, 2568.
Tan, Y.F., et al. (1999) The three important traits for cooking and eating quality of rice grains are controlled by a single locus in an elite rice hybrid, Shanyou 63. Theor. Appl. Genet., 99, 642648[CrossRef].
Terada, R., et al. (2002) Efficient gene targeting by homologous recombination in rice. Nat. Biotech., 20, 10301034[CrossRef][ISI][Medline].
van Hengel, A.J., et al. (1998) Expression pattern of the carrot EP3 endochitinase genes in suspension cultures and in developing seeds. Plant Phys., 117, 4353
Walbot, W. and Evans, N.M.S. (2003) Unique features of the plant life cycle and their consequences. Nat. Rev. Genet., 4, 369379[CrossRef][ISI][Medline].
Whitlock, M.C., et al. (1995) Multiple fitness peaks and epistasis. In . Ann. Rev. Ecol. Syst., 26, 601629[CrossRef][ISI].
Wolf, J.B. (2000) Gene interactions from maternal effects. Evolution, 54, 18821898[CrossRef][ISI][Medline].
Wolf, J.B. (2003) Genetic architecture and evolutionary constraint when the environment contains genes. Proc. Natl Acad. Sci. USA, 100, 46554660
Wolf, J.B., et al. (1998) Evolutionary consequences of indirect genetic effects. Trends Ecol. Evol., 13, 6469[CrossRef].
Wolf, J.B., et al. (2002) Contribution of maternal effect QTL to genetic architecture of early growth in mice. Heredity, 89, 300310[CrossRef][ISI][Medline].
Wu, R.L., et al. (2002a) Statistical methods for dissecting triploid endosperm traits using molecular markers: an autogamous model. Genetics, 162, 875892
Wu, R.L., et al. (2002b) An improved genetic model generates high-resolution mapping of QTL for protein quality in maize endosperm. Proc. Natl Acad. Sci. USA, 99, 1128111286
Wu, R.L., et al. (2002c) A statistical model for the genetic origin of allometric scaling laws in biology. J. Theor. Biol., 217, 275287[CrossRef][ISI][Medline].
Xu, C., et al. (2003) Mapping quantitative trait loci underlying triploid endosperm traits. Heredity, 90, 228235[CrossRef][ISI][Medline].
Zeng, Z.-B. (1994) Precision mapping of quantitative trait loci. Genetics, 136, 14571468[Abstract].
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||














QTL and their additive x additive epistatic interaction effect (