Skip Navigation


Bioinformatics Advance Access originally published online on February 22, 2005
Bioinformatics 2005 21(10):2447-2455; doi:10.1093/bioinformatics/bti342
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2447    most recent
bti342v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Cui, Y.
Right arrow Articles by Wu, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Cui, Y.
Right arrow Articles by Wu, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

Mapping genome–genome epistasis: a high-dimensional model

Yuehua Cui and Rongling Wu *

Department of Statistics, University of Florida Gainesville, FL 32611, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 INTRODUCTION
 EXPERIMENTAL DESIGN
 STATISTICAL METHOD
 A WORKED EXAMPLE
 MONTE CARLO SIMULATION
 DISCUSSION
 REFERENCES
 

Motivation: The proper development of any organ or tissue requires the coordinated expression of its underlying genes that can be located on different genomes present in an organism. For instance, each step in the development of seed for a higher plant is the consequence of gene interactions from the maternal, embryo and endosperm genomes.

Results: We present a multivariate statistical model for mapping quantitative trait loci (QTL) by incorporating two important aspects of seed development in plants—QTL interactions derived from different genomes, the maternal, embryo and endosperm, and genetic correlations among phenotypic traits expressed in different genome-specific tissues. This model, which has a high dimensionality, is constructed within the maximum-likelihood context based on a finite mixture model. The implementation of the expectation–maximization algorithm allows for the efficient estimation of QTL positions, their action and interaction effects and pleiotropic effects. The application of this high-dimensional model to a real rice dataset has validated its usefulness.

Conclusions: Our model was derived for self-pollinated plants, but it can be extended to cross-pollinated plants and to animals. With the burgeoning of genetic and genomic data, this high-dimensional model will have many implications for agricultural and evolutionary genetic research.

Availability: A package of software will be provided from the corresponding author upon request.

Contact: rwu{at}stat.ufl.edu


    INTRODUCTION
 TOP
 Abstract
 INTRODUCTION
 EXPERIMENTAL DESIGN
 STATISTICAL METHOD
 A WORKED EXAMPLE
 MONTE CARLO SIMULATION
 DISCUSSION
 REFERENCES
 
Studies of genome-wide scans for quantitative trait loci (QTL) that determine phenotypic traits have received considerable attention in the past 15 years (Lander and Botstein, 1989; Zeng, 1994; Wu et al., 2002a). The aim of these studies was to understand the genetic architecture of quantitative variation for complex traits of agricultural, evolutionary and biomedical interest (reviewed in Mackay, 2001). The genetic principle behind these studies is the occurrence of recombination events between genetic loci when gametes are formed and transmitted from parents to offspring. Although statistical methods for QTL mapping were proposed originally on the basis of a bivariate approach that associates one gene with one trait, considerable attempts have been made to develop multivariate approaches for mapping multiple interacting QTL (reviewed in Carlborg and Haley, 2004) and multiple correlated phenotypic traits (Jiang and Zeng, 1995; Korol et al., 1995; Knott and Haley, 2000). In this work, we address problems associated with intergenomic epistasis and multiple correlated traits that have not previously been addressed.

First, the genetic interaction between different genes or epistasis provides important fuel for creating novel quantitative genetic variation when an organism is forced to adapt to a new environment (Whitlock et al., 1995). The conventional concept of epistasis implies the effect of an allele at one gene affected by another allele at another gene on the same genome or individual (Falconer and Mackay, 1996). However, there is also another type of epistasis that occurs between different genes each from a different genome or individual (Wolf, 2000; Wolf et al., 1998). Such genome–genome or individual–individual epistasis has been believed to be an important force in maintaining genetic variation in fluctuating environments (Wolf et al., 2002) and to help select optimal life history strategies (Wolf, 2003). An excellent example of genome–genome epistasis is the coordinated regulation among the maternal, embryo and endosperm tissues in a developing seed (Walbot and Evans, 2003). The genetic mapping of genome–genome epistasis based on molecular markers is in its infant stage. Cui et al. (2004) recently published a series of statistical models for detecting epistatic effects on embryo- or endosperm-specific traits between different QTL derived from the maternal, embryo and endosperm genomes in seed plants. These models take into account the genetic and developmental mechanisms for seed development and can be of greater significance in the study of genetic control of seed traits aimed at improving grain production in crops with the aid of molecular biotechnologies.

Second, correlations between different biological traits are ubiquitous, with the pattern and degree of trait correlations thought to be the consequence of natural selection and evolution (Scheiner, 1993). Traditional correlation analysis deals with different traits from the same individual. But it is common for two different traits each from a different individual to be correlated. For example, maternal preferences for oviposition sites affect the survival rate and development of offspring in birds (Lloyd and Martin, 2004). In plants, the level of hormones released by endosperm is thought to guide embryo development (Chaudhury et al., 2001). Genetic mapping approaches for multiple traits capitalize on the information about interrelationships among different traits measured and, therefore, can affect the statistical power of QTL detection. Although a joint analysis of many traits does not necessarily lead to a higher power of detection due to an increased number of parameters being estimated, it has been shown that the statistical power to detect a QTL can be increased by including a few correlated traits. Such an increase in power has been demonstrated using regression methods (Knott and Haley, 2000), a maximum-likelihood method (Korol et al., 1995; Jiang and Zeng, 1995), and variance component models (Almasy et al., 1997). It is particularly favorable to utilize the correlated information when mapping QTL for low heritability traits that are correlated to a trait of higher heritability. Lund et al. (2003) documented several advantages of multitrait QTL mapping over a single trait analysis.

With the burgeoning recognition of the importance of genome–genome epistasis and genetic correlations between individual-specific traits, it is appealing to develop a multivariate statistical model for mapping QTL interactions that affect multiple correlated traits expressed on different individuals or genomes. This motivation stimulates us to develop a high-dimensional model for estimating and testing the gene action and interaction effects on individual-specific traits between the QTL from different genomes. This high-dimensional model was derived from a mixture-based likelihood model and implemented with the expectation(E)–maximization(M) algorithm (Dempster et al., 1977) for Monte Carlo simulations under different sampling strategies to investigate the statistical behavior of our multivariate model. The successful detection of interactive QTL in an example, for rice validates the usefulness of this model.


    EXPERIMENTAL DESIGN
 TOP
 Abstract
 INTRODUCTION
 EXPERIMENTAL DESIGN
 STATISTICAL METHOD
 A WORKED EXAMPLE
 MONTE CARLO SIMULATION
 DISCUSSION
 REFERENCES
 
Our model will be developed for a simple backcross, but can be extended to an F2 or other designs. Consider two homozygous inbred lines which are crossed to generate the heterozygous F1. Crossing the F1 to one of the two parents (say the homozygous recessive) leads to two different genotypes at each locus in the backcross. The progeny of the backcross can be obtained through self-pollination for autogamous species, such as rice and soybean or through outcrossing pollination for allogamous species, such as maize and animals.

The backcross is genotyped for a set of molecular markers to construct a genetic linkage map. As shown in Wu et al. (2002a), genotyping the diploid progeny of the backcross with the same set of markers can increase the power to map the QTL that are expressed in the progeny generation, such as the embryo and endosperm of the seed. Here, we suppose that the markers from both the backcross and its diploid progeny are available to characterize interactions between multiple QTL from different genomes. For animals, a genome–genome interaction may occur as a maternal–offspring interaction. For plants, the progeny (seeds) develop within the maternal sporophyte tissue after double fertilization of the gametophyte; hence there are potentially extensive genome–genome interactions. Double fertilization forms the diploid embryo by fusing the haploid egg with one of the sperm cells and the triploid endosperm by fusing the maternal homodiploid central cell with a second sperm cell (Chaudhury et al., 2001). Proper seed development requires the coordinated expression of the maternal, embryo and endosperm tissues (van Hengel et al., 1998; Opsahl-Ferstad et al., 1997). There has been a wealth of evidence for the genetic control of different genes from these three genomes over seed development (Chaudhury et al., 2001; Evans and Kermicle, 2001; Dilkes et al., 2002; Walbot and Evans, 2003). Therefore, for plants, the genome–genome interaction should include three possible types, maternal–embryo, maternal–endosperm and embryo–endosperm. For this reason, the models to characterize genome–genome interactions developed for plant systems will also cover those for animal systems.

If the backcross is assumed to be at generation t, then its progeny obtained through outcrossing pollination is viewed as generation t + 1. Let Pt, Qt+1 and be three different QTL from the maternal (generation t), embryo (generation t + 1) and endosperm (generation t + 1) genomes, respectively. In generation t, there are two QTL genotypes at Pt, expressed as Ptpt and ptpt, whereas, in generation t+1, there are three QTL genotypes at Qt+1 in the embryo, expressed as Qt+1Qt+1, Qt+1qt+1 and qt+1qt+1, and four QTL genotypes at in the endosperm, expressed as and .

The maternal–embryo interaction model
The two QTL from the maternal (Pt) and embryo genomes (Qt+1) form six across-generation QTL genotypes. Their genotypic values for a quantitative trait, denoted by µjtjt+1, where jtjt+1 stands for the genome-specific QTL genotypes in terms of different numbers of capital QTL alleles, are assigned as follows:


{bti342e1}

(1)
where µ is the overall mean, at and at+1 are the additive effects of the maternal Pt and embryo Qt+1, respectively, dt+1 is the dominant effect of embryo Qt+1, and I and J are the across-generation maternal-additive x embryo-additive and maternal-additive x embryo-dominant effects between the two QTL, respectively.

We treat the genetic map location of the QTL as missing data, to be inferred from known markers by the EM algorithm. The marker information provided differently by the backcross and its offspring will be combined for our mapping model. Assume that the maternal Pt is bracketed by two flanking markers, and , genotyped from the backcross, and the offspring Qt+1 is bracketed by two flanking markers, and , genotyped from the offspring. Let r, r1 and r2 be the recombination fractions between the two maternal markers, marker and maternal Qt, and maternal Qt and marker , respectively. The corresponding recombination fractions are denoted as s, s1 and s2 for the offspring markers and QTL. The conditional probabilities of maternal QTL genotypes given maternal marker interval, , in the backcross can be expressed in terms of r, r1 and r2. Depending on the pollination type, we can also derive the conditional probabilities of embryo QTL genotypes in terms of s, s1 and s2, given the across-generation marker interval . Wu et al. (2002) and Cui et al. (2004) provided such conditional probabilities for self-pollinated plants. Similar procedures can be used to derive these conditional probabilities for cross-pollinated plants.

If different from the offspring interval is different from the marker interval , the conditional probabilities of across-generation QTL genotypes given across-generation marker genotypes can be calculated as the product of QTL-specific conditional probabilities. If these two markers are the same, i.e. the maternal and offspring QTL are located on the same interval, then the conditional probabilities of across-generation QTL genotypes should be derived independently (Cui et al., 2004). These conditional probabilities will be used for the test and estimation of the positions of the two interacting QTL.

The maternal–endosperm interaction model
Across-generation QTL genotypes for the maternal (Pt) and endosperm genomes include eight combinations between two maternal genotypes and four endosperm genotypes. The genotypic values of the maternal–endosperm QTL genotypes, µjtjt+1, can be assigned as follows:


{bti342e2}

(2)
where is the additive effect at endosperm , and are the dominance effects due to the intra-locus interaction between QQ and q and between Q and qq at , respectively, I' is the cross-generation maternal-additive x endosperm-additive epistatic effect, and and are the across-generation maternal-additive x endosperm-dominant epistatic effects for d(t+1)1 and d(t+1)2, respectively.

Assume that a pair of flanking markers and are used to map the endosperm . Let s', and be the recombination fractions between the two markers, marker and the QTL, and the QTL and marker , respectively. The conditional probabilities of endosperm QTL genotypes given the across-generation maternal–embryo marker genotypes can be derived in terms of s', and , depending on the type of pollination. These conditional probabilities for self-pollinated plants have been derived by some groups. The conditional probabilities for cross-pollinated plants can be similarly derived.

The embryo–endosperm interaction model
For the embryo (Qt+1) and endosperm ) QTL at the same generation t + 1, we have 12 joint QTL genotypes whose values, µjtjt+1, are expressed as


{bti342e3}

(3)
where is the embryo-additive x endosperm-additive and embryo-dominant x endosperm-additive epistatic effect between embryo Qt+1 and endosperm , and are the embryo-additive x endosperm-additive epistatic effect for d(t+1)1 and d(t+1)2, respectively, is the embryo-dominant x endosperm-dominant epistatic effect, and and are the embryo-dominant x endosperm-dominant epistatic effects for d(t+1)1 and d(t+1)2, respectively.

Similarly, the conditional probabilities of embryo–endosperm QTL genotypes given across-generation marker genotypes can be derived separately for two different cases in which the two QTL are located in the same interval or in different intervals. Such derivations will be different for self- and cross-pollinated systems.

As shown in Cui et al. (2004), the genetic effect parameter vectors h1=(µ,at,at+1,dt+1,I,J) for the maternal–embryo interaction model, for the maternal–endosperm interaction model and for the embryo–endosperm interaction model can be estimated from the corresponding genotypic values, µjtjt+1, by solving a group of regular linear equations as contained in matrices (1)(3). As can be seen below, we derive a closed-form solution for the EM algorithm to obtain the maximum-likelihood estimates (MLEs) of the genotypic values. Thus, the MLEs of the genetic effect parameters can be estimated accordingly.


    STATISTICAL METHOD
 TOP
 Abstract
 INTRODUCTION
 EXPERIMENTAL DESIGN
 STATISTICAL METHOD
 A WORKED EXAMPLE
 MONTE CARLO SIMULATION
 DISCUSSION
 REFERENCES
 
Statistical model for multiple traits
Let us suppose there are three quantitative traits, one expressed in the maternal tissue (denoted by x), the second expressed in the embryo tissue (denoted by y) and the third expressed in the endosperm tissue (denoted by z). The three QTL from different genomes, Pt, Qt+1 and , interact through coordinated pathways to affect each of these three traits. The statistical models for the phenotypic values of the three traits affected by the hypothetical epistatic QTL are formulated for each of the three types of genome–genome interactions.

For the maternal–embryo interaction model, the bivariate phenotypes (xi,yi) for seed i in the backcross population in terms of genotypic values, can be expressed, as

(4)
where {xi}ijtjt+1 is the indicator variable defined as 1 if seed i carries the maternal–embryo QTL genotype jtjt+1 and 0 otherwise; and are the values of QTL genotype jtjt+1 for two traits x and y, respectively, and and are the residual errors that follow a bivariate normal distribution with means zero and covariance matrix

Note that we use the superscript or subscript x and y to distinguish between the two traits in genotypic values, genetic effects and residual effects and variances.

Equation (4) can be written, in matrix notation, as

(5)
where ui=(xi,yi) is the vector for the phenotypic values of maternal and embryo traits for seed i, is the vector for the genotypic values of a joint maternal–embryo QTL genotype and is the vector for the residual effects of seed i.

For self-pollinated plants, the maternal parent receives no genes from other sources to generate its progeny. Thus, the gene segregation in the progeny would not lead to the variation of the maternal trait. To reflect this characteristic, the maternal–embryo interaction that occurs across generations should be modeled with the constraints

(6)
which imply that embryo QTL Qt+1 has no genetic effect on trait x, i.e. [see Matrix (1)].

Similarly, we can formulate a statistical model for the maternal–endosperm interaction, except for four triploid QTL genotypes at . But the embryo–endosperm interaction model will be different because such an interaction occurs within the same generation in which embryo (y) and endosperm traits (z) are also affected by a QTL from the opposite genome. The bivariate model for phenotypic traits (y,z) can be expressed as

(7)
where wi = (yi,zi) is the vector for the phenotypic values of embryo and endosperm traits for seed i, is the indicator for the embryo–endosperm QTL genotype, is the vector for the genotypic values of a joint embryo–endosperm QTL genotype and is the vector for the residual effects of seed i.

Bivariate mixture model
Finite mixture models are a type of density model that comprises a number of component functions, usually Gaussian. These component functions are combined to provide a multimodal density. Gaussian mixture models can be employed to model genotypic segregation of specific genetic factors that determine quantitative traits. According to mixture models, each observation is assumed to have arisen from one of a known or unknown number of components (QTL genotypes), each component being modeled by a multivariate normal distribution density. Under the maternal–embryo epistasis model, the bivariate likelihood function of phenotypic traits (u) and marker data () based on mixture models is expressed as

(8)
where {varpi} = {{varpi}jtjt+1|i} is the vector for the conditional (or prior) probability of maternal–embryo QTL genotype jtjt+1 given a particular across-generation marker genotype for seed i and m = {mjtjt+1} is the vector of genotypic means for two traits that follow a bivariate normal distribution N(mjtjt+1,{Sigma}).

With the knowledge about conditional probabilities and genotypic values, we can construct similar mixture-based likelihood functions for the maternal–endosperm and embryo–endosperm interaction models. We provide a procedure for estimating the parameters contained in the likelihood functions.

The EM algorithm
Conditional probabilities are a function of the recombination fractions between QTL and their flanking markers and therefore can provide the information about QTL locations. Mean vectors and the covariance matrix are quantitative genetic parameters associated with the genetic effects of QTL. Let {Omega} = ({varpi},m,{Sigma}) denote the unknown parameters. We implement the EM algorithm to obtain the MLE of {Omega}. The log-likelihood function of Equation (8) for the maternal–embryo interaction model is given by

(9)
with a derivative for an unknown {Omega}{lambda},

where we define

(10)
which could be thought of as a posterior probability that seed i has joint maternal–embryo QTL genotype jtjt+1. We then implement the EM algorithm with the expanded parameter set {{Omega},{Pi}}, where {Pi} = {{Pi}jtjt+1|i}. Conditional on {Pi}, we solve for the zeros of ({partial}/{partial}{Omega}{lambda})log L({Omega}) to get our estimates of {Omega}.

In the E-step, the prior conditional probabilities of the QTL genotypes given the marker genotypes and the normal distribution function are used to calculate the {Pi}jtjt+1|i matrix. In the M-step, the calculated posterior probabilities are used to solve the unknown parameters using

(11)

(12)
Using sample parameters as initial values, we iterate the E and M steps between Equations (10) and (12) until the specified convergence criteria are satisfied. The values at convergence are regarded as the MLEs. The MLEs of the genotypic values m can be used to solve the MLEs of the genetic effects h.

In the procedure described above for the EM algorithm, we treated the positions of QTL as known parameters, although their MLEs can also be obtained through iterative steps. We can use a grid approach to estimate the QTL positions. By hypothesizing a pair of embryo and endosperm QTL every 2 cM at marker intervals, we can draw the landscape of log-likelihood test statistics throughout the entire genome. The positions corresponding to the peak of the landscape across a linkage group are the MLEs of the QTL positions.

The MLEs of the QTL positions and effects under the maternal–endosperm and embryo–endosperm epistasis models can be similarly derived. The QTL effects are specified differently among these three models, depending on the dosage of QTL alleles (Table 1). As like in general QTL mapping models, the proportion of the total variance explained by each QTL from a different genome can be calculated for each trait.


View this table:
[in this window]
[in a new window]
 
Table 1 The MLEs of the additive genetic effects of the embryo (at+1) and endosperm QTL and their additive x additive epistatic interaction effect () on gel consistency in the endosperm measured for two different years in a backcross derived from two inbred lines in ricea

 
Hypothesis testing
A number of statistical hypothesis tests can be performed for the underlying parameters of interest. The presence of the QTL from different genomes with joint effects on two quantitative traits expressed in different tissues can be tested by a log-likelihood ratio (LLR) test statistic calculated under the full model (assuming that there are such QTL) and the reduced model (assuming that there is no QTL). The LLR is asymptotically {chi}2-distributed with the degrees of freedom that are equivalent to the number of unknown parameters estimated. For a mixture model like ours here, this may be violated due to some regularity problem (McLachlan and Peel, 2000). The critical threshold value for declaring the existence of the testing QTL is empirically calculated on the basis of permutation tests (Churchill and Doerge, 1994).

After the existence of QTL from different genomes is tested, we can test the additive and dominant QTL effect from a particular genome and additive x additive, additive x dominant, dominant x additive and dominant x dominant epistatic effects derived from two different genomes. Our model allows for testing the effects of specific QTL on individual traits, although, for our experimental design, different genome–genome interaction models characterize different types of genetic effects. All these effect-specific tests are performed by implementing the EM algorithm and the critical value for declaring significance can be obtained empirically through simulation studies.


    A WORKED EXAMPLE
 TOP
 Abstract
 INTRODUCTION
 EXPERIMENTAL DESIGN
 STATISTICAL METHOD
 A WORKED EXAMPLE
 MONTE CARLO SIMULATION
 DISCUSSION
 REFERENCES
 
The newly developed model was used to analyze published data on the endosperm in rice (Tan et al., 1999). The F1 heterozygote between two rice inbred lines, ZS97 and MH63, was self-crossed for 9 generations to produce 241 recombinant inbred lines (RILs) for high-resolution genetic mapping of genes influencing endosperm traits. These RILs that are homozygous for the alternative alleles were genotyped for 221 polymorphic markers distributed throughout the genome to construct a molecular linkage map composed of 12 rice chromosomes. These RILs as the female parent were backcrossed toward one original inbred line, ZS97, as the male parent to generate a backcross population containing 241 plants. All the backcross plants were evaluated for gel consistency in their endosperm tissues in two successive years (1999 and 2000) to determine any major QTL segregating in this material.

Because of the nature of this pedigree, we make some modifications to our general embryo–endosperm model to identify interacting QTL on embryo and endosperm tissues. First, the conditional probabilities that suit this pedigree are derived to predict the embryo–endosperm QTL genotypes based on the markers collected in the embryo. Second, in this design, the number of embryo–endosperm QTL genotypes is reduced to 4 and, thus, the genetic effects that can be estimated are the additive effects of embryo Qt+1 (at+1) and endosperm and additive x additive epistatic effect () between these two QTL. Third, our model was originally developed to analyze the phenotypes expressed in the embryo and endosperm, but the data for this design were collected from the endosperm in two different years. According to Falconer (1952), the same trait measured in different years can be viewed as different traits.

The phenotypic correlation between endosperm gel consistency measured in two different years is 0.68, suggesting that some common genetic basis is shared over years. A genome-wide scan was performed to detect the existence and distribution of interacting QTL throughout the entire genome. Significant joint genetic effects were detected between two QTL on chromosomes 6 and 8. The maximum LLR value throughout the genome is 270.9, markedly larger than the genome-wide critical threshold 30.5, empirically obtained from permutation tests at the 0.005 significance level. One of the detected significant QTL is located at 12.0 cM from the first marker on chromosome 6 of the embryo genome, whereas the second QTL is located at 29.4 cM from the first marker on chromosome 8 of the endosperm genome. The embryo QTL is located at a candidate gene, Waxy, that is associated with a critical step of amylose biosynthesis (Okagaki and Wessler, 1988), which well validates our model.

We estimated the additive effect, at+1, of the embryo QTL, the additive effect, , of the endosperm QTL and their epistatic effects, , on gel consistency in two different years (Table 1). Further hypotheses were performed for the significance tests of the additive and epistatic genetic effects. The LLRs for testing the significance of these effect parameters suggest that the additive effect of the embryo QTL is highly significant, whereas the additive effect of the endosperm QTL and the additive x additive effect between the two QTL are significant, but at lower levels.

In this example, we can use our model to test how genetic effects are expressed differently from year to year. If the genetic effect of a QTL is year-dependent, then this QTL is thought to display a significant genotype x year interaction. Figure 1 illustrates the unparallel changes of the four joint embryo–endosperm QTL genotypes across different years for gel consistency in the endosperm. The LLR test for the year-dependent non-parallel response suggests that there are significant QTL x year interactions (P < 0.0001). Further tests indicate that the additive effects of the QTL from the two genomes are expressed differently between the two years studied (P = 6.42 x 10–12 for the embryo QTL and P = 1.98x 10–6 for the endosperm QTL; Table 1). The additive x additive epistatic effect between the embryo and endosperm QTL is also different between the two years (P=0.005). These pieces of information obtained from data analyses by our model are fundamental to the design of crop breeding aimed at improving high-quality starch in rice.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 1 Four joint genotypic values at the embryo (Qt+1) and endosperm () QTL for endosperm-specific gel consistency (mm) measured for rice in two different years. Data from Tan et al. (1999).

 

    MONTE CARLO SIMULATION
 TOP
 Abstract
 INTRODUCTION
 EXPERIMENTAL DESIGN
 STATISTICAL METHOD
 A WORKED EXAMPLE
 MONTE CARLO SIMULATION
 DISCUSSION
 REFERENCES
 
We carried out a series of simulation studies to examine the statistical properties of our genome–genome models by focusing on the epistatic model from the embryo and endosperm genomes. A similar statistical behavior should be held for the other two epistatic models, maternal–embryo and maternal–endosperm. Our simulation studies aim to examine the model performance under different situations when heritability, sample size and QTL location change. Five equidistant markers are simulated from the embryo population and are ordered as on a linkage group with the length of 80 cM. The Haldane map function was used to convert the map distance into the recombination fraction. For simplicity, we use two traits to achieve our goals. Three different combinations of heritability between two traits (0.1, 0.1), (0.1, 0.4) and (0.4, 0.4) and two different sample sizes (200, 400) were used.

Suppose there are two different putative QTL on the embryo and endosperm genomes. Both the embryo (Qt+1) and endosperm QTL are assumed to pleiotropically affect two traits, one expressed in the embryo (y) and the other expressed in the endosperm (z). The two QTL could be either linked together and located on the same marker interval or located on different marker intervals. The phenotypic values for each seed were simulated according to a bivariate normal distribution with different joint QTL genotypic values, determined by effect parameters, the overall (µ), additive effect of Qt+1 (at+1), additive effect of , the additive x additive epistatic effect () between the two QTL for each trait, y and z, and residual variances ({sigma}2) and correlation ({rho}).

Tables 2 and 3 give the hypothesized values and MLEs of the QTL effect parameters for each trait, as well as the square roots of the mean squared errors used to evaluate the precision and accuracy of the parameter estimation, under different simulation schemes. In general, our model can provide reasonable estimates of the parameters with estimation precision increasing with increased heritability levels and sampling sizes. The QTL position estimates when located in the same interval (Table 3) were not as good as when they were located at different intervals (Table 2). But this problem can be avoided if it is possible to increase the density of mapped markers to reduce the probability that two QTL are located in the same interval.


View this table:
[in this window]
[in a new window]
 
Table 2 The MLEs of the QTL position and effect parameters exerted by an embryo QTL and an endosperm QTL each on different intervals for a backcross of size 400 under different heritability combinations and residual variances estimated from 100 simulation replicates

 

View this table:
[in this window]
[in a new window]
 
Table 3 The MLEs of the QTL position and effect parameters exerted by an embryo QTL and an endosperm QTL on the same interval for a backcross of size 400 under different heritability combinations and residual variances estimated from 100 simulation replicates

 
Our model has an excellent capacity to detecst epistatically interacting embryo and endosperm QTL effects. In all cases of different sample sizes and heritabilities, the maximum values of the LLR landscapes from 100 simulation replicates are all beyond the critical thresholds at the {alpha} = 0.001 level determined from 1000 permutation tests for the simulated data. Furthermore, there is reasonable estimation precision for the additive x additive genetic effects even when the heritability is at a modest level.


    DISCUSSION
 TOP
 Abstract
 INTRODUCTION
 EXPERIMENTAL DESIGN
 STATISTICAL METHOD
 A WORKED EXAMPLE
 MONTE CARLO SIMULATION
 DISCUSSION
 REFERENCES
 
We have proposed a general statistical framework for simultaneously mapping multiple correlated traits expressed in different genome-specific tissues. Different from previous multitrait QTL mapping (Jiang and Zeng, 1995; Korol et al., 1995; Knott and Haley, 2000; Evans, 2002; Lund et al., 2003), our model framework implements interactions between multiple QTL located on different genomes. It has been well recognized that the coordinated expression of genes from different genomes is essential for the proper development of organs. For example, in higher plants, support and nourishment of embryo and endosperm tissues by the maternal tissue is fundamental to proper seed development (Chaudhury et al., 2001; Evans and Kermicle, 2001; Dilkes et al., 2002; Walbot and Evans, 2003).

The current literature has well established the belief that multiple correlated traits can add information to each other and, therefore, multitrait linkage analysis can give rise to more precise inferences about the position and effects of pleiotropic QTL affecting multiple traits, as compared to single-trait analyses (Jiang and Zeng, 1995; Korol et al., 1995; Knott and Haley, 2000; Evans, 2002; Wu et al., 2002c; Lund et al., 2003). Somewhat equivalent to the role of repeated measurements, information from correlated traits can reduce the effect of error variance, thus making it easier (more powerful) to detect QTL. Not only is the power of QTL detection increased, but also the estimation of the QTL map position is more precise. The model proposed in this paper deals with a different type of trait correlation that occurs between different individuals connected through coherent pathways. The best example is the impact of the growth vigor of a plant on its seed development by supplying adequate nutrients. In light of the consideration of the coordinated expression of traits owing to genes and development, our model, which can be viewed as ‘high-dimensional’, should be able to produce results that are closer to biological realism than those without such a solid developmental basis of phenotypic traits.

The statistical behavior of our high-dimensional model has been carefully investigated through computer simulation. The model has been found to provide reasonable power and estimation of interactive QTL from the embryo and endosperm genomes in a range of trait heritabilities and sample sizes. Nevertheless, the best validation for our model may be the successful detection of significant QTL that exert considerable effects on an endosperm trait measured in two consecutive years. These two annual measurements can be viewed as two different traits (Falconer, 1952). Previous approaches for endosperm mapping are purely based on the triploid inheritance of the endosperm (Wu et al., 2002a; Wu et al., 2002b; Xu et al., 2003; Kao, 2004). Our model has the power to identify interactive QTL from the embryo and endosperm genomes. Using our high-dimensional model, both the embryo and endosperm genomes were detected to harbor QTL for gel consistency in rice, with the embryo QTL located almost at the same position as the Waxy gene on the short arm of chromosome 6 (Terada et al., 2002). The Waxy gene is known to influence a major step in amylose synthesis in the endosperm for many grasses including maize and rice. Our bivariate mapping model also has the power to discern how genetic effects of the embryo and endosperm QTL are different across years. Whereas the embryo QTL triggers a large effect on gel consistency, a significant additive effect x interaction year of the endosperm QTL suggests that this QTL can modify the endosperm trait to make seed development better adapted to a year-to-year environmental change. Beyond traditional single trait mapping, our high-dimensional mapping model can detect the interaction for gel consistency between the additive x additive epistatic effect and year of interaction. Further functional analysis of these detected embryo and endosperm QTL will accelerate their usefulness to improve the quality and quantity of rice grains.

The derivations of our model were based on the plant system that undergoes self-pollinated reproduction. This model can be extended in several aspects. First, by incorporating unique segregation patterns of genes in the mixture-based likelihood function, this model can be modified to map genome–genome interactive QTL for cross-pollinated systems. Such a modified model will also be useful for animals in which birth weight is influenced by the uterine environment through the coordinated expression of the maternal and offspring QTL. Second, a mature cereal plant contains three sets of genomes, the maternal, embryo and endosperm. The current model allows for the modeling of interactions between any two sets of genomes. It is crucial to extend it to consider the triple-genome interactions among these three organs. With this triple interaction model, we can understand better the network of gene expression and regulation during seed development.


    Acknowledgments
 
We thank the two anonymous referees for their constructive comments on this manuscript. This work is supported by an Outstanding Young Investigator Award of the National Natural Science Foundation of China (30128017), a University of Florida Research Opportunity Fund (02050259) and a University of South Florida Biodefense grant (7222061-12) to R.W. We thank Drs Jianguo Wu and Chunhai Shi at Zhejiang University for providing molecular marker and phenotypic data. The publication of this manuscript was approved as Journal Series No. R-10584 by the Florida Agricultural Experiment Station.

Received on December 20, 2004; revised on January 25, 2005; accepted on February 17, 2005

    REFERENCES
 TOP
 Abstract
 INTRODUCTION
 EXPERIMENTAL DESIGN
 STATISTICAL METHOD
 A WORKED EXAMPLE
 MONTE CARLO SIMULATION
 DISCUSSION
 REFERENCES
 

    Almasy, L., et al. (1997) Bivariate quantitative trait linkage analysis: pleiotropy versus co-incident linkages. Genet. Epidemiol., 14, 953–958[CrossRef][ISI][Medline].

    Chaudhury, A.M., et al. (2001) Control of early seed development. Ann. Rev. Cell Dev. Biol., 17, 677–699[CrossRef][ISI][Medline].

    Churchill, G.A. and Doerge, R.W. (1994) Empirical threshold values for quantitative trait mapping. Genetics, 138, 963–971[Abstract].

    Cui, Y.H., et al. (2004) Mapping quantitative trait locus interactions from the maternal and offspring genomes. Genetics, 167, 1017–1026[Abstract/Free Full Text].

    Dempster, A.P., et al. (1977) Maximum likelihood from incomplete data via EM algorithm. J. Roy. Stat. Soc. B, 39, 1–38.

    Dilkes, B.P., et al. (2002) Genetic analyses of endoreduplication in Zea mays endosperm: evidence of sporophytic and zygotic maternal control. Genetics, 160, 1163–1177[Abstract/Free Full Text].

    Evans, D.M. (2002) The power of multivariate quantitative-trait loci linkage analysis is influenced by the correlation between the variables. Am. J. Hum. Genet., 70, 1599–1602[CrossRef][ISI][Medline].

    Evans, M.M.S. and Kermicle, J.L. (2001) Interaction between maternal effect and zygotic effect mutations during maize seed development. Genetics, 159, 303–315[Abstract/Free Full Text].

    Falconer, D.S. (1952) The problem of environment and selection. Am. Nat., 86, 293–298[CrossRef].

    Falconer, D.S. and Mackay, T.F.C. Introduction to Quantitative Genetics, (1996) edn. 4 , Harlow, Essex, UK Longmans Green.

    Jiang, C. and Zeng, Z.-B. (1995) Multiple trait analysis of genetic mapping of quantitative trait loci. Genetics, 140, 1111–1127[Abstract].

    Kao, C.-H. (2004) Multiple-interval mapping for quantitative trait loci controlling endosperm traits. Genetics, 167, 1987–2002[Abstract/Free Full Text].

    Knott, S.A. and Haley, C.S. (2000) Multitrait least squares for quantitative trait loci detection. Genetics, 156, 899–911[Abstract/Free Full Text].

    Korol, A.B., et al. (1995) Interval mapping of quantitative trait loci employing correlated trait complexes. Theor. Appl. Genet., 92, 998–1002[CrossRef].

    Lander, E.S. and Botstein, D. (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics, 121, 185–199[Abstract/Free Full Text].

    Lloyd, D.J. and Martin, T.E. (2004) Nest-site preference and maternal effects on offspring growth. Behavioral Ecology, 15, 816–823[Abstract/Free Full Text].

    Lund, M.S., et al. (2003) Multitrait fine mapping of quantitative trait loci using combined linkage disequilibria and linkage analysis. Genetics, 163, 405–410[Abstract/Free Full Text].

    Mackay, T.F.C. (2001) Quantitative trait loci in Drosophila. Nat. Rev. Genet., 2, 11–20[ISI][Medline].

    McLachlan, G.J. and Peel, D. Finite Mixture Models, (2000) , New York Wiley.

    Okagaki, R.J. and Wessler, S.R. (1988) Comparison of non-mutant and mutant waxy genes in rice and maize. Genetics, 120, 1137–1143[Abstract/Free Full Text].

    Opsahl-Ferstad, H.G., et al. (1997) ZmEsr, a novel endosperm-specific gene expressed in a restricted region around the maize embryo. Plant J., 12, 235–246[CrossRef][ISI][Medline].

    Scheiner, S.M. (1993) Genetics and evolution of phenotypic plasticity. Ann. Rev. Ecol. Syst., 24, 25–68.

    Tan, Y.F., et al. (1999) The three important traits for cooking and eating quality of rice grains are controlled by a single locus in an elite rice hybrid, Shanyou 63. Theor. Appl. Genet., 99, 642–648[CrossRef].

    Terada, R., et al. (2002) Efficient gene targeting by homologous recombination in rice. Nat. Biotech., 20, 1030–1034[CrossRef][ISI][Medline].

    van Hengel, A.J., et al. (1998) Expression pattern of the carrot EP3 endochitinase genes in suspension cultures and in developing seeds. Plant Phys., 117, 43–53[Abstract/Free Full Text].

    Walbot, W. and Evans, N.M.S. (2003) Unique features of the plant life cycle and their consequences. Nat. Rev. Genet., 4, 369–379[CrossRef][ISI][Medline].

    Whitlock, M.C., et al. (1995) Multiple fitness peaks and epistasis. In . Ann. Rev. Ecol. Syst., 26, 601–629[CrossRef][ISI].

    Wolf, J.B. (2000) Gene interactions from maternal effects. Evolution, 54, 1882–1898[CrossRef][ISI][Medline].

    Wolf, J.B. (2003) Genetic architecture and evolutionary constraint when the environment contains genes. Proc. Natl Acad. Sci. USA, 100, 4655–4660[Abstract/Free Full Text].

    Wolf, J.B., et al. (1998) Evolutionary consequences of indirect genetic effects. Trends Ecol. Evol., 13, 64–69[CrossRef].

    Wolf, J.B., et al. (2002) Contribution of maternal effect QTL to genetic architecture of early growth in mice. Heredity, 89, 300–310[CrossRef][ISI][Medline].

    Wu, R.L., et al. (2002a) Statistical methods for dissecting triploid endosperm traits using molecular markers: an autogamous model. Genetics, 162, 875–892[Abstract/Free Full Text].

    Wu, R.L., et al. (2002b) An improved genetic model generates high-resolution mapping of QTL for protein quality in maize endosperm. Proc. Natl Acad. Sci. USA, 99, 11281–11286[Abstract/Free Full Text].

    Wu, R.L., et al. (2002c) A statistical model for the genetic origin of allometric scaling laws in biology. J. Theor. Biol., 217, 275–287[CrossRef][ISI][Medline].

    Xu, C., et al. (2003) Mapping quantitative trait loci underlying triploid endosperm traits. Heredity, 90, 228–235[CrossRef][ISI][Medline].

    Zeng, Z.-B. (1994) Precision mapping of quantitative trait loci. Genetics, 136, 1457–1468[Abstract].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2447    most recent
bti342v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Cui, Y.
Right arrow Articles by Wu, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Cui, Y.
Right arrow Articles by Wu, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?