Skip Navigation


Bioinformatics Advance Access originally published online on March 15, 2005
Bioinformatics 2005 21(10):2469-2477; doi:10.1093/bioinformatics/bti382
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2469    most recent
bti382v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (12)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zhao, W.
Right arrow Articles by Wu, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zhao, W.
Right arrow Articles by Wu, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

A non-stationary model for functional mapping of complex traits

Wei Zhao 1, Ying Q. Chen 1, George Casella 1, James M. Cheverud 2 and Rongling Wu 1,*

1Department of Statistics, University of Florida Gainesville, FL 32611, USA
2Department of Anatomy and Neurobiology, Washington University Medical School St. Louis, MO 631110, USA

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 THE STRUCTURED ANTEDEPENDENCE...
 3 MODEL SELECTION
 4 APPLICATION
 5 METHODOLOGICAL COMPARISONS: A...
 6 DISCUSSION
 REFERENCES
 

Summary: Understanding the genetic control of growth is fundamental to agricultural, evolutionary and biomedical genetic research. In this article, we present a statistical model for mapping quantitative trait loci (QTL) that are responsible for genetic differences in growth trajectories during ontogenetic development. This model is derived within the maximum likelihood context, implemented with the expectation–maximization algorithm. We incorporate mathematical aspects of growth processes to model the mean vector and structured antedependence models to approximate time-dependent covariance matrices for longitudinal traits. Our model has been employed to map QTL that affect body mass growth trajectories in both male and female mice of an F2 population derived from the Large and Small mouse strains. The results from this model are compared with those from the autoregressive-based functional mapping approach. Based on results from computer simulation studies, we suggest that these two models are alternative to one another and should be used simultaneously for the same dataset.

Contact: rwu{at}mail.ifas.ufl.edu


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 THE STRUCTURED ANTEDEPENDENCE...
 3 MODEL SELECTION
 4 APPLICATION
 5 METHODOLOGICAL COMPARISONS: A...
 6 DISCUSSION
 REFERENCES
 
The traits whose phenotypes change with time or any other independent variable are important in agriculture, biological and medical research. For this reason, the genetic analysis of these so-called longitudinal traits has been a focus of a number of statistical and genetic studies aimed at predicting the dynamic change of genetic control at the genotype (Kirkpatrick and Heckman, 1989; Pletcher and Jaffrézic, 2002; Jaffrézic et al., 2003) or individual quantitative trait locus (QTL) levels (Wu et al., 2002).

More recently, a collection of statistical methods implemented with growth model theories have been proposed to map QTL that govern growth trajectories using molecular linkage maps (Wu et al., 2002, 2004a, b; Ma et al., 2002). The basic principle of this method, called functional mapping, is to express the genotypic means of a QTL at different time points in terms of a continuous growth function with respect to time t. Under this principle, the parameters describing the shape of growth curves, rather than the genotypic means as expected in traditional mapping strategies, are estimated within a maximum likelihood framework. Also unlike traditional mapping strategies, functional mapping estimates the parameters that model the structure of the (co)variance matrix among multiple different time points and, therefore, largely reduces the number of parameters being estimated for variances and covariances, especially when the number of time points is large.

Functional mapping is, in spirit, a statistical problem of jointly modelling mean–covariance structures in longitudinal studies, an area that has recently received considerable interest in the statistical literature (Pourahmadi, 1999, 2000; Pan and Mackenzie, 2003; Daniels and Pourahmadi, 2002; Wu and Pourahmadi, 2003). However, in contrast to general longitudinal modelling, functional mapping integrates the estimation and test process of its underlying parameters within a mixture-based likelihood framework. Each mixture component in the likelihood model is given a particular biological rationale. For a finite mixture model, each observation is assumed to have arisen from one of a known or unknown number of components, each component being modelled by a density from the parametric family. Assuming that there are J QTL genotypes contributing to a longitudinal trait measured at {tau} time points (denoted by y), this mixture model is expressed as

(1)
where {varpi} = ({varpi}1,...,{varpi}J) are the mixture proportions (i.e. QTL genotype frequencies) which are constrained to be non-negative and sum to unity, mj is a vector that contains the parameters specific to component (or QTL genotype) j, and {Sigma} includes the parameters common to all components (residual variances and covariances). We use the multivariate normal distribution to model each density, and for individual i it is expressed as

(2)
where is a vector of observation measured at {tau} time points and is a vector of expected values for QTL genotype j at different points. At a particular time t, the relationship between the observation and expected mean can be described by a regression model,

(3)
where {xi}ij is the indicator variable denoted as 1 if a QTL genotype j is considered for individual i and 0 otherwise; ei(t) is the residual error (i.e., the accumulative effect of polygenes and errors) that is iid (independently and identically distributed) normal with mean zero and variance {sigma}2(t). The errors at two different time points, t1 and t2, are correlated with covariance {sigma}(t1,t2).

The mixture proportions, {varpi}1,...,{varpi}J, in Equation (1) can be viewed as the prior probabilities of QTL genotypes in the mapping population. When known markers that are co-segregating with the putative QTL are incorporated into the mixture model, these mixture proportions are substituted by the conditional probabilities of QTL genotypes given the marker genotypes. Because a given individual i has known marker genotype, these conditional probabilities can be simply denoted as {varpi}1|i, ..., {varpi}J|i. These conditional probabilities can be derived, with expression depending on the type of the population. For a structured pedigree, they are expressed in terms of the recombination fractions, whereas for a natural population, they can be expressed in terms of linkage disequilibria. The derivations of the conditional probabilities of QTL genotypes are given in general QTL mapping literature (Wu and Casella, 2005).

To estimate the parameters of the likelihood function implemented with growth data measured at multiple time points, one can extend the traditional interval mapping approach to accommodate the multivariate nature of time-dependent traits. However, this extension is limited in three aspects: (1) Individual expected means of different QTL genotypes at all points and all elements in the matrix {Sigma} need to be estimated, resulting in substantial computational difficulties when the vector and matrix dimensions are large. (2) The result from this approach may not be biologically meaningful because the underlying biological principle for growth is not incorporated. (3) This approach cannot be well deployed in a practical scheme because of (2). Thus, some biologically interesting questions cannot be asked and answered.

A new statistical framework has been developed to detect QTL affecting growth trajectories (Wu et al., 2002, 2004a; b; Ma et al., 2002). This framework included two tasks:

(1) Model the time-dependent expected means of QTL genotype j using a growth equation,

(4)
which is specified by a set of curve parameters arrayed in {Omega}mj. All living entities are characterized by growth defined as the irreversible increase of size with time. A number of mathematical models, such as logistic or S-shaped curves, have been proposed to describe growth trajectories. The fundamental biological aspects of mathematical modelling of growth curves have been founded by earlier mathematical biologists, e.g. von Bertalanffy et al. (1957), and recently revisited by West et al. (2001).

The overall form of the growth curve of QTL genotype j is determined by the set of curve parameters contained in {Omega}mj. If different genotypes at a putative QTL have different combinations of these parameters, this implies that this QTL plays a role in governing the differentiation of growth trajectories. Thus, by testing for the difference of {Omega}mj among different genotypes, we can determine whether there exists a specific QTL that confers an effect on growth curves.

The time-dependent growth curves for different genotypes, µj(t), can be used to estimate the dynamic changes of various genetic effects, including the additive, dominant and epistatic effects as well as the interaction between these effects and environmental factors. For example, an F2 population containing three genotypes at a QTL, coded as 2 for QQ, 1 for Qq and 0 for qq, allows for the estimates of the time-dependent additive effect, , and dominant effects, , during growth trajectories. The time-dependent epistatic and genotype x environment effects can be characterized in a similar way if a multi-QTL model is assumed or if an experimental design with multiple environments is used.

(2) Model the structure of the within-subject (co)variance matrix using the first-order autoregressive [AR(1)] model (Diggle et al., 2002), expressed as

for the variance, and

for the covariance between any two time intervals t1 and t2, where 0 < {rho} < 1 is the proportion parameter with which the correlation decays with time lag. The parameters that model the structure of the (co)variance matrix are arrayed in {Omega}v.

We implemented the expectation–maximization (EM) algorithm, originally proposed by Dempster et al. (1977), to obtain the maximum likelihood estimates (MLEs) of three groups of unknown parameters in a QTL mapping model, that is, the QTL-segregating parameters ({Omega}{ell}) that specify the co-segregation patterns of QTL and markers in the population, the curve parameters ({Omega}mj) that model the mean vector and the parameters ({Omega}v) that model the structure of the (co)variance matrix (Ma et al., 2002; Wu et al., 2004a; b). These unknowns, denoted by {Omega} = ({Omega}{ell},{Omega}mj,{Omega}v), are contained within the mixture model described by Equation (1). A detailed description of the EM algorithm was given in Wu et al. (2002) and Ma et al. (2002).

After the MLEs of the parameters are obtained, the existence of a QTL affecting an overall growth curve should be tested by formulating two alternative hypotheses,

(5)
where H0 corresponds to the reduced model, in which the data can be fit by a single growth curve, and H1 corresponds to the full model, in which there exist different growth curves to fit the data. The test statistic for testing the hypotheses in Equation (5) is calculated as the log-likelihood (LR) ratio of the reduced to the full model:

(6)
where and denote the MLEs of the unknown parameters under H0 and H1, respectively. After the existence of QTL is tested, a number of biologically meaningful hypotheses regarding the interplay between gene action and development can be formulated (Wu et al., 2004a).

To remove the heteroscedastic problem of the residual variance, which violates a basic assumption of the simple AR(1) model, two approaches can be used. The first approach is to model the residual variance by a parametric function of time, as originally proposed by Pletcher and Geyer (1999). But this approach needs to implement additional parameters for characterizing the age-dependent change of the variance. The second approach is to embed Carroll Rupert's (1984) transform-both-sides (TBS) model into the growth-incorporated finite mixture model (Wu et al., 2004b), which does not need any more parameters. Both empirical analyses with real examples and computer simulations suggest that the TBS-based model can increase the precision of parameter estimation and computational efficiency. Furthermore, the TBS model preserves original biological means of the curve parameters although statistical analyses are based on transformed data.

The TBS-based model displays the potential to relax the assumption of variance stationarity, but the covariance stationarity issue remains unsolved. Zimmerman and Núñez-Antón (1997) proposed a so-called structured antedependence (SAD) model to model the age-specific change of correlation in the analysis of longitudinal traits. The SAD model has been employed in several studies and displays many favorable properties (Zimmerman and Núñez-Antón, 2001).

In this article, we will incorporate the SAD model within the mixture model for functional mapping to take advantages of it in adequately modeling the (co)variance structure. In Section 2, we describe the statistical model for deriving the SAD model. Section 3 provides a discussion of model selection for different orders. The method is illustrated in Section 4 using growth data of body mass measured at 10 different time points in an F2 progeny derived from the Large and Small mouse strains. The implications and extensions of the model are discussed in Section 5.


    2 THE STRUCTURED ANTEDEPENDENCE MODEL
 TOP
 Abstract
 1 INTRODUCTION
 2 THE STRUCTURED ANTEDEPENDENCE...
 3 MODEL SELECTION
 4 APPLICATION
 5 METHODOLOGICAL COMPARISONS: A...
 6 DISCUSSION
 REFERENCES
 
The antedependence model was originally proposed by Gabriel (1962). It states that an observation at a particular time t depends on the previous ones, with the degree of dependence decaying with time lag. If an observation at time t is independent of all observations before tr, this antedependence model is thought to be of order r. The antedependence model is extended to fit the structure of time-dependent variance and correlation, leading to the SAD model (Núñez-Antón and Zimmerman, 2000). The SAD model can be incorporated to the QTL mapping of longitudinal growth traits.

Let us consider an F2 design of n progeny derived from two contrasting homozygous inbred lines, in which there are three QTL genotypes (J = 3). For this F2, a genetic linkage map is constructed with molecular markers and a growth trait (y) is measured for a finite set of times, 1,...,{tau}. y(1),...,y({tau}) are rth-order antedependent if the conditional distribution of y(t), given y(t–1),...,y(1), depends on y(t–1),...,y(tr), for all t ≥ r (Gabriel, 1962). This concept is equivalent to y(1),...,y({tau}) having a Markovian dependence of order r. The order r serves as a memory gauge, where r = 0 corresponds to independence and r = {tau} – 1 to arbitrary multivariate dependence. A parametrically specified definition for individual i is given on the basis of Equation (3), which can be expressed as

(7)
where r* = min(r,t – 1), {varphi}t,tt's are unrestricted antedependence parameters, and independent normal random variable ei(t) may have time-dependent variances, {nu}2(t), termed innovation variances.

Like AR models, this model allows for serial correlation within subjects, but unlike AR models, it does not assume that the variances are constant nor that correlations between measurements equidistant in time are equal. The antedependence model (7) is called the unstructured antedependence model of order r (UAD(r)) because (r + 1)(2{tau}r)/2 parameters, including the variances {sigma}2(t) and covariances {sigma}(t,tt'), among measurements at {tau} different time points are not expressed as a function of a smaller set of parameters (Núñez-Antón and Zimmerman, 2000).

To make the UAD(r) model more parsimonious, Núñez-Antón (1997) and Núñez-Antón and Zimmerman (2000) proposed the so-called SAD models. One useful class models the autoregressive coefficients with the Box–Cox power law and models the innovation variances with a parametric function, i.e.

(8)
where Tt and Ttt' are measurement times, w(T;{lambda}) equals (T{lambda} – 1)/{lambda} if {lambda} != 0 and equals log T if {lambda} = 0, and v(·) is a function of relatively few parameters (e.g. a low-order polynomial). Thus, different from Gabriel's (1962) original treatment, we only need to estimate three parameters to model the innovation variances if a quadratic polynomial is used, regardless of the number of time points.

As a simplified example with the SAD(1) model in which innovation variances are constant over time points, Jaffrézic et al. (2003) derived the analytical forms for variance and covariance functions among time-dependent measurements, expressed, respectively, as

(9)

(10)
for equally spaced repeated measurements. It can be seen that although constant innovation variances are assumed, the residual variance can change with time (Jaffrézic et al., 2003). Also, for the simplest SAD model, the correlation function is non-stationary because the correlation does not depend only on the time interval t2t1 but also depends on the start and end points of the interval t1 and t2.


    3 MODEL SELECTION
 TOP
 Abstract
 1 INTRODUCTION
 2 THE STRUCTURED ANTEDEPENDENCE...
 3 MODEL SELECTION
 4 APPLICATION
 5 METHODOLOGICAL COMPARISONS: A...
 6 DISCUSSION
 REFERENCES
 
Jaffrézic et al. (2003) proposed an ad hoc approach for model selection. Their strategy is to increase the antedependence order until the additional antedependence coefficient is close to zero.

Núñez-Antón and Zimmerman (2000) proposed using the AIC information criterion to select the best model. Hurvich and Tsai (1989) showed that AIC can drastically underestimate the expected Kullback–Leibler information when only few repeated measurements are available. Instead, they derived a corrected AIC, expressed as

(11)
where is the white noise variance, {tau} is the number of repeated measurements and r is the order of the model. The number of parameters is heavily penalized so that models selected by AICC are typically much more parsimonious than those selected by AIC (Hurvich and Tsai, 1989). An alternative criterion, BIC, that selects a model with a maximum posterior probability (Schwarz, 1978) can also be used to determine the best antedependence order.


    4 APPLICATION
 TOP
 Abstract
 1 INTRODUCTION
 2 THE STRUCTURED ANTEDEPENDENCE...
 3 MODEL SELECTION
 4 APPLICATION
 5 METHODOLOGICAL COMPARISONS: A...
 6 DISCUSSION
 REFERENCES
 
4.1 Mouse data
We used the joint statistical model to map sex-specific QTL that affect growth trajectories in an animal model system—mouse. Vaughn et al. (1999) constructed a linkage map with 96 microsatellite markers for 502 F2 mice (259 males and 243 females) derived from two strains, the Large (LG/J) and Small (SM/J). This map has a total map distance of ~1780 cM (in Haldane's units) and an average interval length of ~23 cM. The F2 progeny were measured for their body mass at 10 weekly intervals starting at age 7 days. The raw weights were corrected for the effects of each covariate due to dam, litter size at birth and parity, but not for the effect due to sex (Vaughn et al., 1999).

4.2 The partitioning of sex-specific genotypic values
The F2 population used here displayed a marked difference between the two sexes, with the male mice growing faster than female mice (Fig. 1). Our model allows for the tests of the genetic, sex and their interaction effects on growth trajectories. The statistical model for specifying the growth phenotype for individual i that contains the sex effect is expressed as

where yik(t) is the time-dependent growth for individual i that has sex k (k = 1 for the male and 0 for the female), {xi}ijk is the indicator variable representing QTL genotype j for individual i with sex k, µjk(t) is the time-dependent genotypic value for QTL genotype j having sex k, and eik(t) is the time-dependent sex-specific residual error.



View larger version (38K):
[in this window]
[in a new window]
 
Fig. 1 Plots of body mass versus ages for 259 male (A) and 243 female (B) mice in an F2 progeny derived from LG/J and SM/J strains (Vaughn et al., 1999). To display sex-specific differences, body mass was not corrected for the sex effect.

 
Based on quantitative genetic theory, we partition time-dependent sex-specific genotypic value, µjk(t), into the additive and dominant effects due to the QTL and the interaction effects between these two genetic effects and sex. Let a(t) and d(t) be the time-dependent additive and dominant effects of the QTL, s(t) be the time-dependent sex effect and Ias(t) and Ids(t) be the time-dependent additive x sex and dominant x sex interaction effects, respectively. We tabulate time-dependent genotypic value µjk(t) for each QTL genotype j (j = 2,1,0) with sex k (k = 1,2) in terms of its genetic compositions as follows:


QTL Sex
genotype Male Female

QQ
Qq
qq

Here µ(t) is the time-dependent overall mean. The dynamic changes of these different effects can be derived, which are expressed as

(12)
for the sex effect;

(13)
for the additive genetic effect of the QTL;

(14)
for the dominant genetic effect of the QTL;

(15)
for the additive x sex interaction effect; and

(16)
for the dominant x sex interaction effect. All these effects can be estimated and tested.

4.3 The growth law
The sigmoidal (or logistic) growth function is regarded as being nearly universal in living systems to capture age-specific change in growth (West et al., 2001). The logistic growth curve as a biological law can be mathematically described by

(17)
where {alpha} is the asymptotic or limiting value of g when t -> {infty}, {alpha}/(1 + ß) is the initial value of g when t = 0 and {gamma} is the relative rate of growth (von Bertalanffy et al., 1957). If different genotypes at a putative QTL have different combinations of these parameters, this implies that this QTL plays a role in governing the difference of growth trajectories.

By plotting body mass growth against time (Fig. 1), it is observed that each of the mapped F2 mice follows the S-shaped (logistic) growth curve. A non-linear least squares approach was used to fit the growth trajectory of body mass with the logistic curve of Equation (17) for each mouse. Based on statistical tests, all the mice can be well fit by a logistic curve (r2 > 0.95). Therefore, we use Equation (17) to fit the mean vector for QTL genotype j with sex k, with {Omega}mjk = ({alpha}jkjk,{gamma}jk), in our mapping model.

4.4 Computational algorithm
We adopt our previous algorithm developed on the basis of Equation (1) (Ma et al., 2002; Wu et al., 2004a; b) to estimate the QTL position ({Omega}{ell}), the curve parameters modelling the mean vector ({Omega}mjk) and the parameters modeling the (co)variance matrix ({Omega}v). In practical computations, the QTL position parameter can be viewed as a fixed parameter because a putative QTL can be searched at every 1 or 2 cM on a map interval bracketed by two markers throughout the entire linkage map. The amount of support for a QTL at a particular map position is often displayed graphically through the use of likelihood maps or profiles, which plot the likelihood ratio test statistic as a function of the map position of the putative QTL.

The Nelder–Mead simplex algorithm, originally proposed by Nelder and Mead (1965), can be used to estimate {Omega}mjk and {Omega}v contained in Equations (1) and (2) (Zhao et al., 2004b). It is a direct search method for non-linear unconstrained optimization. It attempts to minimize a scalar-valued non-linear function using only function values, without any derivative information (explicit or implicit). The algorithm uses linear adjustment of the parameters until some convergence criterion is met. The term ‘simplex’ arises because the feasible solutions for the parameters may be represented by a polytope figure called a ‘simplex’. The simplex is a line in one dimension, triangle in two dimensions and tetrahedron in three dimensions, respectively. To increase the computation efficiency, we derived the closed forms of the determinant and inverse of the residual variance matrix [in Equation (2)] fitted by the SAD(1) model and incorporated these forms to estimate the mixture model (1).

After the point estimates of parameters are obtained by the EM algorithm, we derive the approximate variance–covariance matrix and evaluate the sampling errors of the estimates (,,). The techniques for so doing involve the calculation of the incomplete-data information matrix which is approximated by the negative second-order derivative of the incomplete-data log-likelihood. The incomplete-data information can be calculated by extracting the information for the missing data from the information for the complete data (Louis, 1982). A different so-called supplemented EM algorithm or SEM algorithm was proposed by Meng and Rubin (1991) to estimate the asymptotic variance–covariance matrices, which can also be used to calculate approximate sampling errors for the MLEs of the parameters ({Omega}{ell},{Omega}mjk,{Omega}v) in our mixture model setting.

4.5 Results
To determine the best order of the SAD models for the detection of QTL hidden in this mouse dataset, we should estimate the residual variances under the full model of hypothesis (5) and further calculate the AICC values using Equation (11) for different orders. However, this would be computationally expensive because at each antedependence order, we need to estimate numerous parameters. Here, we instead based our order determination on the reduced model of hypothesis (5) in which there is only one mean curve that explains the F2 growth data. In fact, from a small simulation study, the result about order determination was found to be consistent based on the full and reduced models (data not shown).

To simplify the statistical analysis, we assume that antedependence coefficients between measurements equidistant in time are equal. Table 1 lists the AICC values from models SAD(1) to SAD(6), which are found to increase with order. Also, for the SAD model of higher orders, the first antedependence coefficient is markedly higher than the rest coefficients. Thus, we think that the SAD(1) model should be reasonable to fit the F2 mouse data in this example. Along with the constant innovation variance, this model was incorporated to approximate the structure of the (co)variance matrix for growth trajectories in the F2 mouse progeny.


View this table:
[in this window]
[in a new window]
 
Table 1 AICC information criteria for female mice dataa

 
The profile of the LR of the full versus reduced model across the entire genome estimated from the SAD-based model has three clear peaks on chromosomes 6, 7 and 10 (Fig. 2). These peaks correspond to the locations of the detected QTL, with the LR values 64, 60 and 40, respectively, well beyond the genomewide critical threshold, 32, at the significance level P = 0.05. The critical value for claiming the existence of QTL can be determined on the basis of the Bonferroni argument for the sparse-map case (Lander and Botstein, 1989) or by permutation tests proposed by Churchill and Doerge (1994). In this example, the empirical estimate of the critical value is obtained from 100 permutation tests.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 2 The profiles of the LR between the full and reduced (no QTL) model estimated from the SAD(1) (solid) and AR(1) models (dot) for body mass growth trajectories across the entire genome using the linkage map constructed from microsatellite markers (Vaughn et al., 1999). The genomic positions corresponding to the peaks of the curves are the MLEs of the QTL positions. The genome-wide threshold values for claiming the existence of QTL are given as the horizonal lines. Tick marks on the x-axis represent the positions of markers on the linkage group, whose names are given in Vaughn et al. (1999).

 
This SAD-based model provided estimates of the curve parameters as well as the innovation variance and antedependent coefficient that model the structure of the (co)variance matrix for each sex of the F2 progeny (Table 2). Estimated small sampling errors suggest that our model provides precise estimates of all the model parameters. Figure 3 illustrates the growth curves of three genotypes at each of the detected QTL separately for two different sexes drawn from the estimates of curve parameters in Table 2. Our model can test the additive and dominant effect of the QTL as well as their interaction effects with sex. All these age-dependent changes of genetic effects estimated by Equations (12)(16) are illustrated in Figure 4. The sex effect, s(t), increases rapidly with age, but the other effects appear to change with age at a much lesser extent. As expected, the LG/J strain contributes the body-increasing allele to the F2 progeny at all the three detected QTL. The additive effects, a(t), at these QTL increase with age (Fig. 4). The dominant effects of the QTL, d(t) and its interaction effects with the sex, Ias(t) and Ids(t), appear to change with age at lesser extents. Also, they display different dynamic patterns, depending on the QTL detected.


View this table:
[in this window]
[in a new window]
 
Table 2 The MLEs of the model parameters and their asymptotic standard errors in the parentheses under the SAD(1) model for different sexes of an F2 mouse progeny (composed of 259 males and 243 females)a

 


View larger version (13K):
[in this window]
[in a new window]
 
Fig. 3 Three growth curves each presenting a group of genotype, QQ (solid curves), Qq (dot curves) and qq (broken curves), in the male (red) and female (blue) mice at the QTL, detected by our SAD-based model, on chromosomes 6, 7 and 10.

 


View larger version (14K):
[in this window]
[in a new window]
 
Fig. 4 Dynamic changes of the sex effect, s(t), the additive, a(t), and dominant effect, d(t), of the QTL and the interaction effects, Ias and Ids, between the QTL and sex, detected by our SAD-based model, on chromosomes 6, 7 and 10.

 

View this table:
[in this window]
[in a new window]
 
Table 4 Reciprocal comparisons between the MLEs and sampling errors (in parentheses) of the parameters ({Omega}{ell},{Omega}mjk,{Omega}v) from the SAD(1)- and AR(1)-based model

 

    5 METHODOLOGICAL COMPARISONS: A SIMULATION
 TOP
 Abstract
 1 INTRODUCTION
 2 THE STRUCTURED ANTEDEPENDENCE...
 3 MODEL SELECTION
 4 APPLICATION
 5 METHODOLOGICAL COMPARISONS: A...
 6 DISCUSSION
 REFERENCES
 
The statistical behavior of our new model described in this article can be well investigated by comparing it with the AR(1)-based functional mapping model. The statistical properties of the AR(1)-based model have been studied through simulation and empirically (Ma et al., 2002; Wu et al., 2004a; b). Here, we perform a series of simulation studies to compare these two models.

Consider an F2 population with which a 40-cM long linkage group composed of three equidistant markers is constructed. A QTL that affects growth curves is located at 10 cM from the first marker on the linkage group. Assume that there are 500 F2 progeny, each measured at 10 equally spaced time points. Our simulation was based on four different patterns of covariance structures (Table 3). Patterns A and B conform to the AR(1) and SAD(1) model, respectively, whereas Pattern C does not follow either of these two models. The phenotypic and marker data simulated under each of these patterns were analyzed by both the AR(1)- and SAD(1)-based models, with the MLEs of unknown parameters ({Omega}l,{Omega}mjk,{Omega}v) and their sampling errors based on 100 simulation replicates given in Table 4.


View this table:
[in this window]
[in a new window]
 
Table 3 Three different patterns of the covariance matrix used to simulate the multivariate phenotypic data

 
The QTL location, growth curve parameters and covariance-structuring parameters can be equally precisely estimated for the data simulated under the AR(1) model by the SAD(1) and AR(1) models (Table 4). However, if the data structure follows the SAD(1) model, the estimation precision of the parameters is much better from the SAD(1) than from the AR(1) model. These comparisons suggest that the SAD(1)-based functional mapping model is statistically more robust than the AR(1) model; i.e. the results from the SAD(1) model are less data-dependent than the AR(1) model. It is interesting to note that both the models display similar estimation precisions for the data that follow either the SAD(1) model or the AR(1) model. This implies that for a practical dataset, whose covariance structure is unknown, both the models should be tested and that the growth QTL detected from the two models, if they are different, should be considered to be reasonable.


    6 DISCUSSION
 TOP
 Abstract
 1 INTRODUCTION
 2 THE STRUCTURED ANTEDEPENDENCE...
 3 MODEL SELECTION
 4 APPLICATION
 5 METHODOLOGICAL COMPARISONS: A...
 6 DISCUSSION
 REFERENCES
 
We have developed a new statistical model for functional mapping of QTL that affect growth curves. Functional mapping based on a longitudinal model has proven to be more powerful for QTL detection compared to a simple univariate analysis (Ma et al., 2002). In this article, functional mapping is incorporated by the SAD model (Núñez-Antón and Zimmerman, 2000; Pourahmadi, 1999) that specifies the non-stationarity of the (co)variances for longitudinal traits. It is intriguing to compare this SAD-based model with our previous AR(1)-based model under the assumptions of stationary variance and covariance (Wu et al., 2002; Ma et al., 2002).

Zhao et al. (2004a) reported the results of QTL mapping for growth trajectories in the same F2 mouse progeny as used in this study based on the functional mapping model integrated by the AR(1) structure. To compare the results from the two models, we use dot curves to draw the LR profile for Zhao et al.'s AR(1) model (Fig. 2). The AR(1) model discovered four QTL, as opposed to three detected by the SAD(1) model. Both the models identified two QTL on chromosomes 6 and 7, although the estimated locations of the QTL on chromosome 6 are different in the two models. The third QTL detected by the SAD(1) model is located on chromosome 10, whereas the other two QTL detected by the AR(1) model are located on chromosomes 11 and 15. Such pronounced discrepancies may be due to the difference in the nature of QTL control. For instance, if a time-increasing total variance is explained by a time-increasing differentiation triggered by a QTL, with the residual variance constant across time, then the AR model would work better than the SAD model. On the other hand, if the effects of QTL considered are not the sole source contributing to increased overall variances with time, the SAD model has more power to detect such QTL. We have used simulation studies to explain their differences. Although the SAD model is more robust than the AR model, they should can be viewed as alternative to one another. In practice, these two models should be used simultaneously to monitor the existence of growth QTL for the same dataset.

One of the major advantages of our model is that it can discern the changes in genetic actions and interactions of QTL for complex traits during ontogeny and has the potential to integrate developmental biology into quantitative genetics theory and methodology. Used in an F2 mouse population, our model has discovered the age-related change of genetic influence on body size during development. Similar findings in mice have been obtained by Atchley and Zhu (1997). Our model can be generalized to a dynamic mixture model to handle longitudinal data measured at uneven time intervals and with different measurement patterns among different individuals. It can be anticipated that the model proposed in this article and its extensions will handle various complexities of longitudinal data. It will have great implications for the design of an efficient early selection program in plant and animal breeding and for asking and addressing biological questions at the interface of genetics, development and evolution.


    Acknowledgments
 
We thank three anonymous referees for their constructive comments on this manuscript. This work was partially supported by a NIH grant DK52514 to J.M.C., and by the National Science Foundation of China to R.W. (09 95671). The publication of this manuscript is approved as journal series No.-10587 by the Florida Agricultural Experimental Station.

Received on October 6, 2004; revised on February 23, 2005; accepted on March 7, 2005

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 THE STRUCTURED ANTEDEPENDENCE...
 3 MODEL SELECTION
 4 APPLICATION
 5 METHODOLOGICAL COMPARISONS: A...
 6 DISCUSSION
 REFERENCES
 

    Atchley, W.R. and Zhu, J. (1997) Developmental quantitative genetics, conditional epigenetic variability and growth in mice. Genetics, 147, 765–776[Abstract].

    Carroll, R.J. and Ruppert, D. (1984) Power-transformations when fitting theoretical models to data. J. Am. Statist. Assoc., 79, 321–328[CrossRef].

    Churchill, G.A. and Doerge, R.W. (1994) Empirical threshold values for quantitative trait mapping. Genetics, 138, 963–971[Abstract].

    Daniels, M.J. and Pourahmadi, M. (2002) Bayesian analysis of covariance matrices and dynamic models for longitudinal data. Biometrika, 89, 553–566[Abstract/Free Full Text].

    Dempster, A.P., et al. (1977) Maximum likelihood from incomplet data via the EM algorithm. J. R. Statist. Soc. B, 39, 1–38.

    Diggle, P.J., Heagerty, P., Liang, K.Y., Zeger, S.L. Analysis of Longitudinal Data, (2002) , Oxford, UK Oxford University Press.

    Gabriel, K.R. (1962) Ante-dependence analysis of an ordered set of variables. Ann. Math. Statist., 33, 201–212.

    Hurvich, C.M. and Tsai, C.-L. (1989) Regression and time series model selection in small samples. Biometrika, 76, 297–307[Abstract/Free Full Text].

    Jaffrézic, F., et al. (2003) Structured antedependence models for genetic analysis of repeated measures on multiple quantitative traits. Genet. Res., 82, 55–65[CrossRef][ISI][Medline].

    Kirkpatrick, M. and Heckman, N. (1989) A quantitative genetic model for growth, shape, reaction norms, and other infinite-dimensional characters. J. Math. Biol., 27, 429–450[CrossRef][ISI][Medline].

    Lander, E.S. and Botstein, D. (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics, 121, 185–199[Abstract/Free Full Text].

    Louis, T.A. (1982) Finding the observed information matrix when using the EM algorithm. J. R. Statist. Soc. B, 44, 226–233.

    Ma, C.-X., et al. (2002) Functional mapping of quantitative trait loci underlying the character process: a theoretical framework. Genetics, 161, 1751–1762[Abstract/Free Full Text].

    Meng, X.-L. and Rubin, D.B. (1991) Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm. J. Am. Statist. Assoc., 86, 899–909[CrossRef].

    Nelder, J.A. and Mead, R. (1965) A simplex method for function minimization. Comput. J., 7, 308–313.

    Núñez-Antón, V. (1997) Longitudinal data analysis: non-stationary error structures and antedependence models. Appl. Stochast. Models Data Anal., 13, 279–287[CrossRef].

    Núñez-Antón, V. and Zimmerman, D.L. (2000) Modeling nonstationary longitudinal data. Biometrics, 56, 699–705[CrossRef][ISI][Medline].

    Pan, J.X. and Mackenzie, G. (2003) On modelling mean-covariance structures in longitudinal studies. Biometrika, 90, 239–244[Abstract/Free Full Text].

    Pletcher, S.D. and Geyer, C.J. (1999) The genetic analysis of age-dependent traits: modeling the character process. Genetics, 153, 825–835[Abstract/Free Full Text].

    Pletcher, S.D. and Jaffrézic, F. (2002) Generalized character process models: estimating the genetic basis of traits that cannot be observed and that change with age or environmental conditions. Biometrics, 58, 157–162[CrossRef][ISI][Medline].

    Pourahmadi, M. (1999) Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation. Biometrika, 86, 677–690[Abstract/Free Full Text].

    Pourahmadi, M. (2000) Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix. Biometrika, 87, 425–435[Abstract/Free Full Text].

    Schwarz, G. (1978) Estimating the dimension of a model. Ann. Statist., 6, 461–464.

    Vaughn, T.T., et al. (1999) Mapping quantitative trait loci for murine growth: a closer look at genetic architecture. Genet. Res., 74, 313–322[CrossRef][ISI][Medline].

    von Bertalanffy, L. (1957) Quantitative laws in metabolism and growth. Q. Rev. Biol., 32, 217–231[CrossRef][Medline].

    West, G.B., et al. (2001) A general model for ontogenetic growth. Nature, 413, 628–631[CrossRef][Medline].

    Wu, R.L., et al. (2002) A logistic mixture model for characterizing genetic determinants causing differentiation in growth trajectories. Genet. Res., 19, 235–245.

    Wu, R.L., et al. (2004a) A general framework for analyzing the genetic architecture of developmental characteristics. Genetics, 166, 1541–1551[Abstract/Free Full Text].

    Wu, R.L., et al. (2004b) Functional mapping of quantitative trait loci underlying growth trajectories using a transform-both-sides logistic model. Biometrics, 60, 729–738[CrossRef][ISI][Medline].

    Wu, W.B. and Pourahmadi, M. (2003) Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika, 90, 831–844[Abstract/Free Full Text].

    Zhao, W., et al. (2004a) A unifying statistical model for QTL mapping of genotype-sex interaction for developmental trajectories. Physiol. Genom., 19, 218–227[Abstract/Free Full Text].

    Zhao, W., et al. (2004b) A fast algorithm for functional mapping of complex traits. Genetics, 167, 2133–2137[Abstract/Free Full Text].

    Zimmerman, D.L. and Núñez-Antón, V. (1997) Structured antedependence models for longitudinal data. In Gregoire, T.G., Brillinger, D.R., Diggle, P.J., Russek-Cohen, E., Warren, W.G., Wolfinger, R. (Eds.). Modelling Longitudinal and Spatially Correlated Data. Methods, Applications, and Future Directions, , New York Springer-Verlag, pp. 63–76.

    Zimmerman, D.L. and Núñez-Antón, V. (2001) Parametric modeling of growth curve data: an overview (with discussion). Test, 10, 1–73.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
GeneticsHome page
R. Yang, H. Gao, X. Wang, J. Zhang, Z.-B. Zeng, and R. Wu
A Semiparametric Approach for Composite Functional Mapping of Dynamic Quantitative Traits
Genetics, November 1, 2007; 177(3): 1859 - 1870.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Wu, J. Yang, and R. Wu
Semiparametric functional mapping of quantitative trait loci governing long-term HIV dynamics
Bioinformatics, July 1, 2007; 23(13): i569 - i576.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
W. Zhao, H. Li, W. Hou, and R. Wu
Wavelet-Based Parametric Functional Mapping of Developmental Trajectories With High-Dimensional Data
Genetics, July 1, 2007; 176(3): 1879 - 1892.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Lin, H. Li, W. Hou, J. A. Johnson, and R. Wu
Modeling sequence sequence interactions for drug response
Bioinformatics, May 15, 2007; 23(10): 1251 - 1257.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
Y. Cui, J. Zhu, and R. Wu
Functional mapping for genetic control of programmed cell death
Physiol Genomics, May 16, 2006; 25(3): 458 - 469.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/10/2469    most recent
bti382v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in: