Skip Navigation


Bioinformatics Advance Access originally published online on June 5, 2008
Bioinformatics 2008 24(15):1737-1739; doi:10.1093/bioinformatics/btn258
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/15/1737    most recent
btn258v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zhang, M.
Right arrow Articles by Zhang, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zhang, M.
Right arrow Articles by Zhang, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

LOT: a tool for linkage analysis of ordinal traits for pedigree data

Meizhuo Zhang 1, Rui Feng 2, Xiang Chen 1, Buqu Hu 1 and Heping Zhang 1,*

1Yale University School of Medicine, New Haven, CT 06520-8034 and 2University of Alabama at Birmingham, Birmingham, Alabama 35294, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND IMPLEMENTATION
 3 EXAMPLE USAGE
 4 SIGNIFICANCE AND CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary: Existing linkage-analysis methods address binary or quantitative traits. However, many complex diseases and human conditions, particularly behavioral disorders, are rated on ordinal scales. Herein, we introduce, LOT, a tool that performs linkage analysis of ordinal traits for pedigree data. It implements a latent-variable proportional-odds logistic model that relates inheritance patterns to the distribution of the ordinal trait. The likelihood-ratio test is used for testing evidence of linkage.

Availability: The LOT program is available for download at http://c2s2.yale.edu/software/LOT/

Contact: heping.zhang{at}yale.edu


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND IMPLEMENTATION
 3 EXAMPLE USAGE
 4 SIGNIFICANCE AND CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Linkage analysis has been proven useful in mapping genes for human diseases, such as breast cancer (Claus et al., 1990; Easton et al., 1993; Hall et al., 1990). Many human disease phenotypes are rated on discrete, ordinal scales. Typically the ordinal phenotypes are dichotomized into binary traits before such data can be analyzed using standard linkage-analysis programs such as GENEHUNTER (Kruglyak et al., 1996). Loss of power for linkage analysis due to dichotomization of ordinal traits has been reported (Corbett et al., 2004; Feng et al., 2004). Although association studies have gained momentum in genetic analysis, numerous valuable datasets such as COGA and Framingham Heart Study (Atwood et al., 2002) have been cumulated from linkage studies and hence it remains very important to develop effective methods and software to analyze data from linkage studies.

We have developed a tool, LOT, for linkage analysis of ordinal trait for pedigree data based on the work of Feng et al., (2004) with some modifications and improvements. LOT detects linkage between a marker to an ordinal trait locus by examining whether the inheritance pattern of the marker, which can be inferred from the pedigree data, is associated with the trait using a latent-variable proportional-odds logistic model.


    2 METHODS AND IMPLEMENTATION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND IMPLEMENTATION
 3 EXAMPLE USAGE
 4 SIGNIFICANCE AND CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
2.1 Model
LOT first infers the inheritance pattern of a pedigree by means of inheritance vectors, v. The derivation of the inheritance vectors is independent of the type (continuous or categorical) of the trait. LOT implements the same method used in Kruglyak et al., (1996). In the next step, LOT uses a proportional-odds logistic model, with the addition of two types of latent random variables, to detect association between a marker and a disease locus. The two types of latent variables, U1 and U2, represent: (1) the common genetic or environmental factors in a family that are not observed through the covariates and (2) the genetic susceptibility introduced by the family founders and transmitted to their offspring, respectively. Conditional on all of the latent variables and inheritance vectors, within the i-th family, the traits of all non-founders are independent. Let superscript i index families and subscript j index non-founders in a family. Given a trait Y taking an ordinal value from k=0, 1,..., K(K≥1), the trait of the j-th non-founder in the i-th family follows the distribution:


Formula

where x is the vector of covariates that is available for each study subject, vi is the inheritance vector at the disease gene locus for the i-th family, β is the vector of parameters reflecting the covariate effects on the trait, {alpha}k is the trait-level-dependent intercept and {gamma}=({gamma}1,{gamma}2)' indicates the familial and genetic contributions to the trait. We refer to Feng et al., (2004) for more details. The EM algorithm (Dempster et al., 1977) is used to find the maximum-likelihood estimation (MLE) of the parameters. After obtaining the MLEs of the parameters, a likelihood-ratio test (LRT) is used for determining the significance level of linkage. The null hypothesis is that a disease gene is not in linkage with the marker, i.e. H0: {gamma}2=0. Thus, the numerator and denominator of LRT are the maximum likelihood in the presence of a major disease gene linked to the current marker or intermarker locus and the maximum likelihood in the absence of linkage, respectively.

2.2 LOT and GENEHUNTER
LOT and GENEHUNTER (parametric analysis) have equivalent parametrizations when the trait is binary. For clarity, let us assume no residual familial and genetic effects and no covariates (i.e. no U1 and x). For the parametric analysis in GENEHUNTER, the likelihood at a location t can be written as


Formula

where N is the number of families, Vi is the set of all possible inheritance vectors for the i-th family, f=(f0, f1, f2) denotes the fixed penetrance parameters that must be specified beforehand,


Formula

and DFormula is the number of disease alleles for the j-th individual in the i-th family. {theta}2 corresponds to the disease allele frequency. In LOT, for any given {theta}2, {alpha} and {gamma}2 that control the penetrance of the binary trait as follows,


Formula

where ni is the number of non-founders in the i-th family. Thus, exp ({alpha}0)/1+exp ({alpha}0), exp ({gamma}1+{alpha}0)/1+exp ({gamma}1+{alpha}0) and exp (2{gamma}1+{alpha}0)/ 1+exp (2{gamma}1+{alpha}0) represent the equivalent parameterization of the pene-trance in the model used in LOT to that in GENEHUNTER.

2.3 Ascertainment
Families are not always ascertained at random, and often through members who have certain health conditions. For example, in the hoarding study presented below, all families included at least two siblings with Gilles de la Tourette syndrome. A non-random ascertainment may result in over-sampling subjects affected with diseases from the original population. Parameter estimation may be biased and proper adjustment for ascertainment should be considered in this circumstance, as discussed in Wang and Zhang (2007). Because there are so many schemes of ascertainment and in many cases, the relationship between the ascertainment scheme and the trait of interest may be poorly characterized. For these and other reasons, like other linkage analysis programs, LOT does not correct for ascertainment, although in theory a well-characterized ascertainment scheme can be incorporated in the likelihood and hence accommodated in LOT. Users are advised to make a serious effort to document the ascertainment scheme and scrutinize their analysis, for example, by simulation. We refer to Feng and Zhang (2006) for details.

2.4 Implementation
LOT, implemented in C and Java, comes with a user-friendly graphic user interface (GUI) on Windows and Linux. It can be executed from command line on Windows, Linux and Mac OS X.


    3 EXAMPLE USAGE
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND IMPLEMENTATION
 3 EXAMPLE USAGE
 4 SIGNIFICANCE AND CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
LOT supports input files in a format similar to the standard GENEHUNTER format. Two input files are required: a locus data file and a pedigree file. The locus file contains information on genetic distances between markers, number of alleles at each locus and their frequencies. The pedigree file provides information about the structure of each pedigree, the values of the ordinal trait, the genotype of each marker for each individual and the value of the covariates, if any. For formats and detailed instruction please refer to the Supplementary information website.

LOT produces two types of output: a table and a diagram. The first two columns in the table contain the names of the markers and the map position of the markers and intermarker locations, respectively. The next three columns contain the complete (natural) log-likelihood without considering the latent variables (‘Without Us’), the log-likelihood considering only U1 (‘With U1’) and the log-likelihood considering both U1 and U2 (‘With U1 & U2’), computed for each marker and intermarker location. This tabulated output is automatically saved as a tab delimited plain-text file. The graphic output displays the significance level of linkage of each location based on the result of the likelihood estimation. Users have the option to save the diagram as a PNG image. Currently, the graphical output is only available for versions of LOT with GUI. In addition to the final output, LOT interactively prints onto the main window the progress of the computation.

Figure 1 displays the graphical output produced by LOT for a hoarding study dataset (Feng et al., 2004). The response in this study is an ordinal trait that takes the value of 0, 1 and 2 based on the hoarding symptoms of a patient. Zero was recorded if both of the hoarding items on the Yale–Brown Obsessive-Compulsive Scale symptom checklist were rated as present for the patient, one if only one item was present and two if both items were absent. Shown in the figure is the result from the markers on chromosome 5. The horizontal axis indicates map locations on the chromosome and the vertical axis stands for the difference in log-likelihood between the model considering only U1 and the model considering both U1 and U2. The green curve denotes the gain in log-likelihood when both latent variables are included in the computation compared to when only the familial and genetic factors (U1) are considered. The blue line and red line indicates the thresholds for suggestively significant linkage and significant linkage, respectively. The thresholds are calculated following the definition of suggestive linkage and significant linkage suggested by Lander and Kruglyak (1995) based on the assumption that the total number of markers in a genome-wide linkage scan is about 400. This is usually the case for microsatellite markers. These thresholds provide a reference for the users. Users are encouraged to recalculate the thresholds according their study settings. As shown in Figure 1, at any position where the green curve exceeds the threshold for suggestive linkage the name of the marker is printed on the graph in black; if the green curve exceeds the threshold for significant linkage, the marker name is printed in bold letters.


Figure 1
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Graphical output from LOT for a hoarding study dataset. The blue and red lines indicate, respectively, the thresholds of significant and suggestive evidence for linkage between a marker and the trait locus. The thresholds were computed empirically by generating data under the null hypothesis using permutation and 370 microsatellite markers from the hoarding study.

 
The computational time of LOT grows linearly in the number of markers. The computational time for computing the inheritance vectors grows exponentially in the number of non-founders within a pedigree and linearly in the number of pedigrees when all pedigrees have the same structure. The computational time of the remaining part of the program grows quadratically in the number of samples. While running the LOT program, the bottleneck in computational time is the remaining part. Thus, practically, the estimated running time of the LOT program grows quadratically with the number of samples. In the above example, 223 samples and 24 markers were analyzed on a desktop workstation with Intel Pentium D CPU 3.20 GHz processor and 3.50 GB of RAM. The computation was completed in 211 s. In another analysis with 3074 samples and 32 markers, it took 49 357 s to complete on the same machine.


    4 SIGNIFICANCE AND CONCLUSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND IMPLEMENTATION
 3 EXAMPLE USAGE
 4 SIGNIFICANCE AND CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
LOT provides a new means to perform linkage analysis of pedigree data when the target phenotype is ordinal. The severity of many diseases is rated on ordinal scales. LOT can be employed to study the genetic basis of such complex traits. It implements a latent-variable proportional-odds logistic model that allows analyzing the ordinal traits directly as opposed to dichotomizing the ordinal traits into binary traits and analyzing them using standard linkage analysis software. Analyzing ordinal traits directly circumvents loss of information and consequent loss of power caused by dichotomization. When applied to a binary trait, LOT produces results that are comparable to GENEHUNTER. LOT provides intuitive results by visualizing the significance level of linkage between the markers and the disease trait.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND IMPLEMENTATION
 3 EXAMPLE USAGE
 4 SIGNIFICANCE AND CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We thank the ‘Yale University Biomedical High Performance Computing Center’ (NIH grant: RR19895) for computational resources.

Funding: This research is supported in part by grants K02DA017713 and R01DA016750 from the National Institutes on Drug Abuse.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Alex Bateman

Received on February 27, 2008; revised on June 2, 2008; accepted on June 4, 2008

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND IMPLEMENTATION
 3 EXAMPLE USAGE
 4 SIGNIFICANCE AND CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Atwood LD, et al. Genomewide linkage analysis of body mass index across 28 years of the Framingham Heart Study. Am. J. Hum. Genet (2002) 70:1044–1050.[CrossRef][Web of Science][Medline]

    Claus EB, et al. Age at onset as an indicator of familial risk of breast cancer. Am. J. Epidemiol (1990) 131:961–972.[Abstract/Free Full Text]

    Corbett J, et al. Power loss for linkage analysis due to the dichotomization of trichotomous phenotypes. Hum. Heredity (2004) 57:21–27.[CrossRef][Web of Science][Medline]

    Dempster AP, et al. Maximum likelihood estimation from incomplete data via the EM algorithm. J. R. Stat. Soc. B (1977) 39:1–38.

    Easton DF, et al. Genetic linkage analysis in familial breast and ovarian cancer: results from 214 families. Am. J. Hum. Genet (1993) 52:678–701.[Web of Science][Medline]

    Feng R, et al. Linkage analysis of ordinal traits for pedigree data. Proc. Natl Acad. Sci. USA (2004) 101:16739–16744.[Abstract/Free Full Text]

    Feng R, Zhang HP. Ascertainment adjustment in genetic studies of ordinal traits. Hum. Genet (2006) 119:429–435.[CrossRef][Web of Science][Medline]

    Hall JM, et al. Linkage of early-onset familial breast cancer to chromosome. Science (1990) 250:1684–1689.[Abstract/Free Full Text]

    Kruglyak L, et al. Parametric and nonparametric linkage analysis: a unified multipoint approach. Am. J. Hum. Genet (1996) 58:1347–1363.[Web of Science][Medline]

    Lander E, Kruglyak L. Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat. Genet (1995) 11:241–247.[CrossRef][Web of Science][Medline]

    Wang X, Zhang H. Ascertainment in genetics studies. In. In: Encyclopedia of Life Sciences (2007) last accessed date September 28, 2007. Chichester: John Wiley & Sons, Ltd. Available at http://www.els.net/.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
24/15/1737    most recent
btn258v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zhang, M.
Right arrow Articles by Zhang, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zhang, M.
Right arrow Articles by Zhang, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?