Skip Navigation


Bioinformatics Advance Access originally published online on December 20, 2005
Bioinformatics 2006 22(4):512-513; doi:10.1093/bioinformatics/btk012
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/4/512    most recent
btk012v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Browning, B. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Browning, B. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

FLOSS: flexible ordered subset analysis for linkage mapping of complex traits

B. L. Browning

Genetic Data Sciences, GlaxoSmithKline Five Moore Drive, P.O. Box 13398, Research Triangle Park NC 27709, USA


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 FEATURES
 PERFORMANCE
 IMPLEMENTATION
 REFERENCES
 

Summary: The FLOSS software package is a flexible framework for ordered subset analysis. FLOSS is specifically designed for use with the Merlin linkage analysis package, but FLOSS can be used with any linkage analysis software package that reports NPL Z-scores for each locus and family. When FLOSS is used with the Merlin linkage analysis package, one can use either non-parametric Z-scores or Kong and Cox linear allele sharing model LOD scores. Monte Carlo P-values are calculated using a permutation test with an efficient Besag–Clifford sequential stopping rule. FLOSS also has a flexible tool for assigning family covariate scores from Merlin input files. FLOSS includes user documentation and is written in Java for easy portability. The FLOSS source code is documented and designed to be extensible.

Availability: http://www.stat.auckland.ac.nz/~browning/floss/floss.htm

Contact: brian_browning1{at}yahoo.com


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 FEATURES
 PERFORMANCE
 IMPLEMENTATION
 REFERENCES
 
Ordered subset analysis is a powerful tool for linkage analysis of traits characterized by genetic heterogeneity (Hauser et al., 2004). When there is genetic heterogeneity underlying a trait, a genetic marker with disease predisposing variants will not show linkage in families in which the marker is not involved in the disease aetiology. Ordered subset analysis uses covariate information to identify a more homogenous subset of families that gives increased evidence for linkage. The significance of the increased evidence of linkage is assessed under the null hypothesis that the family linkage scores at each locus and the family covariate information are independent. The covariates can be any ordinal or continuous variables related to the trait of interest.

Ordered subset analysis can give increased evidence for linkage in a region and more precise location estimates of the trait-predisposing locus. Ordered subset analysis can also be used to identify candidate genes in pathways connecting the covariate with the trait, and it can be used to identify an endophenotype for case–control linkage disequilibrium analysis.

Before performing an ordered subset analysis, the family members' covariate variables are used to assign a covariate score to each family. Families are then ordered by their covariate score, and linkage analysis is performed on all subsets of families with the k smallest or k largest covariate scores. For example, if there are N families ordered by their family covariate scores, linkage analysis would be performed using families 1, 2,...,k and families k, k + 1,...,N for 1 ≤ k ≤ N. The maximum linkage score is identified from the 2N – 1 linkage analyses. If more than one family share the same family covariate score, one assigns families to ranks according to their family covariate scores so that families with identical covariate scores have the same rank. Linkage analysis is then performed on all families in the k smallest ranks and on all families in the k largest ranks.

Since ordered subset analysis reports the maximum of 2N 1 linkage scores, one cannot immediately assign a P-value to the maximum linkage score under the null hypothesis that the family linkage scores at each locus are independent of the family covariate scores. Instead, a permutation test is used to determine a Monte Carlo P-value. The permutation test compares the maximum ordered subset linkage score for the covariate score ordering of the families with the maximum ordered subset linkage scores obtained for random orderings of the families.

There is also an alternate form of ordered subset analysis that finds the maximum linkage score maximized over all subsets of families with consecutive covariate scores. For example, if there are N families ordered by their family covariate scores, linkage analysis is performed using families i, i + 1,...,j for 1 ≤ i ≤ j ≤ N. We include this option in FLOSS, but we think this option is less useful in practice since the increased number of ordered subsets makes it more difficult to detect disease loci associated with unusually low or high covariate values and requires substantially more computing time.


    FEATURES
 TOP
 ABSTRACT
 INTRODUCTION
 FEATURES
 PERFORMANCE
 IMPLEMENTATION
 REFERENCES
 
The only other ordered subset analysis program of which we are aware is the OSA program (Hauser et al., 2004). FLOSS can be used with larger pedigrees than OSA since FLOSS uses output from the Merlin linkage analysis package (Abecasis et al., 2002), but OSA is restricted to pedigrees that Genehunter Plus (Kong and Cox, 1997) can analyze.

FLOSS requires two input files: a covariate file and a linkage score file. The covariate file contains the family covariate scores for all families and all covariates. FLOSS analyzes all covariates in the covariate file. The linkage score file is a Merlin ‘.lod’ output file created using Merlin's –npl and –perFamily options (Abecasis et al., 2002).

FLOSS can also be used with other linkage analysis programs by transforming the linkage scores for each family and locus to Merlin .lod file format. FLOSS uses three columns of the Merlin .lod file (family, location and Z-score). The remaining five columns in the Merlin .lod file can contain arbitrary entries.

FLOSS includes a program, called cov, that automates the creation of the covariate input file. The cov program uses the covariate data for each family member contained in Merlin data and pedigree files to assign a covariate score to each family.

The cov program applies a statistic to a subset of the pedigree. The statistic can be the minimum, maximum or mean. For the subset, users can choose to use (1) all family members, or (2) affected family members, or (3) unaffected family members with a first degree affected relative. The third choice is applicable when treatment for the disease may affect the covariate variables for affected family members. The third choice uses unaffected members who are first degree relatives of affected members instead of the affected members.

Users of the cov program can also specify a minimum size for the subset of family members used to calculate the family covariate score. If the number of family members with non-missing covariate data that are used to calculate the family covariate score is smaller than the minimum subset size then the family covariate score is reported as missing and the family is not used in the ordered subset analysis for that covariate. This allows the user to avoid removing families from the FLOSS covariate file or the Merlin pedigree file.

When used with Merlin, FLOSS can calculate linkage scores using either non-parametric (NPL) Z-scores (Kruglyak et al., 1996) or LOD scores from the Kong and Cox (1997) linear allele sharing model (ASM). The family NPL Z-scores in the Merlin ‘.lod’ file and the pedigree structure in the Merlin pedigree file are used to calculate the linear ASM LOD scores (Kong and Cox, 1997).

The permutation P-value is calculated using the efficient Besag–Clifford sequential stopping rule (Besag and Clifford, 1991) so that more permutations are used to estimate small P-value than are used to estimate large P-values. Typically, the permutation test will stop after 20 random orderings are found that give ordered subset linkage scores greater than or equal to the score found using the covariate ordering of the families. The user may set parameters to specify the minimum number of permutations (default = 100) or the maximum number of permutations (default = 10 000) used. When using the default settings for maximum and minimum number of permutations, the expected number of permutations is 192 under the null hypothises of independance between the family covariate scores and the family linkage scores at each locus (Besag and Clifford, 1991).

The primary output file gives summary information for each covariate and includes the change in linkage score between the entire set and ordered subset with the highest linkage score, the maximum linkage score for this ordered subset, the optimal interval of family covariate scores and the Monte Carlo P-values with 95% confidence intervals. The summary file is designed to be easily interpreted and is self-documented with the documentation included at the end of the file.

Other output files list the families in the optimal covariate-ordered subset that gives the maximum linkage score, and the per-locus linkage scores for both this optimal subset and the complete set of families. These files are formatted so that they can be easily read by spreadsheet programs or statistical software packages.

A log file gives the maximum linkage score for each ordered subset considered.


    PERFORMANCE
 TOP
 ABSTRACT
 INTRODUCTION
 FEATURES
 PERFORMANCE
 IMPLEMENTATION
 REFERENCES
 
When examining all ordered subsets with the k smallest or k largest covariate scores, the running time for FLOSS is

  • independent of the size and structure of the pedigrees
  • linear in the the number of markers
  • linear in the number of families
  • linear in the number of permutations.

When using FLOSS with ASM LOD scores on a sample of 626 families with 211 loci, the running time was 45 min for 200 permutations (13.5 s per permutation) when using a 1.2 GHz processor on a Sun E1280. If faster performance is needed, then NPL Z-scores can be used. The running time for FLOSS using NPL Z-scores is significantly less than the running time using ASM LOD scores.

If there are N families with distinct family covariate scores and if FLOSS is used with the option that examines all ordered subsets with consecutive covariate scores instead of all ordered subsets with the k smallest or largest scores, the running time for FLOSS will increase by a factor of approximately N/2.


    IMPLEMENTATION
 TOP
 ABSTRACT
 INTRODUCTION
 FEATURES
 PERFORMANCE
 IMPLEMENTATION
 REFERENCES
 
The FLOSS program is written in Java enabling easy portability, and FLOSS includes user documentation and source code documentation. The program is designed to be extensible.


    Acknowledgments
 
The author thanks Meg Ehm and GlaxoSmithKline for their support, and also thanks Michael Wagner, Brian Reck, Rodney Winkler, Xiaobin Li and Achamma Philip for their helpful comments and feedback.

Conflict of Interest: none declared.


    FOOTNOTES
 
Present Address: University of Auckland, Nutrigenomics Program, Private Bag 92019, Auckland, New Zealand

Associate Editor: Frank Dudbridge

Received on December 14, 2005; accepted on December 15, 2005

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 FEATURES
 PERFORMANCE
 IMPLEMENTATION
 REFERENCES
 

    Abecasis, G.R., et al. (2002) Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet, . 30, 97–101[CrossRef][Web of Science][Medline].

    Besag, J. and Clifford, P. (1991) Sequential Monte Carlo P-values. Biometrika, 78, 301–304[Abstract/Free Full Text].

    Hauser, E.R., et al. (2004) Ordered subset analysis in genetic linkage mapping of complex traits. Genet. Epidemiol, 27, 53–63[CrossRef][Web of Science][Medline].

    Kong, A. and Cox, N.J. (1997) Allele-sharing models: LOD scores and accurate linkage tests. Am. J. Hum. Genet, . 61, 1179–1188[CrossRef][Web of Science][Medline].

    Kruglyak, L., et al. (1996) Parametric and nonparametric linkage analysis: a unified multipoint approach. Am. J. Hum. Genet, . 58, 1347–1363[Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J Am Coll CardiolHome page
R. B. Hinton, L. J. Martin, S. Rame-Gowda, M. E. Tabangin, L. H. Cripe, and D. W. Benson
Hypoplastic Left Heart Syndrome Links to Chromosomes 10q and 6q and Is Genetically Related to Bicuspid Aortic Valve
J. Am. Coll. Cardiol., March 24, 2009; 53(12): 1065 - 1071.
[Abstract] [Full Text] [PDF]


Home page
DiabetesHome page
S. C. Elbein, S. K. Das, D. M. Hallman, C. L. Hanis, and S. J. Hasstedt
Genome-Wide Linkage and Admixture Mapping of Type 2 Diabetes in African American Families From the American Diabetes Association GENNID (Genetics of NIDDM) Study Cohort
Diabetes, January 1, 2009; 58(1): 268 - 274.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/4/512    most recent
btk012v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Browning, B. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Browning, B. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?