Skip Navigation


Bioinformatics Advance Access originally published online on January 19, 2006
Bioinformatics 2006 22(6):773-774; doi:10.1093/bioinformatics/btk031
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/6/773    most recent
btk031v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (41)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Adie, E. A.
Right arrow Articles by Pickard, B. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Adie, E. A.
Right arrow Articles by Pickard, B. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

SUSPECTS: enabling fast and effective prioritization of positional candidates

E. A. Adie *, R. R. Adams , K. L. Evans , D. J. Porteous and B. S. Pickard

Medical Genetics Section, School of Molecular and Clinical Medicine, University of Edinburgh EH4 2XU UK

*To whom correspondence should be addressed at MMC, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 PRIORITIZING CANDIDATES WITH...
 METHODOLOGY
 COMPARATIVE PERFORMANCE
 REFERENCES
 

Summary: SUSPECTS is a web-based server which combines annotation and sequence-based approaches to prioritize disease candidate genes in large regions of interest. It uses multiple lines of evidence to rank genes quickly and effectively while limiting the effect of annotation bias to significantly improve performance.

Availability: SUSPECTS is freely available at http://www.genetics.med.ed.ac.uk/suspects/

Contact: euan.adie{at}ed.ac.uk

Supplementary information: A quick-start guide in Macromedia Flash format is available at http://www.genetics.med.ed.ac.uk/suspects/help.shtml and Excel spreadsheets detailing the comparative performance of the software are included as Supplementary material.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 PRIORITIZING CANDIDATES WITH...
 METHODOLOGY
 COMPARATIVE PERFORMANCE
 REFERENCES
 
When searching for the genetic basis of disease the regions of interest identified through complex-trait linkage studies regularly exceed 30 cM in size and can contain hundreds of genes (McCarthy et al., 2003). Existing tools to help researchers to prioritize candidates for further study can be separated into two distinct classes; those based on functional annotation (Perez-Iratxeta et al., 2002; Freudenberg et al., 2002; Van Driel et al., 2003; Turner et al., 2003; Tiffin et al., 2005) and those based on sequence features (Adie et al., 2005; Lopez-Bigas et al., 2004).

Methods based on functional annotation can suffer from annotation bias as they are unable to deal with genes lacking sufficiently detailed annotation. Sequence-based methods make use of intrinsic characteristics of genes like length, homology to genes in other species and base composition. As these characteristics can be readily computed from sequence they avoid the problem of annotation bias. However, sequence-based methods prioritize genes on the basis of their potential for involvement in disease in general rather than involvement in the specific disease of interest to the user.

SUSPECTS is a novel, consolidated approach that combines the increased precision of annotation-based methods with the better recall of sequence-based methods, avoiding the problems outlined above. Given a set of existing candidate genes for a particular complex or oligogenic disease, it effectively automates further candidate gene selection from large regions on the principle that genes involved in that disease will tend to share the same or similar annotation, reflecting common biological pathways.


    PRIORITIZING CANDIDATES WITH SUSPECTS
 TOP
 ABSTRACT
 INTRODUCTION
 PRIORITIZING CANDIDATES WITH...
 METHODOLOGY
 COMPARATIVE PERFORMANCE
 REFERENCES
 
Users of SUSPECTS can enter a region of interest by specifying flanking markers, chromosomal coordinates or bands. Alternatively, the software will examine a region of interest automatically centred on a single marker.

Users then enter the name of the disease to be considered; the software will automatically retrieve genes implicated in that disorder from OMIM (Hamosh et al., 2002), HGMD (Cooper et al., 1998) and GAD (Becker et al., 2004). Alternatively users can manually enter a list of genes thought to be involved in pathogenesis of the disease. These genes are known as the ‘training set’.

Each positional candidate gene is then scored automatically (see Methodology). Higher scores represent better candidates. The user is presented with a graphical overview of the region of interest (Fig. 1). The graphical overview is a hyperlinked image map that can be used to obtain more detailed information about each candidate gene and the reasoning behind its score. The list of candidate genes ranked by score is presented as a table underneath the graphical overview.


Figure 1
View larger version (15K):
[in this window]
[in a new window]
 
Fig. 1 Graphical overview of results produced by SUSPECTS. SUSPECTS presents the user with a graphical overview of the region of interest. Each gene in the region is represented as a coloured 3D shape. The height, width and colour of these shapes represent the score, number of lines of evidence contributing to that score and the literature-based relevance of the gene, respectively. Literature-based relevance is determined by searching PubMed (shapes are blue if the gene that they represent are mentioned in abstracts containing the name of the disease under study and orange otherwise). Each shape is a hyperlink to more detailed information about that gene further down the results page.

 

    METHODOLOGY
 TOP
 ABSTRACT
 INTRODUCTION
 PRIORITIZING CANDIDATES WITH...
 METHODOLOGY
 COMPARATIVE PERFORMANCE
 REFERENCES
 
Each gene in the region of interest is scored on its suitability as a candidate for further study based on four lines of evidence; first by Prospectr (Adie et al., 2005) on the basis of its sequence features, second by the extent of coexpression with the training set based on GNF expression data (Su et al., 2002), third by the number of rare (found in <5% of all proteins) Interpro domains shared with the training set and finally by the level of semantic similarity (Lord et al., 2003) that the GO terms assigned to it share with the GO terms assigned to genes in the training set.

The four scores are then combined. Each score is weighted depending on the amount of information available for each line of evidence. If little or no information is available then the importance of that score is decreased accordingly. This ensures that the scores of genes which lack sufficiently detailed GO terms or expression profiles do not suffer from annotation bias. The final score ranges from 0 to 100 where 100 represents a perfect match between the candidate gene and all genes in the training set.


    COMPARATIVE PERFORMANCE
 TOP
 ABSTRACT
 INTRODUCTION
 PRIORITIZING CANDIDATES WITH...
 METHODOLOGY
 COMPARATIVE PERFORMANCE
 REFERENCES
 
Approaches based on functional annotation rely on good quality information being available for each possible candidate gene. Conversely, SUSPECTS is able to prioritize all genes including those which lack detailed GO, domain or expression data, although when available those lines of evidence contribute favourably to overall performance.

The performance of SUSPECTS was tested with a set of oligogenic and complex disorders including Alzheimer's disease, hypertension, autism and systemic lupus erythematosus. The set is derived from that used by Turner et al. to test POCUS, an annotation-based classifier (Turner et al., 2003).

At least three implicated genes for each disease were available. For each implicated gene, a region of interest was created containing the implicated gene itself (the ‘target gene’) and every gene within 7.5 Mb on either side. On an average each region of interest contained 155 genes. An associated training set was then created containing the remaining implicated genes for each disorder.

We first ranked each region of interest using a classifier based on sequence features alone (Prospectr). On average the target gene was in the top 31.23% of the resulting ranked lists of candidates and in the top 5% of those lists 20 times out of 155 (13%).

In comparison, on average the target gene was in the top 12.93% of the ranked list from SUSPECTS, which took both the region of interest and the training set as input in each case. The target gene was in the top 5% of the ranked list 87 times out of 155 (56%). The test results for both the sequence features classifier and SUSPECTS have been made available as Supplementary information.

In conclusion, SUSPECTS significantly improves on the performance of candidate prioritization methods which use annotation or sequence data alone and is of value to researchers faced with large regions of interest. It is fast, easy to use and freely available on the World Wide Web at http://www.genetics.med.ed.ac.uk/suspects/

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Satoru Miyano

Received on October 26, 2005; revised on December 12, 2005; accepted on December 28, 2005

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 PRIORITIZING CANDIDATES WITH...
 METHODOLOGY
 COMPARATIVE PERFORMANCE
 REFERENCES
 

    Adie, E.A., et al. (2005) Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinformatics, 6, 55[CrossRef][Medline].

    Becker, K.G., et al. (2004) The Genetic Association Database. Nat. Genet, . 36, 431–432[CrossRef][Web of Science][Medline].

    Cooper, D.N., et al. (1998) The human gene mutation database. Nucleic Acids Res, . 26, 285–287[Abstract/Free Full Text].

    Freudenberg, J. and Propping, P. (2002) A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics, 18, S110–S115[Abstract].

    Hamosh, A., et al. (2002) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res, . 30, 52–55[Abstract/Free Full Text].

    Lopez-Bigas, N. and Ouzounis, C.A. (2004) Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res, . 32, 3108–3114[Abstract/Free Full Text].

    Lord, P.W., et al. (2003) Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics, 19, 1275–1283[Abstract/Free Full Text].

    McCarthy, M., et al. (2003) New methods for finding disease-susceptibility genes: impact and potential. Genome Biol, . 4, 119[CrossRef][Medline].

    Perez-Iratxeta, C., et al. (2002) Association of genes to genetically inherited diseases using data mining. Nat. Genet, . 31, 316–319[Web of Science][Medline].

    Su, A., et al. (2002) Large-scale analysis of the human and mouse transcriptomes. Proc. Natl Acad. Sci. USA, 99, 4465–4470[Abstract/Free Full Text].

    Tiffin, N., et al. (2005) Integration of text- and data-mining using ontologies successfully selects disease gene candidates. Nucleic Acids Res, . 33, 1544–1552[Abstract/Free Full Text].

    Turner, F., et al. (2003) POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol, . 4, R75[CrossRef][Medline].

    Van Driel, M.A., et al. (2003) A new web-based data mining tool for the identification of candidate genes for human genetic disorders. Eur. J. Hum. Genet, . 11, 57–63[CrossRef][Web of Science][Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
J. Sun, P. Jia, A. H. Fanous, B. T. Webb, E. J.C.G. van den Oord, X. Chen, J. Bukszar, K. S. Kendler, and Z. Zhao
A multi-dimensional evidence-based candidate gene prioritization approach for complex diseases-schizophrenia as a case
Bioinformatics, October 1, 2009; 25(19): 2595 - 6602.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Chen, E. E. Bardes, B. J. Aronow, and A. G. Jegga
ToppGene Suite for gene list enrichment analysis and candidate gene prioritization
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W305 - W311.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Yoshida, Y. Makita, N. Heida, S. Asano, A. Matsushima, M. Ishii, Y. Mochizuki, H. Masuya, S. Wakana, N. Kobayashi, et al.
PosMed (Positional Medline): prioritizing genes with an artificial neural network comprising medical documents to accelerate positional cloning
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W147 - W152.
[Abstract] [Full Text] [PDF]


Home page
Plant Cell PhysiolHome page
Y. Makita, N. Kobayashi, Y. Mochizuki, Y. Yoshida, S. Asano, N. Heida, M. Deshpande, R. Bhatia, A. Matsushima, M. Ishii, et al.
PosMed-plus: An Intelligent Search Engine that Inferentially Integrates Cross-Species Information Resources for Molecular Breeding of Plants
Plant Cell Physiol., July 1, 2009; 50(7): 1249 - 1259.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Ortutay and M. Vihinen
Identification of candidate disease genes by integrating Gene Ontologies and protein-interaction networks: case study of primary immunodeficiencies
Nucleic Acids Res., February 1, 2009; 37(2): 622 - 628.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Yilmaz, P. Jonveaux, C. Bicep, L. Pierron, M. Smail-Tabbone, and M.D. Devignes
Gene-disease relationship discovery based on model-driven data integration and database view definition
Bioinformatics, January 15, 2009; 25(2): 230 - 236.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
N. Tiffin, I. Okpechi, C. Perez-Iratxeta, M. A. Andrade-Navarro, and R. Ramesar
Prioritization of candidate disease genes for metabolic syndrome by computational analysis of its defining phenotypes
Physiol Genomics, September 17, 2008; 35(1): 55 - 64.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. F. Saccone, N. L. Saccone, G. E. Swan, P. A. F. Madden, A. M. Goate, J. P. Rice, and L. J. Bierut
Systematic biological prioritization after a genome-wide association study: an application to nicotine dependence
Bioinformatics, August 15, 2008; 24(16): 1805 - 1811.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Yu, S. Van Vooren, L.-C. Tranchevent, B. De Moor, and Y. Moreau
Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining
Bioinformatics, August 15, 2008; 24(16): i119 - i125.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L.-C. Tranchevent, R. Barriot, S. Yu, S. Van Vooren, P. Van Loo, B. Coessens, B. De Moor, S. Aerts, and Y. Moreau
ENDEAVOUR update: a web resource for gene prioritization in multiple species
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W377 - W384.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Q. Xiong, Y. Qiu, and W. Gu
PGMapper: a web-based tool linking phenotype to genes
Bioinformatics, April 1, 2008; 24(7): 1011 - 1013.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Shriner, T. M. Baye, M. A. Padilla, S. Zhang, L. K. Vaughan, and A. E. Loraine
Commonality of functional annotation: a method for prioritization of candidate genes from genome-wide linkage studies
Nucleic Acids Res., March 27, 2008; 36(4): e26 - e26.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Schlicker and M. Albrecht
FunSimMat: a comprehensive functional similarity database
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D434 - D439.
[Abstract] [Full Text] [PDF]


Home page
Ther Adv Cardiovasc DisHome page
S. Sookoian and C. J. Pirola
Review: Genetics of the cardiometabolic syndrome: new insights and therapeutic implications
Therapeutic Advances in Cardiovascular Disease, October 1, 2007; 1(1): 37 - 47.
[Abstract] [PDF]


Home page
Brief BioinformHome page
M. G. Kann
Protein interactions and disease: computational approaches to uncover the etiology of diseases
Brief Bioinform, September 1, 2007; 8(5): 333 - 346.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Perez-Iratxeta, P. Bork, and M. A. Andrade-Navarro
Update of the G2D tool for prioritization of gene candidates to inherited diseases
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W212 - W216.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. J. Gaulton, K. L. Mohlke, and T. J. Vision
A computational system to select candidate genes for complex human traits
Bioinformatics, May 1, 2007; 23(9): 1132 - 1140.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. A. George, J. Y. Liu, L. L. Feng, R. J. Bryson-Richardson, D. Fatkin, and M. A. Wouters
Analysis of protein sequence and interaction data for candidate disease gene prediction
Nucleic Acids Res., November 14, 2006; 34(19): e130 - e130.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. Tiffin, E. Adie, F. Turner, H. G. Brunner, M. A. van Driel, M. Oti, N. Lopez-Bigas, C. Ouzounis, C. Perez-Iratxeta, M. A. Andrade-Navarro, et al.
Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes
Nucleic Acids Res., June 6, 2006; 34(10): 3067 - 3081.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/6/773    most recent
btk031v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (41)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Adie, E. A.
Right arrow Articles by Pickard, B. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Adie, E. A.
Right arrow Articles by Pickard, B. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?