Skip Navigation


Bioinformatics Advance Access originally published online on December 6, 2006
Bioinformatics 2007 23(4):504-506; doi:10.1093/bioinformatics/btl621
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/4/504    most recent
btl621v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zhang, Z. H.
Right arrow Articles by Tong, J. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zhang, Z. H.
Right arrow Articles by Tong, J. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

AllerTool: a web server for predicting allergenicity and allergic cross-reactivity in proteins

Zong Hong Zhang *, Judice L. Y. Koh , Guang Lan Zhang , Khar Heng Choo , Martti T. Tammi 1 and Joo Chuan Tong

Institute for Infocomm Research, 21 Heng Mui Keng Terrace Singapore 119613
1 Department of Biological Sciences, National University of Singapore 14 Science Drive 4, Singapore 117543

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 SYSTEM DESCRIPTION
 CONCLUSION
 REFERENCES
 

Summary: Assessment of potential allergenicity and patterns of cross-reactivity is necessary whenever novel proteins are introduced into human food chain. Current bioinformatic methods in allergology focus mainly on the prediction of allergenic proteins, with no information on cross-reactivity patterns among known allergens. In this study, we present AllerTool, a web server with essential tools for the assessment of predicted as well as published cross-reactivity patterns of allergens. The analysis tools include graphical representation of allergen cross-reactivity information; a local sequence comparison tool that displays information of known cross-reactive allergens; a sequence similarity search tool for assessment of cross-reactivity in accordance to FAO/WHO Codex alimentarius guidelines; and a method based on support vector machine (SVM). A 10-fold cross-validation results showed that the area under the receiver operating curve (AROC) of SVM models is 0.90 with 86.00% sensitivity (SE) at specificity (SP) of 86.00%.

Availability: AllerTool is freely available at http://research.i2r.a-star.edu.sg/AllerTool/

Contact: zhzhang{at}i2r.a-star.edu.sg


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 SYSTEM DESCRIPTION
 CONCLUSION
 REFERENCES
 
Atopic allergy and other hypersensitivity reactions are major causes of chronic ill health in affluent industrial nations, affecting up to 25% of the general population (Mekori, 1996; Nieuwenhuizen and Lopata, 2005). Allergy is caused by adverse immunological reaction to causative agents known as allergens that are otherwise innocuous in nature. The acute symptoms of allergy are usually due to the release of inflammatory mediators when an allergen cross-links immunoglobulin E (IgE) antibodies on mast cells or basophils (Sutton and Gould, 1993). This may be followed by a late-phase reaction characterized by the influx of T-cells, eosinophils and monocytes (Gould et al., 2003). Atopic individuals may have one or more manifestations of the disease including asthma, conjunctivitis, dermatitis (eczema), rhinitis (hay fever) and some experience life-threatening severe anaphylaxis.

Methods for assessing potential allergenicity are essential whenever new proteins are brought into contact with humans, either through food, or other modes of exposure. The current joint recommendation by the World Health Organization (WHO) and Food and Agriculture Organization (FAO) is a scheme based on a decision tree, which compares local sequence similarity of a query protein against known allergenic proteins (FAO/WHO, 2003). Two decision criteria have been proposed for the assessment of allergenic potential: identity of six or more contiguous amino acids, or minimum 35% sequence similarity over a window of 80 amino acids. Several research groups, including Gendel (1998, 2002), Stadler and Stadler (2003) and Fiers et al. (2004) developed computational tools to scan sequences that satisfy these criteria. While these tools are useful for standardized prediction of potential allergenicity of proteins according to the current recommendations of the FAO/WHO Expert Consultation, more complex techniques are needed as the six amino acid rule is non-specific and the minimum of 35% sequence similarity is too stringent to find most true allergens (Li et al., 2004; Hileman et al., 2002; Stadler and Stadler, 2003; Silvanovich et al., 2006).

More sophisticated bioinformatic tools for detecting motifs among allergenic sequences have been recently described. Zorzet et al. (2002) combined FASTA3 algorithm with k-Nearest-Neighbour (kNN) classifier to assess potential food protein allergenicity. Soeria-Atmadja et al. (2004) extended the study on a larger set of allergens using a combination of kNN classifier, Bayesian linear Gaussian classifier and Bayesian quadratic Gaussian classifier. Li et al. (2004) demonstrated the use of wavelet transform to predict potential allergens. Björklund et al. (2005) introduced the use of allergen-representative peptides for detection of potentially allergenic proteins. Cui et al. (2006) as well as Saha and Raghava (2006) reported the use of support vector machine (SVM) for the prediction of novel allergen proteins.

In this paper, we present AllerTool, a web server providing essential tools for assessing predicted as well as published allergic cross-reactivity patterns of clinically relevant protein allergens. Three different programs are available for assessing the potential allergenicity of protein sequences—(1) a sequence similarity search tool for assessment of allergenicity in accordance to FAO/WHO Codex alimentarius guidelines; (2) a SVM-based method for prediction of protein cross-reactivity with little or no similarity to known allergens and (3) a modification of BLAST that displays cross-reactive allergens. In addition, AllerTool also provides potential cross-reactivity information of a query sequence through a graphical representation of the cross-reactivity network of the similar proteins. The main purpose of AllerTool is the support of molecular studies of allergens, the assessment of allergic responses and of allergic cross-reactivity.


    SYSTEM DESCRIPTION
 TOP
 ABSTRACT
 INTRODUCTION
 SYSTEM DESCRIPTION
 CONCLUSION
 REFERENCES
 
Data
Allergen data were extracted from the International Union of Immunological Societies (IUIS) Allergens website (http://www.ALLERGEN.org) and stored in the ALLERDB database (Zhang et al., manuscript in preparation; http://antigen.i2r.a-star.edu.sg/Templar/DB/Allergen/). The dataset consists of all IUIS allergens and isoallergens that have protein sequences available in the public sequence databases or publication references. The dataset consists of 373 allergens, 260 isoallergens and 128 instances of reported cross-reactivity collected from the literature and verified using a text-mining tool ABK (Miotto et al., 2005).

Analysis tools
AllerTool and web interface are written in C/C++ and Perl and run on a SunOS 5.9 UNIX system with Apache web server. It comprises of four integrated tools for assessing the potential allergenicity of protein sequences—XR-BLAST, XR-Graph, ALR-SCAN and ALR-SVM.

XR-BLAST (Koh et al., 2004) is a local sequence comparison tool based on BLAST2.2.3 (Altschul et al., 1997) that outputs information on allergens that have reported cross-reactivity with the individual matches. A sample output of XR-BLAST is given in Figure 1.


Figure 1
View larger version (28K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 An example of XR-BLAST output.

 
XR-Graph (http://antigen.i2r.a-star.edu.sg/Templar/DB/Allerg-en) is a visualization tool for graphical representation of allergen cross-reactivity information. Each graph displays allergens (boxes) that are related by reported cross-reactivity (links). This visual tool enables user to establish possible allergen cross-reactivity relationships not reported before. This tool has potential uses in the development of novel allergy diagnostics approaches. A sample output of XR-Graph is shown in Figure 2.


Figure 2
View larger version (37K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 An example of XR-Graph output for Aln g 1. Reported cross-reactivity patterns are represented by links. Possible cross-reactivity relationships may be inferred by missing links.

 
ALR-SCAN (Koh et al., 2004) is a sequence similarity search tool that reports sequence similarity in accordance to the current FAO/WHO recommendation for the assessment of allergenicity. Both the six contiguous amino acids identity rule, and >35% identity over a stretch of 80 amino acids are implemented. Users can submit the protein of interest to ALR-SCAN, which will return the list of matches that satisfy either of the rules. Sample query and output is shown in Figure 3.


Figure 3
View larger version (23K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 An example of ALR-SCAN output.

 
ALR-SVM is a useful tool for predicting protein allergenicity based on global description of amino acid sequence using SVM as the prediction engine (Cui et al., 2006; Fig. 4). The training dataset consists of 460 allergens and 560 non-allergens, while the testing dataset includes 114 allergens and 140 non-allergens derived from http://www.slv.se/templates/SLV_Page.aspx?id=9343 (Björklund et al., 2005) selected using a debiasing strategy based on sequence similarity of protein sequences commonly found in consumed food with no records in existing allergen databases (Saha and Raghava, 2006). The percentage of allergens represents ~45% of the testing dataset, while non-allergens represent the remaining 55%. Different kernel functions (linear, polynomial, radial and sigmoid) were explored to improve the prediction accuracy of the SVM models. ALR-SVM is based on the third degree polynomial kernel function encoded using descriptors derived from amino acid composition. The AROC value is 0.90 (http://antigen.i2r.a-star.edu.sg/Templar/DB/AllerTool/Algorithms.html). Using amino acid composition as input for training and testing ALR-SVM, the system can predict allergenic proteins with sensitivity (SE) of 86.00% and specificity (SP) of 86.00%, respectively. These values are comparable to the SVM approach by Saha and Raghava (2006) (SE = 85.02%, SP = 84.00%) and allergen-representative peptides (SE = 81.00%, SP = 90.00%; Bjorklund et al., 2005), and outperform motif-based approach using MEME/MAST software (SE = 93.94%, SP = 33.34%; Saha and Raghava, 2006). For the predicted allergenic sequences, a list of high-similarity IUIS allergen sequences and reported cross-reactivity information is provided.


Figure 4
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 An example of ALR-SVM output.

 

    CONCLUSION
 TOP
 ABSTRACT
 INTRODUCTION
 SYSTEM DESCRIPTION
 CONCLUSION
 REFERENCES
 
With the advent of genetically modified proteins in foods, therapeutics and biopharmaceuticals (Saha and Raghava, 2006), AllerTool provides a new service for the assessment of predicted as well as published cross-reactivity patterns of novel proteins. ALR-SCAN is useful for assessment of allergenicity in proteins according to the FAO/WHO Codex alimentarius guidelines. However, concerns have been raised about the validity of the current FAO/WHO guidelines (Li et al., 2004; Hileman et al., 2002). Various groups including Silvanovich et al. (2006) and Stadler and Stadler (2003) reported that short sequence searches of six contiguous amino acids to identify allergenic proteins is a product of chance and adds little value to allergy assessments for newly expressed proteins. There is a need for more sophisticated techniques for screening of allergenicity in proteins. ALR-SVM has been developed to capture non-linear characteristics that may be encapsulated within allergenic protein sequences. Future work will focus on the development of other supplementary methods to support and refine the prediction of cross-reactivity patterns.


    Acknowledgments
 
Authors are thankful to Prof. Vladimir Brusic (UQ, Australia) for critically reading the manuscript.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: John Quackenbush

Received on August 21, 2006; revised on November 23, 2006; accepted on December 5, 2006

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 SYSTEM DESCRIPTION
 CONCLUSION
 REFERENCES
 

    Altschul, S.F., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, . 25, 3389–3402[Abstract/Free Full Text].

    Bjorklund, A.K., et al. (2005) Supervised identification of allergen-representative peptides for in silico detection of potentially allergenic proteins. Bioinformatics, 21, 39–50[Abstract/Free Full Text].

    Cui, J., et al. (2006) Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties. Mol. Immunol, . In press.

    FAO/WHO. Codex Principles and Guidelines on Foods Derived from Biotechnology, (2003) , Rome, Italy Joint FAO/WHO Food Standards Programme.

    Fiers, M.W., et al. (2004) Allermatch, a webtool for the prediction of potential Allergenicity according to current FAO/WHO Codex alimentarius guidelines. BMC Bioinformatics, 5, 133–138[CrossRef][Medline].

    Gendel, S.M. (1998) The use of amino acid sequence alignments to assess potential allergenicity of proteins used in genetically modified foods. Adv. Food Nutr. Res, . 42, 45–62[Medline].

    Gendel, S.M. (2002) Sequence analysis for assessing potential allergenicity. Ann. N. Y. Acad. Sci. USA, 964, 87–98[Web of Science][Medline].

    Hileman, R.E., et al. (2002) Bioinformatic methods for Alergenicity assessment using a comprehensive allergen database. Int. Arch. Allergy Immunol, . 128, 280–291[CrossRef][Web of Science][Medline].

    Koh, J.L.Y., et al. BioWare: a framework for bioinformatics data retrieval, annotation and publishing, in ACM SIGIR Workshop on Search and Discovery in Bioinformatics (SIGIRBIO)July 2004Sheffield, UK.

    Li, G.B., et al. (2004) Predicting allergenic proteins using wavelet transform. Bioinformatics, 20, 2572–2578[Abstract/Free Full Text].

    Mekori, Y.A. (1996) Introduction to allergic diseases. Crit. Rev. Food Sci. Nutr, . 36, S1–S18.

    Miotto, O., et al. (2005) Supporting the curation of biological databases with reusable text mining. Genome Inform, . 16, 32–44.

    Nieuwenhuizen, N.E. and Lopata, A.L. (2005) Fighting food allergy. Curr. approaches. Ann. N.Y. Acad. Sci, . 1056, 30–45.

    Saha, S. and Raghava, G.P.S. (2006) AlgPred: prediction of allergenic proteins and mapping of IgE epitopes. Nucleic Acids Res, . 34, W202–W209[Abstract/Free Full Text].

    Silvanovich, A., et al. (2006) The value of short amino acid sequence matches for prediction of protein allergenicity. Toxicol. Sci, . 90, 252–258[Abstract/Free Full Text].

    Soeria-Atmadja, D., et al. (2004) Statistical evaluation of local alignment features predicting allergenicity using supervised classification algorithms. Int. Arch. Allergy Immunol, . 133, 101–112[CrossRef][Web of Science][Medline].

    Stadler, M.B. and Stadler, B.M. (2003) Allergenicity prediction by protein sequence. FASEB J, . 17, 1141–1143[Abstract/Free Full Text].

    Stothard, P. (2000) The Sequence Manipulation Suite: JavaScript programs for analyzing and formatting protein and DNA sequences. Biotechniques, 28, 1102–1104[Web of Science][Medline].

    Sutton, B.J. and Gould, H.J. (1993) The human IgE network. Nature, 366, 421–428[CrossRef][Medline].

    Zorzet, A., et al. (2002) Prediction of food protein allergenicity: a bioinformatic learning systems approach. In Silico Biol, . 2, 525–534[Medline].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
M. N. Davies, A. Secker, A. A. Freitas, E. Clark, J. Timmis, and D. R. Flower
Optimizing amino acid groupings for GPCR classification
Bioinformatics, September 15, 2008; 24(18): 1980 - 1986.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/4/504    most recent
btl621v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zhang, Z. H.
Right arrow Articles by Tong, J. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zhang, Z. H.
Right arrow Articles by Tong, J. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?