Skip Navigation


Bioinformatics Advance Access originally published online on March 4, 2009
Bioinformatics 2009 25(9):1185-1186; doi:10.1093/bioinformatics/btp121
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
25/9/1185    most recent
btp121v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Kuznetsov, I. B.
Right arrow Articles by Moslehi, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kuznetsov, I. B.
Right arrow Articles by Moslehi, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

A web server for inferring the human N-acetyltransferase-2 (NAT2) enzymatic phenotype from NAT2 genotype

Igor B. Kuznetsov *, Michael McDuffie and Roxana Moslehi *

Gen*NY*Sis Center for Excellence in Cancer Genomics, Department of Epidemiology and Biostatistics, University at Albany, One Discovery Drive, Rensselaer, NY 12144, USA

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 

Summary:N-acetyltransferase-2 (NAT2) is an important enzyme that catalyzes the acetylation of aromatic and heterocyclic amine carcinogens. Individuals in human populations are divided into three NAT2 acetylator phenotypes: slow, rapid and intermediate. NAT2PRED is a web server that implements a supervised pattern recognition method to infer NAT2 phenotype from SNPs found in NAT2 gene positions 282, 341, 481, 590, 803 and 857. The web server can be used for a fast determination of NAT2 phenotypes in genetic screens.

Availability:Freely available at http://nat2pred.rit.albany.edu

Contact:ikuznetsov{at}albany.edu; rmoslehi{at}albany.edu

Supplementary information:Supplementary data are available at Bioinformatics online.


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 
N-acetyltransferase-2 (NAT2) is an important enzyme that catalyzes the acetylation of aromatic and heterocyclic amine carcinogens (Blum et al., 1990). Based on the level of NAT2 acetylator activity, individuals in human populations are divided into three enzymatic phenotypes: rapid (normal activity), intermediate and slow (reduced activity) (Hein et al., 2000). Single nucleotide polymorphisms (SNPs) within NAT2 determine the NAT2 acetylator phenotype. A consensus has been reached on association between NAT2 genotype and acetylator phenotype (Hein, 2006). Recently, we showed that individuals with NAT2 SNP variants associated with the slow phenotype were more susceptible to the effects of tobacco smoking with respect to the risk of developing an advanced colorectal adenoma (Moslehi et al., 2006). Several other studies have also linked NAT2 gene variants and acetylator phenotypes to the risk of several malignant and pre-malignant conditions (Brockton et al., 2000; Hein, 2006; Potter et al., 1999; Tiemersma et al., 2004). The identification of at-risk individuals is an important component of cancer prevention. Current genotyping technologies are able to determine which alleles are present at each locus, but do not provide information about the phase of the alleles at different loci (i.e. do not provide information about which alleles at adjacent loci occur on the same chromosome). In order to assign an acetylator phenotype to a particular individual, the NAT2 haplotypes for this individual need to be determined by inferring the phase of the alleles. After phasing, the acetylator phenotype is assigned manually based on haplotypes (Supplementary Fig. 1). Phasing of alleles (i.e. haplotype determination) is laborious. Experimental methods exist, but are time-consuming and expensive. In most studies, computational statistical methods are used, such as the algorithm implemented in PHASE (Stephens et al., 2001). However, methods for statistical determination of phase are computer intensive and require specific data formatting steps. The goal of the present work was to develop a web server that implements a supervised pattern recognition approach to infer NAT2 acetylator phenotype (slow, intermediate or rapid) directly from the observed combinations of NAT2 SNPs, without taking the extra step of determining the haplotypes for each individual.


    2 METHODS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The dataset used in this work was obtained from the Prostate, Lung, Colorectal and Ovarian (PLCO) cancer screening trial of the National Cancer Institute (see Moslehi et al., 2006 for details). Genotyping for six NAT2 SNPs (C282T, T341C, C481T, G590A, A803G and G857A) was performed using the TaqMan® (Applied Biosystems Inc., Carlsbad, CA, USA) kit. The acetylator phenotypes were assigned in our previous study based on the haplotypes determined from SNP genotyping data for each subject (Moslehi et al., 2006). The dataset consists of 1377 subjects (see Supplementary Table 1 for details and ethnic makeup). Prediction of the acetylator phenotype from combinations of SNPs, as defined here, is a three-class classification problem that can be addressed using a supervised pattern recognition method. We used Support Vector Machine (SVM) as a method of choice (Vapnik, 1998). We constructed a three-class SVM predictor using the one-against-one approach which was shown to perform better than other approaches in multi-class SVMs (Hsu and Lin, 2002). We used SVM implemented in the LIBSVM package (Chang and Lin, (2003) with the linear kernel. Each NAT2 SNP was encoded using a set of three mutually orthogonal binary vectors: homozygote for the most frequent allele (1,0,0), heterozygote for the most frequent allele (0,1,0) and homozygote for the least frequent allele (0,0,1). For a given subject corresponding vectors describing each of the six observed SNPs were concatenated together, resulting in a final binary feature vector of dimension 18. Thus, the SNP combination of each subject was described by 18 binary variables. We used a 7-fold cross-validation to test the SVM predictor of the acetylator phenotype. In this approach, the dataset is randomly partitioned into seven groups, each containing 1/7 of the dataset. At each cross-validation run, one group is removed and the predictor is trained on the remaining observations and tested on the removed group. The process is repeated seven times, so that each group is used for testing once. In order to assess different aspects of classification quality, we used the following performance measures: overall accuracy (ACC), sensitivity (SN) for class i (SNi) and specificity (SP) for class i (SPi) (Baldi et al., 2000):


Formula 1

(1)


Formula 2

(2)


Formula 3

(3)
where Z is a 3 x 3 confusion (contingency) matrix, in which an element z[i,j] represents the number of times objects from class i are predicted to be in class j; N is the total number of objects (N=1377 in this work).


View this table:
[in this window]
[in a new window]

 
Table 1. The performance of NAT2PRED server

 

    3 RESULTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The results of the cross-validation are shown in Table 1. If all six SNPs are used, the predictor of the NAT2 acetylator phenotype achieves a nearly perfect accuracy of 99.9% (Equation 1) and nearly perfect class-specific sensitivities and specificities (Equation 2) between 99.6 and 100%. Such a well-balanced performance is observed despite the highly unbalanced nature of the dataset, meaning that the number of subjects with the slow phenotype is almost an order of magnitude larger than that of subjects with the rapid phenotype. Importantly, individuals with the slow phenotype, who are at increased risk of developing tumors, are identified with 100% SN, meaning that no at-risk individuals are missed. If data on two synonymous SNPs (C282T and C481T) are removed and only non-synonymous SNPs are used, the accuracy of the prediction drops from 99.9 to 93.2%, with similar declines in SN and SP (Table 1). We therefore conclude that all six SNPs used in the present study are required to reliably assign the acetylator phenotype.

The web server implementation of the SVM predictor of the NAT2 acetylator phenotype was trained using the data on all 1377 subjects. It has a simple intuitive user interface.The user is asked to select a genotype for each of the six SNP loci using radio buttons (Supplementary Fig. 2). There are three possible genotypes for each SNP locus, which corresponds to three radio buttons per locus. After the genotype is selected, the user can click ‘Submit’ button and immediately obtain an inferred NAT2 acetylator phenotype. The output page displays the selected genotype and the probabilities of each of the three acetylator phenotypes (slow, intermediate and rapid) for these genotypes (Supplementary Fig. 3). The final prediction is the phenotype with the highest probability. There is also an option for a batch submission of genotypes for multiple individuals. Detailed instructions and information about the methodology and output format can be found by clicking the corresponding help hyperlink located on the input page. To the best of the authors' knowledge, NAT2PRED is the only existing web server for inferring NAT2 acetylator phenotypes from genotyping data. NAT2PRED was developed on a dataset where majority of subjects are Caucasian (94%). However, the prediction model utilizes generally observed linkage disequilibrium between the six NAT2 SNPs and can be applied to individuals from any ethnicity. The web server is publicly available at http://nat2pred.rit.albany.edu.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The authors thank Dr R. B. Hayes, Division of Cancer Epidemiology and Genetics and Drs C. Berg and P. Prorok, Division of Cancer Prevention, NCI, NIH, DHHS, the Screening Center investigators and staff of the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial, Mr T. Riley and staff, Information Management Services, Inc., Ms B. O'Brien and staff, Westat, Inc. and Drs B. Kopp, W. Shao and staff, SAIC-Frederick. Most importantly, we acknowledge the PLCO study participants for their contributions to making this study possible.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Alfonso Valencia

Received on November 26, 2008; revised on February 5, 2009; accepted on February 26, 2009

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS
 3 RESULTS
 ACKNOWLEDGEMENTS
 REFERENCES
 

    Baldi P, et al. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics (2000) 16:412–424.[Abstract/Free Full Text]

    Blum M, et al. Human arylamine N-acetyltransferase genes: isolation, chromosomal localization, and functional expression. DNA Cell Biol. (1990) 9:193–203.[Web of Science][Medline]

    Brockton N, et al. N-acetyltransferase polymorphisms and colorectal cancer: a HuGE Review. Am. J. Epidemiol. (2000) 151:846–861.[Abstract/Free Full Text]

    Chang CC, Lin CJ. LIBSVM: a library for support vector machines. (2003) Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm (last accessed date April 6, 2007).

    Hein DW, et al. Molecular genetics and epidemiology of the NAT1 and NAT2 acetylation polymorphisms. Cancer Epidemiol. Biomarkers Prev. (2000) 9:29–42.[Abstract/Free Full Text]

    Hein DW. N-acetyltransferase 2 genetic polymorphism: effects of carcinogen and haplotype on urinary bladder cancer risk. Oncogene (2006) 25:1649–1658.[CrossRef][Web of Science][Medline]

    Hsu CW, Lin CJ. A comparison of methods for multi-class support vector machines. IEEE Trans. Neural Netw. (2002) 13:415–425.[CrossRef][Web of Science][Medline]

    Moslehi R, et al. Cigarette smoking, N-acetyltransferase genes and the risk of advanced colorectal adenoma. Pharmacogenomics (2006) 7:819–829.[CrossRef][Web of Science][Medline]

    Potter J, et al. Colorectal adenamatous and hyperplastic polyps: smoking and N-acetyltransferase 2 polymorphisms. Cancer Epidemiol. Biomarkers Prev. (1999) 8:69–75.[Abstract/Free Full Text]

    Stephens M, et al. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet (2001) 68:978–989.[CrossRef][Web of Science][Medline]

    Tiemersma E, et al. Effect of SULT1A1 and NAT2 genetic polymorphism on the association between cigarette smoking and colorectal adenomas. Int. J. Cancer (2004) 108:97–103.[CrossRef][Web of Science][Medline]

    Vapnik VN. Statistical Learning Theory. (1998) New York: John Wiley & Sons.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrowOA All Versions of this Article:
25/9/1185    most recent
btp121v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Kuznetsov, I. B.
Right arrow Articles by Moslehi, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kuznetsov, I. B.
Right arrow Articles by Moslehi, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?