Skip Navigation


Bioinformatics Advance Access originally published online on November 11, 2004
Bioinformatics 2005 21(7):1269-1270; doi:10.1093/bioinformatics/bti130
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/7/1269    most recent
bti130v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kiemer, L.
Right arrow Articles by Blom, N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kiemer, L.
Right arrow Articles by Blom, N.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions{at}oupjournals.org

NetAcet: prediction of N-terminal acetylation sites

Lars Kiemer , Jannick Dyrløv Bendtsen and Nikolaj Blom *

Center for Biological Sequence Analysis, BioCentrum-DTU Building 208 Technical University of Denmark DK-2800 Lyngby, Denmark

*To whom correspondence should be addressed.


    Abstract
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODOLOGY
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 REFERENCES
 

Summary: We present here a neural network based method for prediction of N-terminal acetylation—by far the most abundant post-translational modification in eukaryotes. The method was developed on a yeast dataset for N-acetyltransferase A (NatA) acetylation, which is the type of N-acetylation for which most examples are known and for which orthologs have been found in several eukaryotes. We obtain correlation coefficients close to 0.7 on yeast data and a sensitivity up to 74% on mammalian data, suggesting that the method is valid for eukaryotic NatA orthologs.

Availability: The NetAcet prediction method is available as a public web server at http://www.cbs.dtu.dk/services/NetAcet/

Contact: nikob{at}cbs.dtu.dk

Supplementary information: http://www.cbs.dtu.dk/services/NetAcet/


    1 INTRODUCTION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODOLOGY
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 REFERENCES
 
Most proteins undergo post-translational modifications (PTM), which for example can be addition of chemical groups as seen for acetylation or glycosylation, or removal of a few or more amino acids by maturation or signal peptide cleavage. N-terminal acetylation is one of the most common modifications found in eukaryotes and is also found in archaea and bacteria although less frequently. N-terminal acetylation occurs co-translationally on eukaryotic cytoplasmic proteins and the prevalence is estimated at ~80–90% in mammals and ~50% in yeast (Polevoda and Sherman, 2000, 2003).

N-terminal acetylation is a common PTM, for which prediction has been extremely difficult to approach owing to lack of data and a clear consensus motif (Polevoda and Sherman, 2003). Almost 20 years ago, an attempt was made to predict N-terminal acetylation in general, in which the predicted protein secondary structure was used as input to a linear neural network (Augen and Wold, 1986). Performance evaluation is impossible though, as the model was constructed and tested on the same dataset. With yeast being one of the most thoroughly studied eukaryotes a sufficient amount of data has accumulated to allow for the training of a prediction method for NatA acetylation. Yeast is sufficiently important to warrant the construction of a prediction server, but additionally, the predictor seems to obtain comparable performance values on mammalian proteins. This supports the idea that the N-terminal acetylation systems are similar to all eukaryotes.

The method presented here only deals with NatA N{alpha}-terminal acetylation and not acetylation on the {varepsilon}-amino group of internal lysine residues by other acetyltransferases.


    2 METHODOLOGY
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODOLOGY
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 REFERENCES
 
All training data were extracted from Table 2 in Polevoda and Sherman (2003) and joined with data from the Yeast Protein Map (YPM) resource (Perrot et al., 1999). Any inconsistencies between the two datasets were removed to obtain the highest quality data possible. Furthermore, we extracted only substrates reported to be acetylated by NatA, as this is the only transferase on which a sufficient amount of data has been accumulated so far. This resulted in 61 positive and 76 negative training sequences (Fig. 1).



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 1 Shannon information (Shannon, 1948) sequence logo of 57 yeast NatA acetylation sites (upper) and a Kullback–Leibler (Cover and Thomas, 1991) logo (lower) constructed from both the 57 known sites and from 55 known non-acetylated N-terminal sites containing S/T/A/G in the first position (used as background distribution). Acetylation is reported on position 1 in the logos. Data were extracted as stated in the Methodology section. The height of the columns of letters reflects the degree of sequence conservation for the positive examples in the upper logo. In the lower logo the column height reflects the discrepancy between the positive and negative examples. Note that the bit scale differs in the two logos. Sequence logos were constructed as described by Schneider and Stephens (1990).

 
Sequences were truncated to their N-terminal 40 residues and subsequently homology reduced by visual inspection of a neighbour-joining tree generated from a ClustalW multiple alignment (Thompson et al., 1994). Four sequences from the positive training data and four sequences from the negative training data were removed due to homology and following this reduction, the two closest homologues were 52% identical although the average homology was much lower (data files and trees are available as Supplementary Material from http://www.cbs.dtu.dk/services/NetAcet/).

An artificial neural network was trained using three-fold cross-validation. As negative examples, all positions in the dataset, except those known to be acetylated, were used. For evaluation purposes, however, only negative examples having either serine, threonine, alanine or glycine in the first position of the network window were used as the other types were trivial. The neural network used in this work was of the standard feed-forward type, and sparse encoding was used for translating the amino acids to data input for the networks as has been described previously (Blom et al., 1996; Nielsen et al., 1997).


    3 RESULTS AND DISCUSSION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODOLOGY
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 REFERENCES
 
Using a network window size of seven amino acids (corresponding to positions 1–7 in Fig. 1) and eight hidden neurons we were able to obtain a Matthews correlation coefficient (Matthews, 1975) of 0.69 when using a threshold of 0.5. This correlation coefficient reflects a sensitivity of 75% and a specificity of 92% (Fig. 2). A smaller or larger window size gave lower specificity. As expected, specificity on negative examples with a serine residue at position 1 is lower (60%) than average reflecting that these sequences are more difficult due to the bias of the positive examples (Fig. 1). Although a linear neural network (i.e. without hidden neurons) obtained a comparable correlation coefficient, a more sophisticated network containing eight hidden neurons was preferred as this performed far better on negative examples containing a serine residue at position 1 (the number of false-positive predictions on serine residues dropped from 12 to 6).



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 2 Method performance, showing specificity, sensitivity and Matthews correlation coefficient on yeast test data. Values were plotted versus neural network output threshold.

 
On an independent test set of mammalian N-acetylated proteins, which was created by extraction from UniProt (Apweiler et al., 2004), we obtained a sensitivity of 74% on acetylated serines (71 were found of 96 possible). While the figures for serine acetylation prediction are comparable in yeast and mammalian data, we obtained a lower performance on other types of substrates, which we attribute to the relatively few examples of such sites available in the yeast training data. However, it does seem that yeast NatA and mammalian NatA orthologs share properties in their substrate specificity.


    4 CONCLUSION
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODOLOGY
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 REFERENCES
 
The method presented here predicts acetylation sites of NatA in yeast with high performance and also to a certain extent those in mammals. We believe that the method will be highly useful to researchers working with acetylation as well as facilitate the on-going work on proteome annotation.

The prediction server and additional information is available at http://www.cbs.dtu.dk/services/NetAcet/. We plan to evolve this website further to include prediction methods for other types of acetylations as sufficient data becomes available.


    Acknowledgments
 
This work was supported by grants from the Danish National Research Foundation, the Danish Natural Science Research Council, the Danish Center for Scientific Computing, the European Union BioSapiens Network of Excellence (to J.D.B.), and NeuroSearch A/S (to L.K).

Received on September 3, 2004; revised on October 28, 2004; accepted on October 28, 2004

    REFERENCES
 TOP
 Abstract
 1 INTRODUCTION
 2 METHODOLOGY
 3 RESULTS AND DISCUSSION
 4 CONCLUSION
 REFERENCES
 

    Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., et al. (2004) Uniprot: the universal protein knowledgebase. Nucleic Acids Res., 32, D115–D119[Abstract/Free Full Text].

    Augen, J. and Wold, F. (1986) How much sequence information is needed for the regulation of amino-terminal acetylation of eukaryotic proteins?. Trends Biochem. Sci., 11, 494–497[CrossRef].

    Blom, N., Hansen, J., Blaas, D., Brunak, S. (1996) Cleavage site analysis in picornaviral polyproteins: discovering cellular targets by neural networks. Protein Sci., 5, 2203–2216[Abstract].

    Cover, T.M. and Thomas, J.A. Elements of Information Theory., (1991) , New York John Wiley and Sons, Inc.

    Matthews, B.W. (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta., 405, , pp. 442–451[Medline].

    Nielsen, H., Engelbrecht, J., Brunak, S., von Heijne, G. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng., 10, 1–6[Abstract/Free Full Text].

    Perrot, M., Sagliocco, F., Mini, T., Monribot, C., Schneider, U., Shevchenko, A., Mann, M., Jeno, P., Boucherie, H. (1999) Two-dimensional gel protein database of saccharomyces cerevisiae (update 1999). Electrophoresis, 20, 2280–2298[CrossRef][ISI][Medline].

    Polevoda, B. and Sherman, F. (2000) N{alpha}-terminal acetylation of eukaryotic proteins. J. Biol. Chem., 275, 36479–36482[Free Full Text].

    Polevoda, B. and Sherman, F. (2003) N-terminal acetyltransferases and sequence requirements for N-terminal acetylation of eukaryotic proteins. J. Mol. Biol., 325, 595–622[CrossRef][ISI][Medline].

    Schneider, T.D. and Stephens, R.M. (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res., 18, 6097–6100[Abstract/Free Full Text].

    Shannon, C.E. (1948) A mathematical theory of communication. Bell System Tech. J., 27, 379–423 623–656.

    Thompson, J.D., Higgins, D.G., Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J Exp BotHome page
R. L. Houtz, R. Magnani, N. R. Nayak, and L. M. A. Dirk
Co- and post-translational modifications in Rubisco: unanswered questions
J. Exp. Bot., May 1, 2008; 59(7): 1635 - 1645.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. I. Olason
Integrating protein annotation resources through the Distributed Annotation System
Nucleic Acids Res., July 1, 2005; 33(suppl_2): W468 - W470.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
21/7/1269    most recent
bti130v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kiemer, L.
Right arrow Articles by Blom, N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kiemer, L.
Right arrow Articles by Blom, N.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?