Skip Navigation


Bioinformatics Advance Access originally published online on March 1, 2007
Bioinformatics 2007 23(9):1159-1160; doi:10.1093/bioinformatics/btm066
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/9/1159    most recent
btm066v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Brameier, M.
Right arrow Articles by MacCallum, R. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Brameier, M.
Right arrow Articles by MacCallum, R. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

NucPred—Predicting nuclear localization of proteins

Markus Brameier 1,*, Andrea Krings 2 and Robert M. MacCallum 3,*

1Bioinformatics Research Center (BiRC), University of Aarhus, 8000 Aarhus C, Denmark, 2Stockholm Bioinformatics Center (SBC), Stockholm University, 106 91 Stockholm, Sweden and 3Division of Cell & Molecular Biology, Imperial College London, South Kensington Campus, London, UK

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD AND APPLICATIONS
 REFERENCES
 

Summary: NucPred analyzes patterns in eukaryotic protein sequences and predicts if a protein spends at least some time in the nucleus or no time at all. Subcellular location of proteins represents functional information, which is important for understanding protein interactions, for the diagnosis of human diseases and for drug discovery. NucPred is a novel web tool based on regular expression matching and multiple program classifiers induced by genetic programming. A likelihood score is derived from the programs for each input sequence and each residue position. Different forms of visualization are provided to assist the detection of nuclear localization signals (NLSs). The NucPred server also provides access to additional sources of biological information (real and predicted) for a better validation and interpretation of results.

Availability: The web interface to the NucPred tool is provided at http://www.sbc.su.se/~maccallr/nucpred. In addition, the Perl code is made freely available under the GNU Public Licence (GPL) for simple incorporation into other tools and web servers.

Contact: brameier{at}birc.au.dk, r.maccallum{at}imperial.ac.uk


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD AND APPLICATIONS
 REFERENCES
 
Subcellular location provides at least some information about protein function and contributes to understanding protein interactions and signaling pathways in the cell. It is important for the identification of drug targets and may serve as an indicator of several diseases, like cancer and Alzheimer. Experimental determination of subcellular locations is often expensive and time-consuming. Instead, computational methods can make fast and accurate predictions. In recent years, several bioinformatics tools have been developed to identify different kinds of subcellular compartment(s) (Emanuelsson et al., 2000; Nair and Rost, 2005; Nakai and Horton, 1999; Reinhardt and Hubbard, 1998).

An important compartment in eukaryotic cells is the nucleus, where many essential biological processes take place. Because proteins are manufactured outside the nucleus, they must get back into the nucleus if they are needed there. This import is mediated by short binding sites on the protein sequence, called nuclear localization signals (NLSs). NLSs are not yet well understood, and so the set of currently known NLSs may be incomplete.

Only few tools for predicting nuclear localization of proteins make use of sequence motifs to match certain sequence patterns. Cokol et al. (2000) identify relevant substructures of known NLSs and use these for building more general motifs. Our approach to nuclear localization applies genetic programming (GP), a machine-learning technique that automatically develops computer programs in an artificial evolutionary process (Koza, 1992). Our evolved predictors incorporate multiple regular expressions which are matched against the input (amino acid) sequence. These sequence motifs are evolved together with the actual classification rules. One important advantage of NucPred is that it is not restricted to a predefined set of NLS patterns and, thus, has the potential to discover new, unknown NLSs. In fact, the program classifiers were trained without using knowledge about known NLSs, but only about the nuclear/non-nuclear location of protein sequences.

Other approaches train predictors with global information, like the amino acid composition (Nair and Rost, 2005; Reinhardt and Hubbard, 1998) or provide this in addition to the local sequence motifs (Nakai and Horton, 1999).

The method behind NucPred has earlier been reported to be competitive with existing methods for nuclear localization of proteins (Heddad et al., 2004), including PredictNLS (Cokol et al., 2000) and PSORT II (Nakai and Horton, 1999). Hwang et al. (2006) used all three computational approaches for identifying the nuclear proteome in human T leukemia cells, together with experimental methods. The NucPred predictor has also been incorporated into other bioinformatics web services, such as the POGs/PlantRBP database (http://plantrbp.uoregon.edu).

Table 1 compares NucPred with state-of-the-art approaches to nuclear localization, including more recent methods like LOCtree (Nair and Rost, 2005) and BaCelLo (Pierleoni et al., 2006). The test dataset used here contains all human protein sequences—not predicted to be transmembrane by TMHMM—for which there is at least one TAS evidenced Gene Ontology (GO) annotation for cellular component in UniProt GOA. Positive test cases have GO term ‘nucleus’, ‘chromosome’, or one of their child terms assigned. The performance of NucPred, PredictNLS and PSORT II are roughly equivalent, but at different sensitivities. Conjunction of predictions from different methods, e.g. NucPred and PredictNLS, gives higher specificity, while disjunction increases sensitivity.


View this table:
[in this window]
[in a new window]

 
Table 1. Comparison of different nuclear localization methods

 
Second generation predictors, like LOCtree and BaCelLo, achieve a higher sensitivity by a hierarchical architecture of different support vector machines (SVM) classifiers which mimics biological pathways. Each such classifier discerns another class of subcellular compartment. Both methods make use of evolutionary information from sequence profiles, besides sequence composition. LOCtree further improved coverage by training on additional proteins whose subcellular location was predicted by sequence homology or keyword similarity.


    METHOD AND APPLICATIONS
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD AND APPLICATIONS
 REFERENCES
 
The NucPred web server provides three major functionalities for predicting nuclear localization. A likelihood score is calculated either (1) for up to 1000 input sequences (in batch mode), (2) for each sequence in a multiple alignment or (3) over whole proteomes.

The NucPred core is an ensemble (or jury) of 100 sequence-based predictors. Each makes a Boolean (yes/no) decision to whether a protein (sequence) has a nuclear role or not. To improve accuracy, the individual predictions are combined by a majority voting scheme. The consensus score is basically the fraction of predictors which vote ‘nuclear’. For a discrete prediction, the user has to decide on a score threshold (preferably 0.8). Sequences which score greater than or equal to this threshold are predicted to spend time in the nucleus.

For single sequences, NucPred calculates a per-residue scoring. Each amino acid location is colored according to its influence on a ‘nuclear’ classification (see example in Fig. 1). The closer the color of a subsequence lies at the red (blue) end of the color spectrum, the more positive (negative) is its effect. Unlike black-box predictors, such as neural networks (Reinhardt and Hubbard, 1998) or support vector machines (SVMs, Nair and Rost, 2005), the classifying programs in NucPred may be interpreted in terms of the regular expressions contained. For each such sequence motif, a fixed—positive or negative—degree of influence has been pre-calculated over all training sequences (selected from UniProt). The per-residue coloring reflects the sum of the influences of all regular expressions that match at the particular position in the sequence, highlighting potential regions for experimental verification or manipulation.


Figure 1
View larger version (41K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Protein sequence of human gene AUTS2 (UniProt ID Q8WXX7) colored with per-residue NucPred scores (monochrome version). The overall sequence score is 1. First 500 out of 1259 residues shown.

 
NucPred offers the possibility to project the per-residue coloring onto the 3D protein structure found in the protein data bank (PDB http://www.rcsb.org/pdb). The colored structure (modified PDB file) may be viewed using Rasmol (http://www.openrasmol.org) or another PDB viewer. Such an analysis may reveal the spatial arrangement of putative or known NLSs.

Another NucPred option allows the upload of multiple sequences from the same protein family to be aligned by the ClustalW algorithm. Each sequence is colored (individually) according to the per-residue NucPred scores. This enables the comparative analysis of homologs from the same or different species, and may be useful for the study of splice variants.

NucPred also provides pre-calculated sequence scores over full proteomes of different eukaryotes—including human, mouse, rat, fish, fly, worm and yeast—and for all 98 716 eukaryotic proteins in Uniprot (release 8.5 http://www.ebi.uniprot.org). To increase confidence, the NucPred predictions may be compared to and combined with the discrete predictions from PredictNLS and PSORT II. TMHMM (Krogh et al., 2001) predictions of transmembrane proteins are provided to potentially rule out a nuclear location.

In addition, Uniprot annotations for subcellular location and GO annotations for cellular compartment are given if available. This information allows, for instance, the discovery of proteins which are associated with human diseases and are confidently predicted to have a nuclear function, but are not currently known to be nuclear. The protein from Fig. 1 is believed to increase the susceptibility for Autism and is predicted to be nuclear by all three methods. A further interesting example of 94 human proteins meeting these criteria is oncogene ETC2 (UniProt ID Q9H8V3).

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Dmitrij Frishman

Received on November 30, 2006; revised on February 9, 2007; accepted on February 19, 2007

    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD AND APPLICATIONS
 REFERENCES
 

    Cokol M, et al. Finding nuclear localization signals. EMBO Rep (2000) 1:411–415.[CrossRef][Web of Science][Medline]

    Emanuelsson O, et al. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol (2000) 300:1005–1016.[CrossRef][Web of Science][Medline]

    Heddad A, et al. Evolving regular expression-based sequence classifiers for protein nuclear localization. Proc. of EvoBIO 2004 Conf. Lect. Notes Comput. Sci (2004) vol. 3005:31–40.

    Hwang SI, et al. Systematic characterization of nuclear proteome during apoptosis: A quantitative proteomic study by differential extraction and stable isotope labeling. Mol. Cell Proteomics (2006) 5:1131–1145.[Abstract/Free Full Text]

    Koza JR. Genetic Programming: On the Programming of Computer Programs by Natural Selection (1992) Cambridge, MA: MIT Press.

    Krogh A, et al. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol (2001) 305:567–580.[CrossRef][Web of Science][Medline]

    Nair R, Rost B. Mimicking cellular sorting improves prediction of subcellular localization. J. Mol. Biol (2005) 348:85–100.[CrossRef][Web of Science][Medline]

    Nakai K, Horton P. PSORT: A program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem. Sci (1999) 24:34–36.[CrossRef][Web of Science][Medline]

    Pierleoni A, et al. BaCelLo: A balanced subcellular localization predictor. Bioinformatics (2006) 22:408–416.[CrossRef]

    Reinhardt A, Hubbard T. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res (1998) 26:2230–2236.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief Funct Genomic ProteomicHome page
R. Casadio, P. L. Martelli, and A. Pierleoni
The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation
Brief Funct Genomic Proteomic, February 18, 2008; (2008) eln003v1.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
23/9/1159    most recent
btm066v1
Right arrow Comments: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when Comments are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Brameier, M.
Right arrow Articles by MacCallum, R. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Brameier, M.
Right arrow Articles by MacCallum, R. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?