Bioinformatics Vol. 18 no. 90001 2002
Pages S78-S86
© 2002 Oxford University Press
Inferring sub-cellular localization through automated lexical analysis
1 CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University,
650 West 168th Street BB217, New York, NY 10032, USA
2 Department of Physics, Columbia University, 538 West 120th Street, New York, NY 10027, USA
3 Columbia University Center for Computational Biology and Bioinformatics (C2B2),
Russ Berrie Pavilion, 1150 St. Nicholas Avenue, New York, NY 10032, USA
Received on January 24, 2002
; revised on March 29, 2002
; accepted on March 29, 2002
Motivation: The SWISS-PROT sequence database contains keywords of functional annotations for many proteins. In contrast, information about the sub-cellular localization is available for only a few proteins. Experts can often infer localization from keywords describing protein function. We developed LOCkey, a fully automated method for lexical analysis of SWISS-PROT keywords that assigns sub-cellular localization. With the rapid growth in sequence data, the biochemical characterisation of sequences has been falling behind. Our method may be a useful tool for supplementing functional information already automatically available.
Results: The method reached a level of more than 82% accuracy in a full cross-validation test. Due to a lack of functional annotations, we could infer localization for fewer than half of all proteins in SWISS-PROT. We applied LOCkey to annotate five entirely sequenced proteomes, namely Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), Drosophila melanogaster (fly), Arabidopsis thaliana (plant) and a subset of all human proteins. LOCkey found about 8000 new annotations of sub-cellular localization for these eukaryotes.
Availability: Annotations of localization for eukaryotes at: http://cubic.bioc.columbia.edu/services/LOCkey
Contact: nair{at}cubic.bioc.coplumbia.edu rost{at}columbia.edu
Keywords: genome sequence analysis; predicting sub-cellular localization; protein function; lexical analysis.
* To whom correspondence should be addressed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R. Casadio, P. L. Martelli, and A. Pierleoni The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation Brief Funct Genomic Proteomic, February 18, 2008; (2008) eln003v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Shatkay, A. Hoglund, S. Brady, T. Blum, P. Donnes, and O. Kohlbacher SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data Bioinformatics, June 1, 2007; 23(11): 1410 - 1417. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Lee, D.-W. Kim, D. Na, K. H. Lee, and D. Lee PLPD: reliable protein localization prediction from imbalanced and overlapped datasets Nucleic Acids Res., October 18, 2006; 34(17): 4655 - 4666. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Guda pTARGET: a web server for predicting protein subcellular localization. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W210 - W213. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Luz and M. Vingron Family specific rates of protein evolution Bioinformatics, May 15, 2006; 22(10): 1166 - 1171. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Guda and S. Subramaniam TARGET: a new method for predicting protein subcellular localization in eukaryotes Bioinformatics, November 1, 2005; 21(21): 3963 - 3969. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rey, M. Acab, J. L. Gardy, M. R. Laird, K. deFays, C. Lambert, and F. S. L. Brinkman PSORTdb: a protein subcellular localization database for bacteria Nucleic Acids Res., January 1, 2005; 33(suppl_1): D164 - D168. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Rost, G. Yachdav, and J. Liu The PredictProtein server Nucleic Acids Res., July 1, 2004; 32(suppl_2): W321 - W326. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nair and B. Rost LOCnet and LOCtarget: sub-cellular localization for structural genomics targets Nucleic Acids Res., July 1, 2004; 32(suppl_2): W517 - W521. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nair and B. Rost LOC3D: annotate sub-cellular localization for protein structures Nucleic Acids Res., July 1, 2003; 31(13): 3337 - 3340. [Abstract] [Full Text] [PDF] |
||||


