Bioinformatics Vol. 15 no. 12 1999
Pages 1000-1011
© 1999 Oxford University Press
IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices
1 National Center for Biotechnology
Information, National Library of Medicine, National Institutes of
Health, Bethesda, MD 20894, USA
2 Department of Biology, Texas A&M
University, Biological Sciences Building West, College Station, TX
77843, USA
Present address: MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, South Parks Road, Oxford OX1 3QX, UK.
Motivation: Many studies have shown that database searches using position-specific score matrices (PSSMs) or profiles as queries are more effective at identifying distant protein relationships than are searches that use simple sequences as queries. One popular program for constructing a PSSM and comparing it with a database of sequences is Position-Specific Iterated BLAST (PSI-BLAST).
Results: This paper describes a new software package, IMPALA, designed for the complementary procedure of comparing a single query sequence with a database of PSI-BLAST-generated PSSMs. We illustrate the use of IMPALA to search a database of PSSMs for protein folds, and one for protein domains involved in signal transduction. IMPALAs sensitivity to distant biological relationships is very similar to that of PSI-BLAST. However, IMPALA employs a more refined analysis of statistical significance and, unlike PSI-BLAST, guarantees the output of the optimal local alignment by using the rigorous SmithWaterman algorithm. Also, it is considerably faster when run with a large database of PSSMs than is BLAST or PSI-BLAST when run against the complete non-redundant protein database.
Availability: The IMPALA source code, the wolf1187 database, and the aravind105 database are freely available from the NCBI ftp site ncbi.nlm.nih.gov. The databases may be found in the subdirectory ftp://ncbi.nlm.nih.gov/pub/impala. The source code is in ftp://ncbi.nlm.nih.gov/toolbox/ncbi·tools. Some IMPALA executables for different implementations of UNIX are in ftp://ncbi.nlm.nih.gov/blast/executables. IMPALA has been added as a search option on the Blocks Database Server (http://blocks.fhcrc.org/blocks/impala.html)using a library of PSSMs derived from the BLOCKS database.
Contact: schaffer{at}helix.nih.gov
Received on March 19, 1999
; revised on July 28, 1999
; accepted on August 4, 1999
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
H. S. Ooi, C. Y. Kwo, M. Wildpaner, F. L. Sirota, B. Eisenhaber, S. Maurer-Stroh, W. C. Wong, A. Schleiffer, F. Eisenhaber, and G. Schneider ANNIE: integrated de novo protein sequence annotation Nucleic Acids Res., July 1, 2009; 37(suppl_2): W435 - W440. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Wang, R. I. Sadreyev, and N. V. Grishin PROCAIN: protein profile comparison with assisting information Nucleic Acids Res., June 1, 2009; 37(11): 3522 - 3530. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. F. Altschul, E. M. Gertz, R. Agarwala, A. A. Schaffer, and Y.-K. Yu PSI-BLAST pseudocounts and the minimum description length principle Nucleic Acids Res., February 1, 2009; 37(3): 815 - 824. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Przybylski and B. Rost Powerful fusion: PSI-BLAST and consensus sequences Bioinformatics, September 15, 2008; 24(18): 1987 - 1993. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Kann, S. L. Sheetlin, Y. Park, S. H. Bryant, and J. L. Spouge The identification of complete domains within protein sequences using accurate E-values for semi-global alignment Nucleic Acids Res., July 9, 2007; 35(14): 4678 - 4685. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Przybylski and B. Rost Consensus sequences improve PSI-BLAST through mimicking profile-profile alignments Nucleic Acids Res., April 1, 2007; 35(7): 2238 - 2246. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Fariselli, I. Rossi, E. Capriotti, and R. Casadio The WWWH of remote homolog detection: The state of the art Brief Bioinform, March 1, 2007; 8(2): 78 - 87. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-K. Yu, E. M. Gertz, R. Agarwala, A. A. Schaffer, and S. F. Altschul Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches Nucleic Acids Res., November 6, 2006; 34(20): 5966 - 5973. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-M. Yang and C.-H. Tung Protein structure database search and evolutionary classification Nucleic Acids Res., August 2, 2006; 34(13): 3646 - 3659. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-C. Chen, J.-K. Hwang, and J.-M. Yang (PS)2: protein structure prediction server. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W152 - W157. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Cheng and P. Baldi A machine learning information retrieval approach to protein fold recognition Bioinformatics, June 15, 2006; 22(12): 1456 - 1463. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Yu. Mitrophanov and M. Borodovsky Statistical significance in biological sequence analysis Brief Bioinform, March 1, 2006; 7(1): 2 - 24. |
||||
![]() |
L. M. Iyer, A. M. Burroughs, and L. Aravind The ASCH superfamily: novel domains with a fold related to the PUA domain and a potential role in RNA metabolism Bioinformatics, February 1, 2006; 22(3): 257 - 263. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Pieper, N. Eswar, F. P. Davis, H. Braberg, M. S. Madhusudhan, A. Rossi, M. Marti-Renom, R. Karchin, B. M. Webb, D. Eramian, et al. MODBASE: a database of annotated comparative protein structure models and associated resources Nucleic Acids Res., January 1, 2006; 34(suppl_1): D291 - D295. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Price, G. E. Crooks, R. E. Green, and S. E. Brenner Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap Bioinformatics, October 15, 2005; 21(20): 3824 - 3831. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Karchin, M. Diekhans, L. Kelly, D. J. Thomas, U. Pieper, N. Eswar, D. Haussler, and A. Sali LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources Bioinformatics, June 15, 2005; 21(12): 2814 - 2820. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Freschi and A. Bogliolo Using sequence compression to speedup probabilistic profile matching Bioinformatics, May 15, 2005; 21(10): 2225 - 2229. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Kann, P. A. Thiessen, A. R. Panchenko, A. A. Schaffer, S. F. Altschul, and S. H. Bryant A structure-based method for protein sequence alignment Bioinformatics, April 15, 2005; 21(8): 1451 - 1456. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. A. Simossis, J. Kleinjung, and J. Heringa Homology-extended sequence alignment Nucleic Acids Res., February 7, 2005; 33(3): 816 - 824. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. J. Su, L. Lu, S. Saxonov, and D. L. Brutlag eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity Nucleic Acids Res., January 1, 2005; 33(suppl_1): D178 - D182. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Pugalenthi, A. Bhaduri, and R. Sowdhamini GenDiS: Genomic Distribution of protein structural domain Superfamilies Nucleic Acids Res., January 1, 2005; 33(suppl_1): D252 - D255. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Jeronimo, M.-F. Langelier, M. Zeghouf, M. Cojocaru, D. Bergeron, D. Baali, D. Forget, S. Mnaimneh, A. P. Davierwala, J. Pootoolal, et al. RPAP1, a Novel Human RNA Polymerase II-Associated Protein Affinity Purified with Recombinant Wild-Type and Mutated Polymerase Subunits Mol. Cell. Biol., August 15, 2004; 24(16): 7043 - 7058. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Lorenz, J. L. Wells, D. W. Pryce, M. Novatchkova, F. Eisenhaber, R. J. McFarlane, and J. Loidl S. pombe meiotic linear elements contain proteins related to synaptonemal complex components J. Cell Sci., July 1, 2004; 117(15): 3343 - 3351. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Kolker, K. S. Makarova, S. Shabalina, A. F. Picone, S. Purvine, T. Holzman, T. Cherny, D. Armbruster, R. S. Munson Jr, G. Kolesov, et al. Identification and functional analysis of 'hypothetical' genes expressed in Haemophilus influenzae Nucleic Acids Res., April 30, 2004; 32(8): 2353 - 2361. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Pieper, N. Eswar, H. Braberg, M. S. Madhusudhan, F. P. Davis, A. C. Stuart, N. Mirkovic, A. Rossi, M. A. Marti-Renom, A. Fiser, et al. MODBASE, a database of annotated comparative protein structure models, and associated resources Nucleic Acids Res., January 1, 2004; 32(90001): D217 - 222. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Fleming, A. Muller, R. M. MacCallum, and M. J. E. Sternberg 3D-GENOMICS: a database to compare structural and functional annotations of proteins between sequenced genomes Nucleic Acids Res., January 1, 2004; 32(90001): D245 - 250. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bhaduri and R. Sowdhamini A genome-wide survey of human tyrosine phosphatases Protein Eng. Des. Sel., December 1, 2003; 16(12): 881 - 888. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Enright, V. Kunin, and C. A. Ouzounis Protein families and TRIBES in genome sequence space Nucleic Acids Res., August 1, 2003; 31(15): 4632 - 4638. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. John and A. Sali Comparative protein structure modeling by iterative alignment, model building and model assessment Nucleic Acids Res., July 15, 2003; 31(14): 3982 - 3992. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Eswar, B. John, N. Mirkovic, A. Fiser, V. A. Ilyin, U. Pieper, A. C. Stuart, M. A. Marti-Renom, M. S. Madhusudhan, B. Yerkovich, et al. Tools for comparative protein structure modeling and analysis Nucleic Acids Res., July 1, 2003; 31(13): 3375 - 3380. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Mannhaupt, C. Montrone, D. Haase, H. W. Mewes, V. Aign, J. D. Hoheisel, B. Fartmann, G. Nyakatura, F. Kempken, J. Maier, et al. What's in the genome of a filamentous fungus? Analysis of the Neurospora genome sequence Nucleic Acids Res., April 1, 2003; 31(7): 1944 - 1954. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Frishman, M. Mokrejs, D. Kosykh, G. Kastenmuller, G. Kolesov, I. Zubrzycki, C. Gruber, B. Geier, A. Kaps, K. Albermann, et al. The PEDANT genome database Nucleic Acids Res., January 1, 2003; 31(1): 207 - 211. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. S. Gowri, S. B. Pandit, P. S. Karthik, N. Srinivasan, and S. Balaji Integration of related sequences with protein three-dimensional structural families in an updated version of PALI database Nucleic Acids Res., January 1, 2003; 31(1): 486 - 488. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Mazumder, L. M. Iyer, S. Vasudevan, and L. Aravind Detection of novel members, structure-function analysis and evolutionary classification of the 2H phosphoesterase superfamily Nucleic Acids Res., December 1, 2002; 30(23): 5229 - 5243. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Muller, R. M. MacCallum, and M. J.E. Sternberg Structural Characterization of the Human Proteome Genome Res., November 1, 2002; 12(11): 1625 - 1641. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Holzerlandt, C. Orengo, P. Kellam, and M. M. Alba Identification of New Herpesvirus Gene Homologs in the Human Genome Genome Res., November 1, 2002; 12(11): 1739 - 1748. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Gupta, S. B. Pandit, N. Srinivasan, and D. Chatterji Proteomics analysis of carbon-starved Mycobacterium smegmatis: induction of Dps-like protein Protein Eng. Des. Sel., June 1, 2002; 15(6): 503 - 511. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Anantharaman, E. V. Koonin, and L. Aravind Comparative genomics and evolution of proteins involved in RNA metabolism Nucleic Acids Res., April 1, 2002; 30(7): 1427 - 1464. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. W.A. Buchan, A. J. Shepherd, D. Lee, F. M.G. Pearl, S. C.G. Rison, J. M. Thornton, and C. A. Orengo Gene3D: Structural Assignment for Whole Genes and Genomes Using the CATH Domain Structure Database Genome Res., March 1, 2002; 12(3): 503 - 514. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Frishman Knowledge-based selection of targets for structural genomics Protein Eng. Des. Sel., March 1, 2002; 15(3): 169 - 183. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Pieper, N. Eswar, A. C. Stuart, V. A. Ilyin, and A. Sali MODBASE, a database of annotated comparative protein structure models Nucleic Acids Res., January 1, 2002; 30(1): 255 - 259. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. B. Pandit, D. Gosar, S. Abhiman, S. Sujatha, S. S. Dixit, N. S. Mhatre, R. Sowdhamini, and N. Srinivasan SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes Nucleic Acids Res., January 1, 2002; 30(1): 289 - 293. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Nolling, G. Breton, M. V. Omelchenko, K. S. Makarova, Q. Zeng, R. Gibson, H. M. Lee, J. Dubois, D. Qiu, J. Hitti, et al. Genome Sequence and Comparative Analysis of the Solvent-Producing Bacterium Clostridium acetobutylicum J. Bacteriol., August 15, 2001; 183(16): 4823 - 4838. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Anantharaman, E. V. Koonin, and L. Aravind Peptide-N-glycanases and DNA repair proteins, Xp-C/Rad4, are, respectively, active and inactivated enzymes sharing a common transglutaminase fold Hum. Mol. Genet., August 1, 2001; 10(16): 1627 - 1630. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Schaffer, L. Aravind, T. L. Madden, S. Shavirin, J. L. Spouge, Y. I. Wolf, E. V. Koonin, and S. F. Altschul Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements Nucleic Acids Res., July 15, 2001; 29(14): 2994 - 3005. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. V. Grigoriev, C. Zhang, and S.-H. Kim Sequence-based detection of distantly related proteins with the same fold Protein Eng. Des. Sel., July 1, 2001; 14(7): 455 - 458. [Full Text] [PDF] |
||||
![]() |
K. S. Makarova, L. Aravind, Y. I. Wolf, R. L. Tatusov, K. W. Minton, E. V. Koonin, and M. J. Daly Genome of the Extremely Radiation-Resistant Bacterium Deinococcus radiodurans Viewed from the Perspective of Comparative Genomics Microbiol. Mol. Biol. Rev., March 1, 2001; 65(1): 44 - 79. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. F. Altschul, R. Bundschuh, R. Olsen, and T. Hwa The estimation of statistical parameters for local alignment score distributions Nucleic Acids Res., January 15, 2001; 29(2): 351 - 361. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Alba, D. Lee, F. M. G. Pearl, A. J. Shepherd, N. Martin, C. A. Orengo, and P. Kellam VIDA: a virus database system for the organization of animal virus genome open reading frames Nucleic Acids Res., January 1, 2001; 29(1): 133 - 136. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. S. Malik, S. Henikoff, and T. H. Eickbush Poised for Contagion: Evolutionary Origins of the Infectious Abilities of Invertebrate Retroviruses Genome Res., September 1, 2000; 10(9): 1307 - 1318. [Abstract] [Full Text] |
||||









