Bioinformatics Advance Access originally published online on September 16, 2004
Bioinformatics 2005 21(4):483-491; doi:10.1093/bioinformatics/bti028
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bioinformatics vol. 21 issue 4 © Oxford University Press 2005; all rights reserved.
SPAAN: a software program for prediction of adhesins and adhesin-like proteins using neural networks



G.N. Ramachandran Knowledge Center for Genome Informatics, Institute of Genomics and Integrative Biology Mall Road, Delhi 110 007, India
*To whom correspondence should be addressed.
| Abstract |
|---|
|
|
|---|
Motivation: The adhesion of microbial pathogens to host cells is mediated by adhesins. Experimental methods used for characterizing adhesins are time-consuming and demand large resources. The availability of specialized software can rapidly aid experimenters in simplifying this problem. We have employed 105 compositional properties and artificial neural networks to develop SPAAN, which predicts the probability of a protein being an adhesin (P ad).
Results: SPAAN had optimal sensitivity of 89% and specificity of 100% on a defined test set and could identify 97.4% of known adhesins at high P ad value from a wide range of bacteria. Furthermore, SPAAN facilitated improved annotation of several proteins as adhesins. Novel adhesins were identified in 17 pathogenic organisms causing diseases in humans and plants. In the severe acute respiratory syndrome (SARS) associated human corona virus, the spike glycoprotein and nsps (nsp2, nsp5, nsp6 and nsp7) were identified as having adhesin-like characteristics. These results offer new lead for rapid experimental testing.
Availability: SPAAN is freely available through ftp://203.195.151.45
Contact: ramu{at}igib.res.in
| INTRODUCTION |
|---|
|
|
|---|
Microbial pathogens encode adhesins that mediate their adherence to host cell surface receptors, membranes or the extracellular matrix for successful colonization. Investigations into this primary event of hostpathogen interaction have revealed a wide array of adhesins in a variety of pathogenic microbes (Finlay and Falkow, 1997). New approaches to vaccine development focus on targeting adhesins to abrogate the colonization process (Wizemann et al., 1999). However, the specific roles of particular adhesins in several pathogens remain to be elucidated.
One of the best-understood mechanisms of bacterial adherence is the attachment mediated by pili or fimbriae. The well-studied adhesins in this category are FimH and PapG adhesins of Escherichia coli (Hahn et al., 2002) and the Type IV pili adhesins in Pseudomonas aeruginosa, Neisseria, Moraxella, enteropathogenic E.coli and Vibrio cholerae (Strom and Lory, 1993). Several adhesins from other commonly known bacterial pathogens include MrkD protein of Kleibsiella pneumoniae (Gerlach et al., 1989), Hia of Haemophilus influenzae (Barenkamp and St Geme, 1996) and many others (for further details see http://www.igib.res.in/data/seepath/spaan_data.html).
Several vaccine formulations either currently approved or being evaluated use adhesins as immunizing agents. Examples include filamentous hemagglutinin and pertactin proteins against Bordetella pertussis (Halperin et al., 2003), FimH against pathogenic E.coli (Langermann et al., 2000), PsaA against pneumococcal disease (Rapola et al., 2003), outer membrane vesicle preparations including BabA adhesin against Helicobacter pylori infections (Prinz et al., 2003) and a synthetic peptide anti-adhesin vaccine against P.aeruginosa infections (Cachia and Hodges, 2003).
Experimental identification of adhesins is an arduous task. Computational methods such as homology search can aid in identification, but this procedure suffers from limitations when the homologues are not characterized. Sequence analysis based on compositional properties provides relief to this problem. Amino acid composition is a fundamental attribute of a protein and it has significant correlation to its location, function, folding type, shape and in vivo stability (Nakashima and Nishikawa, 1994; Nandi et al., 2003). Recently, compositional properties have been applied to the problems as diverse as the prediction of functional roles (Hobohm and Sander, 1995), protein secondary structures (Rost and Sander, 1993), secretory proteins and apicoplast-targeted proteins in Plasmodium falciparum (Schneider, 1999; Zuegge et al., 2001).
We report a non-homology method using 105 compositional properties combined with artificial neural networks (ANNs) to identify adhesins and adhesin-like proteins in species belonging to a wide phylogenetic spectrum.
| SYSTEMS AND METHODS |
|---|
|
|
|---|
The five attributes
Amino acid frequencies
Amino acid frequency f i = (counts of i-th amino acid in the sequence)/l, where i = 1, ..., 20 and l is the length of the protein.
Multiplet frequencies
Multiplets are defined as homopolymeric stretches (X) n where X is the amino acid and n (integer)
2 (Brendel et al., 1992). After identifying all the multiplets, the frequencies of the amino acids in the multiplets were computed as follows:
![]() |
Dipeptide frequencies
The frequency of a dipeptide (i, j)f ij = (counts of ij-th dipeptide)/(total dipeptide counts), where i, j = 120.
The theoretical number of possible dipeptides is 400. The recommended ratio for the number of input vectors to the number of weight connections is
2 to avoid overfitting (Andrea and Kalayeh, 1991). Therefore, we used top 20 dipeptides (when arranged in the ascending order of the P-values assessed using t-test) whose frequencies in the adhesin dataset were significantly different from that in the non-adhesin dataset (single-letter code): NG, RE, TN, NT, GT, TT, DE, ER, RR, RK, RI, AT, TS, IV, SG, GS, TG, GN, VI and HR.
Charge composition
The frequency of charged amino acids (R, K, E and D considering the ionization properties of the side chains at pH 7.2) is given by f c = (counts of charged amino acids)/l Furthermore, information on the characteristics of the distribution of the charged amino acids in a given protein sequence was obtained by computing the moments of the positions of the occurrences of the charged amino acids.
The general expression to compute moments of a given order; say r is
![]() |
where, X m is the mean of all positions of charged amino acids,
; X i is the position of i-th charged amino acid; and N is the number of charged amino acids in the sequence.
The frequency of charged amino acids, the length of the protein and the moments of order from of 2 to 19 were used to train the ANN constituting a total of 20 inputs. Moments of order >19 were not useful in further enhancing the performance.
Hydrophobic composition
The amino acids were classified into five groups based on their hydrophobicity scores: (8 for K, E, D and R), (4 for S, T, N and Q), (2 for P and H), (+1 for A, G, Y, C and W) and (+2 for L, V, I, F and M) (Brendel et al., 1992).
The inputs fed into the neural network for each group are as follows:
- f i = (counts of i-th group)/(total counts in the protein), where i = 15.
- m ji = j-th order moment of positions of amino acids in i-th group, where j = 25.
A total of 25 inputs representing the hydrophobic composition of a protein were fed to the neural network.
Taken together, a total of 105 compositional properties in the five modules were used to predict the adhesin-like characteristics of a given protein sequence.
Database construction
Adhesins
Protein sequences were retrieved from http://www.ncbi.nlm.nih.gov using the keyword adhesin. Furthermore, proteins containing the following keywords were removed from the primary retrieval: transport, pyrophosphatase, peroxidase, myosin, chaperone, hydrolase, gene product, accessory, regulatory, patent, permease, hypothetical, keratin, agrobacterium, intimin, ORFA, ATP binding, tRNA, deiminase, metalloproteinase, cofactor, amylase, methylase, unknown, ribosomal, alternative start, submitter believes and phospholipase. The remaining sequences in the adhesin database were manually curated to generate a set of well-annotated proteins many of which have been verified experimentally.
Non-adhesins
The rationale we used here was to collect sequences of enzymes and other proteins that function within the cell. They probably have remote possibility of functioning as adhesins and would differ in compositional characteristics (Nakashima and Nishikawa, 1994). The keywords used were dehydratase, dehydrogenase, ribosomal protein, kinase, polymerase, acyl-CoA synthase, decarboxylase, and hydrolase. Since effective implementation of the algorithm requires that the sizes of the two databases to be similar, we selected sequences from Methanococcus jannaschii, E.coli and Saccharomyces cerevisiae as representatives of the three primary kingdoms of life: Archaea, Eubacteria and Eukarya. This selection offers a diverse set for obtaining a broad range of limits for the detection of non-adhesins. In the subsequent step, hypothetical, transport, unknown and membrane protein sequences were removed.
Eliminating redundant entries
We used CLUSTALW (Thompson et al., 1994) to analyze sequence similarities between the sequences in pairwise comparisons. Only one sequence entry was retained among pairs that had a CLUSTALW score of 100. Partial sequence entries were also removed. The total number of adhesins was 469 and the total number of non-adhesin proteins from E.coli was 282, M.jannaschii was 162 and S.cerevisiae was 259 which summed to 703 entries.
Neural network
The feed forward error back propagation neural network algorithm was used. The program was downloaded from the website http://www.cs.colostate.edu/~anderson a gift from Charles W. Anderson (Department of Computer Science, Colorado State University, Fort Collins, CO, e-mail: anderson{at}cs.colostate.edu)
| ALGORITHM |
|---|
|
|
|---|
Neural network architecture
The neural network used here has a multilayer feed forward topology. It consists of an input layer, a hidden layer and an output layer. This is a fully-connected neural network where each neuron i is connected to each neuron j of the next layer (Fig. 1). The weight connections are denoted by w ij . The state I i of each neuron in the input layer is assigned directly from the input data, whereas the states of hidden layer neurons are computed from the states of input layer neurons using the sigmoid function,
|
![]() |
where w j0 is the bias weight. The back propagation algorithm was used to minimize the differences between the computed output and the target value. The target value for adhesins was set as 1 and for non-adhesins it was set as 0.
In the initial optimization experiments, a training set and a validate set were used. The training set had 367 adhesins and 580 non-adhesins. The validate set had 102 adhesins and 123 non-adhesins. A total of 10 000 cycles (epochs) of training iterations were performed. Subsequently, the best epoch with minimum error on the validate set was identified and the corresponding weight matrix was used for the prediction.
Five networks were prepared, one for each attribute (Fig. 1). The number of neurons in the input layer was equal to the number of input data points for each attribute. The optimal number of neurons in the hidden layer was determined through experimentation for minimizing the error at the best epoch for each network individually. An upper limit for the total number of weight connections was set to half of the total number of input vectors to avoid overfitting as suggested previously (Andrea and Kalayeh, 1991). The final number of neurons in the hidden layer for each module was Amino acid frequencies: 30, Multiplets frequencies: 28, Dipeptide frequencies: 28, Charge composition: 30 and Hydrophobic composition: 30. During predictions, the network is fed with new data from the sequences that were not part of the training set.
Probability of being an adhesin, the P ad value
Query proteins were processed modularly through the networks trained for each attribute. Thus, five probability outputs for each sequence were obtained. Final prediction was computed using the following expression, which is a weighted linear sum of the probabilities from five modules:
![]() |
where fc i is the fraction of correlation of i-th module of the trained neural network, where i = A (Amino acid frequencies), C (Charge composition), D (Dipeptide frequencies), H (Hydrophobic composition) or M (Multiplet frequencies). The fractions of correlation fc i represent the fractions of total entries that were predicted correctly (P i,adhesin > 0.5 and P i,non-adhesin < 0.5) by the trained network on the validate set (Charles W. Anderson, http://www.cs.colostate.edu/~anderson). fcA = 0.84, fcC = 0.71, fcD = 0.84, fcH = 0.79, fcM = 0.83.
Matthew's correlation coefficient for assessing the performance of SPAAN
The Matthew's correlation coefficient (Mcc) (Matthews, 1975) is defined as follows:
![]() |
where TP stands for true positives, TN the true negatives, FP the false positives and FN the false negatives.
Here TPs are adhesins and TNs are non-adhesins. Adhesins with P ad value above a chosen threshold are TPs, whereas known non-adhesins with P ad value below the chosen threshold are TNs. The sensitivity, Sn, is given by (TP/(TP+FN)) and specificity, Sp, is given by (TP/(TP+FP)).
| SPECIFICATIONS |
|---|
|
|
|---|
Computer programs used to compute individual compositional attributes were written in C and executed on a PC with operating system Red Hat Linux version 7.3 or 8.0.
Sequence inputs
SPAAN accepts input sequence files in the FASTA format. Multiple sequences can be present in one file. Protein sequences with ambiguous amino acids and/or of length <50 amino acids were filtered out. Amino acids must use the single-letter code according to the IUPAC-IUB nomenclature system.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
Sensitivity, specificity and correlation coefficient
In designing SPAAN, we developed a non-homology, compositional property based method to predict adhesins and adhesin-like proteins solely from the sequence data. To assess the performance of SPAAN, we prepared a test set of 37 well-annotated adhesins and 37 non-adhesins that were not part of the training set. The results are shown in Figure 2. It is apparent that SPAAN could identify 89% of known adhesins with 100% specificity when examined at P ad
0.51. At P ad
0.51, the Mcc (Matthews, 1975) was observed to be highest (0.94). We observed that the combination of five modules provided the best results.
|
Assessment of the performance in individual modules showed that they performed poorly when compared with the combination of modules. The performances of individual modules were as follows: Charge composition (P c = 0.55, Mcc = 0.658, Sn = 0.756 and Sp = 0.848), Dipeptide frequencies (P D = 0.54, Mcc = 0.84, Sn = 0.86 and Sp = 0.94), Hydrophobic composition (P H = 0.61, Mcc = 0.63, Sn = 0.54 and Sp = 0.9) and Multiplet frequencies (P H = 0.58, Mcc = 0.77, Sn = 0.81 and Sp = 0.9). Performance of the Amino acid frequencies' module could not be assessed unambiguously because the Mcc was nearly flat over a broad range. These observations suggest that it would be fruitful to include multiple modules for obtaining high-quality predictions and are consistent with the experience of Hobohm and Sander (1995).
SPAAN predicts experimentally characterized adhesins with high P ad value
Considering the small size of test set, we examined the general applicability of SPAAN by analyzing several well-characterized adhesins from a wide range of pathogens causing a variety of diseases. The results on 194 adhesins with binding specificity to a wide range of host receptors are displayed in Table 1 (for further details see http://www.igib.res.in/data/seepath/spaan_data.html). It is apparent that except two FimH proteins of E.coli, pertactin of B.pertussis, protein F of Streptococcus pyogenes and PspC of Streptococcus pneumoniae, the rest 189 adhesins had P ad
0.51 indicating an overall sensitivity of 97.4%. These results demonstrate the general applicability of SPAAN.
|
SPAAN is a non-homology method based on sequence properties
To examine the non-homology character of SPAAN, we prepared a dataset of 130 adhesins that did not have any protein pairs with CLUSTALW score of 100. Equal number of non-adhesins was selected with the same criterion. A histogram plot of adhesins and non-adhesins in the various ranges of P ad values is displayed in Figure 3a. It is evident that SPAAN is capable of segregating the adhesins and non-adhesins into two distinct cohesive groups. Most of the adhesins (96%) have P ad
0.51 whereas all the non-adhesins (100%) have P ad < 0.51.
|
We computed the pairwise sequence similarities using CLUSTALW (Thompson et al., 1994) and compared with the differences between the P ad values in the pair (denoted by
P ad). The relationships for adhesins and non-adhesins are shown in Figures 3b and c, respectively. In both adhesins and non-adhesins,
P ad values are uniformly low and appear nearly independent of sequence relationship. Furthermore, among the protein pairs with score <20, 82% of adhesin pairs and 86% of non-adhesin pairs had
P ad < 0.2. These data reinforce the non-homology character of SPAAN.
Application of SPAAN to whole genomes
The results of the genome scan for selected pathogens of humans and plants are displayed in Table 2 (for detailed data description, see online Supplementary Table at http://www.igib.res.in/data/seepath/spaan_data.html). We used a stringent criterion of P ad > 0.7 on the basis of the results shown in Figure 3a to reduce the detection of false positives. Subsequently, we restricted our analysis to a maximum of 50 top-scoring proteins. This serves as a good starting point to examine the performance of SPAAN and to identify top-scoring novel adhesins with high confidence. The experimentally characterized adhesins from a wide range of pathogens top the list in genome scans. Several of the predicted adhesins are supported by complementary evidence from Conserved Domain Database search (RPS-BLASTP), BLASTP and the beta helix predictor BETAWRAP (Marchler-Bauer et al., 2002; Altschul et al., 1990; Bradley et al., 2001). About 3078% of these predicted adhesins also contain beta helix motif. The beta helix motif was found to be associated with several adhesins, toxins, virulence factors and surface proteins (Bradley et al., 2001).
|
In addition, SPAAN guided the improved annotation of a number of adhesins by suggesting re-examination of these proteins using the most commonly used software listed above. It is also evident that the well-known adhesins in these organisms top the list of predictions using SPAAN (Table 2). Several proteins with high P ad values were identified using SPAAN for which either limited or no complementary evidence exist. We have classified these proteins as adhesin-like. Interestingly, several mycobacterial proteins, namely, 35 PE_PGRS proteins and 12 PPE proteins were identified with high P ad value. SPAAN could identify these putative mycobacterial adhesins even though our training dataset was devoid of mycobacterial proteins. Indeed, experimental analysis has demonstrated that some of these proteins could mediate hostpathogen interactions (Brennan et al., 2001). These results demonstrate that SPAAN could overcome taxonomic limits and can be used for general purpose.
Although SPAAN was primarily trained on bacterial adhesins, we examined its ability to predict putative adhesins from eukaryotic systems. The criteria was relaxed by using the base threshold value of P ad
0.51 to scan the genome of SARS-associated human coronavirus. The spike glycoprotein, nsp2, nsp5, nsp6 and nsp7 were identified as adhesins. Spike glycoprotein has been implicated to play a role in viral entry and pathogenesis (Gallagher and Buchmeier, 2001). The role of nsp proteins in viral pathogenesis is not clear. Since SARS is an important public health problem, these results could rapidly aid experiments that characterize hostpathogen interactions.
A few false positives do appear in the list. A judicious approach for experimental characterization could be developed by considering the total number of proteins to be analyzed, prioritizing proteins with other complementary evidence while keeping the number of false positives as low as possible. In summary, SPAAN could serve as an useful guide to perform experimental characterization of proteins for adhesin-like properties.
| Acknowledgments |
|---|
We thank Prof. G. Padmanaban, Prof. Samir K. Brahmachari, Dr R. Sonti and A. Maharana for their useful discussions. S.R. thanks Council of Scientific and Industrial Research for a grant under the New Millennium Indian Technology Leadership Initiative (NMITLI) program.
| Footnotes |
|---|
The authors wish it to be known that, in their opinion, the first three authors should be regarded as joint First Authors.
Received on February 24, 2004; revised on August 20, 2004; accepted on August 31, 2004
| REFERENCES |
|---|
|
|
|---|
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403410[CrossRef][ISI][Medline].
Andrea, T.A. and Kalayeh, H. (1991) Applications of neural networks in quantitative structureactivity relationships of dihydrofolate reductase inhibitors. J. Med. Chem., 34, 28242836[CrossRef][ISI][Medline].
Barenkamp, S.J. and St Geme, J.W., III. (1996) Identification of a second family of high-molecular-weight adhesion proteins expressed by non-typable. Haemophilus influenzae. Mol. Microbiol., 19, 12151223.
Bock, K., Breimer, M.E., Brignole, A., Hansson, G.C., Karlsson, K.A., Larson, G., Leffler, H., Samuelsson, B.E., Stromberg, N., Eden, C.S., et al. (1985) Specificity of binding of a strain of uropathogenic Escherichia coli to Gal alpha 14Gal-containing glycosphingolipids. J. Biol. Chem., 260, 85458551
Bradley, P., Cowen, L., Menke, M., King, J., Berger, B. (2001) BETAWRAP: successful prediction of parallel beta-helices from primary sequence reveals an association with many microbial pathogens. Proc. Natl Acad. Sci. USA, 98, 1481914824
Brendel, V., Bucher, P., Nourbakhsh, I.R., Blaisdell, B.E., Karlin, S. (1992) Methods and algorithms for statistical analysis of protein sequences. Proc. Natl Acad. Sci. USA, 89, 20022006
Brennan, M.J. and Shahin, R.D. (1996) Pertussis antigens that abrogate bacterial adherence and elicit immunity. Am. J. Respir. Crit. Care Med., 154, S145S149[ISI][Medline].
Brennan, M.J., Delogu, G., Chen, Y., Bardarov, S., Kriakov, J., Alavi, M., Jacobs, W.R. (2001) Evidence that Mycobacterial PE_PGRS proteins are cell surface constituents that influence interactions with other cells. Infect. Immun., 69, 73267333
Cachia, P.J. and Hodges, R.S. (2003) Synthetic peptide vaccine and antibody therapeutic development: prevention and treatment of. Pseudomonas aeruginosa. Biopolymers, 71, 141168.
Finlay, B.B. and Falkow, S. (1997) Common themes in microbial pathogenicity revisited. Microbiol. Mol. Biol. Rev., 61, 136169[Abstract].
Gallagher, T.M. and Buchmeier, M.J. (2001) Coronavirus spike proteins in viral entry and pathogenesis. Virology, 279, 371374[CrossRef][ISI][Medline].
Gerlach, G.F., Clegg, S., Allen, B.L. (1989) Identification and characterization of the genes encoding the type 3 and type 1 fimbrial adhesins of Klebsiella pneumoniae . J. Bacteriol., 171, 12621270
Hahn, E., Wild, P., Hermanns, U., Sebbel, P., Glockshuber, R., Haner, M., Taschner, N., Burkhard, P., Aebi, U., Muller, S.A. (2002) Exploring the 3D molecular architecture of Escherichia coli type 1 pili. J. Mol. Biol., 323, 845857[CrossRef][ISI][Medline].
Halperin, S.A., Scheifele, D., Mills, E., Guasparini, R., Humphreys, G., Barreto, L., Smith, B. (2003) Nature, evolution, and appraisal of adverse events and antibody response associated with the fifth consecutive dose of a five-component acellular pertussis-based combination vaccine. Vaccine, 21, 22982306[CrossRef][ISI][Medline].
Hobohm, U. and Sander, C. (1995) A sequence property approach to searching protein databases. J. Mol. Biol., 251, 390399[CrossRef][ISI][Medline].
Johnson, J.R., Russo, T.A., Scheutz, F., Brown, J.J., Zhang, L., Palin, K., Rode, C., Bloch, C., Marrs, C.F., Foxman, B. (1997) Discovery of disseminated J96-like strains of uropathogenic Escherichia coli O4: H5 containing genes for both PapG (J96) (class I) and PrsG (J96) (class III) Gal(alpha14)Gal-binding adhesins. J. Infect. Dis., 175, 983988[ISI][Medline].
Langermann, S., Mollby, R., Burlein, J.E., Palaszynski, S.R., Auguste, C.G., DeFusco, A., Strouse, R., Schenerman, M.A., Hultgren, S.J., Pinkner, J.S., et al. (2000) Vaccination with FimH adhesin protects cynomolgus monkeys from colonization and infection by uropathogenic Escherichia coli . J. Infect. Dis., 181, 774778[CrossRef][ISI][Medline].
Marchler-Bauer, A., Panchenko, A.R., Shoemaker, B.A., Thiessen, P.A., Geer, L.Y., Bryant, S.H. (2002) CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res., 30, 281283
Matthews, B.W. (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta, 405, 442451[Medline].
Moch, T., Hoschutzky, H., Hacker, J., Kroncke, K.D., Jann, K. (1987) Isolation and characterization of the alpha-sialyl-beta-2,3-galactosyl-specific adhesin. Proc. Natl Acad. Sci. USA, 84, 34623466
Nakashima, H. and Nishikawa, K. (1994) Discrimination of intracellular and extracellular proteins using amino acid composition and residue pair frequencies. J. Mol. Biol., 238, 5461[CrossRef][ISI][Medline].
Nandi, T., Dash, D., Ghai, R., B-Rao, C., Kannan, K., Brahmachari, S.K., Ramakrishnan, C., Ramachandran, S. (2003) A novel complexity measure for comparative analysis of protein sequences from complete genomes. J. Biomol. Struct. Dyn., 20, 657668[ISI][Medline].
Prinz, C., Hafsi, N., Voland, P. (2003) Helicobacter pylori virulence factors and the host immune response: implications for therapeutic vaccination. Trends Microbiol., 11, 134138[CrossRef][ISI][Medline].
Rapola, S., Jäntti, V., Eerola, M., Mäkelä, P.H., Käyhty, H., Kilpi, T. (2003) Anti-PsaA and the risk of pneumococcal AOM and carriage. Vaccine, 21, 36083613[CrossRef][ISI][Medline].
Rosenshine, I., Ruschkowski, S., Stein, M., Reinscheid, D.J., Mills, S.D., Finlay, B.B. (1996) A pathogenic bacterium triggers epithelial signals to form a functional bacterial receptor that mediates actin pseudopod formation. EMBO J., 15, 26132624[ISI][Medline].
Rost, B. and Sander, C. (1993) Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl Acad. Sci. USA, 90, 75587562
Schneider, G. (1999) How many potentially secreted proteins are contained in a bacterial genome?. Gene, 237, 113121[CrossRef][ISI][Medline].
Schulze-Koops, H., Burkhardt, H., Heesemann, J., Kirsch, T., Swoboda, B., Bull, C., Goodman, S., Emmrich, F. (1993) Outer membrane protein YadA of enteropathogenic yersiniae mediates specific binding to cellular but not plasma fibronectin. Infect. Immun., 61, 25132519
Strom, M.S. and Lory, S. (1993) Structurefunction and biogenesis of the type IV pili. Annu. Rev. Microbiol., 47, 565596[CrossRef][ISI][Medline].
St Geme, J.W., III. (1996) Progress towards a vaccine for nontypable Haemophilus influenzae . Ann. Med., 28, 3137[Medline].
Thompson, J.D., Higgins, D.G., Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 46734680
Wizemann, T.M., Adamou, J.E., Langermann, S. (1999) Adhesins as targets for vaccine development. Emerg. Infect. Dis., 5, 395403[ISI][Medline].
Zuegge, J., Ralph, S., Schmuker, M., McFadden, G.I., Schneider, G. (2001) Deciphering apicoplast targeting signalsfeature extraction from nuclear-encoded precursors of Plasmodium falciparum apicoplast proteins. Gene, 280, 1926[CrossRef][ISI][Medline].
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







