Bioinformatics Advance Access originally published online on May 3, 2006
Bioinformatics 2006 22(14):1717-1722; doi:10.1093/bioinformatics/btl170
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ensemble classifier for protein fold pattern recognition
1 Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University Shanghai 200030, China
2 Gordon Life Science Institute San Diego, CA 92130, USA
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Prediction of protein folding patterns is one level deeper than that of protein structural classes, and hence is much more complicated and difficult. To deal with such a challenging problem, the ensemble classifier was introduced. It was formed by a set of basic classifiers, with each trained in different parameter systems, such as predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, as well as different dimensions of pseudo-amino acid composition, which were extracted from a training dataset. The operation engine for the constituent individual classifiers was OET-KNN (optimized evidence-theoretic k-nearest neighbors) rule. Their outcomes were combined through a weighted voting to give a final determination for classifying a query protein. The recognition was to find the true fold among the 27 possible patterns.
Results: The overall success rate thus obtained was 62% for a testing dataset where most of the proteins have <25% sequence identity with the proteins used in training the classifier. Such a rate is 621% higher than the corresponding rates obtained by various existing NN (neural networks) and SVM (support vector machines) approaches, implying that the ensemble classifier is very promising and might become a useful vehicle in protein science, as well as proteomics and bioinformatics.
Availability: The ensemble classifier, called PFP-Pred, is available as a web-server at http://202.120.37.186/bioinf/fold/PFP-Pred.htm for public usage.
Contact: lifesci-sjtu{at}san.rr.com
Supplementary information: Supplementary data are available on Bioinformatics online.
| INTRODUCTION |
|---|
|
|
|---|
The avalanche of protein sequences generated in the post-genomic era has challenged us for developing computational methods by which the structural information can be timely extracted from sequence databases. Although the direct prediction of the three-dimensional (3D) structure of a protein from its sequence based on the least free energy principle is scientifically quite sound and some encouraging results already obtained in elucidating the handedness problems and packing arrangements in proteins (see e.g. Chou and Carlacci, 1991; Chou et al., 1982, 1984, 1990), it is very difficult to predict its overall fold owing to the notorious local minimum problem. Also, although it is quite successful to predict the 3D structure of a protein according to the homology modeling approach (Chou, 2004; Holm and Sander, 1999), a hurdle exists when the query protein does not have any structure-known homologous protein in the existing databases. Facing this kind of situation, can we find a different approach to predict the fold of a protein? In this paper, we shall resort to the taxonomic approach, whose underpinning is based on the assumption that the number of protein folds is limited (Chou and Zhang, 1995; Dubchak et al., 1999; Finkelstein and Ptitsyn, 1987; Murzin et al., 1995). Accordingly, predicting the 3D structure of a protein may be first converted to a problem of classification, i.e. identifying which fold pattern it belongs to. The present study was initiated in an attempt to introduce a novel approach, the ensemble classifier, to recognize the fold pattern for a query protein.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The working (training and testing) datasets studied here were taken from Ding and Dubchak (2001). The original training dataset and testing dataset contain 313 proteins and 385 proteins, respectively. Of these proteins, however, two (i.e. 2SCMC and 2GPS) in the training dataset and two (2YHX_1 and 2YHX_2) in the testing dataset do not have sequence records. These four proteins were excluded for further consideration due to lacking sequence information. Accordingly, we have 311 proteins for training dataset and 383 proteins for testing dataset. The names of the training and testing proteins and their sequences are given in Online Supplementary Materials AI and AII, respectively. None of proteins in the testing dataset has >35% sequence identity to those in the training dataset (Ding and Dubchak, 2001). According to the SCOP database (Andreeva et al., 2004; Murzin et al., 1995), the proteins in the training and testing datasets (Online Supplementary Materials A) were further classified into the following 27-fold types (Ding and Dubchak, 2001; Dubchak et al., 1995, 1999): (1) globin-like, (2) cytochrome c, (3) DNA-binding 3-helical bundle, (4) 4-helical up-and-down bundle, (5) 4-helical cytokines, (6) EF-hand, (7) immunoglobulin-like, (8) cupredoxins, (9) viral coat and capsid proteins, (10) conA-like lectin/glucanases, (11) SH3-like barrel, (12) OB-fold, (13) beta-trefoil, (14) trypsin-like serine proteases, (15) lipocalins, (16) (TIM)-barrel, (17) FAD (also NAD)-binding motif, (18) flavodoxin-like, (19) NAD(P)-binding Rossmann-fold, (20) P-loop, (21) thioredoxin-like, (22) ribonuclease H-like motif, (23) hydrolases, (24) periplasmic binding protein-like, (25) ß-grasp, (26) ferredoxin-like and (27) small inhibitors, toxins, lectins. Of the above 27-fold types, types 16 belong to all
structural class, types 715 to all ß class, types 1624 to
/ß class and type 2527 to
+ß class. Therefore, the classification of 27 folds is one level deeper than that of 4 structural classes (Cai, 2001; Chou and Zhang, 1995; Zhou, 1998; Zhou and Assa-Munt, 2001). Naturally, it is more challenging and difficult to conduct prediction among the 27-fold types than among the 4 structural classes (Chou, 1995; Chou and Maggiora, 1998). To deal with the problem, Ding and Dubchak (2001) extracted the following six features from protein sequences: (1) amino acid composition, (2) predicted secondary structure, (3) hydrophobicity, (4) normalized van der Waals volume, (5) polarity and (6) polarizability. Of the above six features, only the amino acid composition contains 20 components, with each representing the occurrence frequency of one of the 20 native amino acids in a given protein (Chou and Zhang, 1994; Zhou and Doctor, 2003). For the other five features, each contains 3 + 3 +5 x 3 = 21 components, as detailed in Ding and Dubchak (2001) and Dubchak et al. (1999). Based on these multiple parameter sets and majority voting rule trained by the proteins in the training dataset, an overall success rate of 56% was reported (Ding and Dubchak, 2001) in predicting the fold type for the proteins in the testing dataset.
In the present study, in order to avoid completely ignoring the sequence-order effects, the pseudo-amino acid composition (Chou, 2001) was used to replace the conventional amino acid composition (Chou and Zhang, 1993; Nakashima et al., 1986) as used in (Ding and Dubchak, 2001). However, rather than using a combined correlation function (Chou, 2001), here the alternate correlation function between hydrophobicity and hydrophilicity (Chou, 2005; Chou and Cai, 2005) is adopted to reflect the sequence-order effects. For reader's convenience, a brief introduction about amphiphilic pseudo-amino acid composition (PseAA) is given below.
Suppose a protein P with a sequence of L amino acid residues:
![]() | (1) |
![]() | (2) |
and
are the hydrophobicity and hydrophilicity correlation functions given by
![]() | (3) |
1 and
2 are called the 1st-tier correlation factors that reflect the sequence-order correlation between all the most contiguous residues along a protein chain through hydrophobicity and hydrophilicity, respectively [Figure 1(a1), (a2)],
3 and
4 are the corresponding 2nd-tier correlation factors that reflect the sequence-order correlation between all the 2nd most contiguous residues [Figure 1(b1),(b2)], and so forth. Note that before substituting the values of hydrophobicity and hydrophilicity into Equation (3), they were all subjected to a standard conversion as described by the following equation:
![]() | (4) |
and
represent the original hydrophobicity value (Tanford, 1962) and hydrophilicity value (Hopp and Woods, 1981) for amino acid Ri, respectively (Table 1);
and
their means over 20 native amino acids;
and
their standard deviations. The converted hydrophobicity and hydrophilicity values obtained by Equation (4) will have a zero mean value over the 20 native amino acids and will remain unchanged if going through the same conversion procedure again. As we can see from Equations (14) as well as Figure 1, a considerable amount of sequence-order information has been incorporated into the 2
correlation factors through the hydrophobic and hydrophilic values of the amino acid residues along a protein chain. By fusing the 2
amphiphilic correlation factors into the classical amino acid composition, we have the following augmented discrete form to represent a protein sample P:
![]() | (5) |
![]() | (6) |
j the sequence-correlation factor computed according to Equation (2) and w the weight factor. In the current study, we chose w = 0.5 to make the results of Equation (6) within the range easier to be handled (w can be of course assigned with other values, but this would not have a big different impact to the final results). Therefore, the first 20 numbers in Equation (5) represent the classic amino acid composition, and the next 2
discrete numbers reflect the amphiphilic sequence correlation along a protein chain. Such a protein representation is called amphiphilic pseudo-amino acid composition, which has the same form as the conventional amino acid composition, but contains much more information. It is through the 2
pseudo-amino acid components that the sequence order of a protein chain and the distribution of the hydrophobic and hydrophilic amino acids along the chain are indirectly and partially reflected. It should be pointed out that, according to the definition of the classical amino acid composition, all its components must be
0; it is not always true, however, for the pseudo-amino acid composition: the components corresponding to the sequence correlation factors may also be < 0.
|
|
In this study, the OET-KNN (optimized evidence-theoretic k-nearest neighbors) algorithm is adopted as the operation engine of a classifier (Shen and Chou, 2005). For reader's convenience, a brief introduction about OET-KNN classifier and its key equations are given in Appendix A. However, quite different from the case of (Shen and Chou, 2005), now we have many different input types, such as the (20+2
)D PseAA, 21D predicted secondary structure, 21D hydrophobicity, 21D normalized van der Waals volume, 21D polarity and 21D polarizability (Ding and Dubchak, 2001). Since a basic classifier is defined by one operation engine and one input type, one way to use the information from the multiple input types is to combine the above 6 input types into one and use a [(21x5)+(20+2
)]D vector to represent it. However, doing so would introduce too many parameters into the input, thereby reducing the cluster-tolerant capacity (Chou, 1999) and cross-validation success rate. Furthermore, the PseAA with a different value of
will become a different input type. In the present study,
was assigned with 1, 4, 14 and 30. Therefore, we are actually facing 5 + 4 = 9 different input types (Table 2), and have 9 basic classifiers. To deal with this situation, we shall introduce an ensemble classifier, by which not only the other five features described in (Ding and Dubchak, 2001) but also the pseudo-amino acid compositions with a set of different
values can be automatically fused into one prediction system.
|
The framework of ensemble classifier system was established by combining numerous basic classifiers together in order to reduce the variance caused by the peculiarities of a single training set and hence be able to learn a more expressive concept in classification than a single classifier. Illustrated in Figure 2 is the basic framework for an ensemble classifier that consists of
= 9 basic classifiers. The final output of the ensemble is the weighted fusion of the outputs produced by the nine individual classifiers, as formulated below.
|
Suppose the ensemble classifier
is expressed by
![]() | (7) |
1,
2, ...,
3 represent the nine basic OET-KNN classifiers (Appendix A) each operating on the input derived from one of the nine features listed in Table 2; i.e. classifier
1 operates on the 22D PseAA,
2 on the 28D PseAA,
3 on the 48D PseAA,
4 on the 80D PseAA,
5 on the 21D predicted secondary structure
6 on the 21D hydrophobicity,
7 on the 21D normalized van der Waals volume,
8 on the 21D polarity, and
9 on the 21D polarizability. In Equation (7) the symbol
denotes the fusing operator. For reader's convenience, the values of the nine input parameter systems (cf. Table 2) for each of the proteins in the training and testing datasets are given in the Online Supplementary Materials BI and BII, respectively.
Thus, the process of how the ensemble classifier
works by fusing the nine basic classifiers
(i) (i = 1,2,
,9) can be formulated as follows. Suppose
![]() | (8) |
i(P,Sj) is the belief function or supporting degree for P belonging to Sj obtained by the ith basic classifier as defined by Equation (A5) in Appendix A; and wi is the weighted factor, which was assigned in this study with the value of the success rate obtained by the ith single basic classifier
i, as will be further discussed below.
Thus the query protein P is predicted belonging to the fold type with which its score of Equation 8 is the highest; i.e. suppose
![]() | (9) |
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
To demonstrate the power of the ensemble classifier, predictions were conducted based on the same training and testing datasets used by the previous investigators (Chung and Huang, 2003; Ding and Dubchak, 2001). None of proteins in these datasets has >35% sequence identity to any other, and most of proteins in the testing dataset have <25% sequence identity with those in the training dataset (Ding and Dubchak, 2001). The overall success rate in recognizing the fold among the 27 folding types by the ensemble classifier for the 383 proteins in the independent dataset is given in Table 3, where, for facilitating comparison, the success rates by the other approaches are also listed. As can be seen from Table 3, the ensemble classifier, which was formed by fusing nine basic classifiers, obviously outperformed the other approaches.
|
It is instructive to note that if using each of the nine basic classifiers
1,
2,
3,
4,
5,
6,
7,
8,
9 to do the same prediction, the success rates would be 0.40, 0.44, 0.40, 0.29, 0.42, 0.37, 0.32, 0.29, 0.24, respectively. All of them are significantly lower than the rate of 0.62 = 62% obtained by the ensemble classifier (Table 3), indicating that a strong classifier can be generated by fusing many weak classifiers. Actually, as mentioned above, these single classifier rates were assigned for the weights wi(i= 1,2, ... , 9) in Equation (9) to form the ensemble classifier. | CONCLUSIONS |
|---|
|
|
|---|
An ensemble classifier is formed by a set of basic classifiers, whose individual outcomes are combined in some way, typically through a weighted voting, to give a final determination in classifying a query sample. The current ensemble classifier consists of nine basic individual classifiers. Their operation engine was OET-KNN algorithm, but they were each trained in nine different parameter systems extracted from the training dataset; i.e. 22D PseAA, 28D PseAA, 48D PseAA, 80D PseAA, 21D predicted secondary structure, 21D hydrophobicity, 21D normalized van der Waals volume, 21D polarity and 21D polarizability.
It is instructive to note that although the operation engine adopted here for the basic classifiers is the OET-KNN algorithm, others, such as the covariant discriminant algorithm and SVM algorithm, can also be used to replace the OET-KNN for forming different ensemble classifiers. Moreover, the constituent individual basic classifiers can be driven by completely different operation engines as well, and an ensemble classifier thus formed would become one with a mixture of operation engines. Similarly, we can also design an ensemble classifier by fusing both different input types and different operation engines. It is shown thru the present study that the ensemble classifier formed by fusing different input types, particularly different dimensions of pseudo-amino acid composition [(cf. Equation (5)], is very promising for enhancing the success rate in recognizing the fold type of proteins.
| APPENDIX A |
|---|
|
|
|---|
The optimized evidence-theoretic k-nearest neighbors (OET-KNN) classifier
For reader's convenience, a brief introduction of the OET-KNN classifier is given below. For further explanation, refer to (Shen and Chou, 2005). Let us consider a problem of classifying N entities into 27 classes (fold types), which can be formulated as
![]() | (A1) |
![]() | (A2) |
i(i = 1,2, ... , N) take values in
of Equation (A1). According to the KNN (k-nearest neighbors) rule (Cover and Hart, 1967), an unclassified entity P is assigned to the class represented by a majority of its K-nearest neighbors of P. Owing to its good performance and simple-to-use feature, the KNN rule, also named as voting KNN rule, is quite popular in pattern recognition community. The ET-KNN (evidence theoretic k-nearest neighbors) rule is a pattern classification method based on the DempsterShafer theory of belief functions (Denoeux, 1995). In the classification process, each neighbor of a pattern to be classified is considered as an item of evidence supporting certain hypotheses concerning the class membership of that pattern. Based on this evidence, basic belief masses are assigned to each subset concerned. Such masses are obtained for each of the k-nearest neighbors of the pattern under consideration and aggregated using the Dempster's rule of combination (Shafer, 1976). A decision is made by assigning a pattern to the class with the maximum credibility.
Suppose P is a query protein to be classified, and
is the set of its k-nearest neighbors in the training dataset
of Equation (A2). Thus, for any
, the knowledge that Pi belongs to class
µ
can be considered as a piece of evidence that increases our belief that P also belongs to
µ. According to the basic belief assignment mapping theory (Shafer, 1976), this item of evidence can be formulated by
![]() | (A3) |
0 is a fixed parameter,
µ is a parameter associated with class
µ and D2(Pi, P) is the square Euclidean distance between P and Pi. In the ET-KNN rule, it was not addressed how to optimally select the parameters. In 1998 an optimization procedure to determine the optimal or near-optimal parameter values was proposed from the data by minimizing an error function (Zouhal and Denoeux, 1998). It was observed that the OET-KNN rule obtained thru such an optimization treatment would lead to a substantial improvement in classification accuracy. The optimal parameter thus obtained for
0 of Equation A3 is 0.95, and those for
µ are given in Table A1.
|
The belief function of P belonging to class
µ is a combination of its k-nearest neighbors, and can be formulated as
![]() | (A4) |
is called the orthogonal sum, which is commutative and associative. According to Dempster's rule (Shafer, 1976), the belief function of Equation A4 can be expressed as
![]() | (A5) |
is the i-th possible subset of
, and
,
and
are the symbols in set theory, representing contained in, intersection, and the empty set, respectively.
A decision is made by assigning the query protein P to the class with which the belief or credibility function of Equation A5 has the maximum value; i.e. if
![]() | (A6) |
µ is the class predicted for the query protein.Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Keith A Crandall
Received on March 31, 2006; revised on April 26, 2006; accepted on April 27, 2006
| REFERENCES |
|---|
|
|
|---|
Andreeva, A., et al. (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res, . 32, D226D229
Cai, Y.D. (2001) Is it a paradox or misinterpretation. Proteins, 43, 336338[CrossRef][Web of Science][Medline].
Chou, J.J. and Zhang, C.T. (1993) A joint prediction of the folding types of 1490 human proteins from their genetic codons. J. Theor. Biol, . 161, 251262[CrossRef][Web of Science][Medline].
Chou, K.C. (1995) A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins, 21, 319344[CrossRef][Web of Science][Medline].
Chou, K.C. (1999) A key driving force in determination of protein structural classes. Biochem. Biophys. Res. Commun, . 264, 216224[CrossRef][Web of Science][Medline].
Chou, K.C. (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins, 43, 246255 (Erratum: ibid., 2001, Vol.44, 60)[CrossRef][Web of Science][Medline].
Chou, K.C. (2004) Review: structural bioinformatics and its impact to biomedical science. Curr. Med. Chem, . 11, 21052134[Web of Science][Medline].
Chou, K.C. (2005) Using amphiphilic pseudo-amino acid composition to predict enzyme subfamily classes. Bioinformatics, 21, 1019
Chou, K.C. and Cai, Y.D. (2005) Prediction of membrane protein types by incorporating amphipathic effects. J. Chem. Inform. Modeling, 45, 407413[CrossRef].
Chou, K.C. and Carlacci, L. (1991) Energetic approach to the folding of alpha/beta barrels. Proteins, 9, 280295[CrossRef][Web of Science][Medline].
Chou, K.C. and Maggiora, G.M. (1998) Domain structural class prediction. Protein Eng, . 11, 523538
Chou, K.C., et al. (1984) Energetic approach to packing of a-helices: 2. General treatment of nonequivalent and nonregular helices. J. Am. Chem. Soc, . 106, 31613170[CrossRef].
Chou, K.C., et al. (1990) Review: energetics of interactions of regular structural elements in proteins. Accounts Chem. Res, . 23, 134141[CrossRef].
Chou, K.C., et al. (1982) Structure of beta-sheets: origin of the right-handed twist and of the increased stability of antiparallel over parallel sheets. J. Mol. Biol, . 162, 89112[CrossRef][Web of Science][Medline].
Chou, K.C. and Zhang, C.T. (1994) Predicting protein folding types by distance functions that make allowances for amino acid interactions. J. Biol. Chem, . 269, 2201422020
Chou, K.C. and Zhang, C.T. (1995) Review: prediction of protein structural classes. Crit. Rev. Biochem. Mol. Biol, . 30, 275349[Web of Science][Medline].
Chou, K.C., et al. (1997) Disposition of amphiphilic helices in heteropolar environments. Proteins, 28, 99108[CrossRef][Web of Science][Medline].
Chung, I.F. and Huang, C.D. (2003) Recognition of structure classification of protein folding by NN and SVM hierarchical learning architecture. In Kaynak, O., Alpaydin, E., Oja, E., Xu, L. (Eds.). Lecture Notes in Computer Sciences, , Istanbul, Turkey Springer Vol 2714, , pp. 11591167.
Cover, T.M. and Hart, P.E. (1967) Nearest neighbour pattern classification. IEEE Trans. Inform. Theory, IT-13, 2127[CrossRef].
Denoeux, T. (1995) A k-nearest neighbor classification rule based on DempsterShafer theory. IEEE Trans. Syst. Man Cybern, . 25, 804813[CrossRef].
Ding, C.H. and Dubchak, I. (2001) Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics, 17, 349358
Dubchak, I., et al. (1995) Prediction of protein folding class using global description of amino acid sequence. Proc. Natl Acad. Sci. USA, 92, 87008704
Dubchak, I., et al. (1999) Recognition of a protein fold in the context of the structural classification of proteins (SCOP) classification. Proteins, 35, 401407[CrossRef][Web of Science][Medline].
Finkelstein, A.V. and Ptitsyn, O.B. (1987) Why do globular proteins fit the limited set of folding patterns? Prog. Biophys. Mol. Biol, . 50, 171190[CrossRef][Web of Science][Medline].
Holm, L. and Sander, C. (1999) Protein folds and families: sequence and structure alignments. Nucleic Acids Res, . 27, 244247
Hopp, T.P. and Woods, K.R. (1981) Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl Acad. Sci. USA, 78, 38243828
Murzin, A.G., et al. (1995) SCOP: a structural classification of protein database for the investigation of sequence and structures. J. Mol. Biol, . 247, 536540[CrossRef][Web of Science][Medline].
Nakashima, H., et al. (1986) The folding type of a protein is relevant to the amino acid composition. J. Biochem, . 99, 152162.
Shafer, G. A Mathematical Theory of Evidence, (1976) , Princeton, NJ Princeton University Press.
Shen, H.B. and Chou, K.C. (2005) Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. Biochem. Biophys. Res. Commun, . 334, 288292[CrossRef][Web of Science][Medline].
Tanford, C. (1962) Contribution of hydrophobic interactions to the stability of the globular conformation of proteins. J. Am. Chem. Soc, . 84, 42404274[CrossRef][Web of Science].
Zhou, G.P. (1998) An intriguing controversy over protein structural class prediction. Journal of Protein Chemistry, 17, 729738[CrossRef][Web of Science][Medline].
Zhou, G.P. and Assa-Munt, N. (2001) Some insights into protein structural class prediction. Proteins, 44, 5759[CrossRef][Web of Science][Medline].
Zhou, G.P. and Doctor, K. (2003) Subcellular location prediction of apoptosis proteins. Proteins, 50, 4448[CrossRef][Web of Science][Medline].
Zouhal, L.M. and Denoeux, T. (1998) An evidence-theoretic K-NN rule with parameter optimization. IEEE Trans. Syst. Man Cybern, . 28, 263271[CrossRef].
This article has been cited by other articles:
![]() |
W.-Z. Lin, X. Xiao, and K.-C. Chou GPCR-GIA: a web-server for identifying G-protein coupled receptors and their families with grey incidence analysis Protein Eng. Des. Sel., November 1, 2009; 22(11): 699 - 705. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Dong, S. Zhou, and J. Guan A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation Bioinformatics, October 15, 2009; 25(20): 2655 - 2662. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rackovsky Sequence physical properties encode the global organization of protein structure space PNAS, August 25, 2009; 106(34): 14345 - 14348. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Guo and X. Gao A novel hierarchical ensemble classifier for protein fold recognition Protein Eng. Des. Sel., November 1, 2008; 21(11): 659 - 664. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. V. Aguilar, L. W. Leung, E. Wang, S. C. Weaver, and C. F. Basler A Five-Amino-Acid Deletion of the Eastern Equine Encephalitis Virus Capsid Protein Attenuates Replication in Mammalian Systems but Not in Mosquito Cells J. Virol., July 15, 2008; 82(14): 6972 - 6983. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Damoulas and M. A. Girolami Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection Bioinformatics, May 15, 2008; 24(10): 1264 - 1270. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. A. Shamim, M. Anwaruddin, and H.A. Nagarajaram Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs Bioinformatics, December 15, 2007; 23(24): 3320 - 3327. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Chen and L. Kurgan PFRES: protein fold classification by using evolutionary information and predicted secondary structure Bioinformatics, November 1, 2007; 23(21): 2843 - 2850. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








[
basic individual classifiers:
,
, ... , and
. A colour version of this figure appears in the Supplementary data.











