Bioinformatics Advance Access originally published online on March 31, 2008
Bioinformatics 2008 24(10):1264-1270; doi:10.1093/bioinformatics/btn112
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection
Department of Computing Science, University of Glasgow, S. A. W. Building, G12 8QQ, UK
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: The problems of protein fold recognition and remote homology detection have recently attracted a great deal of interest as they represent challenging multi-feature multi-class problems for which modern pattern recognition methods achieve only modest levels of performance. As with many pattern recognition problems, there are multiple feature spaces or groups of attributes available, such as global characteristics like the amino-acid composition (C), predicted secondary structure (S), hydrophobicity (H), van der Waals volume (V), polarity (P), polarizability (Z), as well as attributes derived from local sequence alignment such as the Smith–Waterman scores. This raises the need for a classification method that is able to assess the contribution of these potentially heterogeneous object descriptors while utilizing such information to improve predictive performance. To that end, we offer a single multi-class kernel machine that informatively combines the available feature groups and, as is demonstrated in this article, is able to provide the state-of-the-art in performance accuracy on the fold recognition problem. Furthermore, the proposed approach provides some insight by assessing the significance of recently introduced protein features and string kernels. The proposed method is well-founded within a Bayesian hierarchical framework and a variational Bayes approximation is derived which allows for efficient CPU processing times.
Results: The best performance which we report on the SCOP PDB-40D benchmark data-set is a 70% accuracy by combining all the available feature groups from global protein characteristics but also including sequence-alignment features. We offer an 8% improvement on the best reported performance that combines multi-class k-nn classifiers while at the same time reducing computational costs and assessing the predictive power of the various available features. Furthermore, we examine the performance of our methodology on the SCOP 1.53 benchmark data-set that simulates remote homology detection and examine the combination of various state-of-the-art string kernels that have recently been proposed.
Contact: theo{at}dcs.gla.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
Much effort has been directed to the prediction of the 3D structures of proteins for which no experimental structures are available (Baker and Sali, 2001). Where there is sequence similarity to proteins of known structure, a comparative matching procedure is often adopted. However, where no such sequence similarity exists, the prediction problem is formidable, not least because the overall structure may be unlike that of any protein, the structure of which has been determined.
In this context, one approach, known as the taxonomic approach (Ding and Dubchak, 2001; Shen and Chou, 2006), has been to divide the problem of determining the overall 3D structure into that of determining its fold. The term fold is used to denote a particular arrangement of a specific number of secondary structure components (usually alpha-helices and beta-strands) that is the basis of the overall structure of several different proteins which may have little or no amino acid sequence similarity. The appearances of some of these arrangements have given rise to names like barrel, bundle, sandwich and propeller, although these tend to encompass several more specific folds e.g. the TIM beta/alpha barrel and the 5-bladed beta-propeller. Hence, protein fold prediction can be seen as a challenging multiclass recognition problem where proteins are classified into folds based on their characteristics and available measurements.
Past work on the problem of predicting protein folds has employed artificial neural networks (ANNs), support vector machines (SVMs), Bayesian networks, Hidden Markov Models and k-nn classifiers (Chou and Zhang, 1995; Dubchak et al., 1995; Jaakkola et al., 1999; Raval et al., 2002) with varying success. In Ding and Dubchak (2001) an extensive study on a publicly available data-set, consisting of 27 SCOP folds (Andreeva et al., 2004; Lo Conte et al., 2000), was conducted exploring the use of various multi-class adaptations of the well-known binary SVM classifier methodology. In that work, the best methodology for combining binary SVMs was identified for the particular problem giving an accuracy of 56%, and furthermore, via an extensive experimental procedure the most predictive protein characteristics were selected from the initial group considered. These were found to be the amino-acid composition (C), the secondary structure (S) and the hydrophobicity (H).
Recently, Shen and Chou (2006) proposed two modifications to the method of Ding and Dubchak (2001) that raised the best performance accuracy from 56 to 62.1%. Firstly, they proposed a somewhat ad hoc ensemble learning approach where multi-class k-nn classifiers individually trained on each feature space (such as C or S) were later combined and secondly, they proposed the use of four additional feature groups to replace the amino-acid composition. These were pseudo-amino acid compositions (PseAA) (Chou, 2005) designed to capture sequence-order effects by using a correlation function between hydrophobicity and hydrophilicity in different intervals of the protein sequence.
In the present work, we concentrate on the same benchmark dataset of Ding and Dubchak (2001) with the extra groups of features proposed by Shen and Chou (2006), and also including sequence-alignment1 features via a pairwise kernel (Liao and Noble, 2003), which essentially describes the sequence based similarity of the proteins. We offer a single multi-class kernel machine able to operate on all of these groups of features simultaneously and instructively combine them. This offers a new and efficient way of incorporating multiple feature characteristics of the proteins without an increase in the number of required classifiers. In addition, we assess the importance and predictive power of the PseAA compositions proposed by Shen and Chou (2006) together with all the other available characteristics and gain insight on the protein fold recognition problem.
Furthermore, we demonstrate the generality of our methodology in a practical setting by addressing the remote homology problem on the SCOP 1.53 data-set as previously studied and described by a large number of works, see Leslie et al. (2004); Liao and Noble (2003); Lingner and Meinicke (2006); Saigo et al. (2004) and references within, where a variety of string kernels in conjunction with a discriminative SVM methodology have been proposed. Following our approach we select four of these state-of-the-art string kernels and combine them into an overall composite kernel where the multinomial probit kernel machine operates.
Related methodologies on kernel machines and multiple kernel learning (MKL) include the work by Lanckriet et al. (2004a, b); Lewis et al. (2006a, b); Sonnenburg et al. (2006) and references within, where semidefinite programming (SDP) or semi-infinite linear programming techniques are employed in order to minimize a loss function with respect to the kernel combination. These approaches build upon the SVM methodology and formulate the kernel combination problem as a further optimization procedure. The SDP approach suffers from large requirements in memory and CPU time in the order O(S3N2), where S is the number of sources and N is the number of covariates. These methods also carry the inherent drawback of SVM methodologies, namely their problematic scaling for multiclass problems as they are based on binary classifiers by nature.
An explicit multi-class classifier within the Gaussian Process methodology was introduced recently by Girolami and Zhong (2007) which enables data integration by combining the covariance functions instead of kernels. Their methodology has the drawback of employing a first-order approximation for the inverse of the covariance functions. Finally, recent work by Melvin et al. (2007) in the context of protein classification employed adaptive codes to handle the multiclass prediction problem. However their methodology is based on learning a weighting of binary classifiers which, we argue, is not an efficient strategy especially for multiple feature space problems such as the one considered in this study.
| 2 APPROACH |
|---|
|
|
|---|
The approach adopted is based on the motivation to reduce the number of classifiers needed for such challenging multi-class recognition problems where multiple feature sets are available, while improving performance. Combining binary classifiers as in the work by Ding and Dubchak (2001) increases heavily the computational resources needed since, e.g. for the best performing all-vs-all method, we need to deploy S x C(C-1)/2=2106 classifiers, where S is the number of feature spaces or sources (only six in their work) and C the number of classes.
Furthermore, even when employing multi-class classifiers in an ensemble learning framework such as the one proposed by Shen and Chou (2006), we still need as many classifiers as there are available feature spaces. Considering the nature of the protein fold prediction problem, where the fold type of a protein can depend on a large number of protein characteristics and also noting that even in the taxonomic approach the number of fold types already approaches the 1000 boundary, it is straightforward to see the need for a methodological framework that can cope with a large number of classes and can incorporate as many as there are available feature spaces while assesing their informational content.
The proposed approach, as can be seen from Figure 1, is based on the ability to embed each object description via the kernel trick (Shawe-Taylor and Cristianini, 2004) into a kernel space (Hilbert space). This produces a similarity measure between proteins in every feature space and then, having a common measure, we can combine informatively these similarities onto a composite kernel space. Hence now, a single multi-class kernel machine can operate on that composite space effectively disregarding the number of feature spaces used. Inference by Bayes theorem on our hierarchical multiclass model enables us to learn the significance of each source/feature space and their predictive power by the corresponding kernel weights β, to learn the regressors and the kernel parameters without resorting to ad-hoc ensemble learning, combination of binary classifiers or parameter tuning.
|
| 3 MATERIALS AND METHODS |
|---|
|
|
|---|
3.1 Fold recognition
The original dataset from Ding and Dubchak (2001) (based on SCOP PDB-40D) consists of 313 proteins for training and 385 proteins for testing with < 35% sequence identity between any two proteins in the train and the test set. Furthermore, the extensions proposed by Shen and Chou (2006) exclude four proteins from the original dataset, namely proteins 2SCMC and 2GPS from the training set plus 2YHX_1 and 2YHX_2 from the test set, due to lack of sequence records.
The 27 SCOP fold types (Dubchak et al., 1995) together with the original feature spaces in Ding and Dubchak (2001), the four proposed by Shen and Chou (2006) which describe PseAA compositions estimated on different intervals of the protein sequence, and the two local alignment Smith–Waterman (SW) based feature spaces, with different scoring matrices, are described in Tables 1 and 2 in the Online Supplementary Materials (OSM hereafter).
|
|
3.2 Remote homology detection (RHD)
The SCOP 1.53 benchmark data-set2 as described in Liao and Noble (2003) is employed to simulate the RHD problem. It consists of 4 352 proteins belonging to one of 54 families and the positive training is performed on low-homologs while the positive testing on members of the same family. We consider four state-of-the-art string kernels, namely a local alignment (LA) kernel (Saigo et al., 2004), a mismatch (MM) kernel (Leslie et al., 2004), an oligomer kernel (Mono) (Lingner and Meinicke, 2004) and a pairwise (PW) kernel (Liao and Noble, 2003), taking the best performing case from each string kernel category as a separate informational source. We follow the above past works within the kernel machine paradigm by adding a class-dependent regularization parameter to the diagonal of the kernels to improve performance on this highly imbalanced problem.
3.3 Methodology
Consider now S feature spaces or sources of information; From each one we have input variables
as object descriptors such as strings or Ds – dimensional vectors for s = 1, ... , S and corresponding multinomial target variables tn
{1, ... , C} for n = 1, ... , N where N is the number of observations and C the number of classes. By applying the kernel trick on the individual feature spaces created by the S sources we can define the N x N composite kernel as
|
|
|
|
is an S x Ds matrix, describing the Ds – dimensional kernel parameters
s of all the base kernels Ks, which intuitively corresponds to the level of smoothing within each kernel. Now as we can see the mean composite kernel is a weighted summation of the base kernels, where each one describes a similarity measure between proteins based on specific features, with βs as the corresponding weight for each one.
Following the standard approach for the multinomial probit by Albert and Chib (1993), we introduce auxiliary variables Y
CxN and define the relationship between the auxiliary variable ycn and the target variable tn as
|
| (1) |
Cx N, where wcn is the weight with which data point n votes for class c, and assuming a standardized normal noise model as in Albert and Chib (1993) and Girolami and Rogers (2006) is given by
|
| (2) |
. Note that |
| (3) |
is taken with respect to the standardized normal distribution
The proposed graphical model as depicted in Figure 2 is completed by considering prior distributions on the model variables and, following a hierarchical approach, hyper-prior distributions on the parameters of the first. We place a product of zero mean Gaussian distributions on the regressors
with variance
cn (described by the variable Z in the graph) and a gamma distribution on each scale with hyper-hyper-parameters
,
, reflecting our lack of prior knowledge and taking advantage of the conjugacy of these distributions.
|
Furthermore, we place a gamma distribution with associated hyper-hyper-parameters
,
on each kernel parameter since
sd
+. In the case of the mean composite kernel, a Dirichlet distribution with parameters
is placed on the combinatorial weights in order to satisfy the constraints imposed on the possible values which are defined on a simplex. A further gamma distribution is placed on each
s with associated hyper-hyper-parameters µ,
. The hyper-hyper-parameters
= {
,
,
,
, µ,
} can be set by type-II maximum likelihood or set to uninformative values and the hyper and first level parameters
= {Y, W, β,
,
, Z} are sampled accordingly. It is now straightforward to see that a Gibbs sampler can be readily constructed and standard MCMC approaches (Andrieu, 2003) can be employed for Bayesian inference in our model. In this article though we offer a variational Bayes approximation in order to achieve efficient computational processing times without loss of predictive performance.
Hence, we bound the model evidence by using an ensemble of factored posteriors to approximate the joint parameter posterior distribution. The joint likelihood of the model is defined as p(t,
|X,
) = p(t|Y) p (Y|W, β,
) p (W|Z) p (Z|
,
) p (β|
) p (
|
,
) p (
|µ,
) and the factorable ensemble approximation of the required posterior is p(
|
, X, t)
Q(
) = Q(Y) Q (W) Q (β) Q (
)Q(Z) Q (
). We can bound the model evidence using Jensen's inequality
|
| (4) |
i)
exp (
Q(
-i) {log p(t,
|
)}), where Q(
–i) is the factorable ensemble with the ith component removed.
The resulting posterior distributions for the approximation are given below with full details of the derivations in OSM. First, the approximate posterior over the auxiliary variables is given by
|
| (5) |
|
| (6) |
|
| (7) |
is the standardized cumulative distribution function (CDF) and |
| (8) |
|
| (9) |
The approximate posterior for the variances Z is an updated product of inverse-gamma distributions and the posterior mean is given in the OSM or Denison et al. (2002). Finally, the approximate posteriors for the kernel parameters Q(
), the combinatorial weights Q(β) and the associated hyper-prior parameters Q(
) can be obtained by importance sampling (Andrieu, 2003) in a similar manner to Girolami and Rogers (2006) since no tractable analytical solution can be offered.
Having described the approximate posterior distributions of the parameters and hence obtained the posterior expectations we turn back to our original task of making class predictions t* for Ntest new proteins X* that are represented by S different information sources Xs* embedded into Hilbert spaces as base kernels K*s
s,βs and combined into a composite test kernel K*
,β. The predictive distribution for a single new protein x* is given by
which ends up, see OSM, as
|
| (10) |
, β for clarity.
In Algorithm 1 we summarize the VB approximation in a pseudo-algorithmic fashion.
| 4 RESULTS AND DISCUSSION |
|---|
|
|
|---|
Reported results are averaged over 20 (fold recognition) and 10 (RHD) randomly initialized trials in order to obtain statistical measures of accuracy and precision. We monitor convergence via the lower bound to the marginal likelihood and convergence is assumed when there is <0.01% increase of the lower bound progression or when a maximum of 100 (fold recognition) and 20 (RHD) iterations have been completed. Throughout this study we have employed second-order polynomial kernels for the global characteristics and inner product kernels for the local characteristics (SW) as they were found to provide a better embeding of the feature spaces. CPU times reported are for a 2 GHz Intel based PC with 2 Gb RAM running Matlab codes.
4.1 Fold recognition
First we examine the performance from individual feature spaces to gain an overall understanding of their predictive abilities. This however does not draw the complete picture as complementary information, may be shared across sources achieving low performances. In Table 1 we present the mean percentage accuracy with std. from our method (VBKC) together with the best ones reported by Ding and Dubchak (2001) on the original dataset.
Regarding the original features employed by Ding and Dubchak (2001), we are in agreement with their observations as the best performing feature space, seems to be the amino-acid composition (C). The
= 1 and
= 4 PseAA achieve the second best global individual performance and as the step
increases further, the individual performances decrease. Although according to Shen and Chou (2006), the PseAA composition has the same form as the conventional amino-acid composition, but contains much more information it seems at this stage that none of the PseAA is as predictive as the conventional amino-acid composition. Furthermore, the local characteristics (SW) surprisingly outperform every global one and SW1 achieves a higher accuracy than the best SVM-combinations proposed by Ding and Dubchak (2001). This is because although most of the proteins have < 35% sequence similarity, this seems to be an adequate similarity level to achieve a good accuracy.
In Table 2 we report the effect of sequentially adding the feature spaces in the order of Ding and Dubchak (2001), extending that to the addition of the PseAA compositions and finally adding the sequence similarity based features. We compare against the best performing SVM combination methodology as reported in Ding and Dubchak (2001) and the ensemble method of Shen and Chou (2006). As we can see in all the steps the proposed method outperforms the best reported accuracies and offers the current state-of-the-art in this data-set.
The best performances can be seen in Table 3 in comparison with the best ones reported in the cited past work. We achieve an improvement over both past methods while we employ a single multiclass kernel machine without resorting to ensemble learning techniques or combining multiple binary classifiers.
|
When we consider a weighted combination of the base kernels, with
|
As we can observe, the amino-acid composition and the secondary structure are judged as more important, followed by the PseAA
= 1. However, it is worth noting that by taking out the amino-acid composition we have only a small loss in performance as we have seen in Table 2. These two observations suggest that the original amino-acid (C) and the pseudo-ones (
i) carry redundant information. Furthermore, despite the individual accuracies of the SW features, they are not heavily weighted. This is because they depend solely on the sequence similarity between proteins and their quality of discriminative information is strongly related to which end of the 0–35% sequence similarity the two proteins will belong. In reality, for the real twilight-zone of low-homology proteins (much <35% similarity) such features have little effect by definition.
In Figure 4 the confusion matrix for a single run is depicted. The values on the matrix are normalized according to Rij=Pj/Ni, where Ni is the total number of proteins belonging in class i and Pj is the number of these Ni proteins that were predicted to belong to class j. For example when all of the proteins in class c were predicted correctly, then Rcc = 1 and Rcj = 0
j
{1, C}.
|
First, it is worth noting that there are two areas where consistent misclassification occurs. The first one is when proteins of class 10–13 (conA-like barrel, SH3-like barrel, OB, beta-trefoil) are classified as Class 7 (fold: immunoglobulin like), and the second one is when proteins of class 19–20 and 24 (Rossmann fold, P-loop, periplasmic binding protein-like) are classified as class 16 (fold: TIM-barrel). Noting that folds 7 and 16 are represented by the top two largest numbers in the training set (30 and 29 proteins, respectively), this seems to imply that these classes are over-represented in comparison with other folds (mean size of 10 proteins) and that features such as (pseudo- or not) amino-acid composition and secondary structure offer little discriminative power on the distinction problem in these two areas.
Furthermore, besides the proteins in the fifth class (fold: 4-helical cytokines) that are correctly classified as expected by previous observations by Ding and Dubchak (2001), now the first class (fold: globin-like) is also achieving a 100% accuracy together with three more classes (7, 16, 27) (folds: immunoglobulin-like, TIM-barrel, small inhibitors) above the 90% level.
4.2 Remote homology detection
As a generalization of the proposed methodology to other related problem-domains, we consider the simulated remote homology problem (RHD) as described in the works of Liao and Noble (2003); Lingner and Meinicke (2006); Leslie et al. (2004); Saigo et al. (2004). The results from the combination of the string kernels are depicted in Table 4 together with the best previously reported results within the SVM methodology. We achieve a state-of-the-art performance via the combination of the kernels and match the overall best performing SVM method outperforming other string kernels. In Figure 5 the number of families that achieve certain ROC scores is depicted in comparison with some of the best performing methods reported in the literature.
|
|
Furthermore, by employing the weighted combination, we infer the contribution of each string kernel and as it can be seen from Figure 6 the Monomer (Mono) and the LA kernel are weighted most heavily as expected from Table 4 and previously reported results.
|
| 5 CONCLUSION |
|---|
|
|
|---|
In this article we offer a single probabilistic multi-class multi-kernel machine that is able to operate simultaneously in multiple feature spaces via a kernel combination methodology. Furthermore, we illustrate the capabilities of our method in a well-benchmarked dataset by Ding and Dubchak (2001) in which recent studies (Shen and Chou, 2006) have improved predictive performance by the introduction of additional pseudo-amino acid composition feature spaces. We show that the additional feature spaces although overall improve performance by a factor of 1–2% in reality carry non-complementary information with the original amino-acid composition. The need for such information to tackle the two misclassification patterns can also be seen in the work of (Shahbaba and Neal, 2007) where even when the problem is treated as a hierarchical classification with parent classes, the performance is not improving beyond 61.4%.
Furthermore, our methodology offers a significant reduction in computational resources as it is based on a single classifier operating over a composite space which retains the dimensionality (N x N) of any of the individual contributing feature spaces (N x N). This, in contrast with the past work of employing thousands of binary classifiers or an ensemble of individually trained classifiers is a significant improvement. We provide, the state-of-the-art on the problem under consideration with a best performance of 70% accuracy without resorting to ad-hoc approaches but employing a solid Baysian formalism which enables us to infer the informative content of the feature spaces.
Finally, we extend our approach to the remote homology problem and demonstrate the generality of our approach in a practical setting by achieving a state-of-the-art performance via a combination of string kernels.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
The authors would like to acknowledge insightful discussions with Dr David Leader and Dr Rainer Breitling, and helpful suggestions by the anonymous reviewers. The first author would like to acknowledge the support received from the Advanced Technology & Research Group within the NCR Financial Solutions Group Ltd company and especially the help and support of Dr Gary Ross and Dr Chao He.
Funding: NCR Financial Solutions Group Ltd. Scholarship to T.D; EPSRC Advanced Research Fellowship (EP/E052029/1) to M.A.G.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Thomas Lengauer
1 Despite the apparent low homology dataset. ![]()
2 Available from http://www.ccls.columbia.edu/compbio/svm-pairwise ![]()
Received on November 13, 2007; revised on March 3, 2008; accepted on March 27, 2008
| REFERENCES |
|---|
|
|
|---|
Albert J, Chib S. Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc (1993) 88:669–679.[CrossRef][Web of Science]
Andreeva A, et al. Scop database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res (2004) 32:226–229.[CrossRef]
Andrieu C. An introduction to MCMC for machine learning. Mach. Learn (2003) 50:5–43.[CrossRef]
Baker D, Sali A. Protein structure prediction and structural genomics. Science (2001) 294:93–96.
Chou K. Using amphiphilic pseudo-amino acid composition to predict enzyme subfamily classes. Bioinformatics (2005) 21:10–19.
Chou K, Zhang C. Prediction of protein structural classes. Crit. Revi. Biochem. Mol. Biol (1995) 30:275–349.[CrossRef]
Denison D.GT, et al. Bayesian Methods for Nonlinear Classification and Regression (2002) UK: Wiley Series in Probability and Statistics, West Sussex.
Ding C, Dubchak I. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics (2001) 17:349–358.
Dubchak I, et al. Prediction of protein folding class using global decsription of amino acid sequence. Proc. Natl Acad. Sci (1995) 92:8700–8704.
Girolami M, Rogers S. Variational Bayesian multinomial probit regression with Gaussian process priors. Neural Comput (2006) 18:1790–1817.[CrossRef][Web of Science]
Girolami M, Zhong M. Data integration for classification problems employing Gaussian process priors. In: Advances in Neural Information Processing Systems 19,—Schölkopf B, Platt J, Hoffman T, eds. (2007) Cambridge, MA: MIT Press. 465–472.
Jaakkola T, et al. Using the fisher kernel method to detect remote protein homologies. In: Proceedings of the Seventh International Conference on Inteligent Systems in Molecular Biology (1999) AAAI Press.
Lanckriet G.RG, et al. Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res (2004a) 5:27–72.
Lanckriet G.RG, et al. A statistical framework for genomic data fusion. Bioinformatics (2004b) 20:2626–2635.
Leslie CS, et al. Mismatch string kernels for discriminative protein classification. Bioinformatics (2004) 20:467–476.
Lewis DP, et al. Nonstationary kernel combination. In: 23rd International Conference on Machine Learning (2006a) Pittsburgh, PA: ACM. 553–560.
Lewis DP, et al. Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure. Bioinformatics (2006b) 22:2753–2760.
Liao L, Noble WS. Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol (2003) 6:857–868.
Lingner T, Meinicke P. Remote homology detection based on oligomer distances. Bioinformatics (2006) 22:2224–2231.
Lo Conte L, et al. Scop: a structural classification of proteins database. Nucleic Acids Res (2000) 28:2257–259.
Melvin I, et al. Multi-class protein classification using adaptive codes. J. Mach. Learn. Res (2007) 8:1557–1581.[Web of Science]
Raval A, et al. A bayesian network model for protein fold and remote homologue recognition. Bioinformatics (2002) 18:788–801.
Saigo H, et al. Protein homology detection using string alignment kernels. Bioinformatics (2004) 20:1682–1689.
Shahbaba B, Neal RM. Nonlinear models using dirichlet process mixtures. Technical Report 0707 (2007) Department of Statistics, University of Toronto.
Shawe-Taylor J, Cristianini N. Kernel Methods for Pattern Analysis (2004) Cambridge, England, UK: Cambridge University Press.
Shen H.-B, Chou K.-C. Ensemble classifier for protein fold pattern recognition. Bioinformatics (2006) 22:1717–1722.
Sonnenburg S, et al. Large scale multiple kernel learning. J. Mach. Learn. Res (2006) 1:1–18.[CrossRef]
This article has been cited by other articles:
![]() |
W. Fu, P. Ray, and E. P. Xing DISCOVER: a feature-based discriminative method for motif search in complex genomes Bioinformatics, June 15, 2009; 25(12): i321 - i329. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








