Skip Navigation


Bioinformatics Advance Access originally published online on October 25, 2006
Bioinformatics 2006 22(24):3047-3053; doi:10.1093/bioinformatics/btl545
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/24/3047    most recent
btl545v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kelemen, J. Z.
Right arrow Articles by Puskás, L. G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kelemen, J. Z.
Right arrow Articles by Puskás, L. G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Kalman filtering for disease-state estimation from microarray data

János Z. Kelemen *, Attila Kertész-Farkas 1, András Kocsor 1 and László G. Puskás

Laboratory of Functional Genomics, Biological Research Centre, Hungarian Academy of Sciences, Szeged Temesvári krt. 62, H-6726, Hungary
1 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, Aradi vértanúk tere 1. H-6720 Szeged, Hungary

*To whom correspondence should be addressed.


    ABSTRACT
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND DATASETS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 REFERENCES
 

Motivation: In this paper, we propose using the Kalman filter (KF) as a pre-processing step in microarray-based molecular diagnosis. Incorporating the expression covariance between genes is important in such classification problems, since this represents the functional relationships that govern tissue state. Failing to fulfil such requirements may result in biologically implausible class prediction models. Here, we show that employing the KF to remove noise (while retaining meaningful covariance and thus being able to estimate the underlying biological state from microarray measurements) yields linearly separable data suitable for most classification algorithms.

Results: We demonstrate the utility and performance of the KF as a robust disease-state estimator on publicly available binary and multi-class microarray datasets in combination with the most widely used classification methods to date. Moreover, using popular graphical representation schemes we show that our filtered datasets also have an improved visualization capability.

Contact: kelli{at}nucleus.szbk.u-szeged.hu.

Supplementary information: www.inf.u-szeged.hu/~kfa/kalman06/


    1 INTRODUCTION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND DATASETS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 REFERENCES
 
Global transcription profiling with DNA microarray technology has led to a deeper understanding of sophisticated cellular processes. Pathological alteration, as a complex biological process, is constantly being studied in this way in a quest to find key drug targets. The gene expression-based molecular classification of cancer subtypes has also been shown to have the potential of reliable diagnosis, either by complementing the traditional clinical, morphological and histo-pathological approaches or as an alternative procedure. However, having large datasets comprising simultaneous expression levels of thousands of genes monitored under diverse circumstances still constitutes a great challenge for biologists as well as computational algorithm developers. In recent years the processing of high-throughput biological data has evolved into a highly interdisciplinary field and a large number of machine learning algorithms have been proposed to automate difficult tasks, such as that of medical diagnosis from gene expression profiles. The following shortly reviews the most renowned of these algorithms, as they were employed in bioinformatics in general and in microarray data classification in particular.

The support vector machine (SVM) classifier (Vapnik, 1998) is the most popular supervised learning algorithm, which has been effectively used in computational biology including protein remote homology detection (Jaakkola et al., 1999), microarray gene expression analysis (Brown et al., 2000), the recognition of translation start sites (Zien et al., 2000), functional classification of promoter regions, the prediction of protein–protein interactions and peptide identification from mass spectrometry data (Noble, 2004).

The artificial neural networks (ANNs) approach was originally developed with the aim of modelling information processing and learning in the brain (Bishop, 1995; Hertz et al., 1991; Rumelhart et al., 1986). Within the bioinformatics area this supervised non-linear learner has been employed for instance in biological sequence analysis, the recognition of signal peptide cleavage sites, gene recognition (Baldi and Brunak, 2001), the prediction of protein functional domains (Murvai et al., 2001) and the classification of cancer subtypes (Khan et al., 2001).

The nearest-neighbour algorithm (Dudoit et al., 2002; Fix and Hodges, 1951) is a simple class prediction technique, which achieves high-performance without a priori assumptions. This method has been used for protein classification (Liao and Noble, 2003) as well as cancer diagnosis (Ramaswamy et al., 2001).

The random forest (RF) technique is a recently proposed meta-classifier method, which is becoming evermore popular in areas of computational biology like drug discovery (Remlinger, 2003) (http://www.samsi.info/200304/dmml/kickoffpresentations/remlinger.pdf) and tumour classification (Shi et al., 2005).

The procedure of molecular classification itself is based on the fact that gene expression profiles work as a surrogate for a biological state. Still, living cells are inherently dynamic; hence microarray measurements capture a large amount of expression variance. A large number of environmental error sources also corrupt the gene expression data, even though normalization procedures are meant to reduce such influences. The two previously mentioned types of variation alter the true gene expression states associated with the particular diseases in question. Under such circumstances the Kalman state estimator provides a reasonable framework for pre-processing the expression data by removing the noise and estimating the multivariable noise-free tumour specific states. The Kalman filter (KF) (Kalman, 1960; Welch and Bishop, 1995; Grewal and Andrews, 2001) is a powerful mathematical tool that has been widely used in many fields of engineering from systems and control theory to signal processing, due to its robustness even under the violation of the normality assumption. It has also been used in supervised learning as well as in myriads of real world applications. Its applications in the bioinformatics field however were limited (Cui et al., 2005), not taking advantage of its full potential as a multivariate signal processor. Our aim here was to adapt the linear KF to handle microarray data in order to reduce the noise level. Using the supervised learning tools introduced earlier, we tested the performance of the filtered datasets in classification setups. We also investigated whether the visualization of such filtered datasets yields a more comprehensible view of separate classes.


    2 METHODS AND DATASETS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND DATASETS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 REFERENCES
 
In this section we describe the customized KF procedure employed in this study. We also present the datasets; the algorithms employed for classification and visualization, and explain how their performances were evaluated.

2.1 The KF
The KF is based on the assumption of a continuous system that can be modelled as a normally distributed random process X, with mean Formula (the state) and variance P (the error covariance):

Formula 1(1)

The KF furthermore assumes that the output of the system can be modelled as a random process Z that is a linear function of the state Formula plus an independent, normally distributed, zero-mean white noise process V,

Formula 2(2)
where, V ~ N(0,R) and E{XV} = 0. H represents the system output matrix.

For our study we model the microarray data flow using the following simplified discrete time state-space representation of Equations (1) and (2):

Formula 3(3)

The first equation is a linear form of (1) containing the addition of an innovation process W ~ N(0,Q). Vectors wk and vk may be interpreted as the modelling error (i.e. the deviation from a mean, stem-state towards the particular biological states in question) and measurement noise, respectively, the latter comprising the previously mentioned functional and experimental variances. Note that since the state transition matrix equals the unit matrix I, as does the output matrix H, they have been omitted for simplicity. Given the models of the white noise processes W and V (Q and R, respectively) and the array measurements zk, the aim of the KF here is to estimate the state vectors Formula 3containing noise-free gene expression data.

Considering the microarray profiling process as stationary (i.e. its statistical properties remain constant over time), the Kalman iterative estimation will converge to the steady-state KF, in which case the error covariance can be computed by solving the discrete algebraic Riccati equation:

Formula 4(4)
Hence, the Kalman gain is given by:

Formula 5(5)
The above equations are greatly simplified due to the omission of the state transition and output matrices for the same reason as noted previously. Finally, the estimated expression state vector is

Formula 6(6)
where, Formula 6 is an estimate of Formula based on the previous samples.

2.1.1 Filter tuning
Given the training vector set, Formula 6can be chosen as the average of the class means, where for each class the means are computed from the member samples. We further use the training set to initialize and tune the two KF parameters, namely Q and R. To reduce the dimensionality of the problem, we performed singular value decomposition (Alter et al., 2000):

Formula 7(7)

The rows of Y are eigengenes and capture most of the variance of the original training dataset, while the columns correspond to the samples. The covariance matrix Q of the innovations can thus be obtained as the between-class covariance (i.e. the covariance of the class means with Formula 7 subtracted) evaluated on the reduced dimensionality training set Y. The measurement noise model R is estimated as a weighted form of the within-class covariance of Y (i.e. the covariance of Y with the class means subtracted). To avoid over-fitting we tune these parameters by introducing some uncertainty variance such that Q = Q + qI and R = R + rI. Our test runs led us to empirically conclude that in the case of single channel raw intensity array data (i.e. Affymetrix) q = Q11 and r = R11 are good choices for a reasonably good performance. Here the 11 index refers to the first eigengene usually considered as the offset of the microarray dataset, in which case it has a quite small variance. For expression log-ratio data (usually coming from dual channel cDNA chips) or very sparse expression matrices these parameters yield acceptable results when we choose:

Formula 8(8)

n being the number of training samples. With the tuned parameters we compute the low dimensional Kalman gain KY using Equations (4) and (5). Finally, from Equations (6) and (7), the filtered gene-expression state vector is given by:

Formula 9(9)
where, zk now spans the entire dataset, including both train and test measurements.

2.2 Classifier algorithms
The SVM classifier is one of the most popular learners, which computes a hyper plane with the largest margin between two classes (Vapnik, 1998). In our experiments the SVMLight software (Joachims, 1999) implemented in Matlab was used with a linear kernel.

The ANN classifier is the other most popular learner consisting of connected artificial neurons built in multi-layer structure (Bishop, 1995). In our study a three layer ANN was used and the number of sigmoid output neurons within the hidden layer was determined by testing. Empirically we found that the best results were obtained with 25 hidden neurons. The ANN was part of the WEKA software package (Witten and Frank, 1999).

The nearest-neighbour (1NN) classifier is a very simple and fast algorithm, which classifies a new sample by calculating the distance to the nearest training case. To measure the distance between samples, we chose the Euclidean metric. The method can be easily extended to k-neighbours (kNN), but our tests showed that increasing k did not significantly improve performance. The 1NN is typically outperformed by the previous two learners on the raw microarray data.

The RF technique is a combination of decision trees, such that each tree is grown on a bootstrap sample of the training set. For each node the split is chosen from m << M variables (where M is the total features) selected at random from an independent, identical distribution out of the feature set (Breiman, 2001). In our experiments, 20 trees were used and m was set to log(D + 1), where D is the number of inputs. The software that we used for it was part of the WEKA (Witten and Frank, 1999) package.

The algorithm parameters not mentioned above were used with their default values within the above mentioned software packages.

2.3 Datasets
We tested the Kalman filtering-classification scheme on a number of publicly available datasets, which are summarized in Table 1.


View this table:
[in this window]
[in a new window]

 
Table 1 Features of datasets

 
The leukaemia (ALL-AML) dataset of Golub et al. (1999) is a popular dataset and is often used to test binary classification algorithms. Using the original sample annotation we partitioned this dataset into three leukaemia classes. Hence the dataset consisted of T lineage acute lymphoblastic leukaemia (T-ALL), B lineage acute lymphoblastic leukaemia (B-ALL) and acute myeloid leukaemia (AML) samples. In our study we included two other leukaemia datasets: the mixed lineage leukaemia (MLL) dataset (Armstrong et al., 2002) and the paediatric acute lymphoblastic leukaemia (Leukaemia) dataset (Yeoh et al., 2002). The former consists of acute lymphoblastic leukaemia (ALL) and AML samples along with ALLs carrying a chromosomal translocation involving the MLL gene. The latter is composed of B-ALL subtypes expressing BCR-ABL, E2A-PBX1 and TEL-AML1, respectively, a hyper-diploid karyotype, as well as MLL, T-ALL and a novel leukaemia subtype.

The ‘various tumour types’ (Tumours) dataset (Ramaswamy et al., 2001) is considered a difficult dataset and consists of 14 classes of tumours: breast, prostate, lung, colorectal, lymphoma, bladder, melanoma, uterus, leukemia, renal, pancreas, ovary, mesothelioma and central nervous system tumours. The dataset (LC) of Gordon et al. (2002) contains microarray data that accounts for two distinct pathological alterations of the lung: malignant pleural mesothelioma and adenocarcinoma.

The small, round blue cell tumours (SRBCT) of childhood dataset (Khan et al., 2001) includes a training set of neuroblastoma, rhabdomyosarcoma, Burkitt lymphoma and the Ewing family of tumours samples and an independent test set that, besides the samples belonging to the training classes, also contains samples that should not be classified into any of these tumour types.

Van't Veer et al. (2002) provides a dataset (BC) consisting of samples coming from breast cancer patients that were clustered by the original authors into two classes according to the patient's response to adjuvant therapy: relapse and non-relapse.

2.4 Evaluation of classifier performance
For multi-class datasets the one-versus-rest technique was used, i.e. for every class an independent binary learner was built where the class member elements were treated as positive and the rest of the elements as negative. For each class specific learner we evaluated the so-called class accuracy. A test sample was classified to the class whose corresponding learner gives the highest score. The accuracy for the whole dataset is the ratio of the number of correctly classified samples and the total number of samples. We should mention that, in the four-class SRBCT dataset there were 25 test samples, but among the test elements there were five samples, which were not members of any of the training classes. We expected each of the class specific learners to reject these samples. The described procedure however, will necessarily assign them to the closest classes, which results in an apparent decrease of performance. Owing to these five cases, for the SRBCT dataset the mean of the class accuracies was shown. The performance evaluation was carried out among others via standard receiver operator characteristic (ROC) analysis, which is based on the ranking of the objects to be classified (Gribskov and Robinson, 1996). The analysis was performed by plotting sensitivity versus 1 – specificity at various threshold values, and the resulting curve was integrated to give an ‘area under the curve’ (AUC) value. We note that for a perfect ranking AUC = 1.0 while for a random ranking AUC = 0.5. In our experiments we calculated the AUC for each class. The ROC score for a dataset was computed as the mean of class AUCs. Here however, only the Accuracy and the ROC scores were listed in the tables. The complete analysis results can be found on the supplementary material website. The results include the following detailed learning performance measures for each class, dataset and method: F-measure, geometric mean (G-mean), break-even point, rate of true positives, false positives, true negatives and false negatives, respectively, and the elapsed time for training and testing in second. Recall and Precision measures (equivalent to specificity and sensitivity, respectively) are also shown. In addition to the rank-based ROC analysis, we evaluated the Lift, which is measure of how much better the model is compared to the random. (Piatesky-Shapiro and Steingold, 2000). We also calculated the area under the Lift curve, called Lift AUC. Note that, for a random classifier the Lift AUC is 1.0, as there is no upper bound and a higher value represents a better classifier performance. The median rate of false positives (RFP) score is the fraction of negative test samples that score as high as, or better than the median-scoring positive samples. This score was used by Jaakkola et al. (1999) in evaluating the Fischer–SVM method.

The Recall/Precision curve is similar to the ROC curve, but in this case the x-axis is the Recall and the y-axis is the Precision. A higher Recall/Precision AUC corresponds to a better classifier performance.

2.5 Feature selection and visualization
Recursive feature elimination (RFE) is a recently proposed feature selection method described in Guyon et al. (2002). The method seeks to choose the ‘best’ m features that lead to the largest margin of class separation using an SVM classifier. In this study, the RFE algorithm was used as part of the Spider package (Weston et al., 2006, http://www.kyb.tuebingen.mpg.de/bs/people/spider/main.html).

RFE was employed with a linear kernel SVM, included in the same software package. Using RFE, several numbers of features (i.e. genes) were selected and evaluated with each learning method on both original and filtered datasets. The names of the selected genes are available within the online Supplementary material. In order to visualize the original and the filtered datasets three graphical representation methods were used. The locally linear embedding (LLE) is a distance preserving non-linear mapping from the original high-dimensional space into a lower dimensional space. Using this method (Roweis and Saul, 2000) we mapped the datasets into a 2D space with the ‘number of neighbourhoods’ parameter being set to the number of samples. The datasets were also visualized using the RadViz (Brunsdon et al., 1998, http://www.agocg.ac.uk/reports/visual/casestud/brunsdon/brunsdon.pdf) algorithm where the features (i.e. the genes) are represented as anchors that are equally spaced around the unit circle and the samples are represented as points inside the unit circle. Their positions depend on the gene expression values: the higher the value for a gene, the more the anchor attracts the corresponding point. This method works best with relatively few (~3–20) features. Finally, the heat map with hierarchical clustering was also employed as a graphical representation of the expression data matrices where the values taken by a feature were represented as grayscale in a 2D map.


    3 RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND DATASETS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 REFERENCES
 
We applied the Kalman filtering on the previously described datasets and for a comparative study SVM, ANN, 1NN and RF supervised learning methods were evaluated in full gene set manner. Table 2 summarizes the Accuracy and ROC scores we obtained. Evidently, the KF definitely improves the classification results of the ANN, 1NN and RF. The SVM results were boosted in 64% of the overall scores.


View this table:
[in this window]
[in a new window]

 
Table 2 Comparison of the classification performance on the original and the Kalman filtered datasets

 
To assess the significance of filtering on microarray data classification we performed paired two sample t-tests to compare the accuracies and ROC scores of the classification procedures on the original datasets with their counterparts in the KF case. The t-statistic was applied in one-tail fashion testing against the alternative hypothesis that the mean of accuracies/ROC scores produced by a certain method on the raw datasets is less than the mean of the matched performance measures on the pre-processed datasets. Table 3 shows that with 95% confidence the KF approach significantly improves the accuracy or the ROC score. In our study we also compared the KF scheme with a different approach to multivariate filtering. The principal component analysis (PCA) based filtering consists of removing the non-significant variance components computed using the eigen-decomposition of the covariance matrix of the training set. The PCA results with SVM are shown in Table 2. As opposed to PCA the KF retains the dataset in the original gene space and is also supervised procedure from a classification point of view. This point is made clear by the P-values in Table 3. In the SVM framework, the PCA filtered datasets did not yield any improvement at a significance level of 0.05 in accuracy/ROC score compared to the original data. Using the same learning algorithm, the KF shows significant accuracy increase over the PCA technique. The advantage of such a pre-processing approach here is not just a better classification performance, but also an improved visualization capability of the data. Figure 1a depicts the original AML-ALL dataset while Figure 1b depicts the Kalman filtered dataset. The LLE representation clearly shows that the classes are more delineated with filtering than without. The heat map with a hierarchical clustering presented in Figure 2 demonstrates how effectively the noise of the features was removed by the KF technique. The standard deviation of the gene expression values was reduced in each class and the tumour groups were separated into distinct clusters.


Figure 1
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 The original (a) and the Kalman filtered (b) AML-ALL dataset visualized by LLE.

 


Figure 2
View larger version (37K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 The heat map representation of the AML-ALL dataset. The first pair shows the original dataset and the second pair shows the filtered dataset.

 


View this table:
[in this window]
[in a new window]

 
Table 3 Significance test results

 
3.1 Marker genes
A common goal in microarray data classification for diagnosis purposes is to select a minimal number of genes that could work as signatures for specific tumours. The RFE feature selection method was evaluated on the original and the Kalman filtered datasets to test whether filtering could help find more reliable subsets of such marker genes. The results we obtained, summarized in Tables 4 and 5, show that the number of Kalman filtered features necessary for a good discrimination of tumour types is smaller than the size of the raw feature set required for a similar performance. The same result is noticeable in Figure 3 where, in a three-best-feature setup, the MLL classes are well separated in the KF data but they are overlapped in the original vector set. Figure 4 shows a heat map visualization of the MLL dataset with 50 selected features. While on the train set KF obviously removes the measurement noise, which results in clearly separated tumour groups, the variance of the test set is also noticeably diminished by the filter. Note that the selected genes from the original and the filtered datasets are distinct.


Figure 3
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Visualization of the original (left side) and the Kalman filtered (right side) MLL dataset. In (a) the RadViz method was used on three genes selected by RFE and plotted on the unit circle. The same genes were used with LLE in (b).

 


Figure 4
View larger version (85K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 Heat map of the best 50 genes selected by RFE from the MLL dataset. On the Kalman filtered dataset (right) the features are less noisy and the three classes are further apart than in the original dataset (left).

 


View this table:
[in this window]
[in a new window]

 
Table 4 The accuracies obtained via SVM depending on the number of selected features

 


View this table:
[in this window]
[in a new window]

 
Table 5 The ROC scores obtained by SVM depending on the selected features via RFE

 
For the names of the selected genes and the visualizations for the datasets used in this study see the Supplementary material site. To compare the quality of features selected from the original datasets with the filtered ones, the fisher separation ratio (FSR) was used.

The FSR is a scalar which is large when the between-class covariance is large and when the within-class covariance is small. Out of the many possible choices of criterion (Bishop, 1995) our ratio was defined as Formula 9, where Formula 9 denotes the trace of a matrix and SB and SW are the between- and within-class scatter matrices, respectively (Fukunaga, 1990). Here the between-class scatter matrix is the scatter of the class mean vectors around the overall mean vector, while the within-class scatter matrix denotes the weighted average scatter of the covariance matrices of the sample vectors belonging to each class.

Table 6 lists the FSR scores for 10 features independently selected from each dataset. The significantly larger scores (P = 0.0245 obtained from a t-test, as described previously) produced by the KF features demonstrate the greater predictive power of the estimated expression data that best define the causal biological states.


View this table:
[in this window]
[in a new window]

 
Table 6 FSR on 10 features selected via RFE

 

    4 CONCLUSIONS
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND DATASETS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 REFERENCES
 
The KF is then a systemic approach to filtering, each gene's expression being estimated using the variances of all the individual features, of course assuming that many genes reflect the biological state of the sample due to the transcriptional network. Hence, it remains for further study (i.e. PCR analysis) to assess whether the selected features can also independently predict and diagnose a tumour outcome. The performance of the KF technique here depends essentially on the tuning of the covariance matrices Q and R. Our choice of parameters proved to be reasonable for classification, although an improvement based on larger training data or better tuning formulae is possible. The filtering of one dataset took only a few seconds of CPU time, hence the technique presented here is a fast and scalable method for pre-processing the microarray data. A Matlab script implementation of the filtering procedure is available on the Supplementary information website.


    Acknowledgments
 
This work was supported by grant from the Hungarian Ministry of Economy and Transport (GVOP-3.1.1-2004-05-0119/3.0). A.K. was supported by the János Bolyai fellowship of the Hungarian Academy of Sciences.

Conflict of Interest: none declared.


    FOOTNOTES
 
Associate Editor: Martin Bishop

Received on August 14, 2006; revised on October 18, 2006; accepted on October 18, 2006

    REFERENCES
 TOP
 ABSTRACT
 1 INTRODUCTION
 2 METHODS AND DATASETS
 3 RESULTS AND DISCUSSION
 4 CONCLUSIONS
 REFERENCES
 

    Alter, O., et al. (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl Acad. Sci. USA, 97, 10101–10106[Abstract/Free Full Text].

    Armstrong, S.A., et al. (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet, . 30, 41–47[CrossRef][Web of Science][Medline].

    Baldi, P. and Brunak, S. Bioinformatics: The Machine Learning Approach, (2001) , Cambridge, MA MIT Press.

    Bishop, C.M. Neural Networks for Pattern Recognition, (1995) , Oxford Clarendon Press.

    Breiman, L. (2001) Random forests. Mach. Learn, . 45, 5–32[CrossRef].

    Brown, M.P.S., et al. (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl Acad. Sci. USA, 97, 262–267[Abstract/Free Full Text].

    Brunsdon, C., et al. (1998) An investigation of methods for visualizing highly multivariate datasets. Case Studies of Visualization in The Social Sciences, 55–80.

    Cui, Q., et al. (2005) Characterizing the dynamic connectivity between genes by variable parameter regression and Kalman filtering based on temporal gene expression data. Bioinformatics, 21, 1538–1541[Abstract/Free Full Text].

    Dudoit, S., et al. (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc, . 97, 77–87[CrossRef][Web of Science].

    Fix, E. and Hodges, J. (1951) Discriminatory analysis, nonparametric discrimination: Consistency properties. Technical report, USAF School of Aviation Medicine, , Randolph Field, Texas.

    Fukunaga, K. Introduction to Statistical Pattern Recognition (2nd edn), (1990) , San Diego Academic Press.

    Golub, T.R., et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537[Abstract/Free Full Text].

    Gordon, G.J., et al. (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res, . 62, 4963–4967[Abstract/Free Full Text].

    Grewal, M.S. and Andrews, A.P. Kalman Filtering: Theory and Practice Using MATLAB, 2nd edn, (2001) , NY John Wiley & Sons.

    Gribskov, M. and Robinson, N.L. (1996) Use of Receiver Operating Characteristic (ROC) analysis to evaluate sequence matching. Comput. Chem, . 20, 25–33[CrossRef][Web of Science][Medline].

    Guyon, I., et al. (2002) Gene selection for cancer classification using support vector machines. Mach. Learn, . 46, 389–422[CrossRef].

    Hertz, J., et al. Introduction to the Theory of Neural Computation, (1991) , CA Addison-Wesley, Redwood City.

    Jaakkola, T., et al. (1999) Using the Fisher kernel method to detect remote protein homologies. Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, California AAAI Press, pp. , pp. 149–158.

    Joachims, T. (1999) Making large-scale SVM learning practical. In Schoelkopf, B., Burges, C., Smola, A. (Eds.). Advances in Kernel Methods—Support Vector Learning, , Boston, MA MIT Press.

    Kalman, R.E. (1960) A new approach to linear filtering and prediction problems. Trans. ASME-J. Basic Eng, . 82, 35–45.

    Khan, J., et al. (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med, . 7, 673–679[CrossRef][Web of Science][Medline].

    Liao, L. and Noble, W.S. (2003) Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol, . 10, 857–868[CrossRef][Web of Science][Medline].

    Murvai, J., et al. (2001) Prediction of protein functional domains from sequences using artificial neural networks. Genome Res, . 11, 1410–1417[Abstract/Free Full Text].

    Noble, W.S. (2004) Support vector machine applications in computational biology. In Schoelkopf, B., Tsuda, K., Vert, J.-P. (Eds.). Kernel methods in computational biology, , MA MIT Press Cambridge, pp. 71–92.

    Piatesky-Shapiro, G. and Steingold, S. (2000) Measuring lift quality in databases marketing. ACM SIGKDD Expl. Newslett, . 2, 76–80.

    Ramaswamy, S., et al. (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA, 98, 15149–15154[Abstract/Free Full Text].

    Remlinger, S.K. (2003) Introduction and application of random forest on high throughput screening data from drug discovery.

    Roweis, S. and Saul, L. (2000) Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323–2326[Abstract/Free Full Text].

    Rumelhart, D.E., et al. and the PDP Research Group. (1986) Learning internal representations by error propagation. In Rumelhart, D.E. and McClelland, J.L. (Eds.). Parallel Distributed Processing: Explorations In The Microstructure Of Cognition, Volume 1: Foundations, , Cambridge, MA MIT Press, pp. 318–362.

    Shi, T., et al. (2005) Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma. Mod. Pathology, 18, 547–557.

    van't Veer, L.J., et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536[CrossRef][Medline].

    Vapnik, V.N. Statistical Learning Theory, (1998) , NY John Wiley & Sons.

    Welch, G. and Bishop, G. (1995) An introduction to the Kalman filter. , Chapel Hill Technical Report TR95-041, Department of Computer Science University of North Carolina.

    Witten, I.H. and Frank, E. (1999) Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufman, San Francisco, CA.

    Weston, J., et al. (2006) The Spider 1.71.

    Yeoh, E.J., et al. (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell, 1, 133–143[CrossRef][Web of Science][Medline].

    Zien, A., et al. (2000) Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics, 16, 799–807[Abstract/Free Full Text].


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow All Versions of this Article:
22/24/3047    most recent
btl545v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kelemen, J. Z.
Right arrow Articles by Puskás, L. G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kelemen, J. Z.
Right arrow Articles by Puskás, L. G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?