Bioinformatics Advance Access originally published online on November 16, 2007
Bioinformatics 2008 24(2):258-264; doi:10.1093/bioinformatics/btm550
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Algebraic stability indicators for ranked lists in molecular profiling
1FBK, via Sommarive 18, I-38100 Povo (Trento), 2DISI, University of Genova, via Dodecaneso 35, I-16146 Genova and 3DIT, University of Trento, via Sommarive 14, I-38100 Povo (Trento), Italy
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: We propose a method for studying the stability of biomarker lists obtained from functional genomics studies. It is common to adopt resampling methods to tune and evaluate marker-based diagnostic and prognostic systems in order to prevent selection bias. Such caution promotes honest estimation of class prediction, but leads to alternative sets of solutions. In microarray studies, the difference in lists may be bewildering, also due to the presence of modules of functionally related genes. Methods for assessing stability understand the dependency of the markers on the data or on the predictor's type and help selecting solutions.
Results: A computational framework for comparing sets of ranked biomarker lists is presented. Notions and algorithms are based on concepts from permutation group theory. We introduce several algebraic indicators and metric methods for symmetric groups, including the Canberra distance, a weighted version of Spearman's footrule. We also consider distances between partial lists and an aggregation of sets of lists into an optimal list based on voting theory (Borda count). The stability indicators are applied in practical situations to several synthetic, cancer microarray and proteomics datasets. The addressed issues are predictive classification, presence of modules, comparison of alternative biomarker lists, outlier removal, control of selection bias by randomization techniques and enrichment analysis.
Availability: Supplementary Material and software are available at the address http://biodcv.fbk.eu/listspy.html
Contact: furlan{at}fbk.eu
Supplementary information: Supplementary data are available at Bioinformatics online.
| 1 INTRODUCTION |
|---|
|
|
|---|
We discuss the problem of measuring the stability of a ranked feature list in functional genomics. Stability (measure of robustness to data perturbations) has only recently arisen as a major issue when applied to feature selection and ranking processes. In fact, selecting a reliable set of biomarkers is as important as achieving predictive classification (Baker and Kramer, 2006), and the two goals cannot always be pursued together (Marchiori, 2006). The concept of stability is indeed central in machine learning and in other workfields (see Wichmann and Kamholz, 2006 for an example in linguistics). Discussions of the feature selection stability problem can be found in Davis et al. (2006) for the microarray case and in Kalousis et al. (2005) for mass spectrometry data. Many attempts ran into the same difficulty: the resulting lists of biomarkers are very unstable with the slightest perturbation of the training set producing drastically different markers. Such issue arises when using methods indicated for error assessment and model selection on high-throughput data. Designed to reduce variance and avoid selection bias, they are structured in schemes of replicated experiments on versions of the original dataset obtained by stratified resampling and partitions in training and test subsets (Ambroise and McLachlan, 2002; Molinaro et al., 2005). In a binary classification task on microarray data (e.g. disease versus control), such a scheme provides average of errors on test sets for models trained on k genes from the corresponding training sets. The no-information case (e.g. the maximal instability situation characterized by random lists) can be used as baseline by running the same analysis with randomized class labels. The effects of such schemes on feature ranking are hard to treat (Simon, 2006): in practice, if the system includes a feature ranking method, a different list of best k biomarkers will be obtained at each resampling.
The task we address in this article consists in defining a measure of similarity among the output ranked lists and devising a family of computational tools for applications. This measure reflects indeed the stability of all the steps in the profiling process by gauging the variability of feature rank due to admissible perturbations of the sample data. We will consider some specific aspects of genome-wide data. In the microarrays data case, a large number of spots are typically not relevant for the classification problem, while many others are highly correlated or possibly clones. Furthermore, some genes can be grouped together in functionally related modules. The stability methods we present in this article are designed to include feature modules. Moreover, to focus on the most informative features, similarity among partial sublists (called top-k lists) will be modeled.
The problem of comparing ranked lists is actively studied in different contexts, e.g. in voting theory or in web document retrieval, and it is emerging in bioinformatics. The methods employed to introduce similarity measures are heterogeneous. We mention harmonic and spectral analysis (Lawson et al., 2006; Saari, 2001), rank aggregations (DeConde et al., 2006; Dwork et al., 2001; Fagin et al., 2003), string theory (Cormode et al., 2001; Gusfield, 1997) and set-theory (Lottaz et al., 2006; Yang et al., 2006). Not all methods are suitable for modeling stability of biomarker lists, either because of computational inefficiency on high-throughput data (e.g. the harmonic analysis methods) or because they are aimed at handling different data types (as in the case of string algorithms for sequence matching).
We base our proposal on the algebraic theory of symmetric groups and adopt the Canberra distance as a disarray measure (see Section 2). The embedding of a distance problem into the framework of permutation group theory was first introduced in Critchlow (1985) and then refined in Diaconis (1988). Once the distance is defined, we consider the symmetric matrix of all mutual distances between pairs of lists and derive stability indicators from its distribution.
We naturally derive the definition of a distance measure among partial lists from the one on global lists. The group theoretical framework for the partial lists (Critchlow, 1985; Fagin et al., 2003) is then used to model a higher stability weight for top-ranked features in lists and to hatch the presence of irrelevant features. We then generalize the approach to reduce the effect of rank switches between functionally related genes. We test the new indicators on different datasets, synthetic, microarrays and proteomic mass spectra (see Sections 3–4).
The proposed indicators can be applied to other contexts in which ranked lists are commonly used, such as information retrieval systems or for enrichment analysis (see Supplementary Material). In Section 4 we test our indicators in combination with GSEA (Subramanian et al., 2005), one of the most used application for gene-set enrichment analysis. Details on theory, software and additional examples are given in Supplementary Material.
| 2 SYSTEM AND METHODS |
|---|
|
|
|---|
2.1 Notation
Let D be a dataset consisting of n samples described by a set
p
1000 after preprocessing. With B, we indicate the number of replicated experiments required by complete validation of profiling. They consist of instances of classification and feature ranking: we also call them runs. At least B = 100 runs are used in the experiments described in Section 4. At each replicate i = 1 ... B, we assume that a ranking process sorts the features according to their importance in building the i-th classifier and an ordered list Li is produced. Let
i( j ) be the rank (position) of feature Fj in Li: we call dual list of Li the permutation
|
2.2 Extraction and position numbers
Within this framework, some basic information can be inferred by counting on
|
|
|
|
|
|
For a given k, the numbers ek( j ) and ak( j ) induce a ranking of the features. We can use it as a first stability criterion: high ek and low ak indicate features extracted often in top positions, an element of stability for the study described by the set of lists
.
2.3 The optimal list
We combine the definitions in Subsection 2.2 and propose a simple solution to encode the ranking information coming from all the lists in
into a single optimal list (possibly not in
).
For each k, consider as the ranking criterion the extraction number ek( j ) in decreasing order, and, when ties occur, the position number ak( j ) in increasing order. The criterion defines a dual list
, which correspond to a list of features called the optimal top-k list of
. When k = p, the complete lists are considered and trivially it is ep( j ) = B for all Fj; thus
is determined only by the ap( j ) average values.
The optimal list definition can be linked to the Borda count, an algorithm for optimally merging a set of lists that is well known in voting theory (Borda, 1781; Saari, 2001). Given a set of B ranked lists on p candidates, the Borda count associates to each candidate Fj a score s ( j ) given by the total number of candidates with higher position over all lists. The Borda optimal list is then derived by ranking candidates with higher scores. In Supplementary Material 2, it is shown that s ( j ) = B(p – ap ( j )), i.e. ranking according to such decreasing score is equivalent to ranking for increasing average position.
2.4 The union number
For each k, a first natural measure of disarray for the set
is given by the number
k of different features occurring in all the B top-k lists. This quantity, called the union number, can be formally defined as follows:
|
|
k(
min{p,kB} for each k; obviously
p(
k we derive the union number indicator defined as the sequence
|
| (1) |
k, the less stable is
k
p}. Note that a similar idea was introduced in Yang et al. (2006), where (a function of) the number of common genes in the top-k and the bottom-k sublists of differential gene expression lists was considered to account for strongly up-regulated and bottom-regulated genes at the opposite ends of the lists. Here, the ranking is derived from classification and thus the most discriminating features are expected to stay at the top. The above defined ek, ak and IU provide a first estimate of variability in a set of lists; however, if a finer analysis is required, more complex methods need to be employed. Metrics are proposed in Subsection 2.5. Moreover, if intersection is used instead of union, an indicator keeping track of the number of genes common to all top-k sublists is defined for each k. Intersection is commonly used, but it is less informative than union for small values of k (consider for instance a set of all but one identical lists).
2.5 Canberra distance
Metrics between ranked lists require an additional condition called right-invariance (Critchlow, 1985) to make it independent from a relabeling (see Supplementary Material 1). Among the classic statistical distances (Fig. 2), the Euclidean distance is not right-invariant and thus not a metric for Sp. Admissible metrics on Sp are the Spearman's footrule, the Spearman's rho and the Kendall's tau. A variation of the Spearman's rank correlation measure was employed in Kalousis et al. (2005) to study list stability. As a second constraint specific to list comparison, we require that variations in lower portions of the lists should be less relevant than those in the top. A natural solution is to consider weighting factors. In particular, we develop our theory on a weighted version of Spearman's footrule, e.g. the Canberra distance:
|
|
|
|
|
We provide the expected (average) value of the Canberra metric on Sn in Theorem 1 (Supplementary Material 5). It can be approximated up to terms o(1) as
|
| (2) |
104 it is about 10 – 3. The result in Equation (2) is used to characterize a set of ranked lists in the maximal instability case.
2.6 List distance indicators
Given a set of lists
of p features and an integer k
p, the computation of all mutual top-k distances leads to the construction of a distance matrix Mk
(B x B,
+ ); let µk be the mean of all its
non-trivial values:
|
| (3) |
|
| (4) |
|
| (5) |
We remark that the indicators defined in this section are independent from the particular classifier and feature ranking algorithm, since only the set of lists is used as the input for the indicator, regardless of the ranking method. Finally, the pointwise ratios of IU and
with their no-information curves [
k = p and
, respectively] are independent from p and k. The resulting normalized indicators ÎU and
can be then used to compare experiments on different datasets (see Supplementary Material 6 for details).
2.6.1 Feature modules
It can be relevant to take into account the presence of mutually related groups of features (modules) and reduce the penalization in case of a rank variation within a module than other permutations. The interest for this additional constraint may be due to biological background or to technical reasons. In case of microarray data, it is often the case that highly correlated genes may be functionally related or just swapped for geometrical reasons (due to the nature of the employed classifier) during the ranking process. The constraint can be implemented by making the distance independent from the ordering of the features inside the module: the mathematical formulation is detailed in Supplementary Material 7. Defining one or more feature modules will take care of the intermodule instability when computing the stability indicator. Therefore, the result is always a curve describing a more stable situation than in the corresponding case without modules. Thus, the introduction of this constraint allows the computation of a more realistic value of the stability indicator, as evidenced in the experiments in Subsections 4.1 and 4.3.
| 3 DATA DESCRIPTION |
|---|
|
|
|---|
3.1 Synthetic data
A family of synthetic datasets f X/Y has been created to simulate a binary classification problem on N samples described by Y features, of which X are discriminant. The features are modeled from the Gaussian
), with mean µ fixed according to the sample label for the discriminant features and sampled from the uniform
i, j), where mi, j and
i, j are defined as follows: |
|
|
|
3.2 Proteomics ovarian cancer data
We considered a proteomic pattern dataset, produced at Keck Laboratory with a Micromass MALDI-L/R instrument on 77 controls and 93 ovarian cancer samples, as described and used in Wu et al. (2006). Here, we consider the linear analyzer data in the region of 3450–28000 Da; by preprocessing all spectra with the methods described in Barla et al. (2006), 123 peaks were identified and intensities at these peaks were used as features.
3.3 Microarray breast cancer data
A microarray dataset of samples from patients with invasive breast carcinoma (Sotiriou et al., 2006) was also considered for studying stability of gene signatures and of the corresponding enriched ranked lists. The platform is the Affymetrix U133A GeneChips; after preprocessing, the data matrix includes 183 cases described by 22 215 gene expression values. The profiling of Estrogen Receptor (ER) status is the task, given 149 ER+ and 34 ER– cases.
3.4 Supplementary data
In the Supplementary Material, two additional microarray datasets are used: the Cardiogenomics Mouse Model of Myocardial Infarction (MI: 37 samples, 12 488 genes) and the Huntington's Disease Data (HD: 31 samples, 5186 genes). The given task, on both datasets, is gene profiling for discriminating disease and control samples.
| 4 RESULTS |
|---|
|
|
|---|
In this section, we compute the list distance indicators on one synthetic and four high-throughput real molecular profiling tasks described in Section 3 to demonstrate the behavior of the proposed indicators in a set of practical situations (predictive classification, presence of modules, comparison of alternative biomarker lists and gene enrichment). We also provide a few indications for an effective usage of the indicator curves as supporting tools for choices among classifiers and feature selection algorithms. Further experiments are shown in Supplementary Material 8–10.
All experiments are carried out in a complete validation methodology aimed at avoiding the selection bias effect (Ambroise and McLachlan, 2002; Furlanello et al., 2003). In brief, B different training-test splits of the original dataset are prepared and, for each instance, a feature selection process is performed on the training part. The resulting ranked list is used to build classification models of different feature set sizes and accuracy is evaluated on the left-out set of test samples. Global accuracy on the whole dataset is obtained by averaging over the B runs. Different types of support vector machine (SVM) classifiers and of recursive feature elimination (RFE) procedures were used with metaparameters chosen by cross-validation or bootstrap.
4.1 Predictive profiling on synthetic data
In this experiment, we study the basic properties of the list indicators in the classification of datasets f 10/100 and f 30/100. In both tasks, we consider B = 100 runs, a linear SVM classifier with regularizer C = 100, and single-step RFE as the feature ranking algorithm. On both datasets less than five features are sufficient to reach perfect classification. In Table 1, features are ranked in terms of the extraction (ek) and position (ak) numbers (see Subsection 2.2).
|
We compare the results for the top-k sublist and the complete list in both tasks. All discriminative features of the f 10/100 dataset are the best ranked on the top-9 sublist and on the complete lists; 7 out of 10 features occur in all the top-10 lists, i.e. they have ek = 100. For the f 30/100 dataset, the profiling process selects discriminant features at the top of the list both for the partial top-10 and for the complete case, with high ek.
In general, a large number of features with high ek indicate that many genes are consistently among the best ranked throughout the whole list set, thus forming a subset of well discriminating features for the employed classifier. When considering complete lists, the above consideration still holds for features with low values of ak.
In Figure 3, the curves defined by the two indicator on f 10/100 and f 30/100 are displayed for increasing top-k list sizes. Both for IU and
, the curves show a sudden change of slope near the abscissa value X (the number of discriminant features in f X/Y). Note that the union number indicator attains the upper bound Y soon after the critical value, while the mean indicator does not saturate. The latter may thus be used for finer analyses, such as the comparison of lists produced by different classifiers. As shown in Figure 3b, for the complete lists (p = 100), the mean indicator values are
for f 10/100, and
for f 30/100; both values are much smaller than the
, discussed in the consistency experiment Supplementary Material 8. Difference in stability with respect to the baseline case can be used to detect no-information cases and compare methods.
|
We therefore consider the response of the mean stability indicator
|
4.2 High-throughput data applications
4.2.1 Comparing classifiers
The stability indicators can be used to compare list sets produced by different profiling methods. Here, we consider the effect of classifiers and of ranking procedures on lists of features for the proteomics ovarian cancer dataset (see Subsection 3.2). A stability analysis is applied to an experiment introduced in Merler et al. (2007), where different SVM and RFE methods are studied. In particular, we compare linear SVM and the Terminated Ramp kernel (TR-SVM), coupled with RFE (ranking recomputed at each step) and 1RFE (ranking computed only once). Three profiling studies were performed in complete validation with B = 400 runs, with regularizer C = 1000 and C = 1 for the linear SVM and TR-SVM, respectively. The results are displayed in Figure 5. The average test error (ATE) computed by complete validation is not far from the no-information error rate (45.29%) for this problem. TR-SVM with 1RFE is more accurate, confirming that choosing a suitable kernel is a good strategy in hard classification tasks. The new list stability indicators are also computed. As a baseline for stability, from Equation (5) we consider the expected values for
|
Stability with TR-SVM is consistently better than with SVM, either with the 1RFE or the full RFE versions. Moreover, the IU reaches its maximum value at a much slower rate with the more complex kernel. The analysis indicates that the TR-SVM method is providing a set of more accurate and stable candidate peaks in this profiling task.
4.2.2 Accuracy and stability
Although they share a common trend in many cases, accuracy and stability are independent measures. In easy classification tasks, alternative methods can give negligible gains in accuracy but more stable lists. As an example, the Breast Cancer dataset in Subsection 3.3 is analyzed with a linear SVM (C = 0.1) with Entropy based RFE (E–RFE; reverting to RFE for the last 100 steps) as in Furlanello et al. (2003), and with a TR-SVM (C = 0.01) with 1RFE. With both methods, the mean indicator is distant from the no-information curve
defined by Equation (5), whose value on the complete list is about 8581.
As shown by Figure 6, the resulting ATE curves are quite similar for the two methods but the improvement obtained by the TR-SVM classifier in terms of list stability is relevant even for small k.
|
4.2.3 A diagnostic plot
We compare simultaneously the five experiments above in terms of accuracy and normalized stability in the ATE versus
|
The definition of
4.3 Applications to enrichment
For enrichment analysis, we considered the GSEA algorithm, using the Normalized Enrichment Scores (NES) as the ranking value. We applied the stability indicator to enriched ranked lists of gene sets as computed with GSEA from ranked lists of genes. To prove the consistency of the procedure, we randomly extracted a set G of ng genes included in the gene sets MSigDB collection, and constructed the set
of l random permutations Li of G. Then, we tested
for enrichment against a set GS of
gene sets from those in the entire collection in MSigDB including at least one element of G. Thus, we obtained a set ![]()
of l ranked gene sets. For this experiment, we set ng = 1000, l = 100 and
.
As shown in Figure 8a, both the obtained ranked lists and the enriched lists are highly unstable (
), also after factoring out the gene sets as feature modules and with different weighting parameters p as in Subramanian et al. (2005) (correlation value for the gene ranked j-th fixed to
). Then we constrained the admissible permutations of genes so that positions s,t in Li satisfies ||s – t||
z, and compared to the resulting sets of ranked enriched lists (z = 20 and same parameters as above). The constrain produces more stable gene lists, and consistently stable enriched lists (Fig. 8b).
|
Finally, in the same setup, we considered gene set invariant permutations, i.e. those exchanging genes within the same gene set. Here, we used ng = 100 and the ngs = 124 gene sets sharing at least two gene with the ranked gene lists. Results are shown in Figure 9. While the permuted gene lists are unstable (
|
In Supplementary Material 12, the stability analysis is applied to enrichment gene sets derived from the Breast Cancer dataset (see Section 3.3).
| 5 CONCLUSIONS |
|---|
|
|
|---|
We introduced a new framework of algebraic indicators to assess the stability of biomarker lists. The more refined indicators come from an adapted metric theory on permutation groups, and they can be used in diverse high-throughput studies. The indicators allow comparisons of list sets produced by different classification/ranking systems, also on different datasets, thus adding stability as a measure of interest for genomic signatures in addition to accuracy.
Results on synthetic data consistently explain test cases, and those on high-throughput data show applicability in current practice. An open source software implementation of the stability indicator is made available and it has adequate efficiency for practical tasks: in fact, computing time on a PentiumD workstation for the analysis of about 80 values of the indicator for 400 lists of 20 000 features (we recall that the algorithm is independent from sample size) amounts to
1 h. A summary of computing performance for different list sizes can be found in Table 6 in Supplementary Material 13.
The approach can further accommodate bioinformatic knowledge on feature modules. Known relations among genes (e.g. function groups as defined in GO or KEGG or explicit correlation effects) can be injected in the algorithm by suitably defining one or more feature modules. In such cases, the rank changes within one module do not contribute to the stability indicator. This approach can be generalized by introducing weighting schemas for module-specific costs of rank change.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Part of the method was studied in two B.Sc. theses by S. Maragnoli and A. Peretti. We thank A. Caranti for hints on the Borda count and the harmonic analysis methods, C. Joachim, E. Manduchi and C. Stoeckert for reading earlier versions of the article and providing useful suggestions. Research supported by AIRC.
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Limsoon Wong
Received on May 7, 2007; revised on October 12, 2007; accepted on October 31, 2007
| REFERENCES |
|---|
|
|
|---|
Ambroise C, McLachlan G. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl Acad. Sci. USA (2002) 99:6562–6566.
Baker S, Kramer B. Identifying genes that contribute most to good classification in microarrays. BMC Bioinformatics (2006) 7:407.[CrossRef][Medline]
Barla A, et al. Proteome profiling without selection bias. In: Proceedings of IEEE-CBMS 2006 (2006) 941–946.
Borda J. Mémoire sur les élections au scrutin. Histoire de l'Académie Royale des Sciences (1781) Année MDCCLXXXI.
Cormode G, et al. Permutation editing and matching via embeddings. In: Proceedings of ICALP 01 (2001) Springer. 481–492.
Critchlow D. Metric Methods for Analyzing Partially Ranked Data. LNS 34 (1985) Springer.
Davis C, et al. Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics (2006) 22:2356–2363.
DeConde R, et al. Combining results of microarray experiments: a rank aggregation approach. Stat. Appl. Genet. Mol. Biol. (2006) 5. Article 15.
Diaconis P. Group representations in probability and statistics. Institute of Mathematical Statistics Lecture Notes – Monograph Series 11 (1988) IMS.
Dwork C, et al. Rank aggregation for the web. In: Proceedings of IWWWCC-WWW10 (2001) 613–622.
Fagin R, et al. Comparing top-k lists. SIAM J. Discrete Math. (2003) 17:134–160.[CrossRef]
Furlanello C, et al. Entropy-based gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinformatics (2003) 4:54.[CrossRef][Medline]
Gusfield D. Algorithms on Strings, Trees and Sequences. CUP (1997).
Kalousis A, et al. Stability of feature selection algorithms. In: Proceedings of IEEE-ICDM 05 (2005) 218–225.
Lawson B, et al. Spectral analysis of the supreme court. Math. Mag. (2006) 79:340–346.
Lottaz C, et al. OrderedList – a Bioconductor package for detecting similarity in ordered gene lists. Bioinformatics (2006) 22:2315–2316.
Marchiori E. Feature selection, SVM-based classification and application to mass spectrometry data analysis. Bioinformatics Data Analysis and Tools. Lecture Notes (2006).
Merler S, et al. Deriving the kernel from training data. Proceedings MCS 2007, LNCS (2007) 4472. Springer. 72–81.
Molinaro A, et al. Prediction error estimation: a comparison of resampling methods. Bioinformatics (2005) 21:3301–3307.
Saari D. Chaotic Elections! A Mathematician Looks at Voting (2001) AMS.
Simon R. Development and evaluation of therapeutically relevant predictive classifiers using gene expression profiling. J. Natl Cancer Inst. (2006) 98:1169–1171.
Sotiriou C, et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J. Natl Cancer Inst. (2006) 98:262–272.
Subramanian A, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA (2005) 102:15545–15550.
Wichmann S, Kamholz D. A stability metric for typological features. Sprachtypologie und Universalienforschung (2006) in press.
Wu B, et al. Ovarian cancer classification based on mass spectrometry analysis of sera. Cancer Inform. (2006) 2:123–132.
Yang X, et al. Similarities of ordered gene lists. J. Bioinform. Comput. Biol. (2006) 4:693–708.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
A.-L. Boulesteix and M. Slawski Stability and aggregation of ranked gene lists Brief Bioinform, September 1, 2009; 10(5): 556 - 568. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Barla, G. Jurman, S. Riccadonna, S. Merler, M. Chierici, and C. Furlanello Machine learning methods for predictive proteomics Brief Bioinform, March 1, 2008; 9(2): 119 - 128. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









