Bioinformatics Advance Access originally published online on September 25, 2007
Bioinformatics 2007 23(20):2708-2715; doi:10.1093/bioinformatics/btm414
Clustering by soft-constraint affinity propagation: applications to gene-expression data
Institute for Scientific Interchange, Viale Settimio Severo 65, Villa Gualino, I-10133 Torino, Italy
*To whom correspondence should be addressed.
| ABSTRACT |
|---|
|
|
|---|
Motivation: Similarity-measure-based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck (2007a). In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, e.g. in analyzing gene expression data.
Results: This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new a priori free parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster.
Contact: leone{at}isi.it, sumedha{at}isi.it and weigt{at}isi.it
| 1 INTRODUCTION |
|---|
|
|
|---|
Clustering based on a measure of similarity is a crucial problem which appears throughout scientific data analysis. For an overview see Jain et al. (1999). Recently, a powerful algorithm called Affinity Propagation (AP) based on message passing was proposed by Frey and Dueck (2007a). As reported impressively in the original publication, this algorithm achieves a considerable improvement over standard clustering methods like K-means (MacQueen, 1967), spectral clustering (Shi and Malik, 2000) and super-paramagnetic clustering (Blatt et al., 1996; Shental et al., 2003).
Based on an ad hoc pairwise similarity function between data points, AP seeks to identify each cluster by one of its elements, the so-called exemplar. Each point in the cluster refers to this exemplar, each exemplar is required to refer to itself as a self-exemplar. This hard constraint forces clusters to appear as stars of radius one: there is only one central node, and all other nodes are directly connected to it. Subject to this constraint, AP seeks at maximizing the overall similarity of all data points to their exemplars. The solution to this hard combinatorial task is approximated following the ideas of belief-propagation (Kschischang, 2001; Yedidia et al., 2005). One of the important points of AP is its computational efficiency: whereas a naive implementation of belief propagation for N data points leads to O(N3) messages which have to be determined self-consistently, the elegant formulation of Frey and Dueck allows to work with O(N2) messages only. Therefore, the algorithm is feasible even in the presence of very large data sets.
Albeit its impressive power in a wide range of applications (Frey and Dueck, 2007a), AP in its present form suffers from a number of drawbacks. The most important ones related to the present work are:
- The hard constraint in AP relies strongly on cluster-shape regularity: elongated or irregular multi-dimensional data might have more than one simple cluster center. AP may force division of single clusters into separate ones.
- Since all data points in a cluster must point to the same exemplar, all information about both the internal structure and the hierarchical merging/dissociation of clusters is lost.
- AP has robustness limitations: a small perturbation of similarities may influence the choice of one or few exemplars, and the hard constraint may trigger an avalanche leading to a different partitioning of the data set into clusters. This point is particularly important in the presence of noise in the data as, e.g. in microarray measurements.
- AP forces each exemplar to point to itself. A relaxation of the hard constraint may allow for cluster structures including second- or higher-order pointing processes.
These problems may be solved by modifying the original optimization task of AP. As a first step, we relax the hard constraint by introducing a finite penalty term for each constraint violation. This softening can be chosen in a way that the computational complexity of the algorithm remains unchanged, but its performance on biological test sets is improved considerably. Moreover, relaxing the constraint helps in gaining valuable insight into the hierarchical structure of the clustering, increasing result robustness at the same time. By tuning the cluster number we see the merging of two clusters into a single one, or the dissociation of single clusters into two separated ones.
| 2 THE ALGORITHM |
|---|
|
|
|---|
Given a set
) between the pairs xµ,x
,µ,
{1,...,N}. The choice of the similarity measure is heuristic, it depends on the nature of data to be clustered. In the case of high-dimensional data as present in gene-expression analysis, similarity may be measured by the Pearson correlation coefficient or the negative pairwise Euclidean distance. However, for the algorithm described below it is not even necessary that the similarities are symmetric.
The algorithm searches for a mapping
which maps each data point µ to its exemplar cµ which itself is a data point. This mapping shall minimize the cost function (or energy)
which equals minus the overall similarity of the data points to their exemplars. In the original AP algorithm c is restricted by N hard constraints: whenever a data point is selected as an exemplar by another data point, it has to be its own self-exemplar.
In this setting, we need to specify self-similarities S(µ,µ). They describe the availability of data points for being self-exemplars (and thus to serve as a cluster center). Since all data points are a priori equally suitable to play such a role, one is naturally led to choose all self-similarities to have some common value
. The free model parameter
acts like a chemical potential in statistical physics, setting the prior likelihood of the number of self-exemplars (and of separated clusters consequently). For very small value of
, every data point prefers to be its own exemplar, and the number of clusters equals the number of data points. In the opposite extreme case of large
, self-exemplars have high cost in E1[c]. All data points are collected in one large cluster with a single exemplar. For intermediate values
acts as a tuning parameter for the cluster number which decreases monotonously with
. Frey and Dueck argue that, if the data set has some underlying structure, the correct clustering can be identified by a comparably broad range of values of the self-similarity in which the inferred cluster structure does not change. If data are not sparse and clusters are symmetrically shaped, then affinity propagation works very well and produces the correct clustering in a very short time.
Finding the cost minimum of E1[c] subject to the self-exemplar constraint is a computationally hard task. It can be achieved exactly only for very small systems. The central idea of AP is therefore to identify the exemplars using message passing (belief propagation, BP) as a heuristic strategy (Yedidia et al., 2005): real-valued messages between pairs of data points are updated recursively until a stable clustering emerges. The original AP equations are a direct application of BP [or, equivalently, Max-Sum (Kschischang et al., 2001)] to the clustering problem.
There are two types of messages exchanged between data points (Frey and Dueck, 2007a): the responsibility r(µ,
) is sent from data point µ to candidate exemplar
; it reflects the accumulated evidence that µ chooses
as its cluster exemplar. The availability a(µ,
) is sent from candidate exemplar
to datum µ; it reflects the appropriateness for
to be an exemplar for µ as a result of the self-exemplar constraint. As mentioned before, the original AP imposes constraints on exemplars to be self-exemplars. We modify the algorithm of Frey and Dueck by softening this hard constraint. We write the constraint attached to a given data point µ as follows, with p
[0,1]:
|
| (1) |
, without being a self-exemplar. The hard-constraint limit of Frey and Dueck is recovered by setting p to zero. For p = 1,
Finally, we introduce a positive real-valued parameter ß weighing the relative importance of the cost minimization with respect to the constraints. In a statistical-physics perspective, this parameter can be seen as a formal inverse temperature. Its introduction allows us to define the probability of an arbitrary clustering c as
|
| (2) |
|
| (3) |
can be determined component-wise by maximizing the marginal probabilities,
|
| (4) |
2.1 The SCAP equations
In the limit
, Equation (2) becomes concentrated to the true cost minima. Looking at Equation (3) it becomes obvious that p has to scale as
in order to have some non-trivial effect. In this limit, we find the SCAP equations (with
, see Section 4):
|
| (5) |
|
| (6) |
In practice, this means that variables are discouraged to be self-exemplars beyond a given threshold, even in the case someone is already pointing at them. The resulting clustering is more stable and obviously allows for a hierarchical structure where
can point to µ that can point to
, etc. Also loops are possible. In most of the tests performed (both on artificial and biological cancer data) clusters are almost tree-like besides a dimer.
2.2 Efficient implementation of the algorithm
The iterative solution of Equations (5) can be implemented in the following way:
- Define the similarity S(µ,
) for each set of data points. Choose the values of the self-similarity
and of the constraint strength
. Initialize all a(µ,
) = r(µ,
) = 0
- For all
, first update the N responsibilities r(µ,
) and then the N availabilities a(
,µ), using Equations (5).
- Identify the exemplars cµ by looking at the maximum value of r(µ,
) + a(µ,
) for given µ, according to Equation (6).
- Repeat steps 2–3 till there is no change in exemplars for a large number of iterations (we used 10–100 iterations). If not converged after Tmax iterations (typically 100–1000), stop the algorithm.
Three notes are necessary at this point:
- Step 3 is formulated as a sequential update: for each data point µ, all outgoing responsibilities and then all incoming availabilities are updated before moving to the next data point. In numerical experiments this was found to converge faster and in a larger parameter range than the damped parallel update suggested by Frey and Dueck (2007a). Dependence of the result on initial conditions was not observed.
- The naive implementation of the update Equations (5) requires 2N2 updates, each one of computational complexity O(N). A factor N can be gained by first computing the unrestricted max and sum once for a given µ, and then implying the restriction only inside the internal loop over
. Like this, the total complexity of a global update is O(N2) and thus feasible even for very large data sets.
- Belief propagation on loopy graphs is not guaranteed to converge. We observe, however, efficient convergence of the sequential update over wide parameter ranges. To handle the possibility of non-convergence, we have introduced a cutoff in the number of iterations. If this is reached, the algorithm stops, and the actual parameter combination is discarded.
2.3 Extracting cluster signatures
In many clustering tasks input data consist of high-dimensional vectors, a specific example being genome-wide microarrays. Frequently only few components of these vectors carry useful information about the cluster structure, extracting such cluster signatures is of crucial importance in understanding the mechanisms behind the cluster structure.
In the following, we will use the specific case of microarray data. Therefore we use the notion gene for a component of the input vector, even if at this stage the discussion is still general. The total number of genes is denoted by G. We propose a simple measure of the influence of single genes on the total similarity measure of a cluster, as compared to random choices of the exemplar selection c. For simplicity, we assume the similarity between data points
and
to be additive in single-gene contributions
|
| (7) |
Having found a clustering given by the exemplar selection c, we can calculate the similarity of a cluster C defined as a connected component of the directed graph given by c. It is given by
|
| (8) |
|
| (9) |
|
| (10) |
|
| (11) |
|
| (12) |
| 3 APPLICATION TO BIOLOGICAL DATA |
|---|
|
|
|---|
3.1 Iris data
The data consist of measurements of sepal length, sepal width, petal length and petal width, performed for 150 flowers, chosen from three species of the flower Iris. It is a benchmark problem for clustering (Duda and Hart, 1973). Super-paramagnetic clustering is able to cluster 125 of the data points correctly, leaving 25 points unclustered (Blatt et al., 1996).
When we apply AP on Iris data, we identify three clusters making 16 errors. With SCAP, we identify them with just nine errors. We use the Manhattan distance measure for the similarity function, i.e.
.
We saw that the species Iris Setosa separates without any errors. On increasing the value of
, the Iris Setosa cluster stays intact and the clusters for Versicolor and Virginica merge with each other, reflecting the fact that they are closer to each other than to Setosa. The errors occur because some samples from these species were closer to samples from other species than to their own.
3.2 Brain cancer data
We used a test data set monitoring the expression levels of more than 7000 genes for 42 patients, which were previously correctly classified into five diagnosis types by an a posteriori assessment method (Pomeroy et al., 2002) (10 medulloblastoma, 10 malignant glioma, 10 atypical teratoid/rhabdoid tumors, 4 normal cerebella and 8 primitive neuroectodermal tumors – PNET). Each array was filtered, log-normalized to mean zero and variance one, resulting in G = 6010 genes. Due to this choice Pearson correlation and negative square Euclidean distance are equivalent. The diagnosis information was not used during clustering, but only for checking the algorithmic outcome.
3.2.1 Imposing five clusters in AP and SCAP
Since we knew that the correct clustering was to identify five different patterns, our first approach was to tune
and
in order to get five clusters. First, we fixed
to infinity (original AP) and changed
finding around 
120 the desired number of five clusters with eight errors. The error was calculated a posteriori by counting every data point which referring to an exemplar of a different diagnosis. Next we fixed
to a sufficiently large value (the result becomes insensitive on
once the latter takes large values), and we changed
. In this case, for
we got six clusters with eight errors, for
four clusters with again eight errors. Five clusters were not found to correspond to any extended
-region. Note that in both cases all errors occur in the last cluster: samples supposed to take diagnosis 5 (PNET) rarely find an exemplar of the same class. Instead they distribute over the other four diagnoses.
3.2.2 Clustering with AP
Then, instead of fixing the number of clusters, we changed
continuously for
. We counted the number of clusters and of errors as a function of
, see Figure 1. The algorithm ground state (configuration of maximum marginals values) in the limit of
is a single cluster.
|
The first non-trivial clustering occurs when the number of clusters remain unchanged for a stable range of
values. In this preliminary study, we took that to be the actual predicted data clustering. Hence, by looking at Figure 1, we would conclude that there are three well-distinguishable clusters in the present data set. Look, however, to the number of errors: it is found to be 14–15 in this range, basically due to the wrong assignment of two entire classes to only three exemplars. Four or five clusters can be imposed and lead to lower error values, but require fine-tuning of
.
3.2.3 Clustering with SCAP
We then fix
to be very large and change only
. For
we start with seven clusters, this number decreases rapidly as
increases, see Figure 2. As before, the point at which the number of clusters is robust against changes in
was taken as the best SCAP clustering. From Figure 2 we conclude that SCAP identifies four clusters. The number of errors in classification is eight.
|
Right from
|
Note that SCAP also provides information about the internal organization inside the clusters. We find, e.g. that the misclassified patterns are always peripherical cluster elements. No other data point refers to them. This information is lost in AP. Due to the hard constraint all points belonging to the same cluster refer to the same exemplar, and information about the internal cluster structure is not contained in c
. A graphical representation of the cluster structure in this case is contained in the Supplementary Material. In Pomeroy et al. (2002), data were clusterized using hierarchical clustering. Even if the overall cluster structure is similar to the one we found, there is no clear-cut clustering into 4–5 classes, some arrays (well-clustered with SCAP) were only added at very late stages of hierarchical clustering. The global nature of SCAP leads to a better clustering performance than the local and greedy hierarchical clustering.
Another interesting point comes from the comparison of our clustering results with the supervised classification results of Dettling (2004). There, a number of state-of-the-art classification algorithms are applied, with training sets containing 2/3, test sets containing 1/3 of the data points. Dettling finds that the minimal generalization error made is 23.8%, corresponding to ca. ten errors on a data set of cardinality 42. It is interesting to note that SCAP in the clustering corresponding to four clusters makes only eight errors. Note that training in Dettling (2004) is done on a subset of patterns, but supervision in this case seems to add no valuable information to the unsupervised clustering results.
Last but not least, we use the procedure described above to extract cluster signatures in the most stable case of four clusters depicted in Figure 3. The lists of the highest ranking genes together with their relevance value Ii(C) is given in the Supplementary Material. The number of statistically relevant genes (we consider a threshold Ii(C)
3) depends on the cluster and is largest for the normal tissue (42 genes), it is much smaller in particular for the first cluster (ca. 4 genes). If we take the first 15–25 genes per cluster, i.e an overall signature of 60–100 genes, we already find basically the same clustering as before, only two new errors of previously well-assigned patterns appear. At gene signatures 120–240, only one of these errors survives. We therefore find that the signature found in this way carries most of the information needed for the clustering. Note also that, due to the fact that in an unsupervised way we did not separate the fifth diagnosis type into a single cluster, we do not have by definition a cluster signature for this cancer type.
3.3 Other benchmark cancer data
3.3.1 Lymphoma cancer data
We used a data set of 62 patients for 4026 genes, showing three different diagnosis (Alizadeh et al., 2000). In the limit of
going to infinity, we find the first non-trivial clustering for
between 150 and 250. In this regime AP group data into three sets, making with three errors. For very high
and varying
, the three-groups clustering becomes more stable and robust, while the algorithm makes just one assignment prediction error. In this case, Dettling finds a minimal generalization error of 0.95%, corresponding to less than one error in 62 patterns. Supervision adds some information, even if clustering itself makes only one error.
3.3.2 SRBCT cancer data
This set has 63 samples with 2308 genes and 4 expression diagnosis patterns (Kahn et al., 2001). For
going to infinity, the best tuning-robust estimates groups cluster data into 5 clusters making as many as 22 errors. On the other hand, with finite
, SCAP finds a regime of four clusters, making only seven assignment errors. Here, Dettling reports only 1.24% generalization error in supervised classification, corresponding to less than one error on 63 patterns. Classification thus performs considerably better than clustering alone.
3.3.3 Leukemia
This set has 72 samples with 3571 genes and 2 diagnoses (Golub et al., 1999). In the case of infinite
, the original AP groups data into two clusters with four errors, while for variable
(fixing
very large) SCAP finds two clusters with two errors. Also classification leads to 2.5% of errors, a result which is slightly better than our clustering result.
| 4 METHODS |
|---|
|
|
|---|
In the process of choosing exemplars, we need to calculate marginals
|
| (13) |
Looking at Figures 4 and 5, BP computes beliefs
(c) for the marginal probabilities as products of messages
coming from each compatibility constraint, times the local prior computed as the exponential of the similarity between point µ and its putative exemplar c. Up to overall normalization, we write:
|
| (14) |
alone forces µ to select exemplar c. It can be calculated via the following self-consistent equations
|
| (15) |
|
| (16) |
were absent in Equation (2). These probabilities are called cavity probabilities, because the disregarding of one data point/constraint effectively carves a cavity in the original factor graph. The link direction of functions As and Bs is shown if Figure 4 together with the problem's; factor graph. Figure 5 shows a pictorial representation of the flow of messages (15) and (16).
|
|
Along the lines of Frey and Dueck (2007a and b), but bearing in mind the modified form for the compatibility constraints, Equation (15) can be simplified after a few manipulations in the following way, depending on cases:
|
| (17) |
It is remarkable that the number of effectively independent quantities present in Equation (15) is much smaller than the apparent O(N3) real valued numbers. Indeed, functional messages A take only two different values:
, that from now on will be called A(
,µ) to avoid index redundancy, and
independently on c, as long as it is
µ. The exchange of indexes in A(µ,
) is pure convention, but it has been introduced for coherency with the definition of availabilities given in Frey and Dueck, (2007a). It follows immediately from the normalization condition that
.
For the cavity probability functions, manipulation of Equation (16) involving the use of the last normalization condition and of the normalization constant rescaling, leads to
|
|
|
| (18) |
|
| (19) |
|
|
,µ) and R(µ,
) alone. Indeed, the effective dependence on the exemplar choice is dropped, and the computational size of the problem reduces by a factor N. The set of equations of A and R can be solved iteratively via the BP algorithm. The case in which one is interested not in the whole form of the posterior probability function, but only in retaining information about the most probable exemplar chosen by each data point, can be seen to be equivalent to taking the |
| (20) |
|
| (21) |
|
| (22) |
|
| (23) |
|
| (24) |
|
| (25) |
|
| (26) |
|
| (27) |
|
| (28) |
|
| (29) |
leading to SCAP Equations (5).1 After convergence, marginals can be written (Frey and Dueck, 2007a; Yedidia, 2005) as
|
| (30) |
| 5 DISCUSSION |
|---|
|
|
|---|
Affinity Propagation is a new powerful tool for unsupervised clustering. It has many very strong points. First it is very efficient, convergence to the final clustering is very fast, the latter appears to be independent on the initialization of messages. Second, due to its hard constraints AP identifies exemplars which are prototypical data points representing a whole cluster.
This last point is, however, also a first limitation of the original AP algorithm. If clusters cannot be well represented by a single cluster exemplar, AP has to fail. The hard constraint renders the algorithm greedy, and small fluctuations in the similarity measure may trigger avalanches in the exemplar choice leading to different clusterings for only slightly modified model parameters.
We have introduced a soft-constraint version of affinity propagation which is able to cure a part of these problems without loosing the efficiency of the original AP:
- By relaxing the hard constraint on clusters exemplars, we could introduce a parameter (
) controlling the algorithm greediness.
is a better tuning parameter than
(it is more informative and leads to more robust and stable clustering) and it is easier to interpret the statistical meaning of its tuning process.
- Clusters are more robust than in the original formulation of the algorithm. Moreover, even though a second a priori free parameter is introduced, the overall dependence of the algorithm on free parameters is reduced, and an optimal tuning strategy naturally emerges.
- The cluster structure can be efficiently probed. This concerns the internal structure of the clusters since SCAP is able to identify central and peripherical nodes of each clusters, as well as the hierarchical organization leading to a process of cluster merging if cluster number is reduced by looking to less fine structures.
- In the case of high-dimensional data, the relation between data points and their exemplars can be used to extract a sparse cluster signature. In the case of brain tumors, we have found that 20–40 genes per cluster are sufficient to reproduce almost the same clustering as found using all genes.
We conclude that SCAP is more efficient than AP in particular in the case of noisy, irregularly organized data—and thus in biological applications concerning microarray data. The computational efficiency of SCAP allows us to treat also very large data sets.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
We acknowledge very useful discussions with Alfredo Braunstein, Andrea Pagnani and Riccardo Zecchina. M.L. would like to thank the Malawi Polytechnic for hospitality during the preparation of the manuscript. The work of S. and M.W. was supported by the EC via the STREP GENNETEC (Genetic networks: emergence and complexity).
Conflict of Interest: none declared.
| FOOTNOTES |
|---|
Associate Editor: Trey Ideker
1 Note that the last change of variable via the subtraction of the overall quantity maxµ
is redundant if self-responsibilities are negative, as it is usually the case. ![]()
Received on May 24, 2007; revised on July 25, 2007; accepted on August 9, 2007
| REFERENCES |
|---|
|
|
|---|
Alizadeh A, et al. Distinct types of diffuse large B-cell-lymphoma identified by gene expression profiling. Nature (2000) 403:503–511.[CrossRef][Medline]
Blatt M, et al. Super-paramagnetic clustering of data. Phys. Rev. Lett (1996) 76:3251.[CrossRef][Web of Science][Medline]
Braunstein A, Zecchina R. Learning by message passing in networks of discrete synapses. Phys. Rev. Lett (2006) 96:030201.[CrossRef][Medline]
Dettling M. BagBoosting for tumor classification with gene expression data. Bioinformatics (2004) 20:3583–3593.
Duda RO, Hart PE. Pattern Classification and Scene Analysis (1973) New York: Wiley.
Frey JF, Dueck D. Clustering by passing messages between data points. Science (2007a) 315:972–976.
Frey JF, Dueck D. Mixture modeling by affinity propagation. (2007b) Freely available at books.nips.cc/papers/files/nips18/NIPS2005_0799.pdf.
Golub T, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science (1999) 286:531–537.
Hartmann AK, Weigt M. Phase Transitions in Combinatorial Optimization Problems (2005) Berlin: Wiley-VCH.
Jain AK, et al. Data clustering: a review. ACM Comput. Surv (1999) 31:264–323.[CrossRef]
Kabashima Y. A CDMA multiuser detection algorithm on the basis of belief propagation. J. Phys. A (2003) 36:11111–11121.[CrossRef]
Khan J, et al. Classification and diagnostic prediction of cancer using gene expression profiling and artificial neural networks. Nat. Med (2001) 6:673–679.[CrossRef][Web of Science]
Kschischang FR, et al. Factor graphs and the sum-product algorithm. IEEE Trans. Inform. Theory (2001) 47:1.
MacQueen J. Some Methods for Classification and Analysis of Multivariate Observations. Le Cam LM, Neyman J, eds. (1967) Vol. 1. Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley CA: University of California Press. 281–297.
Mézard M, et al. Analytic and algorithmic solution of random satisfiability problems. Science (2002) 297:812.
Pomeroy S, et al. Prediction of central nervous system embryonal tumor outcome based on gene expression. Nature (2002) 415:436–442.[CrossRef][Medline]
Shental N, et al. Pairwise clustering and graphical models. (2003) In 17th International Conference on Neural Information Processing Systems - NIPS. Canada: Vancouver.
Shi J, Malik J. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell (2000) 22:888–905.[CrossRef]
Yedidia JS, et al. Belief propagation and generalizations. IEEE Trans. Inform. Theory (2005) 51:2282.[CrossRef]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



–





