使用Affinity Propagation进行聚类

2014-09-21

Affinity Propagation，可以翻译为近邻传播。

它的思路是通过迭代的方法计算出一个数据点更适合和另外哪一个数据点属于一个簇。

AP的论文：Clustering by Passing Messages Between Data Points。

http://www.psi.toronto.edu/index.php?q=affinity%20propagation提供了相关的论文、数据集、源代码。

近邻传播聚类Affinity Propagation是一篇少有的关于AP的中文博文。

http://scikit-learn.org/stable/modules/clustering.html#affinity-propagation中如此介绍AP：

Algorithm description: The messages sent between points belong to one of two categories. The first is the responsibility [latex]r(i, k)[/latex], which is the accumulated evidence that sample k should be the exemplar for sample i. The second is the availability [latex]a(i, k)[/latex] which is the accumulated evidence that sample [latex]i[/latex] should choose sample [latex]k[/latex] to be its exemplar, and considers the values for all other samples that [latex]k[/latex] should be an exemplar. In this way, exemplars are chosen by samples if they are (1) similar enough to many samples and (2) chosen by many samples to be representative of themselves.

More formally, the responsibility of a sample [latex]k[/latex] to be the exemplar of sample [latex]i[/latex] is given by:

[latex]
r(i, k) \gets s(i,k) - max_{k' \neq k}[a(i+k') + s(i+k')]
[/latex]

Where [latex]s(i, k)[/latex] is the similarity between samples [latex]i[/latex] and [latex]k[/latex]. The availability of sample [latex]k[/latex] to be the exemplar of sample [latex]i[/latex] is given by:

[latex]
a(i, k) \gets min[0, r(k,k)+\sum_{i' \neq i, i' \neq k} r(i', k)]
[/latex]

To begin with, all values for [latex]r[/latex] and [latex]a[/latex] are set to zero, and the calculation of each iterates until convergence.

http://www.biomedcentral.com/1471-2105/10/99中如此介绍AP：

Affinity Propagation (AP) identifies cluster centers, or exemplars, from the graph, which in some sense are a representative member of the cluster. Initially, all nodes are considered as exemplars, though each node is manually assigned a "preference" that it should be chosen as an exemplar. If no prior knowledge is available on which nodes should be favored as exemplars, then all nodes can be assigned the same preference value – where the magnitude can be used to control cluster granularity. For each node i and each candidate exemplar k, AP computes the "responsibility" [latex]r(i, k)[/latex], which indicates how well suited [latex]k[/latex] is as an exemplar for i, and the "availability" [latex]a(i, k)[/latex] reflecting the evidence that [latex]i[/latex] should choose [latex]k[/latex] as an exemplar.

[latex]
r(i, k) \gets s(i,k) - max_{k' \neq k}[a(i+k') + s(i+k')]
[/latex]

[latex]
a(i, k) \gets min[0, r(k,k)+\sum_{i' \neq i, i' \neq k} r(i', k)]
[/latex]

Where the matrix [latex]s(i, k)[/latex] denotes the similarity (eg. edge weight) between the two nodes [latex]i[/latex] and [latex]k[/latex], and the diagonal of this matrix contains the preferences for each node. The above two equations are iterated until a good set of exemplars emerges. Each node [latex]i[/latex] can then be assigned to the exemplar [latex]k[/latex] which maximizes the sum [latex]a(i, k) + r(i, k)[/latex], and if [latex]i = k[/latex], then [latex]i[/latex] is an exemplar. A damping factor between 0 and 1 is used to control for numerical oscillations.

不过上面少介绍了[latex]a(k,k)[/latex]：

[latex]
a(k,k) \gets \sum_{i' \neq k} max(0, r(i', k)) [/latex]