]> Content-Based Roles

Content-Based Roles

Will Holcomb

18 June 2014

I'm working through the following paper. I want to tease out some of the math and step through it to better understand.

The idea is that instead of looking at the graph of social connections to evaluate someone, you describe the content of their interactions and then cluster it.

The axises along which content is described are:

The vectors that result from concatenating these features are then clustered according to c-means fuzzy clustering. Fuzzy means that a vector belongs to all the clusters to varying extents.

A set of feature vectors for n users, \( X = \{ x_1, x_2, …, x_n \} \ni x_i = \{ PE, BE, AS, AF, RE \}, \) are grouped into c roles, \( \tilde{F} = \{ \tilde{F_1}, \tilde{F_2}, …, \tilde{F_c} \} \), by minimizing:

\begin{equation} J_m =\sum\limits_{j=1}^{n} \sum\limits_{i=1}^{c} (\mu_{ij})^m D(x_j, c_i) \end{equation}

Where:

The process is:

  1. Pick \( c \) random cluster centers: \( C \)
  2. Calculate the membership of user vector, \( i \), in each cluster, \( j \): \( \mu_{ij} \)
  3. Find the centroids of the clusters, \( C^\prime \)
  4. Compare \( J_m \) and \( J_m^\prime \), if the difference is greater than \( \epsilon \), repeat using \( C^\prime \) as the centers

The input set of massages is partitioned according to time, and for each a set of membership vectors can be generated, \( m_j = \{ \mu_{1j}, \mu_{2j}, …, \mu_{cj} \} \). These are then quantized to vectors \( \{ (r_1,q(\mu_{1j})), (r_2,q(\mu_{2j})), …, (r_c,q(\mu_{cj})) \} \) according to the following rule:

\begin{equation} q(x) = \begin{cases} L;& 0 \leq x \le 0.25\\ M;& 0.25 \leq x \le 0.75\\ H;& 0.75 \leq x \leq 1 \end{cases} \end{equation}