Update on Testing Direction
We are currently focused on testing our modified algorithms on networks that have ground truths. In this way, we can compare the quality of the algorithms’ clustering outputs against the ground truth, and objectively determine which algorithms performed better in which situations.
To compare clustering outputs against ground truths, we use normalized mutual information [1], the Meila index [2], the Rand index [3],split-join distance [4], and the adjusted Rand index [5]. Initial tests show inconsistent results; the modified algorithms perform well (better than Louvain-modularity and CNM-modularity) on some graphs, but considerably worse on others.
One of the graphs we’ve used for testing is Zachary’s karate club network. It is an undirected, unweighted graph of 34 nodes and 78 edges, and models the network of friendships between members of a karate club [6]. This is the ground truth of Zachary’s karate club; the nodes are partitioned into two groups centered around two people (the instructor, at node 1, and the club president, at node 34).
In this test case, across all the aforementioned clustering comparison metrics, implementations of algorithms using coverage outperformed the implementations using modularity. However, the implementations using performance and silhouette did markedly worse. In the visualization of the clustering outputs below, one can actually see an improvement between the modularity implementations and the coverage implementations.
Here is the clustering by Louvain-modularity; it identified 4 communities:
Here is the clustering by CNM-modularity; it has found 3 communities:
Here is the clustering by both LM-coverage and CNM-coverage (they found the same clustering), which has 3 communities and is “less different” than the ground truth than the two modularity outputs above:
- L. Danon, A. Diaz-Guilera, J. Duch, & A. Arenas. “ Comparing community structure identification”. Journal of Statistical Mechanics: Theory and Experiment, vol 2005, P09008, 2005(09).
- M. Meilă. “Comparing clusterings by the variation of information”. Learning theory and kernel machines, 173-187., Springer Berlin Heidelberg, 2003.
- M.W. Rand. “Objective criteria for the evaluation of clustering methods”. Journal of the American Statistical association, 66(336), 846-850, 1971.
- S. Dongen. “Performance criteria for graph clustering and Markov cluster experiments”. Technical Report INSR0012, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000.
- S. Wagner, & D. Wagner. “Comparing clusterings: an overview”. Karlsruhe: Universität Karlsruhe, Fakultät für Informatik, 2007.
- W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33, 452-473 (1977).