More Ground-Truth Testing
I just completed additional testing on the ground-truth graphs. Specifically, I’ve calculated the metric scores on output graphs in order to determine if the Louvain-{x metric} variant of the algorithm actually outputs the clustering with the best x metric score. For example, we’d want to know if for some reason Louvain-silhouette created a clustering that has a better modularity score than Louvain-modularity. The calculator is called MetricScoreCalculator.java.
The main issue I had when creating the calculator came from computing the silhouette index score of output communities.
The method that writes the clustering to the .graph file is called writeOutputFile() in ModularityOptimizer.java. Before it writes the clustering out, it orders the clusters by decreasing size. For example, the following is the original clustering found by the Louvain method optimizing silhouette index:
cluster 0 contains 14 nodes: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, and 13
cluster 1 contains 1 node: 14
cluster 2 contains 1 node: 15
cluster 3 contains 1 node: 16
cluster 4 contains 1 node: 17
cluster 5 contains 1 node: 21
cluster 6 contains 15 nodes: 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, and 33
And the following is the clustering when the clusters are reordered:
cluster 0 contains 15 nodes: 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, and 33
cluster 1 contains 14 nodes: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, and 13
cluster 2 contains 1 node: 14
cluster 3 contains 1 node: 15
cluster 4 contains 1 node: 16
cluster 5 contains 1 node: 17
cluster 6 contains 1 node: 21
PROBLEM
——————–
When computing the metric scores of output communities using MetricScoreCalculator.java, the scores for coverage, modularity, and performance are not affected by this reordering. However, the score for silhouette index is affected by this reordering. For example, the following are the SI scores for the karate network before and after the reordering:
before reordering: SI = 0.3556
after reordering: SI = 0.0237
This is due to the adjacency matrix. In each pass of the algorithm, the adjacency matrix is condensed such that the clusters now represent nodes. Because the clusters are reordered, this affects the silhouette index score calculation.
SOLUTION
——————–
Because the reordered clusters are used in the analysis, I’ve changed the recorded silhouette index scores to reflect the reordering. I’ve also modified the louvain algorithm so that if silhouette index is optimized, the SI score is automatically updated after the reordering.