Louvain-Performance: Weights of Nonexistent Edges
One of the tricky things about calculating performance is that it factors in nonexistent edges between vertices in different communities. The more of these edges exist, the higher the overall performance score, because they count as correctly-interpreted edges. That is, since the two vertices are not directly connected, we were “correct” to put them in separate communities.
For unweighted graphs, each nonexistent edge has weight 1 and simply contributes 1 towards the numerator in the performance formula. For weighted graphs, it’s a fair deal more complicated. One method is to assume that non-existing edges have maximum weight. However, this neglects the weight of the intercluster edges. Instead, we use a scaling parameter M (detailed in Gaertler 2005). I am currently working on figuring out how to calculate M, and how to change the parts of my code that calculate performance to use M.
As a side note, even if the original input graph is unweighted, the Louvain algorithm’s implementation ensures that after the first pass, the algorithm operates on weighted graphs. This is because in the second phase of the i‘th pass, a new graph is formed, in which each community is condensed into a single vertex for the (i+1)’th pass. In this condensing, intra-community edges (e_1, e_2, …, e_k) become a weighted self loop e, where weight(e) = weight(e_1) + weight (e_2) + … + weight(e_k). The weights of inter-community edges (i_1, i_2, …, i_k) between some separate communities a and b become a single weighted edge (a,b), where weight(a,b) = weight(i_1) + weight (i_2) + … + weight(i_k).