[Update 1] Louvain-Performance: Weights of Nonexistent Edges
Earlier, I wrote a post on how performance on weighted graphs uses a meaningful maximum M to account for the weights of nonexistent edges. At that time, my challenges were (1) figuring out how to calculate M, and (2) changing my code to account for M.
(1) After more review of the literature and Chertov 2012’s implementation of performance on weighted graphs, I conclude that M actually differs depending on the data set. That is, it is a semantically meaningful maximum for that dataset, chosen by the researchers. For example, in a data set where edge weights represent the probability of choosing that edge, the meaningful maximum would be 1. And indeed, all edge weights would fall between 0 and 1, and are upper-bounded by 1.
The probability example is a tidy one. Unfortunately, choosing M in graphs where the range is large and there are high outliers presents a larger problem, as this will skew performance (causing it to be lower). I was unable to find any examples of how others dealt with such data sets, partly because of the rarity of using performance as a metric. At this point, my suggestion is to consider removing the outlier edges or (manually or algorithmically) altering their weights in such data sets.
In summary, because the meaningful maximum is so dependent on the input data’s intricacies, we will not have the algorithm calculate M. Instead, we take M as a parameter on the command line.
(2) I have altered my code to take M as a parameter and to use this in the first pass of the algorithm. We now face another question: how do we update M in subsequent passes of the Louvain algorithm? Every pass of Louvain creates a new, condensed graph, with differing edge weights (due to new self-loops and edges that represent the sum of inter-community edges from the earlier graph).