[Update 2] Louvain-Performance and Self-Loops
Since I decided not to modify the performance metric to account for self-loops, we are left with a Louvain-performance algorithm that always arrives at a very unhelpful “best” clustering: put all of the nodes in the same community. Luckily, the nature of the Louvain method is to iterate in “passes”, creating a new clustering at each pass. While the ultimate conclusion of Louvain-performance is not helpful, the intermediate clusterings may be. While Louvain isn’t making the best clustering decisions at each step, as it can’t account for self-loops, it is still making legitimate clustering decisions based on existing inter-community and intra-community edges.
To account for the intermediate clusterings, the Louvain-performance-intermediate algorithm is:
- Let Louvain-performance run to the end, ignoring self-loops.
- At every “pass,” remember the clustering.
- Calculate the performance score associated with each “unfurled” intermediate clustering.
- Return, in terms of the input graph, the best intermediate clustering.
For example:
- Run Louvain-performance.
- Consider starting, at pass 1, with a network G’ of 5 nodes and 10 edges. The algorithm chooses some best clustering C’, that has 3 communities.
- We condense the network, and pass 2 runs on a network G” of 3 nodes and 5 edges. The algorithm chooses some clustering C”, with 2 communities.
- We condense again, and pass 3 runs on a network G”’ of 2 nodes and 3 edges. The algorithm chooses a clustering C”’, putting both nodes into the same community.
- We condense for a final time, into G””: 1 node and no inter-node edges.
- Calculate performance for intermediate clusterings.
- Calculate the performance of G’ given clustering C’.
- Expand G” back into the network G’, using the information in C’. Now calculate the performance of G’ (not G”!) given clustering C” (not C’!).
- Expand G”’ into the network G’, using information from G” (post expansion) and C”. Calculate the performance of G’ given clustering C”’.
- Expand G”’ into G’. Calculate the performance of G’ given clustering C””.
- Return the intermediate clustering with the best performance. For example, suppose Pass 3’s clustering C”’ of G‘ is the best: return C”’.
Essentially, Louvain-performance-intermediate cleans up the overzealousness of the Louvain-performance algorithm by finding the pass at which evaluation should have stopped. In later analysis, we can compare Louvain-other metrics with both Louvain-performance and Louvain-performance-intermediate. My predictions are:
- Louvain-performance is basically useless, as it will always recommend putting all the nodes into a single community.
- Louvain-performance-intermediate is potentially useful, but requires additional computation and probably won’t output better clusterings than Louvain-modularity, given that it can’t account for self-loops at each intermediate step.