[Update 2] Louvain-Performance and Self-Loops

January 14, 2016 · by Christina Tong · in News

Since I decided not to modify the performance metric to account for self-loops, we are left with a Louvain-performance algorithm that always arrives at a very unhelpful “best” clustering: put all of the nodes in the same community. Luckily, the nature of the Louvain method is to iterate in “passes”, creating a new clustering at each pass. While the ultimate conclusion of Louvain-performance is not helpful, the intermediate clusterings may be. While Louvain isn’t making the best clustering decisions at each step, as it can’t account for self-loops, it is still making legitimate clustering decisions based on existing inter-community and intra-community edges.

To account for the intermediate clusterings, the Louvain-performance-intermediate algorithm is:

Let Louvain-performance run to the end, ignoring self-loops.
- At every “pass,” remember the clustering.
Calculate the performance score associated with each “unfurled” intermediate clustering.
Return, in terms of the input graph, the best intermediate clustering.

For example:

Run Louvain-performance.
- Consider starting, at pass 1, with a network G’ of 5 nodes and 10 edges. The algorithm chooses some best clustering C’, that has 3 communities.
- We condense the network, and pass 2 runs on a network G” of 3 nodes and 5 edges. The algorithm chooses some clustering C”, with 2 communities.
- We condense again, and pass 3 runs on a network G”’ of 2 nodes and 3 edges. The algorithm chooses a clustering C”’, putting both nodes into the same community.
- We condense for a final time, into G””: 1 node and no inter-node edges.
Calculate performance for intermediate clusterings.
- Calculate the performance of G’ given clustering C’.
- Expand G” back into the network G’, using the information in C’. Now calculate the performance of G’ (not G”!) given clustering C” (not C’!).
- Expand G”’ into the network G’, using information from G” (post expansion) and C”. Calculate the performance of G’ given clustering C”’.
- Expand G”’ into G’. Calculate the performance of G’ given clustering C””.
Return the intermediate clustering with the best performance. For example, suppose Pass 3’s clustering C”’ of G‘ is the best: return C”’.

Essentially, Louvain-performance-intermediate cleans up the overzealousness of the Louvain-performance algorithm by finding the pass at which evaluation should have stopped. In later analysis, we can compare Louvain-other metrics with both Louvain-performance and Louvain-performance-intermediate. My predictions are:

Louvain-performance is basically useless, as it will always recommend putting all the nodes into a single community.
Louvain-performance-intermediate is potentially useful, but requires additional computation and probably won’t output better clusterings than Louvain-modularity, given that it can’t account for self-loops at each intermediate step.

Tags: Louvain, performance, self-loops