[Update 2] Louvain-Performance and Self-Loops

Since I decided not to modify the performance metric to account for self-loops, we are left with a Louvain-performance algorithm that always arrives at a very unhelpful “best” clustering: put all of the nodes in the same community. Luckily, the nature of the Louvain method is to iterate in “passes”, creating a new clustering at each pass. While the ultimate conclusion of Louvain-performance is not helpful, the intermediate clusterings may be. While Louvain isn’t making the best clustering decisions at each step, as it can’t account for self-loops, it is still making legitimate clustering decisions based on existing inter-community and intra-community edges.

To account for the intermediate clusterings, the Louvain-performance-intermediate algorithm is:

  • Let Louvain-performance run to the end, ignoring self-loops.
    • At every “pass,” remember the clustering.
  • Calculate the performance score associated with each “unfurled” intermediate clustering.
  • Return, in terms of the input graph, the best intermediate clustering.

For example:

  • Run Louvain-performance.
    • Consider starting, at pass 1, with a network G’ of 5 nodes and 10 edges. The algorithm chooses some best clustering C’, that has 3 communities.
    • We condense the network, and pass 2 runs on a network G” of 3 nodes and 5 edges. The algorithm chooses some clustering C”, with 2 communities.
    • We condense again, and pass 3 runs on a network G”’ of 2 nodes and 3 edges. The algorithm chooses a clustering C”’, putting both nodes into the same community.
    • We condense for a final time, into G””: 1 node and no inter-node edges.
  • Calculate performance for intermediate clusterings.
    • Calculate the performance of G’ given clustering C’.
    • Expand G” back into the network G’, using the information in C’. Now calculate the performance of G’ (not G”!) given clustering C” (not C’!).
    • Expand G”’ into the network G’, using information from G” (post expansion) and C”. Calculate the performance of G’ given clustering C”’.
    • Expand G”’ into G’. Calculate the performance of G’ given clustering C””.
  • Return the intermediate clustering with the best performance. For example, suppose Pass 3’s clustering C”’ of G‘ is the best: return C”’.

Essentially, Louvain-performance-intermediate cleans up the overzealousness of the Louvain-performance algorithm by finding the pass at which evaluation should have stopped. In later analysis, we can compare Louvain-other metrics with both Louvain-performance and Louvain-performance-intermediate. My predictions are:

  • Louvain-performance is basically useless, as it will always recommend putting all the nodes into a single community.
  • Louvain-performance-intermediate is potentially useful, but requires additional computation and probably won’t output better clusterings than Louvain-modularity, given that it can’t account for self-loops at each intermediate step.