Running the Data Set
Our data set consists of 52 networks. The size of the networks range from 34 vertices and 78 edges (Zachary’s karate club network) to 4,847,571 vertices and 68,993,773 edges (LiveJournal social network). We ran two tests on our data set: metric valuations and comparison.
Metric Valuation:
We ran the data set through our algorithm variations to compute a community graph and the corresponding metric score. Recall that we have 6 algorithm variations: Louvain with coverage optimization, Louvain with modularity optimization, Louvain with performance optimization, Louvain with silhouette index optimization, CNM with coverage optimization, and CNM with modularity optimization.
Some issues I encountered:
- Input network is too large to process
- Solution: Increase the Java heap space.
- Input network is still too large
- Solution: None yet, but we’re looking into external servers.
- Multiple component networks
- Solution: Because the CNM algorithm also does not accept networks that describe more than a single component, we removed these networks from our test suite.
Comparison:
The community graphs that we computed in the previous test are compared against each other. We have already done the comparison test on the ground-truth graphs. In this test, we compare the outputs against the ground truth (see Christina’s post for more information). With the non ground-truth networks, we run the outputs of a single graph against each other.