The performance of modularity maximization in practical contexts

Good, Benjamin H.; de Montjoye, Yves-Alexandre; Clauset, Aaron

Physics > Data Analysis, Statistics and Probability

arXiv:0910.0165v1 (physics)

[Submitted on 1 Oct 2009 (this version), latest version 1 Apr 2010 (v2)]

Title:The performance of modularity maximization in practical contexts

Authors:Benjamin H. Good, Yves-Alexandre de Montjoye, Aaron Clauset

View PDF

Abstract: Although widely used in practice, the behavior and accuracy of the popular module identification technique called modularity maximization is not well understood. Here, we present a broad and systematic characterization of its performance in practical situations. First, we generalize and clarify the recently identified resolution limit phenomenon. Second, we show that the modularity function Q exhibits extreme degeneracies: that is, the modularity landscape admits an exponential number of distinct high-scoring solutions and does not typically exhibit a clear global maximum. Third, we derive the limiting behavior of the maximum modularity Q_max for infinitely modular networks, showing that it depends strongly on the size of the network and the number of module-like subgraphs it contains. Finally, using three real-world examples of metabolic networks, we show that the degenerate solutions can fundamentally disagree on the composition of even the largest modules. Together, these results significantly extend and clarify our understanding of this popular method. In particular, they explain why so many heuristics perform well in practice at finding high-scoring partitions, why these heuristics can disagree on the composition of the identified modules, and how the estimated value of Q_max should be interpreted. Further, they imply that the output of any modularity maximization procedure should be interpreted cautiously in scientific contexts. We conclude by discussing avenues for mitigating these behaviors, such as combining information from many degenerate solutions or using generative models.

Comments:	17 pages, 12 figures
Subjects:	Data Analysis, Statistics and Probability (physics.data-an); Disordered Systems and Neural Networks (cond-mat.dis-nn); Physics and Society (physics.soc-ph); Molecular Networks (q-bio.MN); Quantitative Methods (q-bio.QM)
Cite as:	arXiv:0910.0165 [physics.data-an]
	(or arXiv:0910.0165v1 [physics.data-an] for this version)
	https://doi.org/10.48550/arXiv.0910.0165

Submission history

From: Aaron Clauset [view email]
[v1] Thu, 1 Oct 2009 13:06:41 UTC (1,542 KB)
[v2] Thu, 1 Apr 2010 10:44:26 UTC (1,234 KB)

Physics > Data Analysis, Statistics and Probability

Title:The performance of modularity maximization in practical contexts

Submission history

Access Paper:

References & Citations

5 blog links

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Physics > Data Analysis, Statistics and Probability

Title:The performance of modularity maximization in practical contexts

Submission history

Access Paper:

References & Citations

5 blog links

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators