Abstract: |
The von Neumann graph entropy is a measure of graph complexity based on the Laplacian spectrum. It has recently found applications in various learning tasks driven by the networked data. However, it is computationally demanding and hard to interpret using simple structural patterns. Due to the close relation between the Laplacian spectrum and the degree sequence, we conjecture that the structural information, defined as the Shannon entropy of the normalized degree sequence, might be a good approximation of the von Neumann graph entropy that is both scalable and interpretable. In this work, we thereby study the difference between the structural information and the von Neumann graph entropy named as entropy gap. Based on the knowledge that the degree sequence is majorized by the Laplacian spectrum, we for the first time prove that the entropy gap is between 0 and log(2) e in any undirected unweighted graphs. Consequently we certify that the structural information is a good approximation of the von Neumann graph entropy that achieves provable accuracy, scalability, and interpretability simultaneously. This approximation is further applied to two entropyrelated tasks: network design and graph similarity measure, where a novel graph similarity measure and the corresponding fast algorithms are proposed. Meanwhile, we show empirically and theoretically that maximizing the von Neumann graph entropy can effectively hide the community structure, and then propose an alternative metric called spectral polarization to guide the community obfuscation. Our experimental results on graphs of various scales and types show that the very small entropy gap readily applies to a wide range of simple/weighted graphs. As an approximation of the von Neumann graph entropy, the structural information is the only one that achieves both high efficiency and high accuracy among the prominent methods. It is at least two orders of magnitude faster than SLaQ (Tsitsulin et al., 2020) with comparable accuracy. Our structural information based methods also exhibit superior performance in downstream tasks such as entropy-driven network design, graph comparison, and community obfuscation. |