Taylor & Francis Group
Browse
uasa_a_1246365_sm8657.pdf (208.63 kB)

Network Cross-Validation for Determining the Number of Communities in Network Data

Download (208.63 kB)
journal contribution
posted on 2016-10-21, 20:19 authored by Kehui Chen, Jing Lei

The stochastic block model (SBM) and its variants have been a popular tool for analyzing large network data with community structures. In this article, we develop an efficient network cross-validation (NCV) approach to determine the number of communities, as well as to choose between the regular stochastic block model and the degree corrected block model (DCBM). The proposed NCV method is based on a block-wise node-pair splitting technique, combined with an integrated step of community recovery using sub-blocks of the adjacency matrix. We prove that the probability of under-selection vanishes as the number of nodes increases, under mild conditions satisfied by a wide range of popular community recovery algorithms. The solid performance of our method is also demonstrated in extensive simulations and two data examples. Supplementary materials for this article are available online.

History