A Semiparametric Two-Sample Hypothesis Testing Problem for Random Graphs

Tang, Minh; Athreya, Avanti; Sussman, Daniel L.; Lyzinski, Vince; Park, Youngser; E. Priebe, Carey

doi:10.6084/m9.figshare.3405667

twosample_semipar_supplement.pdf (7.25 MB)

A Semiparametric Two-Sample Hypothesis Testing Problem for Random Graphs

Version 2 2017-04-25, 05:40

Version 1 2016-05-27, 02:01

journal contribution

posted on 2017-04-25, 05:40 authored by Minh Tang, Avanti Athreya, Daniel L. Sussman, Vince Lyzinski, Youngser Park, Carey E. Priebe

Two-sample hypothesis testing for random graphs arises naturally in neuroscience, social networks, and machine learning. In this article, we consider a semiparametric problem of two-sample hypothesis testing for a class of latent position random graphs. We formulate a notion of consistency in this context and propose a valid test for the hypothesis that two finite-dimensional random dot product graphs on a common vertex set have the same generating latent positions or have generating latent positions that are scaled or diagonal transformations of one another. Our test statistic is a function of a spectral decomposition of the adjacency matrix for each graph and our test procedure is consistent across a broad range of alternatives. We apply our test procedure to real biological data: in a test-retest dataset of neural connectome graphs, we are able to distinguish between scans from different subjects; and in the C. elegans connectome, we are able to distinguish between chemical and electrical networks. The latter example is a concrete demonstration that our test can have power even for small-sample sizes. We conclude by discussing the relationship between our test procedure and generalized likelihood ratio tests. Supplementary materials for this article are available online.