Properties and approximate p-value calculation of the Cramer test

Telford, Alison; Taylor, Charles C.; Wood, Henry M.; Gusnanto, Arief

doi:10.6084/m9.figshare.12191082.v1

gscs_a_1754820_sm6193.pdf (462.63 kB)

Properties and approximate p-value calculation of the Cramer test

journal contribution

posted on 2020-04-24, 11:35 authored by Alison Telford, Charles C. Taylor, Henry M. Wood, Arief Gusnanto

Two-sample tests are probably the most commonly used tests in statistics. These tests generally address one aspect of the samples' distribution, such as mean or variance. When the null hypothesis is that two distributions are equal, the Anderson–Darling (AD) test, which is developed from the Cramer–von Mises (CvM) test, is generally employed. Unfortunately, we find that the AD test often fails to identify true differences when the differences are complex: they are not only in terms of mean, variance and/or skewness but also in terms of multi-modality. In such cases, we find that Cramer test, a modification of the CvM test, performs well. However, the adaptation of the Cramer test in routine analysis is hindered by the fact that the mean, variance and skewness of the test statistic are not available, which resulted in the problem of calculating the associated p-value. For this purpose, we propose a new method for obtaining a p-value by approximating the distribution of the test statistic by a generalized Pareto distribution. By approximating the distribution in this way, the calculation of the p-value is much faster than e.g. bootstrap method, especially for large n. We have observed that this approximation enables the Cramer test to have proper control of type-I error. A simulation study indicates that the Cramer test is as powerful as other tests in simple cases and more powerful in more complicated cases.