Alternative Analysis Methods for Time to Event Endpoints Under Nonproportional Hazards: A Comparative Analysis

Lin, Ray S.; Lin, Ji; Roychoudhury, Satrajit; Anderson, Keaven M.; Hu, Tianle; Huang, Bo; Leon, Larry F; Liao, Jason J.Z.; Liu, Rong; Luo, Xiaodong; Mukhopadhyay, Pralay; Qin, Rui; Tatsuoka, Kay; Wang, Xuejing; Wang, Yang; Zhu, Jian; Chen, Tai-Tsang; Iacona, Renee

doi:10.6084/m9.figshare.11310851.v2

usbr_a_1697738_sm3860.pdf (153.89 kB)

Alternative Analysis Methods for Time to Event Endpoints Under Nonproportional Hazards: A Comparative Analysis

Version 2 2020-01-27, 15:48

Version 1 2019-12-03, 17:29

journal contribution

posted on 2020-01-27, 15:48 authored by Ray S. Lin, Ji Lin, Satrajit Roychoudhury, Keaven M. Anderson, Tianle Hu, Bo Huang, Larry F Leon, Jason J.Z. Liao, Rong Liu, Xiaodong Luo, Pralay Mukhopadhyay, Rui Qin, Kay Tatsuoka, Xuejing Wang, Yang Wang, Jian Zhu, Tai-Tsang Chen, Renee Iacona

The log-rank test is most powerful under proportional hazards (PH). In practice, non-PH patterns are often observed in clinical trials, such as in immuno-oncology; therefore, alternative methods are needed to restore the efficiency of statistical testing. Three categories of testing methods were evaluated, including weighted log-rank tests, Kaplan–Meier curve-based tests (including weighted Kaplan–Meier and restricted mean survival time), and combination tests (including Breslow test, Lee’s combo test, and MaxCombo test). Nine scenarios representing the PH and various non-PH patterns were simulated. The power, Type I error, and effect estimate of each method were compared. In general, all tests control Type I error well. There is not a single most powerful test across all scenarios. In the absence of prior knowledge regarding the underlying or non-PH patterns, the MaxCombo test is relatively robust across patterns. Since the treatment effect changes over time under non-PH, the overall profile of the treatment effect may not be represented comprehensively based on a single measure. Thus, multiple measures of the treatment effect should be prespecified as sensitivity analyses to describe the totality of the data. Supplementary materials for this article are available online.