Updates to data versions and analytic methods influence the reproducibility of results from epigenome-wide association studies
Biomedical research has grown increasingly cooperative through the sharing of consortia-level epigenetic data. Since consortia preprocess data prior to distribution, new processing pipelines can lead to different versions of the same dataset. Similarly, analytic frameworks evolve to incorporate cutting-edge methods and best practices. However, it remains unknown how different data and analytic versions alter the results of epigenome-wide analyses, which could influence the replicability of epigenetic associations. Thus, we assessed the impact of these changes using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort. We analysed DNA methylation from two data versions, processed using separate preprocessing and analytic pipelines, examining associations between seven childhood adversities or prenatal smoking exposure and DNA methylation at age 7. We performed two sets of analyses: (1) epigenome-wide association studies (EWAS); (2) Structured Life Course Modelling Approach (SLCMA), a two-stage method that models time-dependent effects. SLCMA results were also compared across two analytic versions. Data version changes impacted both EWAS and SLCMA analyses, yielding different associations at conventional p-value thresholds. However, the magnitude and direction of associations was generally consistent between data versions, regardless of p-values. Differences were especially apparent in analyses of childhood adversity, while smoking associations were more consistent using significance thresholds. SLCMA analytic versions similarly altered top associations, but time-dependent effects remained concordant. Alterations to data and analytic versions influenced the results of epigenome-wide analyses. Our findings highlight that magnitude and direction are better measures for replication and stability than p-value thresholds.