Gaps in current methods to detect polymorphic CpGs from Illumina Infinium human methylation microarrays and exploring their potential impact in multi-EWAS analyses
DNA methylation (DNAm) epigenome-wide association studies (EWAS) have been performed on diverse ethnicities to discover novel biomarkers associated with various diseases, such as cancers, autoimmune diseases, and neurological disorders. However, genetic polymorphisms can influence DNAm levels resulting in methylation quantitative trait loci (meQTL). These can be either direct effects, by altering the sequence of the methylation (CpG) site itself, or, in the case of array-based measures, indirectly altering the detection probe-binding site interaction. Given that genetic variant frequencies associated with meQTL can differ between population groups, these have the potential to confound EWAS observations, particularly in multi-ethnic populations. In this study, we analysed publicly available DNA methylation profiles (450K array), consisting of 1342 individuals from 6 distinct ancestral groups. We investigate two distinct tools (GapHunter and MethylToSNP) specifically designed to identify CpG sites that may be influenced by genetic variation. Results from this aggregated trans-ancestral epigenome-wide dataset suggest that both tools fail to consistently identify not only rarer (MAF < 0.05) genetic variant effects but also more than half of sites predicted to be associated with variants with much higher allele frequencies (MAF >0.2). In addition, there is a relatively low concordance in the detection of polymorphic CpGs between GapHunter and MethylToSNP. Screening of CpG site associations from EWAS using either of these tools is unlikely to be a robust or comprehensive means of identifying all genetic variant confounding effects.