2022 Volume 86 Issue 6 Pages 986-992
Background: Tobacco smoking is a leading preventable cause of morbidity and mortality worldwide; still, the success rate of smoking cessation is low in general. From the viewpoint of public health and clinical care, an objective biomarker of long-term smoking behavior is sought.
Methods and Results: This study assessed DNA methylation as a biomarker of smoking in a hospital setting through a combination of molecular approaches including genetic, DNA methylation and mRNA expression analyses. First, in an epigenome-wide association study involving Japanese individuals with chronic cardiovascular disease (n=94), genome-wide significant smoking association was identified at 2 CpG sites on chromosome 5, with the strongest signal at cg05575921 located in intron 3 of the aryl-hydrocarbon receptor repressor (AHRR) gene. Highly significant (P<1×10−27) smoking–cg05575921 association was validated in 2 additional panels (n=339 and n=300). For the relationship of cg05575921 methylation extent with time after smoking cessation and cumulative cigarette consumption among former smokers, smoking-related hypomethylation was found to remain for ≥20 years after smoking cessation and to be affected by multiple factors, such as cis-interaction of genetic variation. There was a significant inverse correlation (P=0.0005) between cg05575921 methylation extent and AHRR mRNA expression.
Conclusions: The present study results support that reversion of AHRR hypomethylation can be a quantifiable biomarker for progress in and observance of smoking cessation, although some methodological points need to be considered.
Tobacco smoking is a leading preventable cause of morbidity and mortality in the world. Active smoking in adults increases the risk for a wide spectrum of chronic illness, such as cardiovascular and respiratory disease, and various forms of cancer. As self-reported smoking information is prone to inaccuracies due to recall bias and under-reporting,1 an objective biomarker of long-term smoking behavior has been sought. Among current smokers, cotinine, a primary metabolite of nicotine, is a reliable measure of nicotine exposure,2 but is not suited for the estimation of long-term smoking behavior due to its short half-life of 15–24 h.
Editorial p 993
With the advent of epigenome-wide association studies (EWAS) in the last decade, it has become known that tobacco smoking has a broad effect on DNA methylation at many loci across the genome,3 and that smoking-related changes in gene expression are epigenetically regulated.4 Among EWAS-identified loci, the aryl-hydrocarbon receptor repressor (AHRR) gene has been noted as a promising candidate gene, with smoking-associated hypomethylation of AHRR CpG sites (e.g., cg05575921) detectable in DNA from peripheral lymphocytes and pulmonary macrophages.5,6 AHRR encodes a transcription factor that can repress the aryl-hydrocarbon receptor pathway, which regulates the cytochrome P450 (CYP)-mediated catabolism of xenobiotics such as dioxin and polyaromatic hydrocarbons contained in tobacco smoke.5 Although the exact relationship between AHRR hypomethylation and modulation of the CYP pathway remains unclear, AHRR hypomethylation is considered to be a logical choice as a long-term biomarker of exposure to tobacco smoke (Figure 1A).7 Still, most of the previous studies have focused on the epidemiological relevance of smoking-related epigenetic changes. In light of these circumstances, we perform an integrative assessment of DNA methylation signature as a potential biomarker for smoking in clinical practice.
Effect of smoking status on DNA methylation. (A) The role of AHRR in the xenobiotic detoxification pathway. Manhattan plot (B) and quantile-quantile plot (C) for epigenome-wide association studies (EWAS) of smoking in a Japanese population. In the Manhattan plot, a horizontal line indicates a genome-wide significance level (P=9×10–8), which 2 CpGs at AHRR (cg05575921 and cg23576855) attained. (D) Validation of high concordance (r=0.903, P<10–4) between 2 analytical methods (Illumina EPIC array and ddPCR) for cg05575921 methylation. Distribution of cg05575921 methylation extent is shown separately by smoking status (current, former and never smokers) in panels 2 (E) and 3 (F). ddPCR, droplet digital PCR.
This study was approved by the institutional ethics review board and participants provided written informed consent. The procedures followed were in accordance with the Declaration of Helsinki and the ethical standards of the institutional committee on human experimentation at the National Center for Global Health and Medicine (NCGM). Adult patients of Japanese descent were recruited via 2 separate projects: a bioresource collection project focused on cardiovascular disease, named BIO-CVD, and the NCGM Biobank project (Table). We made a detailed assessment of coronary artery disease (CAD) for samples in BIO-CVD, which we used for both EWAS of smoking (panel 1, n=94) and a subsequent multiomics analysis (for methylation, SNP and mRNA) on AHRR (panel 2, n=339). To validate the findings in BIO-CVD, we additionally used the NCGM Biobank samples (panel 3, n=300), who visited NCGM for treatment of chronic illness including hypercholesterolemia, for methylation and SNP analyses. Here, in BIO-CVD, whose participants were consecutively enrolled between 2014 and 2018, we initially selected 96 individuals for EWAS in panel 1 and then intensively chose CAD patients in panel 2, such that part of the samples (n=31) overlapped between the panels to see the concordance of 2 analytical methods.
Panel 1 | Panel 2 | Panel 3 | |
---|---|---|---|
No. of individuals (F/M) | 94 (29/65) | 339 (47/292) | 300 (155/145) |
Age, years | 60.3±1.2 | 67.8±0.6 | 64.3±0.7 |
Smoking habit, n (%) | |||
Never | 39 (41) | 88 (26) | 145 (48) |
Former | 31 (33) | 169 (50) | 115 (38) |
Current | 24 (26) | 82 (24) | 40 (13) |
Complication, n (%) | |||
Hypertension | 51 (54) | 259 (76) | 155 (52) |
Diabetes | 23 (24) | 131 (39) | 57 (19) |
Hypercholesterolemia | 45 (48) | 239 (71) | 300 (100) |
CAD | 46 (49) | 295 (87) | 9 (3) |
Samples in panels 1 (for EWAS) and 2 (for multiomics analysis) are derived from the BIO-CVD project, where part of the samples (n=31) overlap between panels 1 and 2. Samples in panel 3 (for methylation and SNP analyses) are derived from the NCGM Biobank project. CAD, coronary artery disease; CVD, cardiovascular disease; EWAS, epigenome-wide association studies; F, female; M, male; NCGM, National Center for Global Health and Medicine; SNP, single nucleotide polymorphism.
Self-reported smoking behavior, including age of smoking initiation, age of smoking cessation (for former smokers) and the number of daily consumed cigarettes for a given period of time, was recorded from each individual. Based on this information, participants were categorized into three smoking statuses: never, former and current smokers.
DNA Methylation AssayGenomic DNA was extracted from the buffy coat and stored at −80℃. For the preparation of DNA methylation assay, 1 μg of genomic DNA per sample was bisulfite-converted using the EZ DNA Methylation kit (Zymo Research, Irvine, CA, USA). Genome-wide methylation profiling was performed on DNA derived from peripheral blood mononuclear cells with the illumina EPIC array (illumina Inc. San Diego, CA, USA), which contains >485,000 CpG sites. A low-quality CpG site (with detection P value >0.01) was excluded from the analysis. Two individuals were excluded due to the lack of sufficient information on smoking.
Droplet digital PCR (ddPCR) assay in the QX200 Droplet Digital PCR system (Bio-Rad, Carlsbad, CA, USA) was used to measure methylation extent at the AHRR cg05575921 site in panels 2 and 3 using a set of previously reported primers6 plus originally designed probes (Supplementary Table 1).
mRNA Expression AnalysisTotal RNA was purified with the PAXgene Blood RNA Kit (Becton, Dickinson and Company, Franklin Lakes, NJ, USA) from whole blood of current smokers in panel 2 (n=81), and subjected to quantitative PCR (qPCR) with TaqMan assay (ThermoFisher Scientific, Waltham, MA, USA, Assay ID: Hs01005075_m1 for AHRR and Hs01060665_g1 for actin-beta).
SNP–Methylation Association AnalysisApart from reverse causation (i.e., smoke exposure influences DNA methylation), to explore the confounding cis-interaction between genetic variation and DNA methylation at AHRR, we carried out SNP–methylation association analysis using genome-wide genotype data (n=315 in panel 2 and n=234 in panel 3) from the Infinium OmniExpress-24 BeadChip Array (illumina Inc.) separately for never, former and current smokers (Supplementary Table 2), followed by meta-analysis. A cis distance of ≤500-kb from cg05575921 (which contains 1,797 SNPs with minor allele frequency ≥0.05) was used to screen cis-interaction effects.
Statistical AnalysisResults are shown as mean±SEM unless otherwise indicated. Association analyses were performed in a linear regression model. In EWAS of smoking, we used an M-value of DNA methylation as a dependent variable and smoking status (never=0, former=1, current=2), sex and age as independent variables; for the Illumina EPIC array, a genome-wide significance level was set at P=9×10−8.8 In the analysis of a region-wide SNP-methylation association, linear regression was performed separately by panel and smoking status, and the results were combined by fixed-effect meta-analysis, where P<2.8×10−5 (≒0.05/1,797) was considered significant with Bonferroni correction for multiple comparisons. In other instances, P<0.05 was deemed statistically significant.
The proportion of smokers (combining former and current smokers) was higher in panel 2 (74%) than in panels 1 (61%) and 3 (51%) in accordance with the respective proportion of CAD (Table).
In EWAS of smoking, we successfully identified a genome-wide significant association at 2 CpG sites on chromosome 5, with the strongest signal at cg05575921 located in intron 3 of AHRR (Figure 1B,1C). After validating the high concordance rate (r=0.903, P<10−4) between 2 analytical methods (Illumina EPIC array and ddPCR; Figure 1D), we measured cg05575921 methylation in panels 2 (Figure 1E) and 3 (Figure 1F), and found significant differences in cg05575921 methylation extent between smoking statuses (P=2.5×10−28 and P=5.1×10−33 by ANOVA for panels 2 and 3, respectively). Among never smokers, cg05575921 methylation extent was distributed in a much narrower range (Figure 2A,2B) than among current and former smokers (Figure 1E and 1F).
cg05575921 methylation among never and former smokers. For never smokers, distribution of cg05575921 methylation extent (y-axis) is shown against participant age (x-axis) separately by sex: men (A) and women (B). In (A) and (B), the values indicate Beta-values of cg05575921 methylation (mean±SE). For former smokers, distribution of cg05575921 methylation extent is shown by years after smoking cessation (C) and cumulative cigarette consumption in pack-years (D). In the x-axis of (C) and (D), the numbers enclosed in parentheses indicate the number of subjects for each category. (E) Mean of cg05575921 methylation extent (z-axis) is shown for subgroups of former smokers, classified by the combination of 2 variables (years after smoking cessation in x-axis; pack-years (P.Y.) in y-axis). For more details, see Supplementary Table 3.
We then looked at the relationship of cg05575921 methylation extent with time (in years) after smoking cessation and cumulative cigarette consumption (in pack-years) among former smokers in panel 2. We found a gradually increasing trend of cg05575921 methylation extent with time since cessation (Figure 2C), and also an inverse association between cumulative cigarette consumption (from low to high dose) and cg05575921 methylation extent (Figure 2D). These 2 types of variables, when simultaneously considered, influenced the reversibility of cg05575921 hypomethylation in an additive manner. Nevertheless, it appeared to take ≥20 years before smoking-related AHRR cg05575921 hypomethylation in former smokers could revert after smoking cessation, irrespective of the smoking quantity (Figure 2E).
Next, we examined correlations between cg05575921 methylation extent, self-reported smoking behavior and AHRR mRNA expression among current smokers (Figure 3). Despite no apparent relationship observable between cumulative cigarette consumption and cg05575921 methylation extent among all current smokers in panel 2 (r=2.6×10−4, P=0.998, Figure 3A and 3C), there was a tendency of an inverse association between the 2 variables in a subgroup (current smokers with pack-years <100) of panel 2 (r=−0.197, P=0.090, Figure 3A) as well as in panel 3, in which all current smokers fell into pack-years <100 (r=−0.249, P=0.144, Figure 3B). Detailed examination further revealed that cg05575921 methylation extent would show a unimodal, relatively-broad distribution (kurtosis=0.42, skewness=0.13) in a combined population of current smokers (Figure 3D), and no significant tendency of association with smoking duration (r=−0.088, P=0.499, Figure 3E). In the SNP–methylation association analysis, we identified region-wide significant cis-interaction effects at 6 SNPs located in the AHRR gene (P=4.3×10−6 at rs2672777 and 5 proxy SNPs in modest linkage disequilibrium; Supplementary Figure and Supplementary Table 2).
cg05575921 methylation among current smokers. For current smokers, scatter plots show relationships between cg05575921 methylation extent and cumulative cigarette consumption (pack-years) in panels 2 (A) and 3 (B), with the linear regression depicted by dotted lines. In (A), there was no significant difference in distribution of cg05575921 methylation extent between CAD cases (in blue circles) and non-CAD subjects (in orange circles) in the range of <100 pack-years. (C) For panel 2, the relationship between 2 variables is also demonstrated by mean (±SEM) values in subgroups of current smokers according to cumulative cigarette consumption. (D) Histogram shows a unimodal distribution of cg05575921 methylation extent among part of the current smokers with smoking intensity of 1 pack/day (n=61 from a combined panel 2+3). (E) For part of the current smokers with smoking intensity of 1 pack/day, distribution of cg05575921 methylation extent (y-axis) is shown against years of smoking (x-axis). (F) Scatter plot shows a significant inverse correlation (P=0.0005, r=−0.411) between cg05575921 methylation extent (y-axis) and AHRR mRNA expression (x-axis), which does not appear to differ between CAD cases (in blue circles) and non-CAD subjects (in orange circles). CAD, coronary artery disease.
Moreover, among current smokers in panel 2, there was a significant inverse correlation (P=0.0005) between cg05575921 methylation extent and AHRR mRNA expression, irrespective of CAD status (Figure 3F).
We have identified AHRR cg05575921 methylation to be most significantly associated with smoking status through EWAS in a Japanese population, and have validated that smoking-related cg05575921 hypomethylation remains for a long period of time, ≥20 years, after smoking cessation (Figure 2E). This is in accordance with recent findings by a pooled meta-analysis in Asia;9 that is, adverse effects of tobacco smoking persist more than 2 decades. In contrast, cg05575921 hypomethylation appears to become evident from a relatively early stage of smoking; that is, <10 years, among current smokers (Figure 3A,3B).
In the present study, we also report 3 novel findings. First, among current smokers with extremely high (e.g., ≥100 pack-years) cigarette consumption, cg05575921 methylation extent is not necessarily reduced, compared to that among current smokers with relatively modest (e.g., <100 pack-years) cigarette consumption (Figure 3A–C), indicating the presence of individual differences in the dose-effect relation. Second, although modest in studied sample size, there is no apparent difference in adverse cardiovascular outcomes of smoking-related epigenetic changes at AHRR; that is, the degree of inverse association between cg05575921 methylation and AHRR mRNA expression does not differ between current smokers with and without CAD (Figure 3F). Third, smoking habits modulate genetic effects on AHRR cg05575921 methylation, highlighting the need for individual considerations of effective lifestyle intervention.
A fundamental question is whether smoking-related DNA methylation signature is a better indicator of the adverse health outcomes of smoking than self-reported smoking behavior or serum/urine cotinine levels. The challenge with resolving this question is that the development of smoking-related disorders such as CAD can be influenced by individual differences in nicotine metabolism. Roughly, 60–80% of the variability in nicotine clearance is heritable, with CYP2A6 genotype most strongly affecting smoking intensity.10,11 Smokers titrate their nicotine levels by cigarette consumption, number and volume of puffs and depth of inhalation. Accordingly, faster nicotine metabolizers smoke more cigarettes per day and inhale a greater puff volume during ad libitum smoking.12
Tobacco smoking causes endothelial dysfunction and atherosclerosis in the vascular wall, although a detailed understanding of the proatherogenic nicotine action remains elusive. Nevertheless, DNA methylation has recently gained prominence as a key biological mechanism, through which chemicals in tobacco smoke regulate smoke-related changes in gene expression. It has been reported that >20% of all genes show evidence of smoking-induced abnormal methylomic regulation13 and that part of such genes are associated with mRNA expression changes.3
DNA methylation patterns are affected by multiple latent factors, including genetic variation, environmental changes, and heritable and non-heritable changes in other epigenetic processes (e.g., chromatin structure).14 The integrated nature of epigenetics may account for a unimodal distribution of cg05575921 methylation extent among current smokers. In this context, it has been assumed that the susceptibility of CpG sites to undergo smoking-induced epigenetic alterations is to a certain extent modified by genetic variation.13 Also, several studies have shown that DNA methylation in some of the EWAS-identified loci is driven by local genetic variation,15,16 as observed in this study for rs2672777 and its proxy SNPs at AHRR.
Functional characterization has been made for the differentially methylated region (DMR) at AHRR (chr5:373,378–373,556 hg19),17 in which cg05575921 and 7 other CpGs are included and display similar directional changes in smoking-associated DNA methylation. Of these CpGs, cg05575921 is the one included in the Illumina EPIC array and has turned out to represent the AHRR DMR in EWAS. The AHRR DMR is located in a potential poised enhancer region; that is, the presence of both activating (H3K27ac/H3K4 me1) and repressive (H3K27 me3) histone modifications in a circulating monocyte. It has been reported that the relationship between local methylation state and mRNA expression at AHRR may be mediated by transcription of enhancer RNA,18 which is suggested to play an important role in gene regulation,19 although the precise mechanisms remain unclear.
Thus far, a number of EWAS have reported significant association of AHRR cg05575921 methylation extent with non-smoking traits such as C-reactive protein,20 lung function21 and post-traumatic stress disorder (PTSD)22 (Supplementary Table 4). Although some of the non-smoking trait–cg05575921 associations can be primarily mediated through smoking-induced epigenetics, the others (e.g., PTSD) are assumed to be independent of smoking status, reflecting pleiotropy.
In summary, we have assessed DNA methylation as a new tool that could supplement self-report or existing biomarkers for smoking, aiming to increase the sensitivity and specificity in the clinical applications. Although smoking cessation is the single, most effective healthcare intervention, the success rate of smoking cessation in a general population is only 10%.23 With respect to this, reversion of AHRR hypomethylation is expected to be a quantifiable biomarker for progress in and observance of smoking cessation. Thus, our data not only contribute to basic science but also help improve the applications of DNA methylation signature to medical care of real-world patients.
We are grateful to staff of the Research Institute, National Center for Global Health and Medicine, for their technical assistance with DNA preparation and epigenetic analysis.
None.
This study was supported by a grant (19A2004) from the National Center for Global Health and Medicine.
Ethics committee from the National Center for Global Health and Medicine (reference numbers 222 and 292) approved this study.
The deidentified participant data will not be shared except for the one presented in the manuscript.
Please find supplementary file(s);
http://dx.doi.org/10.1253/circj.CJ-21-0958