Population-Based Impact of Smoking, Drinking, and Genetic Factors on HDL-cholesterol Levels in J-MICC Study Participants

Background Environmental and genetic factors are suggested to exhibit factor-based association with HDL-cholesterol (HDL-C) levels. However, the population-based effects of environmental and genetic factors have not been compared clearly. We conducted a cross-sectional study using data from the Japan Multi-Institutional Collaborative Cohort (J-MICC) Study to evaluate the population-based impact of smoking, drinking, and genetic factors on low HDL-C. Methods Data from 11,498 men and women aged 35–69 years were collected for a genome-wide association study (GWAS). Sixty-five HDL-C-related SNPs with genome-wide significance (P < 5 × 10−8) were selected from the GWAS catalog, of which seven representative SNPs were defined, and the population-based impact was estimated using population attributable fraction (PAF). Results We found that smoking, drinking, daily activity, habitual exercise, egg intake, BMI, age, sex, and the SNPs CETP rs3764261, APOA5 rs662799, LIPC rs1800588, LPL rs328, ABCA1 rs2575876, LIPG rs3786247, and APOE rs429358 were associated with HDL-C levels. The gene-environmental interactions on smoking and drinking were not statistically significant. The PAF for low HDL-C was the highest in men (63.2%) and in rs3764261 (31.5%) of the genetic factors, and the PAFs of smoking and drinking were 23.1% and 41.8%, respectively. Conclusion The present study showed that the population-based impact of genomic factor CETP rs3764261 for low HDL-C was higher than that of smoking and lower than that of drinking.


INTRODUCTION
Low serum levels of HDL-cholesterol (HDL-C) are associated with an increased risk of cardiovascular disease (CVD). 1,2 As clinically available drugs that can enhance HDL-C levels are limited, genetic and environmental factors play an important role in the alleviation of CVD risk. Smoking, alcohol intake, physical activity, BMI, and diet intake have been confirmed to be environmental factors that affect HDL-C levels. [3][4][5][6] The effects of genetic factors, such as single nucleotide polymorphism (SNPs) in various enzymes-encoding genes, on HDL-C levels have also been reported. 7 Although the regulation of HDL-C metabolism is a complex process, enzymes in the reverse cholesterol transport (RCT) system, such as ABCA1, LCAT, cholesteryl ester transfer protein (CETP), hepatic lipase (LIPC), APOA1/C3/A4/A5, scavenger receptor class B type I (SCARB1), and LPL, play a major role in it. 2 Multiple SNPs have been reported to be associated with HDL-C levels, and among the genes harboring such SNPs, the genetic variants of CETP have been observed to exert a greater influence on HDL-C levels. [8][9][10][11] Furthermore, besides association with SNPs in RCT-related genes, the association with several other SNPs, such as those in genes encoding endothelial lipase (LIPG) and APOE, which are related to lipoprotein dynamism, has been reported. 10,12 The majority of studies on environmental and genetic factors that affect HDL-C levels focus on factor-based association with respect to individual risk and susceptibility, and the populationbased impact of environmental and genetic factors on HDL-C levels has not been clearly evaluated. The population-based impact of a factor is an important aspect for public health. The population-based impact of various environmental factors on HDL-C levels can be estimated based on the impact of the association and prevalence of each factor. However, the population-based impact of genetic factors is difficult to estimate, because several SNPs are detected in each enzyme-encoding gene; the impact of the association of each SNP with HDL-C levels will differ, and the prevalence of the allele containing each SNPs will differ as well. Therefore, studies that investigate the combined effect of HDL-C-related SNPs limit their assessment to certain representative SNPs. 9 Furthermore, gene-environment interaction may influence HDL-C levels as well. 13,14 Among environmental factors, smoking and drinking habits significantly affect the reduction or increase in HDL-C levels, respectively. 2,9,15 These factors are suitable candidates for the estimation of the population-based impact of environmental factors on HDL-C levels, while also taking into account the interaction with genetic factors. In such cases, GWAS are suitable for evaluating the overall scenario. GWAS on the effects of HDL-C-related SNPs on ethnic populations, including the Japanese population, have been performed earlier, and all HDL-C-related SNPs have been listed in the catalog. 16,17 To investigate the population-based impact of smoking, drinking, and genetic factors on low HDL-C, we conducted a relatively large-sized cross-sectional study using data on environmental factors and GWAS from the Japan Multi-Institutional Collaborative Cohort (J-MICC) Study.

Study population
The J-MICC Study was a large-scaled cohort study that commenced in 2005; it investigated the host-and environmentrelated factors that affect cancer and other lifestyle-related diseases. [18][19][20] In brief, data on the lifestyles and medical history of patients were collected using questionnaires, while blood samples and health checkup results were collected during the baseline survey after written informed consent was obtained. The participants were recruited from among health-checkup examinees by the local government, private companies, and health checkup centers; responders who posted responses to regional residents and first-visit outpatients at cancer center. The subjects (n = 14,555) of the GWAS selected from among the J-MICC Study participants were aged from 35-69 years and belonged to 11 prefectures of Japan (Chiba, Shizuoka, Aichi, Shiga, Kyoto, Tokushima, Fukuoka, Saga, Nagasaki, Kagoshima, and Okinawa); participants were selected by ten research institutes and universities. The present study excluded data that did not include information on HDL-C levels (all participants [n = 2,296] from the Chiba study region and the Aichi Cancer Center and some participants [n = 187] from other institutes), smoking (n = 180), and drinking (n = 24); and from cases of withdrawal (n = 21). Data from certain subjects qualified for multiple exclusion criteria. The final number of eligible subjects was 11,498 (the dataset used in the present study was decided upon on March 12, 2020, version 20200312).
The ethics committees of Nagoya University Graduate School of Medicine, Kagoshima University Graduate School of Medical and Dental Sciences, and other participating institutes and universities approved the protocol.

Questionnaire survey
A standardized structured questionnaire was used in the J-MICC Study to collect information regarding lifestyle factors and medical history of the subjects. 19 The questionnaire was evaluated by trained staff to ensure completeness and consistency.

HDL-C level assessment
Venous blood samples were collected from the subjects in sitting position during a period of fasting. The mean duration of fasting was 9.8 h. The blood samples were separated into serum, plasma, and buffy coat fractions, and stored directly at −80°C on the day of sampling. The serum HDL-C levels were measured at the respective institutes for health checkup or medical examination in each study region. 21 Quality of samples and SNPs during genotyping DNA was extracted from the buffy coat fractions using a BioRobot M48 Workstation (Qiagen Group, Tokyo, Japan) at Nagoya University, using samples from all regions except Fukuoka and KOPS (Kyushu and Okinawa Population Study); DNA was extracted from the samples from these two regions at Kyushu University using an automatic nucleic acid isolation system (NA-3000; Kurabo, Co., Ltd, Osaka, Japan). Next, the DNA samples were genotyped at the RIKEN Center for Integrative Medicine using a HumanOmniExpressExome-8 v1.2 BeadChip array (Illumina Inc., San Diego, CA, USA). The number of low-quality DNA samples was 463, which were excluded from the analysis. The subjects for whom sex information in the questionnaire was inconsistent with that revealed by the genotyping results were excluded. Furthermore, the identity-by-descent method implemented in the PLINK 1.9 software 22 was used to identify close relationship pairs (pi-hat Smoking, Drinking, and Genetic Factors Affect HDL-cholesterol >0.1875) and the sample from each pair was excluded. The subjects (n = 34) with non-Japanese estimated ancestries 23 were also excluded by principal component analysis (PCA) 24 using a 1,000 Genomes reference panel (phase 3). 25 SNPs with a genotype call rate <0.98, a Hardy-Weinberg equilibrium exact test P-value <1 × 10 −6 , and a low minor allele frequency (MAF) <0.01, or a departure from the allele frequency computed from the 1,000 Genomes Phase 3 EAS (East Asian) samples; and non-autosomal SNPs were excluded. Such quality control filtering resulted in 14,091 individuals and 570,162 SNPs.

Genotype imputation and post-imputation quality control
The imputation of genotypes in autosomal chromosomes was performed using SHAPEIT2 26 and Minimac3 27 software with the 1,000 Genomes reference panel (phase 3). 25 The imputation procedure displayed 47,109,431 SNPs from 570,162 SNPs.
The SNPs with imputation quality r 2 < 0.3 were excluded in the post-imputation quality control step. The number of eligible SNPs was 12,617,547.

Selection of HDL-C-related SNPs
On August 27, 2019, HDL-C-related SNPs were systematically selected from the GWAS catalog (https://www.ebi.ac.uk/gwas/) (the database of published GWAS), which included 499 SNPs from all ethnic population. 16,17 Next, 65 SNPs among these were selected for the present study, which had P-values of genome-wide significance (P < 5 × 10 −8 ) in the present analysis (eTable 1). The Q-Q plot showed the apparently different distribution of the present observed log 10 (P-value) of the 65 SNPs against the expected log 10 (P-value) ( Figure 1). Although the association for rs921919 in SCARB1 (12q24.31) indicated genome-wide significance, this was not included in the present analysis because this SNP was not previously reported to be associated with HDL-C levels and were not listed in the GWAS catalog. Other SNPs in SCARB1 listed in the GWAS catalog were not genome-wide significant in the present analysis.

Statistical analysis
The subjects were divided into two categories based on the smoking status ("never" and "former" [≥1 year] vs "current" [include smokers within 1 year after quitting]), because the HDL-C levels apparently differed between subjects with the "current" and "never" statuses, and with respect to the duration after quitting. The subjects were also divided into two categories based on the drinking status (non-, former, and current moderate drinkers [<20 g/day] vs current heavy drinkers [≥20 g/day]), as the Japanese Ministry of Health, Labour and Welfare recommends alcohol intake in moderation (at <20 g/day); the HDL-C levels apparently differed between the two categories. 28 The duration and intensity of daily activity (hard work and walking) and the frequency and intensity of habitual exercise were used to estimate the metabolic equivalents (METs). The estimation of METs·hour per day was based on the duration and intensity of exercise, with 3.0 for walking, 3.3 for light exercise, 4.0 for moderate exercise, 4.5 for heavy work, and 8.0 for heavy exercise. 29 Daily activity was classified as <8.25 METs·h/day and ≥8.25 METs·h/day at the median value. Habitual exercise was classified as <0.728 METs·h/day and ≥0.728 METs·h/day at the median value. Egg intake was selected as a representative HDL-C-related dietary factor. 2,9 There were two categories for BMI with comparable number of male and female subjects in each. The association between HDL-C levels (continuous) and non-genetic factors, such as smoking and drinking habits, was tested using multivariate linear regression analysis after adjusting for the following HDL-C-related factors: age (<57 vs ≥57 years), sex, smoking, drinking, daily activity, habitual exercise, egg intake, and BMI. Dummy variables of 0 and 1 were used for all independent variables. Statistical analyses for non-genetic factors were performed using Stata software (version 12; Stata Corp., College Station, TX, USA), and differences with P-value <0.05 were considered statistically significant.
The selected HDL-C-related 65 SNPs were divided into seven categories based on the gene and cytoBand groups (eTable 1). The Manhattan plot for total SNPs in the present GWAS consistently showed seven peaks with genome-wide significance, with the exception of a single peak corresponding to rs921919 in SCARB1 with genome-wide significance yet unlisted in the GWAS catalog ( Figure 2). Next, the seven SNPs with the highest coefficients and lowest P-values from each of the seven groups were selected. The association between HDL-C levels (continuous) and genetic factors, and the interaction were tested using multivariate linear regression analysis in epacts v3.2.6 software (https://genome.sph.umich.edu/wiki/EPACTS), after adjusting for the HDL-C-related factors and first five principal components. Dummy variables of 0, 0.5, and 1 were used for the number of alternative alleles (0, 1, and 2) as independent variables in order to compare the impact of coefficients on non-genetic factors (dummy variables of 0 and 1), and the coefficients and 95% confidence intervals (CIs) were estimated. Differences with α = 5 × 10 −8 were considered statistically significant in the GWAS. We applied the Bonferroni correction (P < 0.00077) for evaluating the differences in interaction of smoking or drinking The vertical and horizontal axes indicate observed and expected %log 10 (P value) for tests of association between SNPs and HDL-C, respectively. GWAS, genome-wide association study; HDL-C, high-density lipoprotein cholesterol; SNP, singlenucleotide polymorphism.
with the 65 SNPs to reduce the chances of introducing an alpha error by multiple hypothesis testing. The population-based impact of the non-genetic and genetic factors was estimated using population attributable fraction (PAF). 30,31 First, the odds ratio (OR) for low HDL-C (<40 mg/dL) was estimated, and the PAF was calculated as; where P is the proportion of the exposure in subjects with low HDL-C. The reference exposure group was defined as those with the minimum risk for low HDL-C, ie smoking habit ("never" and "former" [≥1 year]), drinking habit (≥20 gram alcohol/day), daily activity (≥8.25 METs/day), habitual exercise (≥0.73 METs/day), egg intake (≥3 times/week), BMI (<23.0 kg/m 2 ), age (<57 years), and sex (women) in the non-genetic factors; and rs3764261, rs662799, rs1800588, rs328, and rs3786247 (referent and alterative allele hetero-genotype, and alterative allele homogenotype), as well as rs2575876 and rs429358 (referent allele homo-genotype), in the genetic factors. Dummy variables of 0 and 1 were used for both the non-genetic and genetic factors. When the PAF of the combined SNPs was estimated, the accumulation in 6 SNPs was categorized according to the number of the highrisk genotypes for low HDL-C by individual regardless kind of SNPs (ie, 0-1 SNPs for reference, 2 SNPs, 3 SNPs and 4-6 SNPs). The SNP of rs1800588 was excluded from this accumulation analysis, because the OR for low HDL-C was not statistically significant. The ORs and their 95% CIs were estimated using logistic model after adjusting for age, sex, smoking, drinking, daily activity, habitual exercise, egg intake, and BMI.
The HDL-C levels varied for each genotype group based on the smoking and drinking status ( Table 4). The highest HDL-C level (mean 74.6; 95% CI, 70.8-78.4 mg/dL) was observed in heavy drinkers with the rs3764261 alternative homo-genotype (AA), while the lowest was observed in current smokers with the rs662799 referent homo-genotype (GG) and hetero-genotype (GA). The gene-environment interactions between the seven SNPs and smoking/drinking were not statistically significant, and the lowest P-value of 0.004 was higher than the P-value obtained after applying Bonferroni correction (P < 0.00077). These interactions were not statistically significant for all 65 SNPs selected from the GWAS catalog (eTable 1). No significant interaction was observed in the subgroup analysis based on sex (data not shown in eTable 1).
The ORs for low HDL-C were statistically significant for several non-genetic factors, including smoking, drinking, BMI, age and sex, and for the genetic factors, and six of the seven SNPs (except rs1800588) ( Table 5). The PAF for low HDL-C in the non-genetic factors was the highest in men (63.2%), and the PAFs of smoking and drinking were 23.1% and 41.8%, respectively. The PAF for low HDL-C in the genetic factors was the highest in rs3764261 (31.5%), which was higher than that of smoking and lower than that of drinking. The impact of the PAFs of three SNPs (25.5%) and 4-6 SNPs (23.7%) according to the number of SNPs with high-risk genotype for low HDL-C was similar to that of smoking, although the ORs for low HDL-C showed an apparent increasing trend with the number of SNPs with higherrisk genotype (P < 0.001).

DISCUSSION
In the present study, we observed significant associations between HDL-C levels and smoking, drinking, daily activity, habitual exercise, egg intake, BMI, age, sex, and seven SNPs in CETP, APOA5, LIPC, LPL, ABCA1, LIPG, and APOE. The PAFs, as a population-based impact, for low HDL-C were the highest in men on the non-genetic factors and in CETP rs3764261 on the genetic factors. The impact of the genetic factor PAF was higher than that of smoking and was lower than that of drinking.
Genetic factors that affect HDL-C levels, such as SNPs, are primarily associated with genes that encode enzymes from the RCT system, such as ABCA1, LCAT, CETP, LIPC, APOA1=C3= A4=A5, SCARB1, and LPL. 2,7 The SNPs in the corresponding genes, except those in LCAT and SCARB1, were considered among the seven major SNPs selected in the present analysis. The SNPs in SCARB1 were not included because the two SNPs with genome-wide significance were not listed in the GWAS catalog, and the lowest P-value for the SCARB1 SNP (rs838886) listed in the catalog was higher than the genome-wide significance (P = 7.34 × 10 −8 ; data not shown in eTable 1). As the MAF of LCAT was less than 0.01, the SNPs of LCAT were excluded from the GWAS analysis. The SNPs in LIPG and APOE, which are associated with HDL-C production via a system different from RCT, were also considered among the seven major SNPs. 10,12 The genetic variants of CETP were observed to exhibit the most significant influence on HDL-C levels, which was concordant with findings from previous reports. [8][9][10] Cigarette smoking is associated with lower HDL-C levels, even though the mechanisms are yet to be completely elucidated. Certain studies have shown that smoking is related to ApoA1 concentration 13 and CETP activity 14 ; however, these results could    Smoking, Drinking, and Genetic Factors Affect HDL-cholesterol be considered controversial. 32,33 Alcohol consumption is reported to be associated with increased expression of ABCA1 34 and a higher APOA1 concentration 35 in peripheral blood and a lower CETP activity. 36 In the present study, the interaction of the 65 and seven SNPs with drinking was not statistically significant after Bonferroni correction was applied. Previous studies reported significant association of alcohol consumption and polymorphisms in multiple genes (CETP, APOA1/A2, LPL, ADH3, ADH1, and ALDH2) with HDL-C levels. [37][38][39][40][41] The association between CETP and ABCA1 expressions, and alcohol consumption has been also reported in previous studies, but their mechanism is not clear. 34,36 However, no genome-wide significance was reported in the genealcohol interaction for CETP, APOA5, LIPC, and LPL in a particular GWAS. 42 The interaction between each SNP and smoking was also not statistically significant after Bonferroni correction was applied. These results suggest that genetic factors may have a minor or negligible impact on the interaction with drinking and smoking.
Several studies have previously reported the association between SNPs and HDL-C levels, which have been listed in the GWAS catalog. In the present study, we selected the 498 SNPs listed in the GWAS results that were a part of the J-MICC Study and observed 65 SNPs with genome-wide significance for the analysis. We selected seven SNPs according to the gene and cytoBand groups. The Manhattan plot for total SNPs consistently showed seven peaks, except that for SCARB1. These observations support proposition that the seven SNPs are appropriate representatives of the SNPs associated with HDL-C levels in the present analysis.
In the present study, we investigated the population-based impact of both non-genetic and genetic factors on low HDL-C, using PAF. The OR for low HDL-C was used as the relative risk when the PAF was calculated, because the prevalence of low HDL-C was obtained from the baseline general population and its rate was relatively low (5.0% in both sexes). 30,31 To the best of our knowledge, studies investigating the PAF for low HDL-C with non-genetic and/or genomic factors have not yet been conducted. The highest PAFs was observed in men on the nongenetic factors and in CETP rs3764261 on the genetic factors. The impact of the genetic factor PAF was higher than that of smoking and was lower than that of drinking. These observations suggest that, from a public health perspective, the populationbased impact of genomic factors for low HDL-C is comparably high compared to non-genetic factors.
The strength of this study is that the population-based impact of non-genetic and genetic factors on HDL-C levels was evaluated simultaneously using data from an adequate number of subjects and total gene information. To our knowledge, this is the first comprehensive report on the population-based impact of the abovementioned factors.
Meanwhile, the present study has several limitations. First, a causal relationship was not confirmed, as this is a cross-sectional study. Second, atheroprotective and non-atheroprotective HDL particles were jointly considered as total HDL-C. The two fractions of HDL2-C and HDL3-C have different effects on CVD risk. 2 Third, the present study selected seven representative SNPs to estimate the population-based impact; the highest impact may have been estimated because the highest coefficients of the seven representative SNPs were selected based on the gene and cytoBand groups. Fourth, the replication test on GWAS was not conducted, because the present study used information from the GWAS catalog in which the association between SNPs and HDL-C levels had been estimated and published previously. Fifth, the effect of residual SNPs (those apart from the 65 SNPs), referred to as "missing heritability", was not considered. The polygenic risk score may support the estimation of this effect. 43 Sixth, PAF valid only in the absence of confounding and/or effect modification. 30 The lack of unknown data on confounding is likely to misestimate the true PAF, the extent to which is dependent on the magnitude of confounding. 31 Furthermore, PAF estimate is restricted by time and population and depends on the quality and representativeness of the exposure and risk data.
In conclusion, the present study demonstrated that the population-based impact of genomic factor CETP rs3764261 for low HDL-C was higher than that of smoking and lower than that of drinking.