Two polymorphisms, rs2046210 and rs3803662, are associated with breast cancer risk in a Vietnamese case-control cohort

Breast cancer is the most common cancer in women worldwide. Breast tumorigenesis encompasses both extrinsic and intrinsic factors. Among intrinsic aspects, the appearance of DNA variation can cause genetic instability, which may lead to carcinogenesis. Genome-wide association studies have found several potential breast cancer-associated single nucleotide polymorphisms (SNPs) in many different populations. Among these, seven (rs2046210, rs1219648, rs3817198, rs3803662, rs889312, rs10941679 and rs13281615) have been shown to be signiﬁcantly associated with breast cancer risk in various populations including those very similar to the Vietnamese. Here, therefore, we have investigated the relationship between these SNPs and breast cancer risk in a Vietnamese population case-control cohort. Real-time PCR high-resolution melt analysis was performed to genotype 300 breast cancer cases and 325 healthy controls, and the association between the seven SNPs and breast cancer risk was determined by analyzing the differences in allelic and genotypic frequencies between case and control groups using R soft-ware. While ﬁve of the seven showed no association with breast cancer, there was a relationship between the other two SNPs, rs2046210 and rs3803662, and the risk of developing this disease in Vietnamese women. The A allele is the risk allele for both rs2046210 (OR [95% CI] = 1.43 [1.14 – 1.78], P = 0.0015) and rs3803662 (OR [95% CI] = 1.45 [1.16 – 1.83], P = 0.001). We conclude that two polymorphisms, rs2046210 in ESR1 and rs3803662 in TNRC9 , are associated with breast cancer risk in the Vietnamese population.


INTRODUCTION
Breast cancer is one of the most common malignancies worldwide. This is a polygenic disease in which intrinsic factors play an essential role in disease etiology (Nathanson et al., 2001;Balmain et al., 2003). Highpenetrance breast cancer susceptibility genes, such as BRCA1 and BRCA2, are the cause of 20% of the familial risk, which constitutes only about 5-10% of breast cancer cases in the general population due to their low mutation occurrence ( < 0.1%) (Easton, 1999;Ripperger et al., 2009). The remaining cases related to genetic causes may involve medium-and low-penetrance breast cancer susceptibility genes whose variation is exhibited in a population with high frequency ( > 0.1%). Although the effects of individual single-nucleotide polymorphisms (SNPs) are often weak and pose a low breast cancer risk, this risk can be influenced significantly when SNPs interact with each other (Pharoah et al., 2002). With a higher frequency in the population and higher association with risk, SNPs in the low-penetrance genes have greater potential to become genetic markers of breast cancer in the population (Ripperger et al., 2009). In this study, seven SNPs in seven low-penetrance genes were investigated due to their known function in breast cancer development. In particular, four genes, estro-gen receptor 1 (ESR1), fibroblast growth factor receptor 2 (FGFR2), lymphocyte-specific protein 1 (LSP1) and mitogen-activated protein kinase kinase kinase 1 (MAP3K1), are involved in the mitogen-activated protein kinase pathway, which plays a crucial role in numerous fundamental cellular processes such as proliferation, differentiation, motility, the stress response, apoptosis and survival (Huang et al., 1997;Santen et al., 2002;Pham et al., 2013;Ornitz and Itoh, 2015). The other three genes are plasmacytoma variant translocation 1 (PVT1), mitochondrial ribosomal protein S30 (MRPS30) and trinucleotide-repeat-containing 9 (TNRC9)/TOX high mobility group box family member 3 (TOX3), which function in apoptosis and DNA repair (Guan et al., 2007;Shan et al., 2013;Quigley et al., 2014). Thus, SNPs that occur in these genes may cause abnormal expression of an encoded protein, which, in turn, could lead to aberrant proliferation of breast tissue cells and finally result in an increase in breast cancer risk.
In addition to these association studies, many metaanalyses have been conducted to increase statistical power, to resolve conflicting results between individual studies, and to improve estimates of the size of the effect (Hunter and Schmidt, 1990). The results have demonstrated consistency, with all seven SNPs shown to be significantly associated with an increased risk of breast cancer among different ethnic groups, including Asians and Europeans, with P values lower than 0.03 and OR (95% CI) values ranging from 1.07 (1.01 -1.13) to 1.62 (1.44 -1.83) (Wang et al., 2013;Zheng et al., 2014;Tang et al., 2016;Wang et al., 2016;Yang et al., 2016;Zhang et al., 2016;Hu et al., 2017). Interestingly, two SNPs, rs889312 and rs13281615, were shown not to be associated with breast cancer risk in Africans (Zheng et al., 2014;Zhang et al., 2016).
There has not yet been an adequate evaluation of these SNPs in other populations, including Vietnamese. Thus, in this study, we investigated these seven SNPs to determine the association between these SNPs and breast cancer risk in a sample of the Vietnamese population with the aim to add to the literature and to improve the management of breast cancer in the future, not only in Vietnam, but all over the world.

MATERIALS AND METHODS
Subjects Breast cancer cases were identified by the presence of a malignant tumor in the breast of patients who underwent surgery in the Oncology Hospital in Ho Chi Minh City, Vietnam. Average age of the patients was 47.8 ± 4.7 years. Healthy controls were healthy female volunteers who were confirmed to be cancerfree by an annual health check. Average age of the healthy control group was 46.3 ± 5.0 years. Blood samples were collected from all participants (controls were matched to the cases according to age and ethnic grouping, i.e., Kinh -Vietnamese). This study was approved by the Ethical Committee of Oncology Hospital -HCMC Vietnam under the decision number 177/HÐÐÐ-CÐT, 18th November 2014.
Blood samples were collected from 300 breast cancer cases and 325 healthy controls. Genomic DNA was extracted from whole blood using a salting-out method following Hue et al.'s protocol (Hue et al., 2012) with some modifications. DNA samples were evaluated by spectrophotometry using the NanoDrop 1000 Spectrophotometer (Thermo Fisher Scientific, USA) to determine DNA concentration and purity (Huberman, 1995;Sahota et al., 2007;Chacon-Cortes and Griffiths, 2014). The purity of the DNA samples was validated by the value of the 260 nm/280 nm absorbance ratio, which should be from 1.7 to 2.0 (Sambrook and Russell, 2001).

SNP genotyping
The DNA sequence region of the candidate SNPs was obtained from GenBank (Cerutti et al., 2016), and primers for real-time PCR high-resolution melt (HRM) analysis were designed using Primer3plus, Umelt Hets (ttps://www.dna.utah.edu/hets/umh.php), Primer-Blast (https://www.ncbi.nlm.nih.gov/tools/primer-blast/) and OligoAnalyzer 3.1. Optimization was performed for each set of primers to determine suitable conditions for HRM analysis where three different genotypes of each SNP could be recognized easily. The SNP genotyping assays using optimal HRM analysis were executed by LightCycler 480 High-Resolution Melting Master (Roche Diagnostics, Germany) and a LightCycler 96 Instrument with a 96-well thermal block (Roche Diagnostics). The optimal designed primer sequences are shown in Table 1. PCR and HRM analysis was performed in a 6-μl reac-tion containing 1X LightCycler 480 High-Resolution Melting Dye, 0.2 μM forward and reverse primers, optimal MgCl 2 concentration for each SNP (Table 2), 10-20 ng of genomic DNA and PCR-grade water. Three PCR amplification steps consisted of an initial pre-incubation at 95 °C for 300 s followed by 40 cycles of a denaturation step at 95 °C for 30 s, an annealing step at the optimal annealing temperature for each SNP for 30 s (Table 2), and an elongation step at 72 °C for 30 s. HRM analysis was carried out in a four-step protocol including 95 °C for 90 s, 40 °C for 60 s, 65 °C for 30 s, and 95 °C for 1 s in continuous acquisition mode with temperature ramp at 0.04 °C/s. Finally, the reactions were cooled at 4 °C. One negative control and three positive controls (genotypes confirmed by Sanger sequencing) were included in each run. Typical HRM genotyping results are shown in Fig. 1. To identify the genotype of a sample, four criteria based on the melting curve and the value obtained from the DNA amplification through the real-time PCR HRM were taken into consideration: (1) the amplification value (Ct value), which must be lower than 30; (2) the melting peaks, between which the ΔT of two homozygotes must be higher than 0.05; (3) the normalized melting curves, which must show that the heterozygote curve cuts the homozygote with the lower T m in the middle and does not cut the other homozygote curve; and (4) the alternative plot, in which the lower-T m homozygote is set as the baseline and the heterozygote curve must divide into two sides of the baseline. The homozygote with the higher T m curve must be on the upper side ( Fig. 1).
All SNPs in the Primary Set, comprising 151 breast cancer cases and 161 healthy controls that were randomly selected from the Full Set, were initially genotyped to identify genotype and allele frequencies for examination of the potential polymorphism and association. Further validation for potential associated SNPs was performed in the remaining 149 breast cancer cases and 164 healthy controls to make up the Full Set of 300 breast cancer cases and 325 healthy controls.

Statistical analysis
The association between the examined SNPs and breast cancer risk was analyzed statisti-cally using R version 3.3.2. The threshold to determine a statistically significant association was set at P = 0.05. Genotype and allele frequencies of each SNP in the population were calculated as percentages. Initially, Hardy-Weinberg equilibrium (HWE) was used to evaluate allele distribution for identification of unexpected population or genotyping biases (Hardy, 1908;Guo and Thompson, 1992). This test was used for validation of suitable genotyping data for association analysis. Thereafter, a chi-squared test was performed to evaluate differences in genotype and allele frequencies between cases and controls in the population (Fisher and Yates, 1963). The OR and 95% CI were obtained to estimate the relation between the allele or genotype and the disease risk.

RESULTS
Primary investigation of seven selected SNPs for further analysis Genotyping results from 151 cases and 161 controls were analyzed to ensure the selected SNPs were potential polymorphisms for association analysis in the Full Set. All seven SNPs were highly variable polymorphisms in the Primary Set of the Vietnamese population, with the frequency of the minor allele ranging around 19% (Table 3). In addition, the HWE test showed an equal distribution of the genotypes in the population with the P values being > 0.05 (Table 3), particularly in the control group. The chi-squared test for the Primary Set indicated that there was no significant difference (P > 0.05) between cases and controls for both allelic and genotypic fre-quencies of five SNPs: rs1219648, rs3817198, rs889312, rs10941679 and rs13281615 (Table 4). There was thus no association between these SNPs and the risk of breast cancer in the Primary Set. The minor allele of each SNP was suspected to be the risk allele; however, in this case, the minor alleles of these five SNPs may have no effect or a weak effect on disease risk. These primary results did not support further analysis in the bigger sample size due to their low potential to become markers for breast cancer risk.
In contrast, the remaining two SNPs, rs2046210 and rs3803662, showed a strong association with risk of breast cancer in this small sample set. The allelic association analysis demonstrated significant differences between cases and controls with a P value of 0.01 and 0.03, respectively. The A allele for both SNPs was associated with a 1.51-fold increase in breast cancer risk (Table  3). However, the genotypic analysis did not support the association: a chi-squared test showed that the genotype containing the suspected risk allele does not affect the risk of breast cancer even though it is carrying two risk alleles (P > 0.05) ( Table 4). The small sample size may be a limitation of this analysis. Nevertheless, together with the information about the strong association between the risk allele and the disease, the SNPs rs2046210 and rs3803662 were considered more likely to be potential SNPs for further analysis. The allelic analysis, in this case, supports the association analysis in the Full Set to confirm the potential of associated SNPs.
Validation of two SNPs significantly associated with breast cancer risk in the Primary Set Genotyping was continued in the remaining samples to evalu-    ate the Full Set of 300 cases and 325 controls, and the frequencies of alleles and genotypes were assessed (Table  5). Statistical analysis depicted significant differences between cases and controls in the Vietnamese population cohort at both the allelic level (P = 0.001) and the genotypic level for rs2046210 and rs3803662 (P = 0.007 and P = 0.004, respectively). The chi-squared P values for genotype and allele frequencies obtained from the Full Set were 10 times smaller than those obtained from the Primary Set (Table 4). Further association analysis of genetic models of rs2046210 and rs3803662 was carried out (Table 6). In the allelic model, the A allele of rs2046210 (OR [95% CI] = 1.43 [1.14 -1.78], P = 0.0015) and rs3803662 (OR [95% CI] = 1.45 [1.16 − 1.83], P = 0.001) strongly increased the risk of breast cancer development. The allelic and genotypic analysis of the Full Set again confirmed the strong associations of these two potential SNPs and breast cancer risk in Vietnamese women.
To increase specificity regarding which genotypes could increase breast cancer risk, the genotypic model analysis was conducted for rs2046210 (Table 7) and rs3803662 (Table 8). A breast cancer risk association for both SNPs was observed in the additive and dominant models. In the additive model, the presence of the homozygous AA genotype elevated the risk of the disease by 2.05-fold for rs2046210 (OR [95% CI] = 2.05 [1.31 − 3.22], P = 0.0017) and 2.17-fold for rs3803662 (OR [95% CI] = 2.17 [1.36 − 3.45], P = 0.001). On the other hand, the AG genotype appears not to be associated with breast cancer risk. This additive analysis indicated that the A allele of each SNP may have a recessive effect in its contribution to the risk. This was confirmed by the dominant model analysis, which demonstrated that either one or two copies of allele A of rs2046210 and rs3803662 are required for a 1.56-fold increase (OR [95% CI] = 1.56 [1.12 -2.17], P = 0.0089) and a 1.53-fold increase (OR [95% CI] = 1.53 [1.05 -2.21], P = 0.025) in disease risk, respectively.

DISCUSSION
In this study, we have genotyped seven SNPs located in seven genes involved in various breast cancer pathways including proliferation, differentiation, motility, the stress response, apoptosis and DNA repair (Huang et al., 1997;Santen et al., 2002;Guan et al., 2007;Pham et al., 2013;Shan et al., 2013;Quigley et al., 2014;Ornitz and Itoh, 2015) in a Vietnamese breast cancer case-control cohort. When comparing the allelic and genotypic frequencies from this study with those in Kinh Vietnamese  in Ho Chi Minh City (1000 Genomes), we found that they corresponded closely (Table 3). This indicates that genotyping for these SNPs was reliable. Among the genotyped SNPs, five (rs1219648, rs3817198, rs889312, rs10941679 and rs13281615) were found to not show an association with breast cancer risk, but the remaining two (rs2046210 and rs3803662) showed a strong association with the risk of breast cancer in our Vietnamese population cohort.
In the literature, five SNPs (rs1219648, rs3817198, rs889312, rs10941679, and rs13281615) were shown to be strongly associated with breast cancer risk in Europeans and Asians (Antoniou et al., 2009;Shan et al., 2012;Wang et al., 2013;Sawyer et al., 2014;Siddiqui et al., 2014;Campa et al., 2015;Ghoussaini et al., 2016;Mazhar et al., 2016;Tang et al., 2016;Zhang et al., 2016;Hein et al., 2017), as well as in populations similar to the Vietnamese, namely Chinese and Taiwanese populations (Liang et al., 2008;Long et al., 2010;Zheng et al., 2010;Kuo et al., 2017). Furthermore, among these populations, these five SNPs have been shown to be associated with an increased risk of breast cancer with the highest OR (95% CI) of 3.06 (1. 79-5.25). Nevertheless, in this study, we found that these SNPs were not significantly associated with breast cancer risk in our Vietnamese cohort, with P values of genotype and allele frequencies being higher than 0.05 (Table 4). These contradictory results indicate that the five SNPs are associated with breast cancer risk in a specific ethnicity and population (Mizoo et al., 2013). It is possible that these SNPs are related to breast cancer progression, instead of disease risk, due to their being located in low-penetrance genes: FGFR2, LSP1 and MAP3K1 are involved in proliferation and differentiation, while MRPS30 and PVT-1 play a role in apoptosis and DNA repair pathways. However, the specific mechanism by which these SNPs affect the risk of breast cancer remains unclear.
The two remaining SNPs, rs2046210 and rs3803662, were found to be associated with breast cancer risk in our Vietnamese cohort, with P values less than 0.007 (Table  6). In addition, these results correlate with other association studies of rs2046210 and rs3803662 in Europeans and Asians, especially Japanese, with P values less than 0.03 and the highest OR (95% CI) being 2.16 (1.32-3.59) (Cai et al., 2011;Mizoo et al., 2013;Mazhar et al., 2016;Wang et al., 2016;Hu et al., 2017). Therefore, these two SNPs are likely to increase the risk of breast cancer up to two-fold in Vietnamese women (Tables 7 and  8), as in other ethnic groups. Furthermore, they may serve as disease biomarkers for breast cancer risk in the Vietnamese population.
An explanation for the association of these SNPs and breast cancer has recently been proposed. A study conducted by Dunbier et al. (Dunbier et al., 2011) reported that rs2046210 is located near chromosome 6 open read-ing frame 97 (C6ORF97), which is immediately upstream of the gene encoding ERα (ESR1), a high-susceptibility gene. This open reading frame was highly correlated with ESR1 (Spearman correlation coefficient Rs = 0.67), indicating that when the expression of C6ORF97 increases, the expression of ESR1 also rises. In other words, because rs2046210 is situated within the binding site of a transcription factor, the risk A allele may perturb the binding site, thereby preventing the transcription factor from binding to the target DNA and resulting in the inhibition or induction of ERα. In the latter case, overexpression of ERα might be the result. Notably, a report by Drury et al. indicated that rs2046210 plays a role in the high-level expression of ERα (Drury et al., 2009). Therefore, rs2046210 may be involved in the aberrant expression of ESR1, which raises the ERα level and could eventually lead to breast cancer. SNP rs3803662 is located downstream of TOX3/ TRCN9 in the short arm of chromosome 16 at position 52,552,429. Shan et al.'s study suggested that TOX3 and BRCA1 are alternately expressed in breast cancer cells (Shan et al., 2013), and demonstrated an opposing regulation in breast cancer for these two genes. Other studies have shown that abnormal BRCA1 activity can lead to the inactivation of tumor suppressor genes such as PTEN and TP53 (Tomlinson et al., 1998;Saal et al., 2008), along with increased expression of ERBB2 (HER2) and the c-Myc oncogene (Brodie et al., 2001). Therefore, the inhibition of BRCA1 via TOX3 may lead to a series of downstream changes. High expression of TOX3 likely plays a crucial role in breast cancer progression, especially in estrogen receptor-positive mammary epithelial cells (Seksenyan et al., 2015). Therefore, variation in the region regulating TOX3 expression, including rs3803662, is suspected to affect the expression of TOX3, leading to modulation of the BRCA-related breast cancer risk. However, further functional studies are required to confirm the influence of rs3803662 on the TOX3 level in breast cancer development.

CONCLUSION
Investigation of an association between the selected seven SNPs and breast cancer risk in a Vietnamese population cohort showed that rs1219648, rs3817198, rs889312, rs10941679 and rs13281615 were unlikely to be significant breast cancer risk markers, while rs2046210 and rs3803662 were likely to be so, as shown in other populations. These two SNPs are candidates for further studies to evaluate their function in regulating the development of breast cancer, and may be used as biomarkers for early detection, diagnosis and treatment of breast cancer. This research was funded by Vietnam National University Ho Chi Minh City (VNU-HCMC) under grant number C2016-18-01. The authors would like to thank the Oncology Hospital-HCMC for their contribution to collecting samples.