The Journal of Toxicological Sciences
Online ISSN : 1880-3989
Print ISSN : 0388-1350
ISSN-L : 0388-1350
Original Article
Air pollution and COPD: Unveiling the mechanisms through network toxicology and transcriptomics
Dong SongLin XieXuege GaoYushan ChenChunjun ZhongHuicong LiShaofeng ZhanLeshen Lian
Author information
JOURNAL FREE ACCESS FULL-TEXT HTML

2026 Volume 51 Issue 4 Pages 275-294

Details
Abstract

Over the past five decades, air pollution has posed a growing threat to human health, particularly affecting the respiratory system. This study aims to investigate the potential molecular mechanisms underlying the relationship between exposure to air pollutants and the development of COPD and to identify potential gene targets that may play a key role in this process. In this study, researchers used several publicly available databases to obtain target genes related to air pollutants and COPD, determine the overlapping genes between them and performed GO and KEGG enrichment analyses to elucidate the underlying mechanisms. Cross-validation was performed using multiple datasets from the Gene Expression Omnibus (GEO) database to screen out candidate targets, and molecular docking techniques were utilized to investigated molecular interactions between candidate targets and air pollutants. Candidate targets were subsequently validated and analyzed using immune cell infiltration analysis, single-cell transcriptome data, risk prediction model construction and clinical data to further elucidate their relationship with COPD. Findings suggest that HDAC9, DPP9 and KCNN4 are candidate targets of air pollutants that are potentially involved in COPD development. These results offer new insights into the potential molecular mechanisms linking air pollution exposure to COPD and underscore the need for further in-depth research on air pollution issues.

INTRODUCTION

As global industrialization continues to accelerate, the threat to human respiratory health from exposure to environmental pollutants has escalated to a critical public health and safety concern. Data from the World Health Organization (WHO) and Global Burden of Disease (GBD) studies indicate that air pollution emerged as the second leading cause of death from non-communicable diseases globally in 2019, accounting for approximately 6.7 million deaths and $4.6 trillion in lost economic output (GBD 2019 Risk Factor Collaborators, 2020; WHO, 2024). Homeostatic imbalances in the respiratory system — the first physiologic barrier to pollutant exposure — show significant correlations with the pathologic course of chronic pulmonary diseases. Typical pollutants, such as gaseous pollutants (NO2, O3), occupational dust (SiO2), and volatile organic compounds (benzene, toluene, formaldehyde), can trigger a systemic inflammatory cascade response. This occurs through molecular events like oxidative stress, immune regulatory imbalance, and epigenetic reprogramming, which ultimately lead to irreversible airway remodeling (Sin et al., 2023; Christenson et al., 2022). Among many respiratory diseases, chronic obstructive pulmonary disease (COPD) has attracted much attention due to its high lethality and progressive, deteriorating nature (Barnes et al., 2015). It has become the third leading cause of death globally, with more than 3.3 million deaths annually. Additionally, direct healthcare expenditures due to COPD are expected to exceed $4.3 trillion by 2050 (GBD 2019 Chronic Respiratory Diseases Collaborators, 2023; Chen et al., 2023). Recent cohort studies have shown that exposure to environmental pollutants contributes up to 50% to the population-attributable fraction of COPD, with a particularly dominant effect observed in the non-smoking population (Sin et al., 2023). The Lancet Commission report of September 2022, “Working to Eliminate Chronic Obstructive Pulmonary Disease”, states that eliminating chronic obstructive pulmonary disease is largely dependent on eradication of risk factors for lung disease, especially environmental risk factors such as indoor and outdoor air pollution (Stolz et al., 2022). However, to date, about 99% of the world's population still lives in areas with severe air pollution, and air pollution contributes up to 40% of COPD mortality (Health Effects Institute, 2023). All these studies indicate a strong causal association between the toxic components in environmental pollutants and the onset and progression of COPD.

At present, although some progress has been made on the effects of exposure to environmental pollutants on the pathogenesis and prognosis of COPD, the relevant mechanistic studies in this area still need to be further strengthened. At the same time, the analysis of toxicological mechanisms for single environmental pollutants fails to fully elucidate the networked effects of multi-component synergism in complex environmental exposures. In recent years, the development of network toxicology has provided new research ideas for exploring the complex relationship between environmental factors and diseases. It has also become an effective means of identifying key molecular targets and pathways of diseases through the integration of multi-omics data analysis.

To address the current research gaps, in this study, nine common high-risk air pollutants (SO2, O3, NO2, SiO2, benzo[a]pyrene, etc. as shown in Table 1) were selected through integrating the techniques of network toxicology and multi-omics to construct a “Toxin-Target-Pathway” network. Researchers conducted a preliminary investigation of potential molecular targets and pathway mechanisms for COPD induction by air pollutant exposure and validation by molecular docking techniques. Furthermore, the deep relationship between candidate targets and the disease was also revealed through single-cell transcriptome data analysis, risk prediction modeling, and clinical data analysis of clinical data. The study flowchart is shown in Fig. 1. The results of this study not only provide preliminary elucidate the potential pathogenesis of environmental pollution-associated COPD, but also provide new ideas for the early prevention and treatment of COPD and the development of potential medicines. It also underscores the necessity for further in-depth research on air pollution issues.

Table 1. Specific Sources of 9 Common High-Risk Air Pollutants.


Fig. 1

Study design overview.

MATERIALS AND METHODS

Assessment of toxicity and identification of in vivo metabolites for air pollutants

The chemical structures and related molecular information of nine air pollutants were retrieved from the PubChem database (https://pubchem.ncbi.nlm.nih.gov). Subsequently, the toxicity prediction of the compounds was carried out using the ADMETLAB 3.0 platform (https://admetlab3.scbdd.com) and the ProTox3 database (https://tox.charite.de/protox3), with a focus on their specific toxic effects on respiratory tissues. Subsequently, to identify these key metabolic substances in the body for the nine air pollutants in this study, we performed a systematic search across specialized databases including PubMed and Web of Science. The search strategy combined each pollutant name with key terms such as “biomarker”, “metabolite”, “adduct”, and “human biomonitoring/exposure”. We reviewed relevant literature encompassing in vitro exposure simulations, human/animal exposure studies, and multi-omics analyses to ascertain nine air pollutants and their primary active metabolite forms.

Collection of air pollutants target genes

In the target prediction stage, a multi-database cross-validation strategy was used to improve prediction accuracy. The standardized molecular structures of the pollutants were imported into four prediction platforms: TargetNet (http://targetnet.scbdd.com), Swiss Target Prediction (http://www.swisstargetprediction.ch/), SEA (https://sea.bkslab.org/) and Superpred (http://prediction.charite.de/), with species strictly limited to “Homo sapiens”. Prediction results from various platforms were integrated to build a comprehensive target dataset. Subsequently, the Uniprot database (https://www.uniprot.org/) was used to normalize the target names (including gene name normalization, homologous protein merging and non-coding RNA exclusion), and finally to establish the specific target gene sets for the nine pollutants.

Collection of COPD-related target genes

The search term “chronic obstructive pulmonary disease” was utilized to collect COPD-related target genes from GeneCards (GeneCards: https://www.genecards.org), DRUGBANK (https://go.drugbank.com/drugs), TTD (TTD:http://db.idrblab.net/ttd/), and OMIM database (https://omim.org/). The top 25% of GeneCards' Score ranked genes and all the targets obtained from the other three databases were selected, merged with redundancy removal to obtain the final set of relevant target genes for COPD.

GO/KEGG enrichment analysis

The set of target genes for air pollutants were intersected with the set of COPD target genes to obtain the potential target genes of air pollutants acting on COPD. Using the “clusterProfiler”, “enrichplot” and “org.Hs.eg.db” packages in the R software, the air pollutant-acting potential target genes in COPD were analyzed by Gene Ontology (GO) functional annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment. Statistically significant enrichment pathways were screened (Calibrated P < 0.05), and the results were visualized. These analyses revealed the potential roles of these potential target genes in biological processes, molecular functions and cellular components, as well as the signaling pathways in which they are involved. For a more visual presentation of the results, plotting was done using an online data analysis and visualization platform (https://www.bioinformatics.com.cn). The platform provides convenient tools to further enhance the interpretability and visualization of results.

Validation of potential target genes through the GEO database

To validate the candidate genes, two independent COPD-related datasets (GSE21359 and GSE37147) were retrieved from the Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) in each dataset were identified using GEO2R, with a significance threshold of P < 0.05. These DEGs constituted the validation sets. The potential target genes were successively intersected with each validation set. The resulting intersected genes were then analyzed for significant differential expression between COPD patients and healthy controls (P < 0.05), and the expression differences were visualized using violin plots generated with Sangerbox 3.0. Subsequently, the DEGs obtained from the two intersection steps were further intersected to identify common target genes. The diagnostic performance of these common genes was evaluated using Receiver Operating Characteristic (ROC) curve analysis implemented with the “pROC” package in R. Genes demonstrating an area under the curve (AUC) > 0.7—indicating good to high diagnostic accuracy—were ultimately selected as the candidate targets.

Molecular docking validation

To evaluate the interaction strength between the target proteins corresponding to candidate target genes and the air pollutant components, molecular docking was performed between the three target proteins and the nine pollutants, including their relevant in vivo metabolites. Small molecules of pollutants in “sdf” format were downloaded from PubChem and opened using Autodock Tools 1.5.7 for the following operations: adding all hydrogens, setting up ligands, automatically assigning charges, and setting up torsion keys, etc. Next, protein structures of the core targets were screened on the Uniprot and RCSB PDBs and downloaded in “pdb” format. Then, use Autodock Tools 1.5.7 to do the following: remove water and solvent molecules, add full hydrogen, and set to receptor. The processed receptors and ligands were opened in Autodock Tools 1.5.7 and run autogrid4 and autodock4 for semi-flexible docking, respectively. The stability of binding between small molecules and proteins was determine by calculating their binding energy. Binding energy values below 0 kcal·mol−1 indicate spontaneous binding, and -5 kcal·mol−1 suggest a more stable binding capacity between the ligand and receptor. Finally, the results were exported to a “pdbqt” file and visualized using Pymol.

Immune infiltration analysis

The “CIBERSORT” algorithm in the R package was used to analyze the immune infiltration of selected COPD samples and healthy samples from GSE21359, comparing the differences in the infiltration of 22 immune cells between the two groups. The results were visualized using the R packages “pheatmap” and “corrplot”. Additionally, the correlation between candidate targets and immune cell infiltration was explored using the R package “Spearman” and visualized using Lollipop plots.

Single-cell analysis of candidate targets

The transcript expression levels of candidate targets in human lung cells were annotated using consensus-based tissue gene data and single-cell transcriptomics data from the Human Protein Atlas (HPA) database (https://www.proteinatlas.org/). This analysis explored gene expression across various lung cells.

Evidence of genetic regulation in candidate targets (eQTL Analysis)

To evaluate whether the expression of the candidate targets is under genetic regulation and to identify potential functional variants, we performed an expression quantitative trait locus (eQTL) analysis based on authoritative consolidated databases such as the eQTL Catalogue and PheWeb. Data from lung tissue, the most pathologically relevant site, were prioritized and supplemented with whole blood data. Our analysis was restricted to cis-eQTLs within a 1 Mb window upstream and downstream of each gene, requiring genome-wide significance (P < 5 × 10−8). For each candidate gene, we extracted the lead significant variant, recording its rsID, effect allele, effect direction (beta value), association P-value, and tissue source to obtain direct evidence of genetic regulation.

Disease risk modeling based on candidate targets

Based on the GSE37147 dataset, researchers constructed the nomogram model of candidate targets using the R package “rms”. Quantitative scoring based on expression levels of candidate targets enables the assessment of the risk of COPD development. The calibration of the model predictions was evaluated by plotting calibration curves. In addition, decision curve analysis was performed using the R package “rmda” to identify optimal risk thresholds for clinical intervention. Clinical impact curves were subsequently generated to quantify the concordance between model-predicted high-risk cases and the observed incidence.

Analysis of the relationship between the expression of candidate targets and patients' pulmonary function

Researchers analyzed the association between the expression levels of candidate targets and pulmonary function in COPD patients using clinical data from the GSE37147 dataset. COPD patients were stratified into high- and low-expression groups based on the expression of candidate targets. Differences in FEV1% predicted between the groups were statistically evaluated. P < 0.05 was considered statistically significant.

RESULTS

Assessment of toxicity and identification of in vivo metabolites for air pollutants

The toxicity of the nine air pollutants was evaluated through two toxicity profiling tools. Based on predefined criteria, a pollutant was classified as toxic if respiratory toxicity was identified by at least one platform. Notably, all nine air pollutants components were identified as respiratory toxicants (Table 2). Following this, the major in vivo metabolites for the pollutant components were identified through a review of multiple databases and relevant literature. These data are compiled in Table 3. It should be noted that for some pollutants (e.g., ozone, silica, nitrogen dioxide), the predominant toxicity mechanisms rely on their physicochemical properties (e.g., oxidativity, particulate irritation) rather than on canonical enzymatic metabolic conversion; therefore, they lack typical active metabolites.

Table 2. Molecular weight, SMILES structure and respiratory effects of air pollutants.


Table 3. Nine air pollutants and their major reactive metabolite forms.


Target genes related to air pollutants

By integrating target prediction data from TargetNet, Swiss Target Prediction, SEA and Superpred databases, researchers identified potential target genes for each air pollutant. Following removal of redundant entries, 364 target genes were ultimately associated with air pollutants.

Target genes related to COPD

By systematically integrating COPD-associated targets from four independent databases (GeneCards, DRUGBANK, TTD, OMIM) with deduplication, researchers identified 1997 COPD-associated target genes. Subsequently, Venn diagram analysis intersecting these genes with 364 air pollution-associated targets revealed 47 potential target genes potentially mediating pollutant-induced COPD pathogenesis (Fig. 2A).

Fig. 2

Enrichment analysis of potential target genes. A. Venn diagram of the 47 potential target genes. B. GO enrichment analysis of the 47 potential target genes. C. KEGG enrichment analysis of the 47 potential target genes.

GO and KEGG enrichment analysis

GO enrichment analysis of 47 potential target genes revealed distinct functional patterns: In terms of biological processes (BP), significant enrichment was found in processes such as positive regulation of interferon-α production, monocyte and leukocyte proliferation. Cellular components (CC) predominantly localized to membrane microdomains, membrane rafts, and the exterior of the plasma membrane. Regarding molecular function (MF), the genes were mainly associated with nitric oxide synthase activity, tetrahydrobiopterin binding, and arginine binding (Fig. 2B). KEGG pathway enrichment analysis demonstrated that the genes were significantly enriched in key pathways, including pathways in cancer, chemokine signaling pathway, AGE-RAGE signaling pathway in diabetic complications, Toll-like receptor signaling pathway, Th17 cell differentiation signaling pathway, IL-17 signaling pathway, Th1 and Th2 cell differentiation signaling pathway, HIF-1 signaling pathway, VEGF signaling, PI3K-Akt signaling pathway and necroptosis (Fig. 2C). These findings imply that air pollutants might influence the development and progression of COPD by modulating pathways related to inflammation, immune cell differentiation, cellular necrosis, neovascularization, and carcinogenesis.

Validation of potential target genes

Comprehensive differential expression analysis utilizing the GSE21359 cohort (23 COPD patients versus 41 healthy controls) identified 8,494 significant DEGs (Fig. 3A) (P < 0.05). Replication in the GSE37147 ex-smoker cohort (57 COPD versus 82 healthy controls) revealed 3,343 significant DEGs (Fig. 3B) (P < 0.05). The intersection of DEGs from the validation set GSE21359 with potential target gene yielded 21 overlapping genes. Among these, 14 had significant intergroup differences (P < 0.05) (Fig. 3D). Further analysis showed that three of these 14 genes also had significant intergroup differences (P < 0.05) in the validation set GSE37147 (Fig. 3E), namely HDAC9, DPP9 and KCNN4 (Fig. 3C). The ROC curve analysis demonstrated that all three genes had some diagnostic accuracy, as evidenced by an AUC value greater than 0.7 for each (Fig. 3F - G). Therefore, HDAC9, DPP9 and KCNN4 were identified as candidate targets.

Fig. 3

Validation of key target genes. A. Volcano plot of dataset GSE37147. B. Volcano plot of dataset GSE21359. C. The 3 validated key target genes. D. Differential expression of potential target genes between groups in dataset GSE21359. E. Differential expression of potential target genes between groups in dataset GSE37147. F. ROC curve analysis of the 3 key target genes in dataset GSE21359. G. ROC curve analysis of the 3 key target genes in dataset GSE37147.

Molecular docking

Molecular docking simulations demonstrated spontaneous binding of all nine pollutants to the three candidate proteins (Fig. 4A). Benzo[a]pyrene demonstrated favorable binding affinities with HDAC9, DPP9, and KCNN4, with binding energies ≤ -5.0 kcal·mol−1 (Fig. 4B). Meanwhile, this study found that the metabolically active species often exhibited binding affinities comparable to or stronger than their parent pollutants (Table 4). For instance, 3-hydroxybenzo[a]pyrene showed a binding energy of -7.31 kcal/mol to DPP9 (vs. -7.2 kcal/mol for the parent) and -5.42 kcal/mol to HDAC9 (vs. -5.16 kcal/mol), suggesting that metabolically activated forms may amplify COPD pathology via enhanced epigenetic regulation (HDAC9) or inflammasome activation (DPP9). Its binding to KCNN4 (-6.55 kcal/mol) was also stronger than some parent compounds. In contrast, metabolites of small gaseous pollutants (e.g., sulfuric acid, formic acid) exhibited consistently low binding energies (-3.42 to -5.94 kcal/mol), in line with their parent compounds. These findings collectively suggest that these air pollutant components or their metabolically active constituents within the body can directly interact with candidate proteins, thereby influencing the pathogenesis of COPD.

Fig. 4

Molecular docking simulations reveal pollutant-protein interactions. A. Comparative binding affinities of nine air pollutants against HDAC9, DPP9, and KCNN4. B. Molecular docking of benzo[a]pyrene with DPP9. The left panel shows the overall protein structure as a ribbon diagram, colored by a continuous gradient from the N-terminus (blue) to the C-terminus (red). The binding site is indicated by a dotted circle. The right inset provides a detailed view of the binding pocket, with benzo[a]pyrene (ligand) shown in yellow stick representation and key interacting residues labeled. The molecular surface is colored by electrostatic potential using the interpolated charge scale (blue, positive; red, negative). C. Molecular docking of benzo[a]pyrene with the KCNN4. The tetrameric structure of KCNN4 is shown with its four subunits individually colored (Chain A: blue, B: orange, C: gray, D: green). Benzo[a]pyrene (yellow sticks) is depicted within the binding site. D. Molecular docking of benzo[a]pyrene with HDAC9. The single-chain HDAC9 protein is shown in green, with benzo[a]pyrene (yellow sticks) in its binding pocket.

Table 4. Results of molecular docking for metabolically active products.


Immune infiltration analysis

To investigate the differences in immune cell infiltration between COPD patients and healthy individuals, and to analyze the relationship between candidate target expression and immune cell infiltration, the researchers conducted the following studies. The researchers found, through immune infiltration analysis, that patients with COPD had significantly higher infiltration of resting dendritic cells, eosinophils, and M0 macrophages than healthy individuals (P < 0.05), and significantly lower infiltration of resting memory CD4+ T cells and naïve B cells (Fig. 5A-B). Further analysis indicated that HDAC9 expression was significantly negatively correlated (P < 0.05) with infiltration of neutrophils, γδ T cells, and initial B cells (naïve B cells) (Fig. 5C); DPP9 expression was significantly negatively correlated with infiltration of T cells CD4 memory resting (P < 0.05) (Fig. 5D); KCNN4 expression was significantly negatively correlated with infiltration of γδ T cells and resting memory CD4+ T cells (P < 0.05), and significantly positively correlated with infiltration of CD8+ T cells (Fig. 5E). These findings imply that air pollutants might influence COPD patients' immune responses by modulating immune cell infiltration after acting on candidate targets, potentially affecting the disease's chronic persistence and exacerbation.

Fig. 5

Immune infiltration. A-B. Differences in the infiltration of 22 immune cell types between the disease group and the healthy group. “con” represented the healthy group, and “treat” represented the COPD patient group. C. Correlation between HDAC9 expression and the infiltration of 22 immune cell types. D. Correlation between DPP9 expression and the infiltration of 22 immune cell types. E. Correlation between KCNN4 expression and the infiltration of 22 immune cell types.

Single-cell analysis of candidate targets

Human single-cell transcript-based data suggested that DPP9 and KCNN4 were abundantly expressed in various lung cells, including type I alveolar epithelial cells, type II alveolar epithelial cells, macrophages, endothelial cells, smooth muscle cells, granulocytes, and fibroblasts. They were most highly expressed in type I alveolar epithelial cells and endothelial cells, followed by type II alveolar epithelial cells (Fig. 6B-C). In contrast, HDAC9 was most highly expressed in endothelial cells, followed by macrophages and B cells (Fig. 6-A). These findings imply that air pollutants might play a key role in processes such as airway remodeling and destruction of alveolar structure in COPD by modulating the aberrant expression of candidate targets in lung tissue cells with effects on tissue cell function.

Fig. 6

Expression levels of the 3 key target genes in human lung single-cell transcriptomes. A. Expression level of HDAC9 in human lung cells. B. Expression level of KCNN4 in human lung cells. C. Expression level of DPP9 in human lung cells.

Genetic regulatory evidence for candidate targets (eQTL Analysis)

To assess whether HDAC9, DPP9, and KCNN4 are functionally regulated by genetic variation, we retrieved their cis-eQTL information for these genes in lung tissue and whole blood from publicly available eQTL meta-data. As summarized in Table 5, significant eQTLs were identified for all three genes. HDAC9 (rs2107595) and DPP9 (rs12610495) showed the most significant association signals in lung tissue (P < 5 × 10−10), with consistent effect directions. The strong eQTL signal for KCNN4 was observed in whole blood (rs16926535, P < 1 × 10−30). These results provide genetic evidence that the expression levels of the identified candidate targets are heritable traits and that their regulatory loci are active in disease-relevant tissues, offering upstream support for their potential functional roles in COPD pathogenesis.

Table 5. Summary of eQTL Query Results.


Disease risk modeling based on candidate targets

The researchers constructed nomogram models to further investigate the disease risk prediction ability of three candidate targets. The risk of developing COPD could be predicted based on the expression levels of the target genes (Fig. 7A), and the accuracy of the model was assessed by calibration curves. The calibration curves showed that the actual curves fit the ideal curves better, indicating that the model’s predictions were more reliable (Fig. 7B). Clinical decision curves suggested that decision-making using the model at risk thresholds of 0.02-0.99 all yielded substantial net benefits (Fig. 7C). Clinical impact curves showed a high degree of overlap between the high-risk individuals as determined by the model and those who actually developed COPD when the risk threshold was >0.7 (Fig. 7D). These findings suggest that the risk of developing clinical COPD may be predicted by assessing the expression levels of HDAC9, DPP9 and KCNN4.

Fig. 7

Disease risk prediction model. A. Nomogram model. B. Calibration curves. C. Clinical decision curve. D. Clinical impact curve.

Relationship between candidate targets and lung function in COPD patients

To investigate whether the expression of HDAC9, DPP9 and KCNN4 affects the decline of lung function in patients with COPD, researchers analyzed the relationship between the expression of these candidate targets and patients' lung function. The FEV1% of patients in the HDAC9 high-expression cohort was significantly lower than that in the low-expression cohort (P < 0.001) (Fig. 8A). The FEV1% of patients in the DPP9 and KCNN4 high-expression cohort was lower than that in the low-expression cohort, but no statistically significant difference (P > 0.05) (Fig. 8B-C). These findings suggest that HDAC9 overexpression may be related to the accelerated decline of lung function in COPD patients.

Fig. 8

Correlation analysis between key target genes and patient clinical characteristics. A. Correlation between HDAC9 expression and patient FEV1%. B. Correlation between KCNN4 expression and patient FEV1%. C. Correlation between DPP9 expression and patient FEV1%.

DISCUSSION

According to the World Health Organization, about 90% of urban residents worldwide are exposed to excessive air pollution levels (Gordon et al., 2014; Schraufnagel et al., 2019; Kampa and Castanas, 2008). Air pollution is closely associated with a variety of respiratory diseases, among which COPD is considered to be one of the most affected by air pollution (Guan et al., 2016). A 2017 Global Burden of Disease Study stated that approximately 34.6% of COPD deaths worldwide are attributable to air pollution (Soriano et al., 2020). Air pollution now ranks as the second-leading risk factor for COPD development (Chinese Medical Association, 2024). However, the exact molecular mechanisms linking air pollution to COPD remain elusive. In this study, by integrating network toxicology, molecular docking, and transcriptomics data, the researchers have initially revealed the potential molecular mechanisms underlying the link between air pollutants exposure and COPD.

The researchers chose nine common air pollutant constituents that cover multiple indoor and outdoor sources of air pollution from secondhand smoke, oil combustion, industrial emissions, home renovations, fuel cooking, and others. Through preliminary network toxicology analysis, 47 target genes were identified as potential target genes for air pollutants acting on COPD . KEGG and GO enrichment analyses suggest that air pollutants may influence immune cell differentiation, cell necrosis, neovascularization, and inflammatory responses by regulating pathways such as Toll-like receptor signaling pathway (Sidletskaya et al., 2021), Th17 cell differentiation signaling pathway (Silva et al., 2020), HIF-1 signaling pathway (Fu and Zhang, 2018; Zhang et al., 2018), VEGF signaling, PI3K-Akt signaling pathway (Liu et al., 2023), and Necroptosis (Wang et al., 2023), thereby promoting COPD development. All these pathways and processes are closely linked to COPD pathogenesis. The researchers selected transcriptomics data from ordinary COPD patients and healthy individuals in GSE21359 for the first validation of potential target genes. Considering most COPD patients have a smoking history, to minimize the influence of smoking on the study, the researchers also selected 57 COPD patients from the smoking cessation cohort in GSE37147 and 82 healthy individuals from the smoking cessation cohort for analysis, thereby enhancing the study's specificity. Through two validations, the researchers identified HDAC9, DPP9, and KCNN4 as candidate targets significantly correlated with COPD due to air pollution.

Molecular docking serves as a critical step in translating the abstract association among “pollutant components-target proteins-diseases” into concrete, visualizable molecular interaction models, thereby generating testable hypotheses for subsequent mechanistic studies. In this study, HDAC9, DPP9, and KCNN4 were identified as candidate targets associated with COPD development under the exposure of nine air pollutant components. Their corresponding proteins were subsequently investigated as molecular targets in docking simulations. Molecular docking revealed that all three candidate proteins spontaneously bound to the nine pollutant components. Notably, benzo[a]pyrene—a product of incomplete combustion—exhibited particularly strong binding affinity. These results indicated that the structural pockets of the target proteins could accommodate these pollutants, suggesting that air pollutants may influence COPD progression not only indirectly via upstream signaling but also through direct functional disruption of these proteins. This provides novel mechanistic insights into the link between air pollutant exposure and COPD. As a polycyclic aromatic hydrocarbon (PAH) and one of the world’s most potent carcinogens with significant respiratory toxicity, benzo[a]pyrene demonstrated stable binding to all three target proteins through interactions with multiple key amino acid residues in molecular docking analyses (Bukowska et al., 2022). The three binding pockets were composed of numerous hydrophobic amino acids, particularly those with localized positive charges near functionally critical regions, such as TYP78, TRP105, LEU146, ALA141, VAL275, and PHE278. Beyond dominant hydrophobic interactions, residues such as TYR78 and VAL275 also engaged in π–π stacking (Pi-Pi Stacked/T-shaped), π–cation (Pi-Cation), and π–alkyl (Pi-Alkyl) interactions with benzo[a]pyrene. These non-covalent binding modes are characteristic of PAH–protein associations and may induce conformational changes or functional impairment of the targets, potentially amplifying COPD-related pathologies, including HDAC9-mediated epigenetic dysregulation, DPP9-related inflammasome activation, or KCNN4 channel dysfunction. These findings collectively suggested that benzo[a]pyrene likely exerts its pathogenic effect via a multi-target, network-based mechanism, simultaneously disrupting key biological networks including ion channel signaling (KCNN4) and metabolic regulation (HDAC9, DPP9, rather than through a single target. Benzene, toluene, and formaldehyde — volatile organic compounds (VOCs) — also exhibited strong binding affinity in docking studies. Their high diffusivity and metabolic activity suggest they may perturb the local microenvironment of target proteins through diverse mechanisms, thereby broadly affecting protein function (Chaoyan Ou 2005). In contrast, gaseous pollutants (SO2, NO2, O3, CO), characterized by low molecular weights and distinct electronic structures, tended to bind at specific solvent-accessible sites or key amino acid residues. This binding mode may allow them to directly modulate enzymatic activity or ion channel gating. Finally, molecular docking indicated that inhalable particulate matter such as SiO2 (Choi et al., 2013) could also spontaneously bind to the target proteins. This suggests that nanoscale particulates may impair protein stability or interfere with protein-protein interaction networks at a macroscopic level. Taken together, these findings integrate the real-world scenario of mixed pollutant exposure. Pollutants with distinct physicochemical properties can engage core pathways such as HDAC9, DPP9, and KCNN4 through synergistic or complementary molecular mechanisms. By disrupting multiple biological processes in concert, they collectively drive COPD pathogenesis. This study thereby offers novel insights into the intricate pathophysiological network that links complex air pollutant exposure to COPD. Furthermore, by extending the molecular docking analysis to include the metabolically active components of air pollutants, this study refines the investigative framework to better approximate biological conditions. Our findings revealed that metabolite active components often exhibit binding affinities comparable to or stronger than their parent pollutants. This underscores the pivotal role of metabolic transformation in environmental toxicology: for pollutants requiring metabolic activation (e.g., benzo[a]pyrene), toxicity is likely mediated by their active metabolites, whereas for those acting via direct physicochemical properties (e.g., certain reactive gases), the parent forms may be primary. This provides a deeper level of understanding regarding the molecular mechanisms by which complex environmental exposures influence COPD, suggesting that future experimental validation should fully account for the actual forms of pollutants present in the body and their metabolic processes.

Molecular docking confirmed that all three candidate targets can spontaneously bind to nine air pollutant components, especially benzo[a]pyrene, which is derived from incomplete combustion of organics matter and exhibits strong binding affinity for all three genes. Benzo[a]pyrene is ranked as one of the three most potent carcinogens in the world and is highly respiratory toxic (Bukowska et al., 2022). Further analysis of the relationship between the three candidate targets and COPD, through subsequent single-cell analyses, immune infiltration analyses, and risk prediction model construction, have shown a strong correlation between HDAC9, DPP9, and KCNN4 and COPD, indicating their potential as biomarkers for predicting COPD risk. Further eQTL analysis in this study provides crucial genetic support for candidate targets. The findings reveal that HDAC9 and DPP9 exhibit significant genetic regulatory signals in lung tissue, with their associated loci themselves being known risk variants for lung diseases. This dual role suggests they are not merely pollutant targets but also have baseline expression levels influenced by genetic predisposition. Regulatory signals for KCNN4 are more prominent in blood, consistent with its immune regulatory function. Critically, these findings bridge external environmental exposure with internal genetic background: individual genetic variants may pre-set the expression levels of genes like HDAC9 and DPP9, thereby potentially modulating an individual’s response threshold and disease susceptibility upon pollutant exposure. This offers a concrete molecular bridge for understanding the complex “gene-environment” interactions in COPD pathogenesis. Meanwhile, the researchers also found that increased HDAC9 expression was significant correlation with reduced lung function (FEV1%) in COPD patients. These findings provide a novel basis for understanding and exploring the molecular mechanisms underlying air pollutant-induced COPD and offer potential molecular targets for early prevention and treatment of the disease.

HDAC9 (histone deacetylase 9) is a gene encoding a histone deacetylase involved in the regulation of immune cell activation, angiogenesis, and inflammatory response (Asare et al., 2025). Many studies have shown that HDAC9 plays a role in a variety of chronic diseases, such as cardiovascular disease, cancer, liver fibrosis, and autoimmune diseases (Hu et al., 2020; Yang et al., 2021). However, the relationship between HDAC9 and COPD remains unclear, with few relevant studies available. The current study identified HDAC9 as a possible candidate regulatory gene for COPD and a link with air pollution by bioinformatics. In cardiovascular diseases, HDAC9 has been identified as a major risk factor for several atherosclerotic diseases. It exacerbates inflammation and atherosclerosis progression by activating the NLRP3 inflammasome, which increases IL - 1β levels (Cao et al., 2014; Malhotra et al., 2019; Fernández-Ruiz, 2020). IL-1β plays a key role in COPD inflammatory and significantly correlates with disease severity (Pauwels et al., 2011; Ran et al., 2025). Additionally, among the pathological changes in COPD, pulmonary vascular remodeling stands out as a core feature. The thickening and hardening of the pulmonary vascular intima serve as key mechanisms for pulmonary hypertension development and impaired gas exchange (Hu et al., 2024; Balbirsingh et al., 2022; Fabbri et al., 2023). Thus, the relationship between HDAC9 and COPD may be closely related to airway inflammation and pulmonary vascular remodeling. In addition, Zheng et al. (Zheng et al., 2024) demonstrated that inhibiting HDAC9 improved skeletal muscle wasting and boosted regeneration in a COPD mouse model. Xie et al. (Xie, 2022) indicated that HDAC9 is implicated in the development of emphysema in COPD mice by modulating the immunosuppressive function of Treg cells. Thus, HDAC9 shows some promise in the mechanistic study and treatment of COPD.

DPP9 (dipeptidyl peptidase 9), a member of the S9b serine peptidase family, which plays a role in inflammatory regulation, immunomodulation, cell death, and memory regulation, and is closely associated with cancer, immune disorders, viral infections, and chronic inflammatory diseases (Nguyen et al., 2025). Several studies have observed a link between DPP9 and lung inflammation. DPP9 inhibitors reduce eosinophil infiltration in asthmatic mouse airways, with protective effects against asthma (Schade et al., 2008; Moecking et al., 2021). In pulmonary fibrosis mouse models, DPP9 inhibitors alleviate lung inflammation and collagen deposition (Egger et al., 2017; Liu and Qi, 2020). Moreover, during moderate-to-severe COVID-19, increased human peripheral blood DPP9 expression has been detected, and DPP9 overexpression exacerbates acute respiratory tract inflammation in COVID-19 (Sharif-Zak et al., 2022). However, several other studies have shown that DPP9 is able to inhibit the release of downstream inflammatory factors such as IL-1β and IL-18 by inhibiting the activation of NLRP1 vesicles (Okondo et al., 2018; Zhong et al., 2018; Henderson et al., 2021; Harapas et al., 2022; Hollingsworth et al., 2021). This indicates that the regulation of DPP9 in lung inflammation may be bidirectional. Beyond its potential in modulating chronic lung inflammation, DPP9 can also regulate the NRF2 signaling pathway in conjunction with Keap1, which affects the antioxidant capacity of lung tissues (Bolgi et al., 2022; Chang et al., 2023). This may be closely related to the elevated oxidative stress observed in COPD patients. When combined with the present study's findings, DPP9 shows potential in regulating chronic inflammatory diseases in the lungs. Its association with COPD likely involves the aforementioned mechanisms, which warrant further exploration and validation.

KCNN4 (potassium calcium-activated channel subfamily N member 4) is a calcium-activated potassium channel gene coding for a calcium-activated potassium channel widely expressed in immune cells, epithelial cells, and smooth muscle cells. It participates in inflammatory responses, mucus secretion, and smooth muscle contraction by regulating cell membrane potential and calcium signaling (Allegrini et al., 2025). KCNN4 inhibitors (e.g. TRAM - 34) have been shown to attenuate airway remodeling and airway inflammation by modulating calcium homeostasis and inhibiting histamine-induced contraction of airway smooth muscle cells (ASMCs) (Vega et al., 2020). The anti-inflammatory effects of glucocorticoids, one of the main therapeutic agents for acute exacerbations of COPD, also involve modulation of potassium ion channels in airway epithelial cells (Zaidman et al., 2017; Amrani et al., 2020). Studies have demonstrated that in the KCNN4 - gene - silenced mouse model, there is reduced inflammatory lung injury and neutrophilic inflammation, as well as improved mucociliary clearance. Silencing KCNN4 reduces mucus secretion, neutrophil infiltration, and emphysema in mouse lungs (Vega et al., 2020), all of which are core pathological features of COPD (Agustí et al., 2023; Barnes, 2016). Although few studies have directly demonstrated the relationship between KCNN4 and COPD, its potential to be a novel therapeutic target warrants further investigation.

In this study, we performed immune cell infiltration analysis using GEO datasets and single-cell analysis with the HAP database. Our results indicated that elevated expression of HDAC9, DPP9, and KCNN4 in COPD patients was associated with immune cell infiltration, and their expression levels varied across lung cell types. Integrating these findings with prior enrichment and molecular docking results, we propose that exposure to air pollutants may exert its effects by modulating upstream signaling pathways, by inducing dysregulation of key genes (e.g., HDAC9, DPP9, and KCNN4) in alveolar epithelial cells, by directly interfering with the function of the proteins these genes encode, and by simultaneously provoking immune dysregulation in lung tissue. Collectively, these processes ultimately lead to chronic airway inflammation, airway remodeling, and irreversible airflow limitation. The pathogenesis of COPD involves the dysfunction of key structural and immune cells in the lung. As the first line of defense, alveolar epithelial cells are directly exposed to pollutants. Their extensive injury, apoptosis, abnormal repair, goblet cell hyperplasia, and excessive mucus production collectively constitute key drivers of early COPD pathogenesis (Zhi and Wang, 2018; Ruaro et al., 2021; Raby et al., 2023). Dysfunction of pulmonary endothelial cells is a critical feature in COPD pathogenesis. This impairment leads to increased vascular permeability (inflammatory exudation), structural remodeling of pulmonary vessels, and the development of pulmonary hypertension (Brassington et al., 2019; Gredic et al., 2021). These alterations collectively compromise the alveolar-capillary barrier integrity, initiating and propagating the inflammatory cascade. A prominent feature of COPD is the substantial infiltration of macrophages into the lungs. Upon cigarette smoke stimulation, these alveolar macrophages aggregate and become hyperactivated. They subsequently release abundant proteases that degrade alveolar elastin, directly contributing to emphysema formation (Gharib et al., 2018), and concurrently secrete pro-inflammatory cytokines that intensify the chronic inflammatory process (Lee et al., 2021). Research indicated that DPP9-regulated oxidative stress with pulmonary inflammation (Chang et al., 2023; Wang et al., 2024; Nguyen et al., 2025), KCNN4-mediated ion channel activity with inflammatory cell infiltration, mucus hypersecretion, and smooth muscle contraction (Vega et al., 2020), and HDAC9 involvement in pulmonary inflammation and vascular remodeling (Hui Xu et al., 2023; Zeng et al., 2024). This suggests that air pollutant exposure may influence the expression of these genes (e.g., HDAC9, DPP9, KCNN4) in alveolar epithelial and endothelial cells, thus regulating central pathological processes such as chronic inflammation, vascular remodeling, and mucus hypersecretion. In parallel, pollutants can directly bind to and impair the function of the critical target proteins, accelerating the progression toward irreversible airflow limitation. Our most integrative insights derive from the immune cell infiltration analysis and its correlation with the candidate targets. We observed an imbalanced immune microenvironment in COPD lungs, marked by excessive innate immune activation (Rutgers et al., 2000; Kohler et al., 2019; Pang and Liu, 2024) (elevated resting dendritic cells, eosinophils, and M0 macrophages) and a relative suppression of adaptive immunity (Sharma et al., 2009; Jacobs et al., 2022) (decreased resting memory CD4+ T cells and naive B cells). Interestingly, the expression levels of HDAC9, DPP9, and KCNN4 showed significant correlations with specific immune subsets. For instance, the negative correlation of HDAC9 with neutrophils and naive B cells may point to a novel role in epigenetically suppressing lymphocyte function and survival (Yan et al., 2011; Haofang Yuan, 2025). Similarly, DPP9's negative correlation with resting memory CD4+ T cells suggests a potential influence on T cell homeostasis via its protease activity (Johnson et al., 2020), while the positive correlation between KCNN4 and CD8+ T cells could indicate an ion channel-mediated mechanism affecting cytotoxic T cell function (Koch Hansen et al., 2014; Chimote et al., 2018). Collectively, our findings indicate that HDAC9, DPP9, and KCNN4 are likely candidate regulators shaping the COPD-specific immune microenvironment. Based on the integrated analyses and molecular docking results, we propose a mechanistic hypothesis of “pollutant exposure - cellular functional/epigenetic alterations - disease” to explain how airborne pollutant exposure drives COPD progression. Specifically, we posit that multiple air pollutants (and their metabolites) act upon candidate targets (HDAC9, DPP9, KCNN4) in alveolar epithelial and endothelial cells. This interaction triggers epigenetic dysregulation, ion channel dysfunction, and altered protein activity. These molecular dysregulations collectively contribute to the characteristic immune microenvironment imbalance and tissue remodeling in COPD, ultimately leading to chronic inflammation, mucus secretion, and irreversible airflow limitation.

This study innovatively employed a network toxicology approach to investigate the molecular mechanisms linking complex air pollutant exposure to COPD. By integrating transcriptomics data and molecular docking, we identified and analyzed candidate molecular targets. Furthermore, leveraging multiple databases, we simulated the systemic connections between pollutant exposure and disease pathogenesis, including alterations in cellular function, immune cell infiltration, and target protein activity. This work proposes a novel, coherent hypothesis to explain how air pollution drives COPD progression, addressing a gap in mechanistic understanding. It translates the complex “environmental exposure-disease” association into a series of specific molecular targets and pathways amenable to experimental validation.

However, this exploratory analysis based on public databases retains several limitations. First, the core molecular docking method primarily simulates non-covalent interactions of stable small molecules, whereas the mechanisms of multiple pollutants involved (e.g., highly reactive gases, covalently modifiable formaldehyde, and particulate matter) may exceed the method's conventional simulation scope. Thus, docking results should be regarded as hypothesis generation rather than definitive predictions. Second, constrained by complex in vivo toxicokinetic processes and data availability, we were unable to perform calibration linking computed pollutant concentrations to actual biologically effective concentrations in the lungs. Finally, all findings stem from bioinformatics inference and remain unverified by experimentation. These limitations clearly position this study as “systemic hypothesis generation,” pointing the way toward subsequent mechanistic experiments and more precise exposure-effect studies.

In conclusion, in this study, the research preliminarily uncovered the potential molecular mechanisms underlying the association between complex - composition air pollutants and COPD by integrating multiple approaches, including network toxicology, transcriptomics data, molecular docking and immune infiltration. The results of this study indicate that air pollutants may drive COPD development by impacting processes such as immune cell differentiation, cell necrosis, neovascularization, and inflammatory responses. HDAC9, DPP9, and KCNN4 may serve as candidate molecular targets of air pollutants in COPD development. These results offer novel theoretical insights into understanding the impact of air pollutants on COPD, and these candidate targets may also represent potential molecular targets for early warning and targeted therapy of COPD. Future studies will further explore the relationship between these key genes and COPD, as well as their potential clinical applications.

ACKNOWLEDGMENTS

We would like to express our sincere gratitude to all the staff members of the databases utilized in this study for their valuable contributions.

Funding

The study was supported by the National TCM Advantage Special Construction Project (Pulmonary Department of the First Affiliated Hospital of Guangzhou University of Chinese Medicine), the Guangdong Provincial Natural Science Foundation (2024A1515012160) and Dongguan Science and Technology of Social Development Program (20231800935372, 20221800906122);

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

The datasets used in this study are all publicly available.

Author contributions

Dong Song: Methodology, Writing – original draft; Data curation, Formal analysis, Data curation; Lin Xie: Translation, Proofreading; Xuege Gao: Methodology, Software, Data curation, Formal analysis; Yushan Chen: Writing – original draft, Data curation, Formal analysis, Proofreading; Chunjun Zhong: Methodology, Supervision, Validation; Shaofeng Zhan: Funding acquisition, Methodology, Supervision; Huicong Li: Conceptualization, Funding acquisition, Methodology, Supervision, Writing – review & editing. Leshen Lian: Conceptualization, Funding acquisition, Investigation, Methodology, Supervision.

Ethical approval and consent to participate

The data utilized in this study were exclusively sourced from publicly accessible databases, primarily including the Gene Expression Omnibus (GEO) database and the PubChem database, among others. Given that these databases comprise de-identified data freely available to the public, the study did not necessitate ethical approval or the procurement of informed consent from participants. The data were exclusively employed for research objectives and adhered strictly to pertinent ethical standards governing the utilization of publicly accessible datasets.

Patient consent for publication

Not applicable.

REFERENCES
 
2026 Author(s)

This article is licensed under a Creative Commons [Attribution 4.0 International] license.
https://creativecommons.org/licenses/by/4.0/
feedback
Top