Genes & Genetic Systems
Online ISSN : 1880-5779
Print ISSN : 1341-7568
ISSN-L : 1341-7568
Full papers
Identification of a disease-specific gene expression profile of children with acute asthma by weighted gene co-expression network analysis
Yan Luo Jing WangWei LuYang LiuYun HuangDichun Luo
著者情報
キーワード: asthma, children, hub gene, modules, network
ジャーナル オープンアクセス HTML
電子付録

2020 年 95 巻 6 号 p. 315-321

詳細
ABSTRACT

Asthma is one of the most common diseases, with a high prevalence among children. To date, systemic co-expression analysis for this disease has not been undertaken to explain its pathogenesis. Here we identified differentially expressed genes (DEGs) in 87 samples, and then constructed co-expression modules via weighted gene co-expression network analysis (WGCNA) and investigated the functional enrichment of co-expressed genes in terms of Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG). Meanwhile, protein–protein interaction (PPI) network and miRNA–transcription factor–target (miRNA–TF–target) regulatory network analyses were performed to screen hub genes. As a result, 3,469 DEGs were identified in this study, of which 1,860 genes were up-regulated and 1,609 genes were down-regulated. Using WGCNA, we identified two key modules, named MEbrown and MEblue, that may play important roles in asthma. Functional enrichment analysis revealed that MEbrown was enriched in 37 KEGG pathways and 472 biological processes (BPs), while MEblue was enriched in 16 KEGG pathways and 449 BPs. From PPI and miRNA–TF–target regulatory network analysis, a total of 31 TFs, seven miRNAs and 28 nodes were identified. Our findings should provide a framework of therapeutic targets for treating children with acute asthma.

INTRODUCTION

Asthma is a common chronic disease, globally affecting about 300 million people of all ages, and its prevalence is still rising (Masoli et al., 2004). Pediatric asthma is especially concerning, with up to 9.5% of children having been affected (Akinbami et al., 2016). Asthma in children places a heavy burden not only on the children’s families, but also on the health care system of the whole society (Asher and Pearce, 2014).

Pediatric asthma is usually classified into four stages: acute onset, chronic persistent, clinical remission and other (Hoch et al., 2019). The acute onset stage is also known as the “attack” period. Children with acute asthma exhibit increasing chest tightness, shortness of breath, coughing and/or wheezing, along with dyspnea. They may even experience respiratory failure, putting their lives at risk (Pavord et al., 2018). Thus, identifying robust biomarkers that can aid asthma diagnosis or enable better prognostic prediction may help to improve prognosis and children’s quality of life.

With the development of bioinformatic approaches, systematic description and high-throughput data analysis are now widely used to provide important clues for molecular mechanism studies (Nam et al., 2009). Moreover, bioinformatic analysis can also reveal a systematic biological process that traverses different functional networks in different disease models (Chasman et al., 2016). In asthma-related research, however, only a few studies have investigated gene expression profiles. For example, Katayama et al. (2020) have identified a unique acute wheeze-specific gene module in blood from preschool wheezers. This module was associated with vitamin D levels and asthma medication. Another study from Aoki et al. (2009) showed expression profiles of genes related to asthma exacerbations. There were 137/16 genes that were significantly up-/down-regulated during an asthma exacerbation, of which 62 were also differentially expressed during upper respiratory infection. However, little is known about the gene expression profile of children with acute asthma. Here, we used a novel tool, weighted gene co-expression network analysis (WGCNA), to explore the molecular interaction mechanism and correlation networks of this disease.

As an effective bioinformatics method, WGCNA is able to clarify synergetic expressed modules at the transcriptome level. More importantly, it can be used to identify relationships between gene networks (Langfelder and Horvath, 2008). WGCNA has been successfully applied in different disease models to generate correlation networks, further to identifying candidate biomarkers or therapeutic targets (Pei et al., 2017). In the present study, based on datasets from the Gene Expression Omnibus (GEO), we used WGCNA to analyze differentially expressed genes (DEGs) from 87 samples. Key gene modules associated with DEGs were then identified. Further, the biological functions and pathways of genes in the key modules were analyzed. These informative genes identified in our study offer new insights into the pathogenesis of asthma; moreover, our findings may be beneficial to clinical treatment of children with acute asthma.

MATERIALS AND METHODS

Data collection and preprocessing

Datasets related to childhood asthma were obtained from the NCBI GEO (http://www-ncbi-nlm-nih-gov.ezproxy.lb.polyu.edu.hk/geo) with accession number GSE103166. GSE103166 consists of 87 samples, comprising 56 samples of children with acute asthma/wheeze and 31 control samples. The platform used was GPL23961 [HuGene-2_1-st] Affymetrix Human Gene 2.1 ST Array [HuGene21st_Hs_ENTREZG_20.0.0]. The GEO database download has been standardized. The limma package in R (http://www.bioconductor.org/packages/release/bioc/html/limma.html) was used to identify DEGs (children with asthma versus control) from the candidate genes. P < 0.05 was the cut-off criterion.

Screening children with acute asthma-related modules based on WGCNA

WGCNA (Langfelder and Horvath, 2008) is a typical biological algorithm for constructing gene co-expression networks (Bacchetti and Leung, 2002; Yin et al., 2018). We used WGCNA to analyze the expression values of the DEGs obtained in the previous screening in each group, and screened the modules and genes associated with acute asthma in children.

First, the trait-based node significance measure was calculated:   

S mn =| cor ( m,n ) |
where m and n are the expressed genes m and n, respectively. cor is the Pearson coefficient of these two vectors. This transition improved the reliability of the co-expression network by giving more weight to the strong connections and reducing the importance of the weak connections in the predicted co-expression network.

Next, a power function was applied to correlate the adjacency of genes, where β is soft-thresholding powers:   

a mn = power ( S mn ,β )
The adjacency matrix was then converted to a topological matrix using the following formula:   
w mn = l mn + a mn min{ k m , k n }3333333334e+1- a mn
lmn represents the sum of the adjacency coefficient area of the nodes connected to genes m and n, and km is the sum of the adjacency numbers of the nodes individually connected to gene m. The topological properties were also confirmed. Module identification was then accomplished with the dynamic tree cut method. Modules with high similarities were identified and merged together with a height cut-off of 0.95. Furthermore, P-values of gene expression difference between the asthma and control groups were evaluated with Student’s t-test, and log-transformed P-values were used to determine statistical significance. The mean value of gene significance for all genes in a particular module was defined as module significance.

Functional enrichment analysis

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed using the R package clusterProfiler for significant modules. The gene lists of modules were uploaded, and we obtained the results for biological process (BP) and KEGG pathways. P < 0.05 was regarded as significant.

Protein–protein interaction (PPI) network analysis

To construct a PPI network of DEGs, we first used the STRING (Szklarczyk et al., 2015) (version: 10.0, http://www.string-db.org/) database to predict and analyze the interactions between proteins encoded by DEGs. Cytoscape (Shannon et al., 2003) (version: 3.2.0, http://www.cytoscape.org/) software was then used to conduct PPI network analysis.

MicroRNA–transcription factor–target (miRNA–TF–target) co-regulation network analysis

In the PPI network, we used the overrepresentation enrichment analysis (ORA) enrichment method in WebGestalt (http://www.webgestalt.org/) to perform miRNA– TF–target enrichment prediction for genes with degree values > 5. The miRNA–TF–target co-regulation network was constructed by Cytoscape (Shannon et al., 2003).

RESULTS

Identification of DEGs between children with acute asthma and healthy samples

To identify DEGs, gene microarray data of 87 samples were obtained from the GEO database (http://www-ncbi-nlm-nih-gov.ezproxy.lb.polyu.edu.hk/geo). After standardization and gene annotation, the gene expression distribution of each sample was depicted as S1. A total of 3,469 DEGs were identified, among which 1,860 genes were up-regulated and 1,609 genes were down-regulated in the asthma group. A gene expression heatmap and volcano plot are shown in Fig. 1A and Fig. 1B, respectively.

Fig. 1.

Gene expression heatmap (A) and volcano plot (B) of children with acute asthma. Red represents up-regulated genes and green represents down-regulated genes compared with control.

Identification of key modules based on WGCNA

To clarify asthma-related key modules, WGCNA was performed on the 3,469 DEGs. Prior to WGCNA, in order to have relatively balanced scale independence and mean connectivity of the WGCNA, we analyzed network topology for various soft-thresholding powers. In Fig. 2A, we took power as 8 when the correlation index reached 0.90. The hierarchical cluster dendrogram of the 3,469 genes was then analyzed (Fig. 2B). We set the MEDissThres as 0.25 to merge similar modules, and seven modules were generated (Fig. 2C), which are shown in Table 1. Interaction-based relationships for the seven modules are displayed as a heatmap in Fig. 2D, and the results revealed that there is a high level of independence among the modules.

Fig. 2.

WGCNA analysis based on DEGs and identification of key modules. (A) Analysis of the scale-free fit index for various soft-thresholding powers (β). The left panel displays the influence of soft-thresholding power (x-axis) on scale-free fit index (y-axis). The right panel shows the influence of soft-thresholding power (x-axis) on mean connectivity (degree, y-axis). (B) Cluster dendrogram of genes, with dissimilarity based on topological overlap, together with assigned module colors. Dynamic tree cutting was applied to identify modules by dividing the dendrogram at significant branch points. Modules are displayed with different colors in the horizontal bars below the dendrogram, with gray representing unassigned genes. (C) Diagram of correlation of module colors with asthma. Colors represent modules and the y-axis represents gene significance. (D) Heatmap plot of the adjacencies in the hub gene network. In the heatmap, red represents high adjacency (positive correlation), while blue represents low adjacency (negative correlation).

Table 1. Correlation-values and P-values of different modules
ModuleCorrelationP-value
MEbrown0.365.19E-04
MEblack−0.366.71E-04
MEgreen−0.441.91E-05
MEblue−0.417.12E-05
MEred0.366.41E-04
MEyellow0.391.88E-04
MEgray0.621.92E-10

Data for key modules are shown in bold type.

Taken together, the brown module (MEbrown) and blue module (MEblue) were considered to be the key modules. Gene significance versus module membership plots of these two models are shown in Supplementary Fig. S1.

Functional enrichment analysis of genes in the MEbrown and MEblue modules

For functional enrichment analysis, we used KEGG pathway enrichment analysis and GO-BP methods separately. First, the genes included in MEbrown and MEblue modules were displayed in a heatmap plot (Fig. 3A). A total of 326 and 2,170 genes were identified in MEbrown and MEblue modules, respectively. In enrichment analysis, MEbrown was enriched in 37 KEGG pathways and 472 BPs, while MEblue was enriched in 16 KEGG pathways and 449 BPs. The results are shown in Fig. 3B, with detailed information provided in Table 2.

Fig. 3.

Functional enrichment analysis, PPI network analysis and miRNA–TF–target regulatory network. (A) Heatmap plot of gene expression in MEblue (left panel) and MEbrown (right panel). Red and green represent up- and down-regulated genes, respectively, compared with control. (B) Enrichment results for MEblue (upper panels) and MEbrown module genes (lower panels) for BP-top10 (bottom left panels) and KEGG-top10 (right panels). (C) PPI network analysis for MEbrown and MEblue. Circles and squares represent up- and down-regulated genes, respectively, while brown and blue mark genes in MEbrown and MEblue. Node size is based on the PPI degree value. (D) miRNA–TF–target regulatory network. Circles and squares indicate up- and down-regulated genes, respectively. Brown and blue indicate genes in MEbrown and MEblue. Yellow hexagons indicate TFs and green triangles indicate miRNAs incorporated into this network. Node size is based on the degree value.

Table 2. KEGG pathway and GO enrichment analysis of MEblue and MEbrown
ModuleIDDescriptionCountP-value
BlueKEGGhsa04710Circadian rhythm103.97E-04
hsa00514Other types of O-glycan biosynthesis113.21E-03
hsa04120Ubiquitin-mediated proteolysis234.44E-03
hsa04340Hedgehog signaling pathway116.34E-03
hsa04110Cell cycle201.28E-02
hsa04211Longevity regulating pathway151.99E-02
hsa04960Aldosterone-regulated sodium reabsorption82.07E-02
hsa05168Herpes simplex virus 1 infection602.40E-02
hsa04141Protein processing in endoplasmic reticulum242.45E-02
hsa00310Lysine degradation112.73E-02
BPGO:0006486Protein glycosylation465.47E-06
GO:0043413Macromolecule glycosylation465.47E-06
GO:0070085Glycosylation461.68E-05
GO:0048511Rhythmic process492.02E-05
GO:0009101Glycoprotein biosynthetic process545.46E-05
GO:0006900Vesicle budding from membrane229.02E-05
GO:0061458Reproductive system development631.04E-04
GO:0046785Microtubule polymerization181.09E-04
GO:0048193Golgi vesicle transport551.30E-04
GO:0031109Microtubule polymerization or depolymerization231.39E-04
BrownKEGGhsa04640Hematopoietic cell lineage31.30E-03
hsa05146Amoebiasis22.15E-02
hsa04668TNF signaling pathway22.56E-02
hsa05418Fluid shear stress and atherosclerosis23.81E-02
BPGO:0072073Kidney epithelium development65.68E-09
GO:0001822Kidney development79.89E-09
GO:0072001Renal system development71.42E-08
GO:0001655Urogenital system development73.22E-08
GO:0072009Nephron epithelium development58.94E-08
GO:0035850Epithelial cell differentiation involved in kidney development41.26E-07
GO:0061005Cell differentiation involved in kidney Development42.93E-07
GO:0072006Nephron development53.35E-07
GO:0072160Nephron tubule epithelial cell differentiation34.74E-07
GO:0032835Glomerulus development45.13E-07

PPI network analysis of MEbrown and MEblue modules

Next, PPI network analysis was used to predict and analyze the interactions between proteins encoded by DEGs in MEbrown and MEblue. As shown in Fig. 3C, there were 151 nodes and 231 interaction pairs in the whole network. The top 20 nodes of this network are listed in Table 3.

Table 3. Top 20 nodes based on PPI-degree
NodeDescriptionDegree
NOTCH1down-brown16
GNL3up-blue13
PECAM1down-brown13
ARG1up-brown11
CATdown-brown10
LIFdown-blue10
KITLGup-blue9
C3AR1up-blue9
CFPdown-brown9
CFTRup-blue9
PROM1down-blue8
SCNN1Gup-blue7
MMEdown-brown7
SOCS3down-brown7
CD79Adown-blue7
POSTNup-blue7
MUC13up-blue6
ST6GAL1up-blue6
MEF2Cdown-brown6
CD1Cdown-blue6

miRNA–TF–target regulatory network

To determine whether miRNAs and TFs are responsible for the observed altered gene expression in the MEbrown and MEblue modules, the miRNA–TF–target regulatory network was obtained. As shown in Fig. 3D, a total of 31 TFs, seven miRNAs and 28 nodes were identified. Moreover, there were 38 pairs of regulatory relationships in this network, of which five were up-regulated (round shape, including KITLG, SCNN1G, CFTR, POSTN and ST6GAL1) while six were down-regulated (square shape, including NOTCH1, CD79A, MEF2C, PROM1, LIF and SOCS3).

DISCUSSION

Acute asthma in children is considered to be a disease with a high prevalence. To achieve early diagnosis and treatment of pediatric asthma, more knowledge about the disease mechanism and gene expression profile is urgently needed. In the present study, by searching the GEO database, we have identified 3,469 DEGs, of which 1,860 were up-regulated and 1,609 were down-regulated (Supplementary Fig. S2). We then applied WGCNA to identify the key modules and regulators of gene expression among DEGs.

Most existing bioinformatic tools focus only on unweighted networks, whereas WGCNA can be used for both weighted and unweighted correlation networks, and to further explore the module (cluster) structure in a network. Moreover, it also can be used to rank genes or modules in independent data sets (Langfelder and Horvath, 2008). So far, this approach has been widely used with different disease-related datasets. For example, in glioblastoma multiforme, Yang et al. (2018) performed WGCNA to analyze 1,913 DEGs between a glioblastoma group and a control group and found five modules that are mainly enriched in the inflammatory response. Based on WGCNA, Yin et al. (2018) identified key pathways and genes in different stages of hepatocellular carcinoma. In the modules they identified were genes enriched in energy metabolism, protein ubiquitination and ephrin receptor signaling pathways. In our study, the first step was to identify the key modules based on WGCNA. To this end, network topology analysis for various soft-thresholding powers (Fig. 2A) and dynamic tree-cutting analysis (Fig. 2B) were conducted. This yielded seven modules (Fig. 2D), among which two modules, the brown (MEbrown) and blue module (MEblue), were considered to be the key modules for further analysis (Fig. 2D). A total of 326 and 2,170 genes were included in MEbrown and MEblue modules, respectively (Fig. 3A).

To investigate the networks of these two key modules, functional enrichment analysis was performed using KEGG pathway enrichment analysis and GO-BP methods. MEbrown was enriched in 37 KEGG pathways and 472 BPs, while MEblue was enriched in 16 KEGG pathways and 449 BPs. Genes related to protein glycosylation, macromolecule glycosylation and circadian rhythm were most enriched in MEblue, while genes related to regulation of cell−cell adhesion, liver development, osteoclast differentiation and the C-type lectin receptor signaling pathway were more enriched in MEbrown (Fig. 3B). PPI network analysis was subsequently used to predict and analyze the interactions between proteins encoded by DEGs in MEbrown and MEblue. PPI networks are tools to understand cell functions, disease machinery and drug design/repositioning, because they can be used to map the relationships between species through conserved pathways and protein complexes (Athanasios et al., 2017; Miryala et al., 2018). In our study, we identified 151 nodes and 231 interaction pairs in the whole PPI network (Fig. 3C). Finally, based on previous analysis, a miRNA–TF–target regulatory network was generated (Fig. 3D), with 31 TFs, seven miRNAs and 28 nodes identified. Moreover, for the hub genes, five genes (KITLG, SCNN1G, CFTR, POSTN and ST6GAL1) were up-regulated, and six genes (NOTCH1, CD79A, MEF2C, PROM1, LIF and SOCS3) were down-regulated. A previously published study demonstrated that the Notch1–GATA3 signaling pathway is involved in inflammation in asthma (Chong et al., 2014). The authors showed that Notch1 and Notch2 receptors play an important role in regulating allergic airway inflammation. These receptors might act as therapeutic options in the treatment of allergic asthma.

In summary, our study used WGCNA network analysis to identify co-expression modules for genes of children with acute asthma. Two modules, named MEbrown and MEblue, were found to be highly enriched for genes involved in many pathways. Furthermore, we identified 11 hub genes that are potential biomarkers for diagnosis and treatment of children with acute asthma.

REFERENCES
 
© 2020 The Author(s).

This is an open access article distributed under the terms of the Creative Commons BY 4.0 International (Attribution) License (https://creativecommons.org/licenses/by/4.0/legalcode), which permits the unrestricted distribution, reproduction and use of the article provided the original source and authors are credited.
https://creativecommons.org/licenses/by/4.0/legalcode
feedback
Top