Identiﬁcation of a disease-speciﬁc gene expression proﬁle of children with acute asthma by weighted gene co-expression network analysis

Asthma is one of the most common diseases, with a high prevalence among children. To date, systemic co-expression analysis for this disease has not been undertaken to explain its pathogenesis. Here we identiﬁed differentially expressed genes (DEGs) in 87 samples, and then constructed co-expression modules via weighted gene co-expression network analysis (WGCNA) and investigated the functional enrichment of co-expressed genes in terms of Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG). Meanwhile, protein–protein interaction (PPI) network and miRNA–transcription factor–target (miRNA–TF– target) regulatory network analyses were performed to screen hub genes. As a result, 3,469 DEGs were identiﬁed in this study, of which 1,860 genes were up-regulated and 1,609 genes were down-regulated. Using WGCNA, we identiﬁed two key modules, named MEbrown and MEblue, that may play important roles in asthma. Functional enrichment analysis revealed that MEbrown was enriched in 37 KEGG pathways and 472 biological processes (BPs), while MEblue was enriched in 16 KEGG pathways and 449 BPs. From PPI and miRNA–TF–target regulatory network analysis, a total of 31 TFs, seven miRNAs and 28 nodes were identiﬁed. Our ﬁndings should provide a framework of therapeutic targets for treating children with acute asthma.


INTRODUCTION
Asthma is a common chronic disease, globally affecting about 300 million people of all ages, and its prevalence is still rising (Masoli et al., 2004). Pediatric asthma is especially concerning, with up to 9.5% of children having been affected (Akinbami et al., 2016). Asthma in children places a heavy burden not only on the children's families, but also on the health care system of the whole society (Asher and Pearce, 2014).
Pediatric asthma is usually classified into four stages: acute onset, chronic persistent, clinical remission and other (Hoch et al., 2019). The acute onset stage is also known as the "attack" period. Children with acute asthma exhibit increasing chest tightness, shortness of breath, coughing and/or wheezing, along with dyspnea. They may even experience respiratory failure, putting their lives at risk (Pavord et al., 2018). Thus, identifying robust biomarkers that can aid asthma diagnosis or enable better prognostic prediction may help to improve prognosis and children's quality of life.
With the development of bioinformatic approaches, systematic description and high-throughput data analysis are now widely used to provide important clues for molecular mechanism studies (Nam et al., 2009). Moreover, bioinformatic analysis can also reveal a systematic biological process that traverses different functional networks in different disease models (Chasman et al., 2016). In asthma-related research, however, only a few studies have investigated gene expression profiles. For example, Katayama et al. (2020) have identified a unique acute wheeze-specific gene module in blood from preschool wheezers. This module was associated with vitamin D Edited by Hie Lim Kim levels and asthma medication. Another study from Aoki et al. (2009) showed expression profiles of genes related to asthma exacerbations. There were 137/16 genes that were significantly up-/down-regulated during an asthma exacerbation, of which 62 were also differentially expressed during upper respiratory infection. However, little is known about the gene expression profile of children with acute asthma. Here, we used a novel tool, weighted gene co-expression network analysis (WGCNA), to explore the molecular interaction mechanism and correlation networks of this disease.
As an effective bioinformatics method, WGCNA is able to clarify synergetic expressed modules at the transcriptome level. More importantly, it can be used to identify relationships between gene networks (Langfelder and Horvath, 2008). WGCNA has been successfully applied in different disease models to generate correlation networks, further to identifying candidate biomarkers or therapeutic targets (Pei et al., 2017). In the present study, based on datasets from the Gene Expression Omnibus (GEO), we used WGCNA to analyze differentially expressed genes (DEGs) from 87 samples. Key gene modules associated with DEGs were then identified. Further, the biological functions and pathways of genes in the key modules were analyzed. These informative genes identified in our study offer new insights into the pathogenesis of asthma; moreover, our findings may be beneficial to clinical treatment of children with acute asthma.

MATERIALS AND METHODS
Data collection and preprocessing Datasets related to childhood asthma were obtained from the NCBI GEO (http://www-ncbi-nlm-nih-gov.ezproxy.lb.polyu.edu.hk/ geo) with accession number GSE103166. GSE103166 consists of 87 samples, comprising 56 samples of children with acute asthma/wheeze and 31 control samples. The platform used was GPL23961 [HuGene-2_1-st] Affymetrix Human Gene 2.1 ST Array [HuGene21st_Hs_ ENTREZG_20.0.0]. The GEO database download has been standardized. The limma package in R (http:// www.bioconductor.org/packages/release/bioc/html/limma. html) was used to identify DEGs (children with asthma versus control) from the candidate genes. P < 0.05 was the cut-off criterion.
Screening children with acute asthma-related modules based on WGCNA WGCNA (Langfelder and Horvath, 2008) is a typical biological algorithm for constructing gene co-expression networks (Bacchetti and Leung, 2002;Yin et al., 2018). We used WGCNA to analyze the expression values of the DEGs obtained in the previous screening in each group, and screened the modules and genes associated with acute asthma in children.
First, the trait-based node significance measure was calculated: where m and n are the expressed genes m and n, respectively. cor is the Pearson coefficient of these two vectors. This transition improved the reliability of the co-expression network by giving more weight to the strong connections and reducing the importance of the weak connections in the predicted co-expression network. Next, a power function was applied to correlate the adjacency of genes, where β is soft-thresholding powers: a power mn S mn ,E The adjacency matrix was then converted to a topological matrix using the following formula: w l a k k e a mn mn mn m n mn ` min , 3333333334 1 l mn represents the sum of the adjacency coefficient area of the nodes connected to genes m and n, and k m is the sum of the adjacency numbers of the nodes individually connected to gene m. The topological properties were also confirmed. Module identification was then accomplished with the dynamic tree cut method. Modules with high similarities were identified and merged together with a height cut-off of 0.95. Furthermore, P-values of gene expression difference between the asthma and control groups were evaluated with Student's t-test, and logtransformed P-values were used to determine statistical significance. The mean value of gene significance for all genes in a particular module was defined as module significance.
Functional enrichment analysis Gene Ontology (GO)   and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed using the R package clusterProfiler for significant modules. The gene lists of modules were uploaded, and we obtained the results for biological process (BP) and KEGG pathways. P < 0.05 was regarded as significant.

RESULTS
Identification of DEGs between children with acute asthma and healthy samples To identify DEGs, gene microarray data of 87 samples were obtained from the GEO database (http://www-ncbi-nlm-nih-gov. ezproxy.lb.polyu.edu.hk/geo). After standardization and gene annotation, the gene expression distribution of each sample was depicted as S1. A total of 3,469 DEGs were identified, among which 1,860 genes were up-regulated and 1,609 genes were down-regulated in the asthma group. A gene expression heatmap and volcano plot are shown in Fig. 1A and Fig. 1B, respectively.
Identification of key modules based on WGCNA To clarify asthma-related key modules, WGCNA was performed on the 3,469 DEGs. Prior to WGCNA, in order to have relatively balanced scale independence and mean connectivity of the WGCNA, we analyzed network topol-ogy for various soft-thresholding powers. In Fig. 2A, we took power as 8 when the correlation index reached 0.90. The hierarchical cluster dendrogram of the 3,469 genes was then analyzed (Fig. 2B). We set the MEDis-sThres as 0.25 to merge similar modules, and seven modules were generated (Fig. 2C), which are shown in Table  1. Interaction-based relationships for the seven modules are displayed as a heatmap in Fig. 2D, and the results revealed that there is a high level of independence among the modules. Taken together, the brown module (MEbrown) and blue module (MEblue) were considered to be the key modules. Gene significance versus module membership plots of these two models are shown in Supplementary Fig. S1.
Functional enrichment analysis of genes in the MEbrown and MEblue modules For functional enrichment analysis, we used KEGG pathway enrichment analysis and GO-BP methods separately. First, the genes included in MEbrown and MEblue modules were displayed in a heatmap plot (Fig. 3A). A total of 326 and 2,170 genes were identified in MEbrown and MEblue modules, respectively. In enrichment analysis, MEbrown was enriched in 37 KEGG pathways and 472 BPs, while MEblue was enriched in 16 KEGG pathways and 449 BPs. The results are shown in Fig. 3B, with detailed information provided in Table 2.
PPI network analysis of MEbrown and MEblue modules Next, PPI network analysis was used to predict and analyze the interactions between proteins encoded by DEGs in MEbrown and MEblue. As shown in Fig. 3C, there were 151 nodes and 231 interaction pairs in the whole network. The top 20 nodes of this network are listed in Table 3.
miRNA-TF-target regulatory network To determine whether miRNAs and TFs are responsible for the observed altered gene expression in the MEbrown and MEblue modules, the miRNA-TF-target regulatory network was obtained. As shown in Fig. 3D, a total of 31 TFs, seven miRNAs and 28 nodes were identified. Moreover, there were 38 pairs of regulatory relationships in this network, of which five were up-regulated (round shape, including KITLG, SCNN1G, CFTR, POSTN and ST6GAL1) while six were down-regulated (square shape, including NOTCH1, CD79A, MEF2C, PROM1, LIF and SOCS3).

DISCUSSION
Acute asthma in children is considered to be a disease with a high prevalence. To achieve early diagnosis and treatment of pediatric asthma, more knowledge about the disease mechanism and gene expression profile is urgently needed. In the present study, by searching the GEO database, we have identified 3,469 DEGs, of which 1,860 were up-regulated and 1,609 were down-regulated ( Supplementary Fig. S2). We then applied WGCNA to identify the key modules and regulators of gene expression among DEGs.
Most existing bioinformatic tools focus only on unweighted networks, whereas WGCNA can be used for both weighted and unweighted correlation networks, and to further explore the module (cluster) structure in a network. Moreover, it also can be used to rank genes or modules in independent data sets (Langfelder and Horvath, 2008). So far, this approach has been widely used with different disease-related datasets. For example, in glioblastoma multiforme, Yang et al. (2018) performed WGCNA to analyze 1,913 DEGs between a glioblastoma group and a control group and found five modules that are mainly enriched in the inflammatory response. Based on WGCNA, Yin et al. (2018) identified key pathways and genes in different stages of hepatocellular carcinoma. In the modules they identified were genes enriched in energy metabolism, protein ubiquitination and ephrin receptor signaling pathways. In our study, the first step was to identify the key modules based on WGCNA. To this end, network topology analysis for various soft-thresholding powers ( Fig. 2A) and dynamic tree-cutting analysis (Fig. 2B) were conducted. This yielded seven modules (Fig. 2D), among which two mod- ules, the brown (MEbrown) and blue module (MEblue), were considered to be the key modules for further analysis (Fig. 2D). A total of 326 and 2,170 genes were included in MEbrown and MEblue modules, respectively (Fig. 3A).
To investigate the networks of these two key modules, functional enrichment analysis was performed using KEGG pathway enrichment analysis and GO-BP methods. MEbrown was enriched in 37 KEGG pathways and 472 BPs, while MEblue was enriched in 16 KEGG pathways and 449 BPs. Genes related to protein glycosylation, macromolecule glycosylation and circadian rhythm were most enriched in MEblue, while genes related to regulation of cell − cell adhesion, liver development, osteoclast differentiation and the C-type lectin receptor signaling pathway were more enriched in MEbrown (Fig. 3B). PPI network analysis was subsequently used to predict and analyze the interactions between proteins encoded by DEGs in MEbrown and MEblue. PPI networks are tools to understand cell functions, disease machinery and drug design/repositioning, because they can be used to map the relationships between species through conserved pathways and protein complexes (Athanasios et al., 2017;Miryala et al., 2018). In our study, we identified 151 nodes and 231 interaction pairs in the whole PPI network (Fig. 3C). Finally, based on previous analysis, a miRNA-TF-target regulatory network was generated (Fig. 3D), with 31 TFs, seven miR-NAs and 28 nodes identified. Moreover, for the hub genes, five genes (KITLG, SCNN1G, CFTR, POSTN and ST6GAL1) were up-regulated, and six genes (NOTCH1, CD79A, MEF2C, PROM1, LIF and SOCS3) were downregulated. A previously published study demonstrated that the Notch1-GATA3 signaling pathway is involved in inflammation in asthma (Chong et al., 2014). The authors showed that Notch1 and Notch2 receptors play an important role in regulating allergic airway inflammation. These receptors might act as therapeutic options in the treatment of allergic asthma.
In summary, our study used WGCNA network analysis to identify co-expression modules for genes of children with acute asthma. Two modules, named MEbrown and MEblue, were found to be highly enriched for genes involved in many pathways. Furthermore, we identified 11 hub genes that are potential biomarkers for diagnosis and treatment of children with acute asthma.