The Tohoku Journal of Experimental Medicine
Online ISSN : 1349-3329
Print ISSN : 0040-8727
ISSN-L : 0040-8727
Regular Contribution
Mining the Biomarkers and Associated-Drugs for Esophageal Squamous Cell Carcinoma by Bioinformatic Methods
Xiuying KuangZhihui Liu
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML

2022 Volume 256 Issue 1 Pages 27-36

Details
Abstract

Esophageal squamous cell carcinoma (ESCC) showed limited treatment outcome and poor prognosis. This study aimed to screen potential biomarkers and drugs in ESCC. Firstly, GSE26886, GSE111044 and GSE77861 were downloaded from the Gene Expression Omnibus (GEO) database. Next, the differentially expressed genes (DEGs) between cancer and noncancerous tissues were analyzed by the GEO2R. The Gene Ontology (GO) annotation, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation, the protein-protein interaction (PPI) analysis and hub genes screened were conducted by some bioinformatic methods, respectively. Lastly, the hub genes and potential drugs were verified by the GEPIA2 and the QuartataWeb database. The results showed that 13 up-regulated genes and 81 down-regulated genes were identified. In GO terms, DEGs were mainly associated with cell proliferation, cell migration and cell differentiation. DEGs did not cluster into the KEGG pathway. After hub genes validated, nine genes (FLG, COL1A1, COL1A2, PSCA, SCEL, PPL, ACPP, CNFN, and A2ML1) expression trends showed no change. Moreover, higher COL1A1 or COL1A2 expression for ESCC patients showed poor prognosis. Finally, five drugs used for promoting blood coagulation were identified. Probably, these drugs could show anticancer effects by promoting blood coagulation or inhibiting vascular formation in cancers, which offers a novel idea for the treatment of ESCC.

Introduction

Esophageal cancer (EC) is the eighth common harmful cancer of digestive system, and is associated with a high mortality rate globally (Arnold et al. 2015). Esophageal squamous cell carcinoma (ESCC) is one of the sub-types of EC, which has been defined according to its epidemiology and histopathology (Forde and Kelly 2013). ESCC remains responsible for the most majority of EC cases (Liu et al. 2021), though its incidence is declining in most countries.

The prognosis of ESCC is poor, owing to the lack of effective early biomarkers and treatment strategies (Zhang and Jain 2020; Liu et al. 2021; Yang et al. 2021). The overall 5-year survival are less than 25% in the late-stage of ESCC patients. By contrast, the 5-year survival rates are more than 85% for patients with ESCC diagnosed in early stages (Lordick et al. 2016). The endoscopy is often used as a primary diagnostic tool to screen ESCC; however, its application is limited, owing to its serious side effects and dependence on the skill of the endoscopist (Marginean and Dhanpat 2020; Su et al. 2020; Zhang and Jain 2020). Moreover, due to its invasive nature, some patients refuse to undergo endoscopy (Lin et al. 2013). Therefore, identification of effective methods for the diagnosis of ESCC is imperative.

High-throughput sequencing has been broadly applied in many diseases, particularly, in cancers (Freimanis and Oade 2020; Kamio et al. 2020; Pillay et al. 2020). Various public databases are available for obtaining clinical and sequencing data that are uploaded after analysis. Reanalysis of these data through bioinformatic methods can provide new insights into cancers. Although some bioinformatic studies on ESCC have been conducted (Zhang et al. 2020; Xue et al. 2021; Ye et al. 2021), all these studies attempted to mine a single gene signature related to cancers, but a single gene may not be sufficient to understand the cancer, because tumorigenesis is a multi-factor, multi-stage, and multi-gene process. Moreover, these studies inadvertently ignored the genes and targeted drugs. Thus, the diagnosis of ESCC remains challenging, resulting in poor prognosis of patients.

“The idea of repurposing of old drugs” has received considerable attention of researchers who are aiming to develop an effective cancer therapy. Old drugs are defined as the Food and Drug Administration (FDA) approved drugs or those with safe clinical application. For example, metformin, a widely prescribed antidiabetic drug used to treat type II diabetes, has shown tumor suppressive properties (Quinn et al. 2013; Yue et al. 2014; Anisimov 2016). Interestingly, the combination of metformin with targeted drugs has been reported to improve targeted therapeutic efficacies in cancers (Morales and Morris 2015; Armstrong et al. 2021; Cunha Junior et al. 2021; Jang et al. 2021). Therefore, we propose the hypothesis whether the old drugs can be screened through bioinformatics methods to identify new anticancer drugs for ESCC.

In this study, comprehensive bioinformatic analyses were used to discover potential biomarkers and available drugs for ESCC. Firstly, three microarray datasets in the Gene Expression Omnibus (GEO) database were selected and analyzed. Then, the differentially expressed genes (DEGs) between ESCC and healthy groups were identified. Furthermore, the Gene Ontology (GO) annotation, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation, and protein protein interaction (PPI) analysis were conducted among the DEGs through several bioinformatics methods. Finally, the potential biomarkers and correlated pathways, as well as the drugs, which might be associated with ESCC, were identified. The candidate genes/drugs identified in the study may serve as promising diagnostic biomarkers or therapeutic targets for ESCC.

Materials and Methods

Microarray data information

GEO database possess lots of gene expression data, including high-throughput data, hybridization arrays and microarrays data (Barrett et al. 2013). GSE26886, GSE111044 and GSE77861 were downloaded from GEO database. The platforms of the microarray datasets were GPL570 (Affymetrix Human Genome U133 Plus 2.0 Array). GSE26886 comprised 21 cancer tissue samples and 9 normal tissue samples. GSE111044 comprised 3 cancer tissue samples and 3 normal tissue samples. GSE77861 comprised 7 cancer tissue samples and 7 normal tissue samples (Table 1).

Table 1.

A Summary of microarray datasets from Gene Expression Omnibus (GEO) database.

Identification of DEGs

The DEGs between ESCC and normal specimen were found via the GEO2R. The cut‐off criteria were set as |logFC| > 2 and P value < 0.05. Then, the commonly DEGs among the three datasets were screened in Venn software. The logFC < −2 represented down-regulated genes, meanwhile logFC > 2 represented up-regulated genes among these DEGs.

Gene ontology analysis and KEGG pathway enrichment analysis

The Gene ontology (GO) analysis is used to identify the biological properties of genome data of genes or high-throughput transcriptome (Dalmer and Clugston 2019). The KEGG stores various databases, including genomes, biological pathways, diseases and drugs (Kanehisa et al. 2017). The Enrichr website (Kuleshov et al. 2016) was used to show the GO analysis and KEGG pathways analysis (P < 0.05).

The PPI network construction and hub genes selected

The STRING database (STRING, http://string-db.org) (Crosara et al. 2018) was performed to interact with the DEGs. Then, a PPI network was analyzed and visualized by the Cytoscape software (http://www.cytoscape.org/). The top 10 hub genes were selected by the plugin cytoHubba (Chin et al. 2014) (scores > 2).

Expression and prognosis analysis of hub genes in GEPIA2

GEPIA2 (http://gepia2.cancer-pku.cn/), a web-based tool, can interactively analyze the gene expression profiling of normal and cancer (Tang et al. 2019). The hub genes were reconfirmed by the boxplot analyses. The Kaplan-Meier curves assessed the prognosis of ESCC patients with hub genes.

The construction of drug-gene interaction

The QuartataWeb (http://quartata.csb.pitt.edu/) supports searching for information on drug-gene interaction (Li et al. 2020). The identified hub genes via GEPIA2 were uploaded into the database to screen existing drugs or compounds.

Statistics analysis

The moderate t-test was applied to identify DEGs. P < 0.05 was considered as statistically significant.

Results

Identification of DEGs

DEGs (2055 in GSE26886, 239 in GSE778611 and 1478 in GSE111044) were identified by GEO2R. The overlap in the three datasets contained 94 genes, as showed in the Venn diagram (Fig. 1). As a result, 13 up-regulated genes and 81 down-regulated genes were obtained (Table 2).

Fig. 1.

A total of 94 differentially expressed genes (DEGs) in the three datasets (GSE26886, GSE77861 and GSE111044).

Different color meant different datasets. (A) The common 13 up-regulated DEGs. (B) The common 81 down-regulated DEGs.

Table 2.

The identified differentially expressed genes (DEGs).

Gene ontology and KEGG pathway enrichment analysis

In GO terms of Biological Process (BP) annotation, it was mainly enriched in the “Positive regulation of cell proliferation”, “Skeletal system development”, “Lipoxin metabolic process”, “Proteolysis”, “Regulation of peptidase activity”, “Negative regulation of cytokine-mediated signaling pathway”, “Skin morphogenesis”, and “Keratinocyte differentiation”. In Cellular Component (CC) annotation, it was enriched in the “Extracellular exosome”, “Extracellular space”, “Extracellular region”, “Collagen type I trimer”, “Proteinaceous extracellular matrix”, and “Extracellular matrix”. In Molecular Function (MF) annotation, the “Monooxygenase activity”, “Iron ion binding”, “Serine-type peptidase activity”, “Extracellular matrix structural constituent”, “Cytokine activity”, “Platelet-derived growth factor binding”, and “Cyclin-dependent protein serine/threonine kinase regulator activity” were clustered. Additionally, DEGs did not cluster into the KEGG pathway according to the cut-off criteria (Table 3).

Table 3.

The Gene ontology (GO) analysis.

*BP, Biology process; CC, Cellular component; MF, Molecular function.

The construction of PPI and hub genes analysis

As shown in Fig. 2A and B, the blue ones described down-regulated genes, while the red nodes described up-regulated genes. There were 41 genes/nodes with 39 edges enriched in the network. The top 10 hub genes were: FLG, TMPRSS2, COL1A1, COL1A2, PSCA, SCEL, PPL, ACPP, CNFN, and A2ML1 (Fig. 2B). All parameters in cytoHubba were set by default.

Fig. 2.

The construction of protein-protein interaction (PPI) network and hub genes analysis.

(A) The PPI networks of differentially expressed genes. (B) The top 10 genes in the PPI networks. Red represents upregulated genes while blue represents downregulated genes.

Validation in the TCGA and GTEx projects

The hub genes were uploaded into the GEPIA2 website to identify the reliability of the hub genes in ESCC. Among all the ten hub genes, 9 of them (FLG, COL1A1, COL1A2, PSCA, SCEL, PPL, ACPP, CNFN, and A2ML1) were reconfirmed via GEPIA2, and the expression trends showed no change. COL1A1 and COL1A2 showed increased more significantly in ESCC tissues than in normal tissues (Fig. 3). Moreover, compared to lower COL1A1 or COL1A2 expression for ESCC patients, disease-free survival rate with higher COL1A1 or COL1A2 expression showed worse prognosis (Fig. 4), which implied that COL1A1 and COL1A2 had significant relevance to the progression of ESCC.

Fig. 3.

Boxplot graphs showing the expression levels of hub genes in tumor and normal tissues of esophageal squamous cell carcinoma (ESCC) patients.

*P < 0.05. ESCA, esophageal carcinoma.

Fig. 4.

Kaplan-Meier curves for disease-free survival in the esophageal squamous cell carcinoma (ESCC) patients stratified by high or low.

(A) COL1A1 and (B) COL1A2.

TPM, transcripts per million; HR, hazard ratio.

The construction of drug-gene interaction

The top 10 hub genes were uploaded into QuartataWeb database for drug-gene interaction analysis. As shown in Fig. 5A, just COL1A1 and COL1A2 were identified and matched to 22 predicted drugs. Among the 22 drugs, the top 5 medications approved by FDA “Collagenase clostridium histolyticum”, “Halofuginone”, “Vonicog Alfa”, “Von Willebrand Factor Human”, and “Clove oil" were selected (Table 4). Besides, COL1A1 and COL1A2 were involved in “Platelet activation”, “ECM-receptor interaction”, “PI3K-Akt signaling pathway” and others in the KEGG pathway (Fig. 5B).

Fig. 5.

The construction of drug-gene interaction and functional analysis of hub genes.

(A) The potential drugs targeted to the hub genes. (B) The KEGG pathway associated with 2 hub genes.

Table 4.

The significant drugs targeted to hub genes.

Discussion

Despite the advancement of surgical and medical therapy for ESCC, the overall mortality of ESCC accounts for the sixth leading cause of cancer-related deaths (Arnold et al. 2015). High mortality in ESCC can be attributed to insufficient early-stage detection methods, chemotherapy resistance, and a high recurrence and metastasis risk (Chen et al. 2019). Therefore, the identification of reliable biomarkers for the diagnosis and treatment of ESCC is urgently required. With the rapid development of bioinformatics methods, many microarrays and sequencing data have been generated, which provide a comprehensive and convenient strategy for screening the genetic alterations and identifying the molecular mechanisms for the diagnosis and prognosis of cancers (Barrett et al. 2013; Arnold et al. 2015; Kuleshov et al. 2016; Chen et al. 2019; Li et al. 2020).

In this study, GSE26886, GSE111044 and GSE77861 were analyzed through GEO2R for identification DEGs between ESCC tissues and normal tissues. A total of 94 DEGs (13 up-regulated genes and 81 down-regulated genes) were screened. In Biological Process (BP) annotation, these genes were mainly enriched in the cell proliferation, cell migration and cell differentiation, such as “Positive regulation of cell proliferation”, “Skeletal system development”, and “Lipoxin metabolic process”. In Cellular Component (CC) annotation, these genes were mainly enriched in the “Extracellular exosome”, “Extracellular space”, “Extracellular region”, “Collagen type I trimer”, “Proteinaceous extracellular matrix”, and “Extracellular matrix”, which were closely related to the extracellular microenvironment. In Molecular Function (MF) annotation, the “Monooxygenase activity”, “Iron ion binding”, “Serine-type peptidase activity”, “Extracellular matrix structural constituent”, “Cytokine activity”, “Platelet-derived growth factor binding”, and “Cyclin-dependent protein serine/threonine kinase regulator activity” were clustered, which were related to the communication between cells and extracellular matrix. The aforementioned results suggested that the DEGs played an oncogenic role by promoting cancer cell proliferation. Additionally, DEGs did not cluster into the KEGG pathway according to the cut-off criteria.

Among the PPI construction, 41 genes/nodes with 39 edges were enriched. Additionally, the top 10 hub genes (FLG, TMPRSS2, COL1A1, COL1A2, PSCA, SCEL, PPL, ACPP, CNFN and A2ML1) were selected using cytoHubba software. Of the 10 hub genes, nine genes (FLG, COL1A1, COL1A2, PSCA, SCEL, PPL, ACPP, CNFN and A2ML1) were reconfirmed using GEPIA2, and the expression trends showed no change. COL1A1 and COL1A2 demonstrated a more significant increase in ESCC tissues than in normal tissues. Moreover, the disease-free survival rate under a higher COL1A1 or COL1A2 expression in patients with ESCC showed poor prognosis compared with that under the low COL1A1 or COL1A2 expression, which implied that COL1A1 and COL1A2 has a significant relevance to ESCC progression.

Type I collagen, which typically consists of collagen type I α 1 (COL1A1) and collagen type I α 2 (COL1A2) , is one of the components of the extracellular matrix (Yamauchi et al. 2018) and is considered to be associated with tumor invasion and progression. COL1A1 and COL1A2 exhibited an abnormal expression in some types of cancer (Zhang et al. 2018). In medulloblastoma and colorectal cancer, the COL1A1 and COL1A2 mRNA expression levels increased, whereas in melanoma and bladder cancer, the COL1A2 expression level decreased (Liu et al. 2017). A study demonstrated a correlation of high COL1A1 and COL10A1 expressions with poor prognosis in ESCC patients (Zhang et al. 2018). The disease-free survival analysis in this study indicated that ESCC patients with a high COL1A1 or COL1A2 expression showed a higher risk of disease recurrence or progression than those with a lower COL1A1 or COL1A2 expression. Meanwhile, COL1A1 and COL1A2 were enriched in “Platelet activation”, “ECM-receptor interaction” and “PI3K-Akt signaling pathway”, which play a crucial role in tumorigenesis. Therefore, COL1A1 and COL1A2 may induce the invasion and metastasis of ESCC.

Despite the vital role of COL1A1 and COL1A2 in the tumorigenesis, only a few drugs have been designed to target these two genes. In this study, we screened the old drugs to identify new anticancer drugs by using the bioinformatic approach. The ‘old drugs’ were defined as the FDA approved or those with safe clinical application. In this study, COL1A1 and COL1A2 matched to 22 predicted drugs. Among these 22 drugs, the top 5 drugs approved by FDA, namely, “Collagenase clostridium histolyticum”, “Halofuginone”, “Vonicog Alfa”, “Von Willebrand Factor Human”, and “Clove oil” were selected. “Collagenase clostridium histolyticum”, an enzyme, is produced by the bacterium Clostridium histolyticum (Brown 2017). This enzyme degrades the collagen plaques and thus is beneficial for the treatment of Dupuytren’s contracture (Grazina et al. 2019) and Peyronie’s disease (Randhawa and Shukla 2019). The topical formulation is used for the debridement of necrotic tissues due to burns or chronic ulcers. “Halofuginone” can suppress the expression of collagen a1(I) and matrix metalloproteinase 2 (MMP-2). Halofuginone can also inhibit cell proliferation and extracellular matrix deposition (Sundrud et al. 2009). “Vonicog alfa” is used for the control of bleeding episodes in patients with von Willebrand disease. It contains only the “Von Willebrand Factor Human” and thus, it offers the flexibility to administer the coagulation factor VIII if needed. The von Willebrand disease is an inherited disorder characterized by the deficiency or misfunction of the von Willebrand factor (vWF) (Singal and Kouides 2016). Due to this deficiency, the blood cannot effectively clot, and the patients having this deficiency are prone to prolonged or excessive bleeding. “Clove oil” possesses anticancer, anti-inflammatory and antioxidant properties (Cortés-Rojas et al. 2014).

Most of these drugs are used for controlling bleeding, suppressing cell proliferation and extracellular matrix deposition, and promoting blood coagulation. Meanwhile, vascular formation supplies nutrients to the tumor, which promotes development, metastasis and recurrence of the tumor (Wang et al. 2016). Probably, these drugs could show anticancer effects by promoting blood coagulation or inhibiting vascular formation in cancers, which offer a novel idea for the treatment of ESCC.

In summary, COL1A1 and COL1A2, which contribute to the tumorigenesis and prognosis of ESCC, might be a potential biomarker for ESCC diagnosis and prognosis. Additionally, the potential drugs associated with these hub genes were selected and constructed, which offer a novel idea for the treatment of ESCC. However, our interpretations are limited by considerable risk of bias and the drugs should be verified by relevant experimental models in future.

Acknowledgments

We thank all the public databases and websites used in this paper; GEO database, TCGA database, KEGG database, the Enrichr website, the Retrieval of Interacting Genes website, and the QuartataWeb.

Conflict of Interest

The authors declare no conflict of interest.

References
 
© 2022 Tohoku University Medical Press

This article is licensed under a Creative Commons [Attribution-NonCommercial-NoDerivatives 4.0 International] license.
This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC-BY-NC-ND 4.0). Anyone may download, reuse, copy, reprint, or distribute the article without modifications or adaptations for non-profit purposes if they cite the original authors and source properly.
https://creativecommons.org/licenses/by-nc-nd/4.0/
feedback
Top