Bioinformatic analysis of differentially expressed genes as prognostic markers in pheochromocytoma and paraganglioma tumors

The pathogenesis of pheochromocytoma and paraganglioma (PCPG) catechol-amine-producing tumors is exceedingly complicated. Here, we sought to identify important genes affecting the prognosis and survival rate of patients suffering from PCPG. We analyzed 95 samples obtained from two microarray data series, GSE19422 and GSE60459, from the Gene Expression Omnibus (GEO) reposi-tory. First, differentially expressed genes (DEGs) were identiﬁed by comparing 87 PCPG tumor samples and eight normal adrenal tissue samples using R language. The GEO2R tool and Venn diagram software were applied to the Database for Annotation, Visualization and Integrated Discovery (DAVID) to analyze Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and Gene Ontology (GO). We further employed Cytoscape with the Molecular Complex Detection (MCODE) tool to make protein–protein interactions visible for the Search Tool for Retrieval of Interacting Genes (STRING). These procedures resulted in 30 candidate DEGs, which were subjected to Kaplan–Meier analysis and validated by Gene Expression Proﬁling Interactive Analysis (GEPIA) to determine their inﬂuence on overall survival rate. Finally, we identiﬁed ALDH3A2 and AKR1B1, two genes in the glycerolipid metabolism pathway, as being particularly enriched in PCPG tumors and correlated with T and B tumor-inﬁltrating immune cells. Our results suggest that these two DEGs are closely associated with the prognosis of malignant PCPG tumors.


Pheochromocytoma
and paraganglioma (PCPG) tumors originate from sympathetic nerve-derived adrenal medulla cells, as well as extra-adrenal thoracic and abdominal accessory ganglia, and show high vascularization and catecholamine secretion (Manger, 2006;Dahia, 2014). Patients with PCPG display symptoms including severe hypertension and sweating, which can lead to serious complications such as heart disease or cerebral hemorrhage. PCPG is a rare neuroendocrine tumor and about 10-20% of PCPG is metastatic, which results in an approximate 50% reduction in survival rate (Ghosal et al., 2020). The survival rate of patients with malignant PCPG remains low due to poor prognosis and lack of adequate treatment options. Currently, diagnosis of PCPG is difficult and malignant tumors are only detected when a metastasis is discovered. Since PCPG primary tumors may cause serious metabolic disorder that contributes to decreased survival rate (Martucci and Pacak, 2014), further investigations to identify novel prognostic molecular biomarkers to aid diagnosis, develop adequate therapeutics and understand the mechanism of PCPG pathogenesis are therefore of critical importance.
Previous studies have shown that some long noncoding RNAs (lncRNAs) are overexpressed in pan-cancers including PCPG (Posa et al., 2016;Wang et al., 2020) and are thus potential biomarkers. However, factors such as low abundance of circulating lncRNAs in body fluids, cost of analysis and low throughput mitigate against using lncRNAs as useful biomarkers for PCPG. To more efficiently explore the pathogenesis, diagnosis and therapy of cancers, newer techniques including gene chips have been widely utilized to detect differentially expressed genes (DEGs) between malignant tumors or sarcomas and normal tissues (Severgnini et al., 2006). These techniques have enriched the discovery of novel cancer biomarkers, which can be stored in databases. Consequently, these data will be applicable in the discovery of reliable clues for deeper research on cancers in the future. Bioinformatic tools have revealed new biomarkers of PCPG (Burnichon et al., 2011;Crona et al., 2013Crona et al., , 2015, proving that the extensive use of bioinformatic techniques can produce valuable information on the pathogenesis of cancers. In this paper, we screened two PCPG data series, GSE19422 and GSE60459, from the Gene Expression Omnibus (GEO), a public gene microarray repository. We utilized the online GEO2R and R language to analyze differentially regulated genes in each series. Next, we employed Venn diagram software to detect common DEGs in both series. We also applied the Database for Annotation, Visualization and Integrated Discovery (DAVID) to check the molecular function (MF), cellular component (CC), biological process (BP) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of DEGs. Finally, we confirmed key genes and their edges from the DEGs using a protein-protein interaction (PPI) network online tool as well as the Cytoscape MCODE (Molecular Complex Detection) tool. To test whether these key genes are applicable to the prognosis of PCPG, we re-analyzed them using the Kaplan-Meier Plotter online database to show the relationship between their expression and patient survival rate. The key genes highlighted by Kaplan-Meier analysis were imported into Gene Expression Profiling Interactive Analysis (GEPIA) to compare their expression between PCPG tumors and normal adrenal tissue These manipulations resulted in eight DEGs, whose potential relationship with PCPG was analyzed using KEGG pathway enrichment. This yielded two DEGs, ALDH3A2 and AKR1B1, that are highly enriched in glycerolipid metabolism and that correlate closely with T and B lymphocytes of tumor-infiltrating immune cells (TIICs). We thus applied bioinformatic methods to identify two sensitive and effective biomarkers in PCPG patients that may provide useful clues regarding the mechanism of pathogenesis, diagnosis and treatment of PCPG tumors.

MATERIALS AND METHODS
Gene expression microarray We downloaded the gene expression profiles in GSE19422 and GSE60459 for PCPG tumors and normal adrenal tissue from the GEO (www.ncbi.nlm.nih.gov/geo). We analyzed 95 samples in total, comprising 84 PCPG tumor samples and five normal adrenal tissue samples from GSE19422, based on the GPL6480 platform, and three PCPG tumor samples and three normal adrenal tissue samples from GSE60459, based on GPL13607. Thus, eight adrenal tissues were used as control and 87 tumor samples were used individually to identify DEG levels.
Differential expression analysis We applied R language and GEO2R online tools to analyze DEGs between PCPG tumor specimens and normal adrenal specimens, with the P-value cut-off set to < 0.05 and |logFC| set to > 2. The DEGs from these two data sets were then analyzed by Venn software to identify common genes DEGs with logFC > 2 were regarded as up-regulated, while those with logFC < -2 were considered as downregulated genes (Feng et al., 2019).
Gene Ontology and pathway enrichment analyses of DEGs Gene Ontology (GO) analysis was performed to identify the functional roles of DEGs and proteins in (The Gene Ontology Consortium, 2000). KEGG, which is widely used in analyzing functional proteins and genes, was applied to define the nodes of molecular networks (Kanehisa et al., 2017).
DAVID is used to integrate DNA, RNA and protein functions (Zou et al., 2019). DAVID was used to describe the enrichment of MF, BP and CC.
PPI network and module analysis PPI networks provide information about how proteins interact with other proteins to carry out their cellular functions. We used the STRING database to make a comprehensive annotation and analysis of the functional relationship of genes (Szklarczyk et al., 2017). Cytoscape was applied to visualize the PPI network. In addition, the MCODE tool was applied to screen every module of the interaction (degree cutoff = 2, max. depth = 100, k-core = 2, and node score cutoff = 0.2) (Feng et al., 2019).
Kaplan-Meier survival analysis The prognostic impacts of the genes were checked with the Kaplan-Meier Plotter (https://kmplot.com/analysis), a public web tool for confirming the influence of a large group of genes on cancer survival whose sources include the GEO, EGA and TCGA databases. We also applied GEPIA, a tool for delivering overall functional results (Tang et al., 2017). Expressed RNA sequence data for 9,736 tumors and 8,587 normal samples, including 182 PCPG tumor patients and three normal people, from the TCGA and GTEx projects are served by GEPIA.

Tumor-infiltrating immune cells analysis
The prognostic results for the selected genes were checked using the Tumor Immune Estimation Resource (TIMER) database tool, which contains abundant information on TIICs and gene expression levels for 23 types of cancers from the TCGA database. We analyzed the corre- Fig. 1. DEGs in two PCPG data series. (A) Differentially expressed genes (DEGs) of data series GSE19422 and GSE60459. P-values (-log10) are plotted against fold change (FC; log2). Up-regulated (log2FC > 2, P < 0.05) and down-regulated (log2FC < -2, P < 0.05) genes are denoted by red and blue points, respectively, and unchanged genes by gray points. (B) Venn diagrams of the DEGs in GSE19422 and GSE60459. Common sets of 30 up-regulated genes (left; logFC > 0) and 121 down-regulated genes (right; logFC < 0) were identified at the intersections of the two data series. lation of eight screened significant survival-related key genes' expression with PCPG TIICs (B cells, CD4 + T cells, CD8 + T cells, neutrophils, macrophages and dendritic cells). Correlation of tumor purity level with key gene expression was also considered. We also analyzed the correlation of different somatic copy number alterations (SCNAs) of the eight key genes with tumor immune lymphocytes in PCPG tumors using the SCNA module to assess whether the mutation of these genes will influence the PCPG tumor microenvironment. Genomic Identification of Significant Targets in Cancer (GISTIC 2.0) was utilized to define SCNAs that contained deep deletion (-2), arm-level deletion (-1), diploid/normal (0), armlevel gain (1) and high amplification (2). The infiltration levels of tumor immune lymphocytes were compared between PCPG tumors and normal tissue using the twosided Wilcoxon rank-sum test (Kärjä et al., 2005).

Identification of common DEGs in PCPG tumors
In total, we identified 635 DEGs from GSE19422 and 1,402 from GSE60459 using the GEO2R online tool, and their expression was obtained by the R software package with |logFC| > 2 and P < 0.05 as cut-off points. We found 343 up-regulated genes and 292 down-regulated genes in GSE19422, and 474 up-regulated genes and 928 downregulated genes in GSE60459 (Fig. 1A). Common DEGs in these two datasets were identified by Venn diagram software. As demonstrated in Fig. 1B, the Venn diagram revealed 151 common DEGs. In all, there were 30 upregulated genes (logFC > 2) and 121 down-regulated genes (logFC < -2) in PCPG tumor tissues (Table 1).

GO and KEGG pathway enrichment analysis of DEGs in PCPG tumors
We conducted functional annotation of the 151 up-and down-regulated DEGs using the DAVID tool. In the BP category, up-regulated DEGs were enriched remarkably in the regulation of cell communication, while down-regulated DEGs were enriched in the cellular response to zinc ion, steroid metabolic process, C21-steroid hormone biosynthetic process, cholesterol metabolic and oxidation-reduction process, cellular response to cadmium ion, negative regulation of growth, and sterol metabolic process, among others (Table  2). CC category genes revealed a detectable enrichment of up-regulated DEGs in the regulation of synaptic vesicle membrane, cell junction, and integral component of membrane, while down-regulated DEGs were enriched in endoplasmic reticulum membrane, organelle membrane, integral component of plasma membrane, plasma membrane and perinuclear region of cytoplasm.
KEGG pathway analysis indicated that DEGs in PCPG tumors were enriched in, for example, metabolic pathways, chemical carcinogenesis, steroid hormone biosynthesis, metabolism of xenobiotics by cytochrome P450, mineral absorption, drug metabolism -cytochrome P450 and glutathione metabolism ( Fig. 2 and Supplementary  Table S1).
Human PPI network analysis We subjected the 151 DEGs to PPI multiple protein identifier analysis, which demonstrated 93 nodes and 139 edges (Fig. 3A). Fiftyeight DEGs were not incorporated into the PPI network analysis. Furthermore, the Cytoscape MCODE tool was applied for deeper analysis, which yielded 30 key genes among the 93 nodes (Fig. 3B).   Kaplan-Meier Plotter and GEPIA analysis of key genes We used the Kaplan-Meier Plotter website to check the association with patient survival rate of these 30 key DEGs in PCPG tumors, and found that eight genes have a significant influence on survival. In contrast, the other 22 genes had no significant impact on survival (P > 0.05) ( Table 3 and Fig. 4). Next, we applied GEPIA to compare the expression levels of these eight genes between 182 PCPG tumor patients and 3 normal people. Interestingly, we found that four of the eight were highly expressed in PCPG tumors. In comparison, the other four genes displayed low expression in PCPG tumors relative to normal adrenal tissue samples (Table 4 and Fig. 5).
KEGG pathway enrichment analysis To identify potential pathways involving these eight genes, we utilized KEGG pathway enrichment to analyze them with the DAVID tool (Table 5 and Fig. 6). We found that two key genes, ALDH3A2 and AKR1B1, were enriched in the glycerolipid metabolism pathway, but there was no significant KEGG pathway association for the other six genes. Since AKR1B1 and ALDH3A2 may influence the survival rate of PCPG cancer patients via glycerolipid metabolism, they therefore have greater potential than the other six genes to be biomarkers for PCPG tumors.
Correlation of key genes' expression with PCPG tumor-infiltrating immune cells analysis TIICs are also an independent predictor of survival rate in many cancers. Therefore, we investigated whether the key genes revealed by our KEGG analysis correlate with PCPG-infiltrating lymphocytes using the TIMER tool and TCGA database. PCPG tumor purity was significantly positively correlated with KCNH2 expression but negatively correlated with TBXA2R expression. For the infiltrating lymphocytes, B cell infiltration level was predominantly positively correlated with expression of the ALDH3A2 and AKR1B1 genes, which were also enriched in the glycerolipid metabolism pathway. CD4 + T cell and dendritic cell infiltration were significantly positively correlated with TBXA2R expression. For the infiltration  4. Prognostic information for the 30 core genes. The Kaplan-Meier Plotter online tool was used to acquire prognostic information for the 30 core genes; the eight genes displayed were associated with significantly different survival rates (P < 0.05).
by macrophages, AKR1B1 and KCNH2 genes showed significant positive correlation. Intriguingly, all eight of the key DEGs showed weak or no significant correlations with neutrophil and CD8 + T cells in PCPG tumors (Fig. 7). For the correlation between SCNA and abundance of immune infiltrates in PCPG tumors, the infiltration by CD8 + T cells was increased by arm-level deletion of P2RY2 and decreased by arm-level deletion of ALDH3A2, arm-level gain of GNG4 and high amplification of TBXA2R. Further, the infiltration by CD4 + T cells was increased by arm-level deletion of PCSK2 and P2RY2. Infiltration by macrophages was increased by arm-level deletion of P2RY2 and SCG3, and decreased by high amplification of GNG4 (Fig. 8). Genes with high expression in PCPG (P < 0.05) PCSK2 SCG3 TBXA2R GNG4 Genes with low expression in PCPG (P < 0.05) ALDH3A2 AKR1B1 KCNH2 P2RY2 Fig. 5. Eight key genes are significantly differentially expressed in PCPG patients. To further confirm the level of expression of these genes in PCPG patients compared to normal people, the eight genes associated with different survival rates were analyzed using the GEPIA website. All eight genes had a significantly different expression level in PCPG cancer samples compared to normal specimens (P < 0.05). PCPG tumor tissues are shown in red and normal adrenal tissues in gray.

DISCUSSION
Aldehyde dehydrogenase 3 family member A2 (ALDH3A2), also known as fatty aldehyde dehydrogenase, has been shown to play an important role in lipid metabolism, including oxidation of fatty aldehyde, and lipid detoxification (Rizzo and Craft, 1991;González-Fernández et al., 2016). The ALDH3A2 gene consists of 10 exons that span approximately 30.5 kb and are located 50-85 kb from ALDH3 (Rogers et al., 1997). There are 19 different isozymes in the human ALDH family with similar functions (Marcato et al., 2011). They are biomarkers in cancer stem cells that show not only tumor features of aggressiveness and potential prognosis (Ma and Allan, 2011;Marcato et al., 2011), but also vital effects on the self-protection, differentiation and expansion of these cancer stem cells (Ma and Allan, 2011). Mutations in the ALDH3A2 gene in humans result in lipid metabo-lism errors leading to Sjögren-Larsson syndrome (Rizzo and Craft, 1991;Carney et al., 2004;Rizzo and Carney, 2005;Didona et al., 2007). There have been reports implicating ALDH3A2 especially in ovarian and prostate cancers. These reports indicate that ALDH3A2 is highly expressed in ovarian tumors and primary prostate cancers relative to normal tissues (van den Hoogen et al., 2010;Saw et al., 2012). On the other hand, ALDH3A2 was reported to be down-regulated in breast cancer, lung adenocarcinoma, lung squamous cell carcinoma, esophageal squamous cell carcinoma and HNSC, compared to normal tissues (Chang et al., 2018). These observations indicate that ALDH3A2 is an important gene in the progression of cancers.
Human aldose reductase (AKR1B1) plays an important role in glucose metabolism and lipid-aldehydemediated cell regulation (Barski et al., 2008). AKR1B1 also influences cell differentiation and growth by protect-    8. Comparison of tumor infiltration levels among PCPG tumors with different SCNAs for the eight key genes. Box plots represent the distributions of each copy number level of the indicated cells in PCPG tumors. Blue plots represent arm-level deletion (-1), gray plots represent diploid/normal (0), orange plots represent arm-level gain (1) and red plots represent high amplification (2). Armlevel deletion of PCSK2 and SCG3 decreased CD4 + T cell and macrophage infiltration, respectively, in tumors compared to normal tissue. Arm-level deletion of ALDH3A2 decreased both CD8 + T cell and CD4 + T cell infiltration in PCPG tumors compared to normal tissue. The SCNAs of AKR1B1 and KCNH2 played no role in tumor immune cell infiltration.
ing against lipid peroxidation and steroidogenesis (Aida et al., 2000;Lefrançois-Martinez et al., 2004;Baba et al., 2009). AKR1B1 has been proposed to play varying roles in different types of cancers and is mostly related to abundant cell processes in tumors (Saraswat et al., 2006). Tammali et al. (2009) reported that AKR1B1 plays a role in colon cancer progression. Further, AKR1B1 was shown to be highly expressed in basal-like breast cancer, which is one of the breast cancers with the poorest prognosis (Wu et al., 2017), as well as in the development and progression of liver and lung cancers (Nakarai et al., 2015). Conversely, in prostate or endometrial cancers, AKR1B1 expression was significantly down-regulated relative to normal tissues (Laffin and Petrash, 2012;Hevir et al., 2013). Also, outer cortical fiber cells that expressed a high level of AKR1B1 appeared to show a dramatic change called the epithelial-to-mesenchymal transition (Yadav et al., 2011;Zablocki et al., 2011). AKR1B1 has also been shown to inhibit the migration of inflammatory cells (Yadav et al., 2007) and, hence, could increase the survival of tumor cells. AKR1B1 can also aid in the survival of tumor tissue because of its involvement in the regulation of epidermal growth factor, fibroblast growth factor, hypoxia-inducible factor-1α and vascular endothelial growth factor, which leads to angiogenesis (Tammali et al., 2011a(Tammali et al., , 2011b. In addition, the tumor immune microenvironment is a factor that influences prognosis in patients with malignant tumors (Binnewies et al., 2018). Therefore, it has great potential to be a prognostic and effective biomarker for cancers (Taube et al., 2018). We report here that ALDH3A2 and AKR1B1 were associated with T cell and B cell infiltration in PCPG tumors. Although the infiltration by T and B lymphocytes in cancers may improve patient survival (Ruffini et al., 2009), studies in prostate cancer have suggested that the lack of TIICs predicted a high risk of progression and recurrence of tumors (Vesalainen et al., 1994). In contrast, intracapsular carcinoma correlated with weak expression of TIICs, while perineural invasion and capsular invasion correlated with strong expression of TIICs in prostate cancer (Vesalainen et al., 1994;Kärjä et al., 2005). The regulation of DEGs in TIICs is also an independent predictor of the prognosis of cancers in patients.
However, whether ALDH3A2 and AKR1B1 can serve as a prognostic marker for PCPG has not been investigated, and the correlation of both genes with tumor immune microenvironment in PCPG tumors has not been previously reported. Here, using bioinformatics analysis, we revealed ALDH3A2 and AKR1B1 as potential biomarkers, being down-regulated in PCPG tumors relative to normal adrenal tissues. The low expression of ALDH3A2 caused better prognosis, but the low expression of AKR1B1 resulted in poorer prognosis. Results from the Kaplan-Meier Plotter online tool showed that low expression of both genes could play key roles in the progression of PCPG and infiltration of immune cells into the tumor microenvironment. However, this needs further verification through a series of experiments. Nonetheless, data reported in this paper should provide useful information regarding potential biomarkers as well as insights into the biological mechanisms of malignant PCPG. As a limitation, some of the DEGs we screened may be from immune cells, but this should have little effect on the results because, in PCPG patients, not only the tumor cells but also the "complexity" of the PCPG tumor influences patient survival rates. We are now undertaking experiments to confirm our gene expression data in tumor cells and TIICs, to explore the mechanism by which AKR1B1 and ALDH3A2 influence the survival rate of PCPG patients.

CONCLUSIONS
Previous studies have confirmed that AKR1B1 and ALDH3A2 play important roles in the growth and development of diverse cancer types, but little research has been conducted on the relationship between AKR1B1, ALDH3A2 and PCPG tumors. Therefore, this paper reveals potentially useful biomarkers of PCPG tumors as well as new research directions for PCPG.
We thank Zhenyu Dang (Fudan University), Tianchen Zhou (University of Iowa) and Jinrui Wen (Imperial College London) for their support and encouragement. This research was funded by the National Natural Science Foundation of China, grant number 81871577.