Biological and Pharmaceutical Bulletin
Online ISSN : 1347-5215
Print ISSN : 0918-6158
ISSN-L : 0918-6158
Regular Articles
Discovery and Verification of Key Liver Cancer Genes and Alternative Splicing Events Based on Second-Generation Sequencing Data Analysis
Mengqi CuiMiao BaiLihua ZhengYongli BaoLuguo SunChunlei YuYing SunZhenbo SongGuannan WangZhenxiang YuYuxin LiYanxin Huang
著者情報
ジャーナル フリー HTML
電子付録

2021 年 44 巻 10 号 p. 1433-1444

詳細
Abstract

Hepatocellular carcinoma (HCC) is the most common malignant liver disease in the world. Existing screening and early diagnosis methods are not highly sensitive for HCC, and patients are likely to develop the disease to the middle and advanced stages before being diagnosed. Therefore, finding new and efficient diagnosis and treatment methods has become an urgent problem. We aimed at finding and verifying new liver cancer markers by combining informatics analysis with experimental exploration to provide new ideas and methods for the diagnosis and treatment of clinical liver cancer. We used two different bioinformatic pipelines to analyze sequencing data of clinical liver cancer samples and identify differentially expressed genes and key variants after combining them with The Cancer Genome Atlas sequencing data. Then, we explored the functions and mechanisms of the key variants to identify potential liver cancer markers. Through bioinformatic analysis of sequencing data, 139 differentially expressed genes were found, including 53 upregulated genes and 86 downregulated genes. Through enrichment and alternative splicing event analysis of sequencing data, we found nine key variants with exon skipping events. Metallothionein 1E (MT1E)-203 was found to be a key variant that influenced cell proliferation through the p53 cell cycle pathway through cell viability and proliferation assays, and MT1E-203 lost the ability to bind two zinc ions due to exon skipping according to the structure prediction of MT1E-203. MT1E-203 is a potential biomarker for HCC and may play an important role in the diagnosis and treatment of HCC.

INTRODUCTION

Hepatocellular carcinoma (HCC) is the most common type of liver cancer and the third most frequent cause of cancer worldwide.1) The patient survival rate of HCC is extremely poor, which is mainly due to the asymptomatic progression of HCC at the early stage and its high metastasis potential at the late stage.2) Conventional chemotherapies have no significant impact on overall survival, and only a small fraction of HCC patients at the early stage are eligible for curative surgical resection.3) Therefore, the discovery of accurate and efficient biomarkers is of great significance for the early diagnosis and treatment of HCC. In recent years, to solve the problems faced by HCC screening, some biomarkers have been discovered, including Alpha-fetoprotein (AFP), Des-gamma-carboxy prothrombin (DCP), Glypican-3 (GPC-3), Insulin-like growth factor-2 (IGF-2), etc. Conventional biomarkers have the disadvantages of low sensitivity and low specificity, which restricts the detection rate of HCC. The discovery of new biomarkers is expected to change the status of early diagnosis and treatment of liver cancer.

In addition, carcinogenesis of hepatocytes is accompanied by mutations in genes and changes in posttranscriptional processing. Alternative splicing (AS), as a posttranscriptional modification mechanism, is widely found in eukaryotes. More than 95% of human multiexonic protein-encoding genes have AS events. Five basic modes of AS are exon skipping, mutually exclusive exons, alternative 5′ donor sites, alternative 3′ acceptor sites, and intron retention. AS is regarded as a major mechanism for increasing the functional complexity and diversity of proteins in the human genome.4) AS of mRNA is closely related to the occurrence and development of many diseases. However, studies of the relationship between AS and HCC are rare.

In addition, the main method used to study AS in the past was RT-PCR technology. However, this technology has low throughput and includes a large amount of noise, which is limits that allow RT-PCR to only be able to characterize known splicing events.5) With the continuous innovation of technology and instruments, second-generation sequencing technology has become a powerful tool for cancer and AS research due to its high throughput and high accuracy. RNA-sequence (seq), which is based on Illumina’s high-output and short-read sequencing platform, enables the detection of the overall transcriptional activity of any species at the single nucleotide level and can accurately recognize AS sites and provide comprehensive transcriptome information.

High-throughput RNA-seq produces a large quantity of reads data that requires proper analysis to enable meaningful integration. A number of analytical tools have been developed to assemble and quantify differential data.6) Currently, a common protocol used for RNA-seq data analysis includes TopHat7) -Cufflinks8) -Cuffdiff, which are free open source software programs developed for the assembly and integration of RNA-seq data, including the identification of new genes and splicing variants.8) HISAT9) -StringTie10) -Ballgown11) is an emerging pipeline that is faster and more memory efficient than TopHat-Cufflinks-Cuffdiff. Furthermore, a number of computational methods for identifying AS events have been developed, such as DEXSeq, rMATs, and MISO.12) rMATs is a new type of AS event analysis method that can handle different types of data with high accuracy.13)

In this study, the calculating method we used combined two different bioinformatic analysis pipelines having the advantages of the each methods. After analyzing the RNA-seq data from the liver cancer tissues and paracancerous tissues, we found differentially expressed genes and transcripts, and combined the analysis results of alternative splicing to find the key variant Metallothionein 1E (MT1E)-203. Afterwards, we verified the results and explore its mechanism of action. These findings will contribute to the discovery of new HCC detection markers and therapeutic targets.

MATERIALS AND METHODS

Data Collection and Identification of Differential Expressed Transcripts

The RNA-seq data of HCC samples used in this study was obtained from the previous research in our laboratory. The RNA-seq data was from 6 HCC patients in China–Japan Union Hospital of Jilin University, including HCC tissue and adjacent liver tissue, which is in line with the Helsinki Declaration. The other HCC specimen’s data were obtained from TCGA database (https://portal.gdc.cancer.gov/).

Survival Analysis of Differential Expressed Genes

The Kaplan–Meier Plotter online database (http://kmplot.com/) was used to perform the survival analysis of selected differentially expressed genes. Calculate the hazard ratio (HR) with a 90% confidence interval and log rank p-value and display it in the upper right corner of the graph.

Cell Culture

The human hepatocellular carcinoma cell line HepG2, Huh7 and the human liver cell line L02 stocked in this laboratory were used in this study, which were cultured in 1640 medium containing 10% fetal bovine serum at 37 °C under 5% CO2 containing humidified atmosphere.

RT-PCR and Real Time RT-PCR

The mRNA of HepG2 and L02 cells were isolated and reversely transcribed using Trizol method and served as templates for real time PCR using the SYBR Kit. Real time PCR used the Applied Biosystems 7300 system. The experiment was performed in 3 replicates independently. The primer pairs were used to sequentially amplify the chosen transcripts (Supplementary Table S2). Primers were synthesized by GENEWIZ company (https://www.genewiz.com.cn/).

Vector Construction and Over-Expression of Transcripts

In order to verify the calculation results of the sequencing data, we selected a high expression transcript MT1E-203 to construct over-expression and knockdown vector, and constructed MT1E-203 over-expressing HepG2 cell and MT1E-203 knock-down L02 cell by lipofection. We cloned the cDNA of MT1E-203 and small interfering RNA (siRNA) (Supplementary Table S2) into the pEGFP-N1 plasmid respectively, transfected HepG2 cells and L02 cells with Lipofectamine 2000 transfection reagent, used HepG2 cells and L02 cells transfected into empty vector as a negative control, and screened for stable clones with G418 resistance using G418 eukaryotic screening markers.

3-(4, 5-Dimethylthiazol-2-yl)-2,5-diphenyltetrazolium Bromide (MTT) Assay

Three groups of HepG2 cells which were not transfected, transferred into empty vectors, and transferred into the sequence of interest were inoculated into 96-well plates at a density of 6 × 103/well, and three replicate wells were set for each cell. After culturing for 48 h, we added 5 mg/mL MTT (20 µL/well) to the wells for 4 h, then added 200 µL of dimethyl sulfoxide (DMSO) to the treated cells and detected optical density (OD) at 490 nm using a microplate reader.

Cell Cycle and Apoptosis Analysis

We use Cell Cycle and Apoptosis Analysis Kit (Beyotime Biotechnology Inc., Shanghai, China). According to the instructions, we first collected untreated, transferred empty vector and transferred into the target gene three groups of cells, washed twice with cold phosphate buffered saline (PBS), fixed with cold 70% ethanol for 12 h. After washing twice with cold PBS, we used ribonuclease (RNase)A, Propidium iodide staining solution and staining buffer to be a staining solution. Then 500 µL of staining solution was added to each tube of cells and incubated at 37 °C for 30 min in the dark. Finally, the sample was determined by flow cytometry.

Bromodeoxyuridine-Enzyme-Linked Immunosorbent Assay (ELISA)

By measuring 5-bromo-2-deoxyUridine (BrdU) incorporation, the DNA synthesis in proliferating cells was determined. After transfected with pEGFP-N1 or pEGFP-N1-MT1E-203, HepG2 cells were seeded in 96-well culture plates at a density of 6 × 103cells/well, cultured for 4 h, then incubated with a final concentration of 10 μM BrdU labeling solution for 24 h according to Cell Proliferation ELISA, BrdU(colorimetric) Kit (Roche Inc., Switzerland). When the incubation period ended, the medium was removed, the cells were fixed for 30 min at FixDenat, washed three times with 1 × washing buffer, incubated with Anti-BrdU-POD working solution for 90 min, then washed three times with 1 × washing buffer again, incubated with substrate solution for 30 min in the dark, then add 25 µL/well 1M/L H2SO4, 1 min on the shaker at 300 rpm. Finally, measure the absorbance of the samples in the ELISA reader at 450 nm.

Colony Formation Assay

The transfected cells (MT1E-203 and Negative control (NC)) were maintained for 5 d at 37 °C. On day 6, the cells were washed using PBS, added 4% paraformaldehyde and fixed 30 min at room temperature. Finally, the colonies were stained with crystal violet and photographed.

Western Blot and Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Analysis

Whole cell lysates from cells or homogenized tissues were prepared. The primary antibody was incubated at 4 °C for 12 h, and the secondary antibody was incubated at room temperature for 2 h. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was used as controls. Protein action pathways refer to the data in KEGG database. KEGG (Kyoto Encyclopedia of Genes and Genomes) (https://www.kegg.jp/).

Antibodies

Antibodies used in the study are listed as follows: anti-Cyclin A (sc-751, 1 : 1000), anti-Cyclin D (sc-450, 1 : 1000), anti-Cyclin E (sc-198, 1 : 1000) were purchased from Santa Cruz Biotechnology (DE, U.S.A.); anti-CDK4 (66950-1-immunoglobulin (Ig), 1 : 1000), anti-CDK2 (60312-1-Ig, 1 : 1000), anti-P53 (60283-2-Ig, 1 : 1000), anti-P-P53 (28961-1-AP, 1 : 1000), anti-P21 (60214-1-Ig, 1 : 1000), and anti-GAPDH (60004-1-Ig, 1 : 1000) were purchased from Protein-tech Group (Wuhan, China).

Protein Structure Prediction

I-TASSER and SWISS-MODEL were used to predict the structure of the protein translated from MT1E-203.14,15) PYMOL software was used to synthesize protein structures and look for metal ion binding sites.

Data Availability

The datasets generated during and analysed during the current study are available in the Gene Expression Omnibus (GEO) repository. https://www.ncbi.nlm.nih.gov/gds GEO ID of sequencing raw data is GSE136846. The Cancer Genome Atlas (TCGA) liver cancer sequencing data was downloaded in https://portal.gdc.cancer.gov/ using SangerBox (http://sangerbox.com/).

Ethics Approval and Consent to Participate

The study was done after agreement from the local ethics committee and with the patients’ informed consent and the process was line with declaration of Helsinki. Our institution’s committee on human research gave approval for this study, and all participants gave informed consent. The full name of committee is Ethics Committee of China–Japan Union Hospital of Jilin University.

Statistical Analysis

The data were processed using one-way ANOVA with SPSS software (SPSS Inc., Chicago, IL, U.S.A.), and all data shown are presented as the mean ± standard error (S.E.) of at least three independent experiments. *** p < 0.001, ** p < 0.01, and * p < 0.05 were considered statistically significant.

RESULTS

Identification of Differentially Expressed Genes in HCC

FastQC was used to perform quality checks of fasta files (Supplementary Fig. S1). Based on the fastQC results, we removed low-quality reads and retained high-quality reads for further research. Then, we used the TopHat-Cufflinks-Cuffdiff (TCC) and Hisat-Stringtie-Ballgown (HSB) pipelines to analyze the sequencing data of 6 pairs of liver cancer tissues and paracancerous tissues from patients, used Perl (version 5.10.1) and python (version 2.7.11) to correct and normalize the data, and finally, used R (version 3.4.2) to screen and integrate the results, with the cutoff criteria of an adjusted p < 0.05 and log > 1. High-quality reads were mapped to the Homo sapiens genomes hg19 and GRCh37. After assembly and annotation, we found 547 differentially expressed transcripts through the TCC pipeline and 1396 differential transcripts through the HSB pipeline. A total of 157 differentially expressed transcripts were found through both pipelines (Supplementary Table S1). Then, we compared the results with RNA-seq data downloaded from the TCGA database and identified 139 differentially expressed genes, which contained 53 upregulated genes and 86 downregulated genes (Table 1 and Fig. 1A). Figure 1B shows a flow chart of the bioinformatics analysis in this study, and Fig. 1C shows the expression of 139 genes in the three different data sets, which were used for further screening.

Table 1. A Total of 139 Differential Expressed Genes (DEGs) Were Obtained by Bioinformatic Analysis of RNA-Seq Data
Gene NameGene IDFold change (TCC)Fold change (HSB)Fold change (TCGA)Control
SLC6A19ENSG00000174358−6.67122−10.51334−13.06191
GMNNENSG000001123122.2391810.085743.0271
HMMRENSG000000725713.754343.202263.39391
MT1EENSG00000169715−4.6642−4.88026−3.283821
KRT19ENSG00000171345−7.83913−0.08243−4.883461
PCP4ENSG000001830365.819026.2547311.701281
PRKDCENSG000002537291.660173.238433.035461
ECT2ENSG000001143463.709113.91733.839291
GPC3ENSG000001472576.6955848.201315.732071
NEK2ENSG000001176504.593894.921424.420221
IGF2BP3ENSG000001362314.175744.4132610.922551
TOP2AENSG000001317473.789327.873845.883991
LINC01296ENSG000002252103.740672.186934.507331
CDKN3ENSG000001005263.566197.413843.765031
KIFC1ENSG000002376493.551416.404784.628181
CDC6ENSG000000948043.508484.250986.850571
RRM2ENSG000001718483.464615.984635.281571
NPTX2ENSG000001062363.456866.678477.418231
E2F1ENSG000001014123.271542.540693.717321
SPC24ENSG000001618883.245493.330693.307921
NDC80ENSG000000809863.17314.256594.866351
CCNB1ENSG000001340573.148673.772133.758351
CAP2ENSG000001121863.093444.429273.037711
ILDR2ENSG000001431952.957273.091745.079591
CDCA5ENSG000001466702.740622.879474.884851
SPC25ENSG000001522532.722753.74793.917461
WNK4ENSG000001265622.686082.814437.445441
DSCC1ENSG000001369822.541952.352815.179761
TPX2ENSG000000883252.357673.507184.1461
MCM4ENSG000001047382.2639613.568393.963241
STMN1ENSG000001176322.242124.705953.275881
RAB11FIP4ENSG000001312422.224934.630954.709891
ATAD2ENSG000001568022.1995.325763.904131
LINC00890ENSG000002608022.190884.826533.95211
LRRC1ENSG000001372692.014422.2833.98151
TMCO3ENSG000001504031.925784.784022.391771
RNASEH2AENSG000001048891.816533.003512.333241
KPNA2ENSG000001824811.813364.902234.045551
NCAPG2ENSG000001469181.738583.135672.109371
MCM6ENSG000000760031.693962.122533.779621
MPP7ENSG000001500541.682422.00191.746411
RAP2AENSG000001252491.657742.060462.339291
FEN1ENSG000001684961.603732.976682.230661
DNAJB11ENSG000000905201.492833.118311.636191
CHAF1AENSG000001676701.481652.661282.624691
GMFBENSG000001970451.443242.035421.868561
XPO5ENSG000001245711.31822.398741.774661
MCM5ENSG000001002971.307872.694992.598171
CCT3ENSG000001634681.306361.864522.122151
C8orf33ENSG000001823071.26523.016542.375931
TOPBP1ENSG000001637811.246641.650372.228281
AP3B1ENSG000001328421.243141.362092.365651
ZNF318ENSG000001714671.229692.178951.37131
NRASENSG000002132811.19441.903772.193851
IMPAD1ENSG000001043311.18272.655031.260941
CASC3ENSG000001083491.161722.048182.137261
PLIN2ENSG00000147872−1.57993−9.02079−0.962211
SLC9B2ENSG00000164038−1.58984−3.84617−1.859141
APOC3ENSG00000110245−1.6752−2.03541−1.884961
C1RLENSG00000139178−1.72071−3.09084−1.135581
APOA5ENSG00000110243−1.87453−6.04383−0.687271
DUSP1ENSG00000120129−1.95823−1.38867−1.260481
ELFN1ENSG00000225968−1.96331−1.96993−3.07361
CPED1ENSG00000106034−2.1205−4.57252−3.035671
HAO2ENSG00000116882−2.15061−3.1158−6.907451
CD4ENSG00000010610−2.2155−3.21416−1.987481
SIK1ENSG00000142178−2.27454−1.80335−1.921511
JUNENSG00000177606−2.28904−2.72961−2.901381
GLYATL1ENSG00000166840−2.30506−3.70031−6.673881
RNF125ENSG00000101695−2.35564−4.72963−2.575571
ZFP36ENSG00000128016−2.38677−2.51381−1.177231
SLC25A25ENSG00000148339−2.39061−2.11789−1.767581
GHRENSG00000112964−2.39418−3.19779−2.121691
FNDC5ENSG00000160097−2.40277−1.06861−0.415241
TUBE1ENSG00000074935−2.44288−1.83835−2.344431
BASP1ENSG00000176788−2.4948−3.00603−1.480371
AXLENSG00000167601−2.54167−3.08151−2.856171
SIGLEC9ENSG00000129450−2.57102−4.86738−0.757311
SOD3ENSG00000109610−2.58894−3.18319−1.666141
ITGA9ENSG00000144668−2.58901−3.32054−0.06281
LTBP4ENSG00000090006−2.60075−1.22862−0.419921
SCD5ENSG00000145284−2.62197−2.56407−2.111321
SLCO1B3ENSG00000111700−2.6842−2.69802−5.587641
FMODENSG00000122176−2.77173−2.79249−1.152351
CCDC8ENSG00000169515−2.77829−3.85332−0.450751
ZG16ENSG00000174992−2.84472−3.94831−6.664021
CYR61ENSG00000142871−2.88197−1.55566−1.587641
ATOH8ENSG00000168874−2.93244−2.67086−0.462231
NAPSBENSG00000131401−2.93861−2.66964−0.638141
EPHA3ENSG00000044524−2.97377−3.25992−0.522221
LIFRENSG00000113594−2.98691−3.08581−2.164991
SPINT2ENSG00000167642−3.00095−1.64562−0.555951
CBFA2T3ENSG00000129993−3.00888−3.77525−1.866231
LRATENSG00000121207−3.05516−2.1515−0.277351
HK3ENSG00000160883−3.09838−4.88762−1.227051
CCDC3ENSG00000151468−3.13399−1.3851−0.365631
CLEC11AENSG00000105472−3.16525−2.90822−0.351321
LUMENSG00000139329−3.27485−7.87725−2.10991
BGNENSG00000182492−3.29078−1.26777−0.23821
PDGFRAENSG00000134853−3.33398−2.44174−1.622911
FLRT2ENSG00000185070−3.33691−3.07343−0.087571
DNASE1L3ENSG00000163687−3.34188−2.87391−0.614331
MUC6ENSG00000184956−3.36666−3.1778−4.554441
F3ENSG00000117525−3.55746−1.52165−0.241521
EMILIN1ENSG00000138080−3.57648−2.06645−0.099911
NPY1RENSG00000164128−3.5873−1.80612−3.003351
CFDENSG00000197766−3.64701−2.44716−0.412181
LINC01093ENSG00000249173−3.66018−2.4372−10.67321
MT1LENSG00000260549−3.71369−2.74345−0.50471
CETPENSG00000087237−3.71383−2.38768−2.835551
ID1ENSG00000125968−3.74935−2.1337−0.197211
BBOX1ENSG00000129151−3.93893−1.66785−6.667241
FOSBENSG00000125740−3.99534−1.09388−7.072721
OLFML3ENSG00000116774−4.00913−1.05727−1.120121
COLEC11ENSG00000118004−4.0223−1.19888−2.110791
LY6EENSG00000160932−4.10135−5.33369−3.084551
CYP1A2ENSG00000140505−4.20516−9.26267−15.893381
NAT2ENSG00000156006−4.23592−7.93172−1.641861
PTGDSENSG00000107317−4.24983−4.1844−0.110551
JCHAINENSG00000132465−4.27411−3.62121−0.814341
HGFACENSG00000109758−4.3641−1.2476−4.717111
FOSENSG00000170345−4.5662−3.03939−4.973041
CXCL12ENSG00000107562−4.67191−7.73448−3.316561
MT1FENSG00000198417−4.81179−1.43756−1.491581
EGR1ENSG00000120738−4.90293−5.01988−4.211031
ECM1ENSG00000143369−4.97399−1.05713−2.028281
CCL21ENSG00000137077−4.98326−5.45316−2.612531
KRT7ENSG00000135480−4.98511−8.73022−3.03661
DCNENSG00000011465−5.04744−6.68938−1.200061
C7ENSG00000112936−5.04798−4.11697−1.909331
NGFRENSG00000064300−5.08919−12.36062−1.633631
MT1GENSG00000125144−5.57737−2.62115−5.133941
STAB2ENSG00000136011−6.01684−2.33873−4.568771
DPTENSG00000143196−6.18095−2.84831−3.64961
SMIM24ENSG00000095932−6.25947−6.88606−5.055381
COLEC10ENSG00000184374−6.27785−6.38766−3.995171
CRHBPENSG00000145708−6.28137−3.74193−6.636381
IGF2ENSG00000167244−6.74572−2.80769−7.811021
MARCOENSG00000019169−8.38192−2.64823−6.317541
Fig. 1. Bioinformatics Analysis of Sequencing Data

(A) Venn diagrams (R Version 3.6.1) of 139 differentially expressed genes identified in RNA-seq data. TCC: TopHat-Cufflinks-Cuffdiff; HSB: HISAT-Stringtie-Ballgown. (B) Flow chart of the bioinformatics analysis of sequencing data. (D) Expression of 139 differentially expressed genes in RNA-seq data. The X axis shows the name of every five genes. (D) GO analysis of the DEGs in HCC. The DEGs were classified into three functional groups: extracellular structure organization, mitotic nuclear division and sister chromatid segregation. DEGs: differential expressed genes. HCC: Hepatocellular carcinoma. (E) Signaling pathway enrichment analysis of the DEGs. The DEGs were mainly enriched in three pathways: diseases associated with glycosaminoglycan metabolism, initial triggering of complement and the mitotic G1-G2/S phases. (F) DEG protein–protein interaction network complex analysis. Using the online database STRING, a total of 106 DEGs were filtered into protein–protein interaction network complexes. (Color figure can be accessed in the online version.)

Function and Pathway Enrichment of Differential Expressed Genes (DEGs)

Gene ontology (GO) enrichment analysis based on the R package “ggplot” was used to explore the function of the DEGs (Fig. 1D). The DEGs were mainly enriched in 3 functional groups: extracellular structure organization, mitotic nuclear division and sister chromatid segregation. The R package “ggplot” was also used to perform pathway enrichment analysis of the DEGs (Fig. 1E). The main enriched pathways of the DEGs included diseases associated with glycosaminoglycan metabolism, initial triggering of complement and the mitotic G1-G2/S phases. To explore the interaction level of crucial genes associated with HCC, the online STRING and Cytoscape software were used to derive protein protein interaction (PPI) information and plot the results (Fig. 1F and Supplementary Fig. S3). In the STRING results, a total of 133 nodes and 409 edges were obtained. We found that genes such as GMNN, ECT2, PRKDC, MT1E, HMMR, and KRT19 were located in the center of the network, and determination of their functions deserves further study.

Identification and Analysis of AS Events of Key Genes

Based on the gene expression and PPI results, we found 3 DEGs, namely, GMNN, PRKDC, and ECT2. The expression of these genes in the HepG2 cell line (human hepatocellular carcinoma cell line) and L02 cell line (human normal liver cell line) was measured by RT-PCR and quantitative (q)RT-PCR. We found that GMNN, PRKDC, and ECT2 were upregulated in the HepG2 cell line, which was consistent with the bioinformatics analysis results of the RNA-seq data (Fig. 2A). Furthermore, GPC3 and AFP are proven markers of liver cancer, and both were upregulated in our RNA-seq results. From this point of view, the analysis results obtained from the calculation methods in this study are credible and can be used in follow-up experiments.

Fig. 2. Verification of the Bioinformatics Analysis Results

(A) The expression levels of GMNN, PRKDC and ECT2 in L02 and HepG2 cells were examined by RT-PCR and real-time RT-PCR. Significance was considered at p < 0.001 by the t-test. Full-length gels are shown in Supplementary Fig. S5. (B) Pie chart showing the results of rMATs analysis, with SE events accounting for the highest proportion of total events. (C) Five types of alternative splicing events were found by rMATs software. (D) The expression levels of SLC6A19-201, HMMR-204, GHR-202, MT1E-203, PCP4-201, and KRT19-201 in L02 and HepG2 cells were examined by RT-PCR and real-time qPCR. Significance was considered p < 0.001 by the t-test. (E) Prognostic values of the nine key genes in hepatocellular carcinoma patients. HR, hazard ratio. (Color figure can be accessed in the online version.)

To identify alternative splicing events at the transcript level, we performed alternative splicing analysis on 12 sequencing files. A total of 64476 alternative splicing events were found using rMATs software. Five types of AS events were found in the RNA-seq samples. The most frequent AS type found in the annotated isoforms was skipped exons (SEs), which accounted for more than two-thirds of the variants, followed by mutually exclusive exons (MXE) and alternative 3′ and 5′ splice sites (A3S and A5S), which accounted for 24%. Retained introns (RIs) only contributed 6% of the AS events (Fig. 2B). Figure 2C shows 5 different AS events. Referring to the results of the rMAT analysis, we selected 6 key variants from the DEGs with skipping exon events for further study: SLC6A19-201, HMMR-204, GHR-202, MT1E-203, PCP4-201, KRT19-201 (Supplementary Fig. S2 and Supplementary Table S3). Among them, four variants, SLC6A19-201, GHR-202, MT1E-203 and KRT19-201, were downregulated in tumor tissues, and HMMR-204 and PCP4-201 were upregulated in tumor tissues. To verify this result, we examined the expression of these 6 variants in the HepG2 and L02 cell lines, and the experimental results were consistent with the calculated results (Fig. 2D). We found that the expression levels of MT1E-203 in the HepG2 and L02 cell lines were significantly different, and the number of MT1E gene variants was relatively small, so we chose this variant for subsequent experiments and exploration.

So far, 9 genes have been successfully verified. To explore the correlation among these genes and liver cancer, we analyzed the correlation between the expression of these 9 genes and the survival time of patients. Prognostic data of the key genes were obtained from the Kaplan–Meier Plotter online database (http://kmplot.com/analysis/). Abnormal expression of the key genes resulted in a poor prognosis of HCC (Fig. 2E). The three most influential genes were HMMR (HR = 2.36 [1.65–3.38], p = 1.2e-06), ECT2 (HR = 2.13 [1.48–3.05], p = 2.6e-05) and GMNN (HR = 1.32 [0.93–1.89], p = 0.12). According to the results, these 9 differentially expressed genes may be important in HCC. The specific mechanisms of these genes deserve further study

From the results of RT-PCR and qRT-PCR, we found that the analytical pipelines used in this study had high credibility and accuracy. In searching for possible therapeutic targets for liver cancer, we selected MT1E-203, which had the fewest isoforms of the six variants and played an important role in PPIs and pathways, for further study.

Function Exploration of MT1E-203

To investigate the role of MT1E-203 in HCC, we explored the function of MT1E-203. First, we constructed the overexpression vector pEGFP-N1-MT1E-203 for MT1E-203 and the empty vector pEGFP-N1 and transfected the overexpression vector and the empty vector into the HCC cell line HepG2 and Huh7, respectively. Supplementary Fig. S4 shows the vector pEGFP-N1-MT1E-203 and restriction sites. Then, two pairs stably transfected cell lines were constructed through G418 pressure screening, namely HepG2-MT1E-203 (recombinant vector) group and HepG2-NC (empty vector) group, Huh7-MT1E-203 group and Huh7-NC group. In addition, we constructed MT1E-203 knockdown cell lines L02-MT1E-203 and L02-NC using RNA interference (RNAi).

Next, we explored the relationship between the expression of MT1E-203 and proliferation of HCC cells and normal liver cells. According to the MTT and BrdU-ELISA results, the viability and proliferation ability of the overexpression group (HepG2-MT1E-203 and Huh7-MT1E-203) cells decreased significantly with time compared with those of the NC group (HepG2-NC and Huh7-NC) (Figs. 3A, B). The viability and proliferation ability of the knockdown group (L02-MT1E-203) cells increased with time compared with the NC group (L02-NC) (Fig. 3C). According to the results of the colony formation assay, the number of clones in the overexpression group was significantly lower than that in the NC group (Figs. 3D, E). On the contrary, knockdown of MT1E-203 in L02 cells significantly increased the number and size of clones, indicating that knockdown of MT1E-203 could promote cell proliferation (Fig. 3F). Overexpression of MT1E-203 increased cell apoptosis and caused cell cycle arrest in the S phase. Western blot analysis was performed to evaluate the change in expression of cell proliferation-related proteins caused by overexpression of MT1E-203 (Figs. 3G, H). The results showed that, compared with the NC group, the expression levels of p53 and p21 proteins were upregulated in the overexpression group, and the expression levels of cyclin A, cyclin E, cyclin D, CDK2, and CDK4 downregulated. In addition, we found that the overexpression of MT1E-203 significantly enhanced the phosphorylation of P53 protein, which improved the stability and activity of P53. It is inferred that MT1E-203 affects cell proliferation through p53 cell cycle-related pathways, which arrests the cell cycle in the G1 and S phases (Fig. 3K). In addition, flow cytometry was used to detect apoptosis markers and cell cycle markers in the NC group and the overexpression group (Fig. 3I). According to the results, the number of early and late apoptotic cells increased, the number of G2 phase cells decreased and the number of S phase cells increased in the overexpression group compared with the NC group (Fig. 3J).

Fig. 3. Exploring the Function of MT1E-203

(A) The effects of MT1E-203 on HepG2 cell apoptosis were assessed by the MTT assay, and the relative numbers of cells at 0 d are presented. The effects of MT1E-203 on HepG2 cell proliferation were assessed by the BrdU-ELISA assay. (B) MTT and BrdU-ELISA results in Huh7 cell line. (C) MTT and BrdU-ELISA results in L02 cell line. (D) Colony formation assays of MT1E-203 stably overexpressing HepG2 cells. (E) Colony formation assays of MT1E-203 stably overexpressing Huh7 cells. (F) Colony formation assays of MT1E-203 stably knockdown L02 cells. (G) Protein levels of Cyclin E, CDK2, Cyclin D, CDK4, P53, and P21 in the HepG2 cell stable overexpression group, NC group and HepG2 cells. Full-length images are provided in Supplementary Fig. S6. (H) Protein levels of Cyclin E, CDK2, Cyclin D, CDK4, P53, and P21 in the HepG2 cell stable overexpression group, NC group and Huh7 cells. (I) Apoptosis was measured by flow cytometry in HepG2 cells stably overexpressing MT1E-203. The histogram of cell apoptosis shows the proportion of cells in the four periods from Q1 to Q4. (J) The cell cycle was measured by flow cytometric analysis in HepG2 cells stably overexpressing MT1E-203, and the histogram of the cell cycle shows the proportion of cells in the G1, G2, and S at three stages. (K) MT1E-203 may affect the P53 cell signaling pathway. (Color figure can be accessed in the online version.)

Changes in the MT1E-203 Protein Structure Due to AS Events

To explore whether alternative splicing events can cause changes in protein structure, we downloaded the translated protein sequences of MT1E-201 and MT1E-203 that did not undergo alternative splicing in the Protein Data Bank (PDB) database, and their names were P04732 and H3BU80, respectively, and then, we used the protein structure prediction software I-TASSER and the SWISS-MODEL model (Fig. 4A). It was found that due to the occurrence of exon skipping events, the H3BU80 protein expressed by MT1E-203 lacked the key sequence from cysteine 13 to cysteine 34, thereby losing the ability to bind one sulfide ion and two zinc ions. In addition, we used Ligplus software to analyze the binding sites of zinc ions and sulfide ions and found that the loss of binding ability was caused by the deletion of the sequence near the binding site (Fig. 4B).

Fig. 4. Predicting the Protein Structures of P04732 and H3BU80

(A) Protein structures of P04732 and H3BU80. The yellow balls are metal sulfide ions, and the gray balls are metal zinc ions. (B) Binding site analysis of metal sulfide and zinc ions in the two proteins. There are 5 sulfide ion binding sites and 2 zinc ion binding sites in P04732, while there are only 4 sulfide ion binding sites in H3BU80. Two structure files are provided in the Supplementary information. (Color figure can be accessed in the online version.)

DISCUSSION

Although there is much basic and clinical research on the molecular origin and related pathways of liver cancer, research progress on gene expression caused by AS has been slow, mainly because the traditional method used to discover AS events is affected by gene expression, gene length and difficult primer design. These constraints lead to lower efficiency and accuracy of experiments. AS discovery methods based on RNA-seq data analysis take advantage of the high throughput and high accuracy of RNA-seq, greatly improving the efficiency of identifying AS events.

In this study, the combination of clinical liver cancer and paracancerous sample sequencing data and TCGA data resulted in the identification of 139 differentially expressed genes. Among these genes, GPC3 and AFP are frequently reported genes that are related to HCC. Studies have found that GPC3 is normally expressed in the placenta, mammary gland, mesothelium, lung, and kidney.16) In addition, a study has found that in normal liver cells, GPC3 is hardly expressed and is highly expressed in liver cancer cells, so GPC3 has become a liver cancer biomarker for the screening and diagnosis of liver cancer.17) AFP is the most commonly used biomarker for HCC diagnosis in clinical practice, and it is also used for recurrence and prognosis of HCC.18) Based on the results of the GO analysis, we identified 9 core genes. RT-PCR and qRT-PCR confirmed that GMNN, SLC6A19, HMMR, MT1E, GHR, KRT19, PCP4, PRKDC, and ECT2 were differentially expressed in liver cancer and normal liver cells. After combining all the variable splicing events obtained by rMAT analysis, we found 6 candidate transcripts, SLC6A19-201, HMMR-204, GHR-202, MT1E-203, PCP4-201, and KRT19-201, for subsequent experiments and selected the transcript MT1E-203 to construct an overexpression vector, which had the highest level of expression of the candidate transcripts. Overexpression of MT1E-203 was found to result in cell cycle arrest and cell apoptosis according to the MTT and flow cytometry results. Subsequently, WB experiments showed that some proteins related to apoptosis and proliferation changed in expression due to overexpression of MT1E-203. Metallothionein 1E (MT1E) is the name of a family of low-molecular weight intracellular metalloproteins with a high affinity for heavy metal ions, such as zinc, cadmium, copper, mercury, and platinum.19) We used protein structure prediction software to simulate the protein structure of the MT1E gene variant referring to the structure of the MT1E gene in the PDB (Protein Data Bank) database. MT1E lost its binding site due to SE events. The ability of to bind zinc ions may have an effect on cell metabolism and proliferation. Shen et al.20) analyzed gene expression in 606 tumors and 550 nontumor samples from GEO. From these samples, 173 DEGs were identified, and through GO analysis and PPI analysis, the final 10 core genes were identified. However, the SHEN study only carried out bioinformatics analysis and did not involve validation by biological experiments. Therefore, our research is undoubtedly more convincing. In addition, Luo et al.21) used RNA-seq to analyze 20 clinical samples and explored SRSF2-regulated AS events without adequate bioinformatics analysis. For expression analysis of a single gene, the traditional RNA-seq analysis method is still credible, but for comparison and screening of multiple genes and multiple transcripts, a single analysis method has drawbacks. For example, the research of Yang et al.22) directly uses the SAM file downloaded from the GEO database for analysis. The mapping process of the sequencing data is not considered, and some important information may be lost. Various calculation and analysis methods meet the needs of different studies. In this study, analysis methods of two kinds of sequencing data were combined to avoid errors caused by a single method. Among the analysis methods used, the HSB pipeline takes less time and saves disk space, but the TCC pipeline produces more result files and obtains more comprehensive sequence information. TopHat maps reads on a genome and then assembles the reads into full-length transcripts. Cufflinks quantifies the expression level of each transcript. Then, Cuffdiff integrates differentially expressed transcripts and their differential expression ratios. In the HSB pipeline, HISAT aligns the RNA-seq reads to a genome and finds cleavage sites. Stringtie integrates the results from the previous step and estimates the level of gene expression. Ballgown used rigorous statistical methods for differential expression analysis of genes and variants.23) This research innovatively used two calculation pipelines to make the results more closely resembled the reference value. By combining complete DEG analysis and AS analysis, the entire bioinformatics process was fast and accurate. Finally, the key variant MT1E-203, which is a potential liver cancer marker, was verified through biological experiments.

In summary, the combination of the two sequencing data analysis methods used in this study are reported for the first time, and the prediction of the structure and binding site of the splice variant is the highlight of this study. Numerous studies have shown that changes in shear may cause a variety of diseases, including cancer. This study analyzed differential expression between gene variants. The high accuracy of sequencing technology facilitates the discovery of more significant shear variants through the study of their pathways and functions leading to cancer. The use of molecular targets or molecular markers can lead to the early diagnosis of cancer and can improve the treatment and cure rate of liver cancer.

CONCLUSION

In summary, we identified 139 DEGs by bioinformatics analysis of clinical sequencing samples and TCGA database samples. Then, 6 key variants were identified by alternative splicing analysis, and we found that the variant MT1E-203 was expressed at low levels in liver cancer tissues and cell lines. Overexpression of MT1E-203 reduced the viability and proliferation of liver cancer cells and blocked the cell cycle by affecting the P53 gene pathway. In addition, the protein structure of MT1E-203 might have been changed due to exon skipping, leading to the loss of sites for binding zinc ions. MT1E-203 is a potential liver cancer marker and therapeutic target.

It is inferred that the bioinformatic analysis methods used in this study are credible. This study provides a meaningful method for screening differential genes and analyzing alternative splicing. The analysis process is expected to be extended to research on other diseases.

Acknowledgments

We are grateful to the patients who contributed the clinical samples to this study, and we thank all the teachers and students of the National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University for their support and assistance in this research. This work was supported by the Natural Science Foundation of Jilin Province, P. R. China [Grant No. 20180101242JC], National Natural Science Foundation of China [Grant No. 81700709], Science and Technology Development Research Foundation of Jilin Province [Grant No. 20180520105JH], and Systems Biology Research on Genome and Transcriptome of Stem Cells [Grant No. 2017030] of Jilin Province Sunbird Regenerative Medical Engineering Co., Ltd.

Author Contributions

Yanxin Huang and Yuxin Li: conceiving and designing the research. Mengqi Cui: the major contributor in writing the manuscript, performing the research. Miao Bai: performing Western blot experiment and protein structure prediction. Zhenxiang Yu, Lihua Zheng, Yongli Bao, Luguo Sun, Chunlei Yu, Ying Sun, Zhenbo Song and Guannan Wang: providing technical and material assistance for this article. All authors reviewed the manuscript.

Conflict of Interest

The authors declare no conflict of interest.

Supplementary Materials

The online version of this article contains supplementary materials.

REFERENCES
 
© 2021 The Pharmaceutical Society of Japan
feedback
Top