Time series clustering analysis of genes during osteogenic differentiation of human mesenchymal stem cells

To investigate the gene expression pattern and related biological changes during osteogenic differentiation of human mesenchymal stem cells (hMSCs), we downloaded expression data for four uninduced hMSC samples, and 12 osteogenic induction samples at day 2, 8, 12 or 25, in the GSE37558 dataset. Differentially expressed genes (DEGs) between groups were screened, followed by short time-series expression miner (STEM) analysis and weighted gene co-expression network analysis (WGCNA). Osteogenic differentiation-related genes were extracted from the GeneCards database. Next, functional enrichment was performed, and protein–protein interaction (PPI) and lncRNA–miRNA–mRNA networks were constructed. Compared to uninduced hMSC samples, 163, 341, 447 and 537 DEGs were found in osteogenic induction samples at day 2, 8, 12 and 25, respectively, showing a sustainably increased trend. From STEM, WGCNA and the Gene-Cards database, a total of 107 key genes associated with osteogenic differentiation were screened; these genes were enriched in biological processes, such as ossification, ECM–receptor interaction, vasculature development, cartilage development and bone mineralization, as well as the Wnt signaling pathway and the chemokine signaling pathway. The PPI network identified four hub genes, STAT5A, TWIST1, FOXO1 and LEP. The lncRNA–miRNA–mRNA network revealed regulatory axes for STAT5A, FOXO1 and LEP. Three and two regulatory axes were found for STAT5A and LEP, respectively. Multiple regulatory axes for FOXO1 were found, such as MIR155HG–miR-223–FOXO1. This study identifies candidate key targets that may play important roles in regulating osteogenic differentiation of hMSCs, and provides novel information to further investigate the molecular regulation mechanism. More experiments are required to evaluate the effects of these genes on osteogenic differentiation of hMSCs.


INTRODUCTION
Bone defect is the loss of part of the bone, and is a common orthopedic clinical symptom. Every year, tens of millions of patients worldwide suffer from bone defects caused by severe trauma, fracture associated with infection, improper post-fracture treatment, bone tumors or other complications. Small bone defects can be repaired by the body, but large bone defects are difficult to heal spontaneously. There is no single definition of what constitutes a critical-sized defect; this is dependent on various factors, such as the absolute versus relative size, anatomical location, and whether there is circumferential loss of bone (Nauth et al., 2018). Although there has been substantial progress, the treatment of large bone defects is still an urgent problem to be solved due to factors such as large surgical trauma, long treatment period and complication (Baldwin et al., 2019;Qasim et al., 2019;Zhang et al., 2020b).
Bone tissue engineering is considered to be an effective method for the treatment of large bone defects, and seed cells are an important part of bone tissue engineering (Wubneh et al., 2018;Shi et al., 2019;Zhao et al., 2020). Mesenchymal stem cells (MSCs) are a type of stem cells isolated from bone marrow, with strong proliferation and differentiation ability, that can differentiate into osteoblasts, adipocytes, chondroblasts, endothelial cells, nerve cells and other tissue cells under different induction conditions (Arthur and Gronthos, 2020). MSCs have been widely used in the field of bone reconstruction and regeneration, and are the best seed cells for the treatment of bone defects because they are convenient to isolate, are easy to amplify and maintain stably in vitro, and display low immunogenicity and good osteogenic differentiation performance (Li et al., 2019a;Lin et al., 2019). However, a long process is required for MSCs to differentiate into osteoblasts in vitro, and their proliferation ability and activity decrease during the process of induction and differentiation, which is not conducive to their role in the treatment of bone defects (Li et al., 2018;Ramesh, 2021). Therefore, it is necessary to further explore the key genes involved in osteogenic differentiation of MSCs and their regulatory mechanisms.
Several studies have explored the genome-level changes and hub genes involved in osteogenic differentiation based on microarray techniques and bioinformatics research methods (Fan et al., 2020;Khodabandehloo et al., 2020;Wu et al., 2021). These studies revealed hub genes during osteogenic differentiation mainly by screening differentially expressed genes (DEGs), and then investigating their functions and protein-protein interaction (PPI) networks. Weighted gene co-expression network analysis (WGCNA) uses the concepts of systems biology to find similarities of expression between genes, and identifies highly correlated genes as a gene module, which helps to explore the overall behavior of biological systems (Langfelder and Horvath, 2008). Using WGCNA, Liu et al. (2021) investigated the co-expression network for osteogenic/adipogenic differentiation of MSCs and identified six hub genes. However, the timing of gene expression is precisely regulated at different stages of osteoblast formation (Zhang et al., 2015), and focusing on changes at a certain time point during osteogenic differentiation may neglect some important regulators in this process. Short time-series expression miner (STEM) is a clustering analysis tool that is widely used to study a range of dynamic biological processes, such as development and differentiation, which can determine statistically significant timedependent gene expression profiles based on data for a large number of genes and several time points (Ernst and Bar-Joseph, 2006;Fan et al., 2021).
To investigate changes of gene expression pattern during osteogenic differentiation more comprehensively, an integrated bioinformatics analysis was conducted. Based on the GSE37558 microarray dataset, DEGs at different time points (days 2, 8, 12 and 25 versus day 0) during osteogenic differentiation of human (h)MSCs were first screened and then analyzed by time series clustering to reveal the tendency of DEGs to vary over differentiation time points using STEM. In addition, co-expression gene modules related to osteogenic differentiation were identified by both WGCNA and GeneCards. Common genes in these analyses were considered as important candidate genes to further identify hub genes for osteogenic differentiation of hMSCs. Our study provides potential targets and novel clues for the differentiation of hMSCs.

RESULTS
Genes differentially expressed at each time point of osteogenic differentiation There were 163, 341, 447 and 537 DEGs in Day 2 vs. Day 0, Day 8 vs. Day 0, Day 12 vs. Day 0 and Day 25 vs. Day 0 samples, respectively (Fig. 1A), of which 97 were common genes, suggesting that increasing numbers of genes were differentially expressed as osteogenic differentiation progressed. There were a total of 647 DEGs after the removal of duplicates (Supplementary Table S1). Next, these DEGs were clustered using STEM, and six significant gene profiles were obtained (Fig. 1B), from which 306 genes in profiles 41, 43 and 38 showed similar up-regulated expression patterns and were merged (named red profiles), and 174 genes in profiles 9, 19 and 0 showed similar down-regulated expression patterns and were merged (named green profiles).
WGCNA identifies osteogenic differentiation time point-correlated gene modules Gene modules that correlated highly with osteogenic differentiation time point were analyzed using WGCNA. Based on the scale-free topology network criterion, the soft threshold (power = 10) was selected when the square value of  the correlation coefficient reached 0.85 for the first time ( Fig. 2A). Genes were clustered following the dynamic tree cut algorithm with parameter settings minModule-Size = 30, MEDissThres = 0.3, and six highly correlated gene modules were obtained (Fig. 2B). Correlation analysis revealed that two gene modules were markedly correlated with osteogenic differentiation time ( Fig. 2C): the black module (r = 0.88, P = 5E-06) and the turquoise module (r = − 0.85, P = 4E-05).
Common genes between STEM profiles and WGCNA modules Venn analysis revealed 412 common genes between significant STEM gene profiles and WGCNA modules, including 301 genes between the red profiles and the black module, and 111 genes between the green profiles and the turquoise module (Fig. 3A). Functional enrichment analysis indicated that the common genes between the red profiles and the black module were markedly involved in extracellular structure organization, extracellular matrix (ECM) organization, ossification, vascular smooth muscle contraction, etc. (Fig. 3B). Genes common to the green profiles and the turquoise module were markedly implicated in chemotaxis-related biological processes, such as regulation of leukocyte/monocyte chemotaxis, positive regulation of chemotaxis and regulation of leukocyte migration (Fig. 3C).
Key genes related to osteogenic differentiation To further screen key genes related to osteogenic differentiation, we searched the GeneCards database with the search term "osteogenic differentiation". Genes retrieved from the GeneCards database were then merged with STEM profiles and WGCNA modules, and 107 common genes were screened and considered as key genes related to osteogenic differentiation (Fig. 4A). The common genes among GeneCards, green profiles and the turquoise module were markedly implicated in chemotaxis-related biological processes, the Wnt signaling pathway and the chemokine signaling pathway (Fig. 4B). The common genes among GeneCards, red profiles and the black module were markedly involved in ossification, regulation of ossification, extracellular structure/matrix organization, cartilage development, bone mineralization, the Wnt signaling pathway, ECM-receptor interaction, etc. (Fig. 4C).
PPI network The 107 common genes selected above were input to the STRING database to predict PPIs, and 95 interactions involving 65 genes were found as shown in the PPI network (Fig. 5A). Among these, four hub genes were identified by the cytoHubba algorithm (for details, see Materials and Methods), including signal transducer and activator of transcription 5A (STAT5A), twist family bHLH transcription factor 1 (TWIST1), forkhead box O1 (FOXO1) and leptin (LEP). These four genes were all markedly up-regulated in osteogenic induction samples (Fig. 5B). In addition, expression of the four hub genes was markedly up-regulated with increasing differentiation time points (Fig. 5C).

DISCUSSION
MSCs are ideal seed cells for bone tissue engineering due to their convenient isolation and culture and multilineage differentiation potential (Confalonieri et al., 2018). Osteogenic differentiation of MSCs is a multistep process, and is regulated by multiple transcriptional factors and pathways. Elucidating the gene expression profile during osteogenic differentiation of MSCs contributes to evaluating and improving scaffold materials and their effectiveness in inducing bone formation (Zheng et al., 2021). In this study, we explored the key genes and their functional involvement in hMSCs at different time points (day 2, 8, 12 and 25) during osteogenic differentiation. Compared to uninduced hMSC samples, the number of genes that were differentially expressed in osteogenic induction samples increased sustainably at day 2, 8, 12 and 25. Genes that were sustainably upregulated during osteogenic differentiation were mainly involved in ossification, extracellular structure/matrix organization, ECM-receptor interaction, vasculature development, cartilage development and bone mineralization.
The ECM is a non-cellular three-dimensional polymer network, composed of biomacromolecules that are synthe- sized and secreted by cells, on the cell surface or between cells. It is rich in collagen, laminin, elastin, proteoglycan and various cytokines (Theocharis et al., 2016). The ECM plays an important role in maintaining cell homeostasis and tissue development by providing a living environment for different types of cells in vivo and influencing their biological activities (Walker et al., 2018).  transplanted ECM material onto a titanium substrate and cultured bone marrow MSCs, and found that the ECM could regulate their morphology, proliferation and osteogenic differentiation. Angiogenesis is a key step in the process of bone regeneration, during which MSCs differentiate into chondrocytes, and, as blood vessels grow in, the cartilage begins to mineralize. Blood vessels provide not only oxygen and nutrients to bone tissue, but also stem cells for remodeling and repair of bone tissue (Diomede et al., 2020). Our results support these findings.
Among these key genes, four hub genes were identified: STAT5A, TWIST1, FOXO1 and LEP. STAT5A is a member of the STAT transcription factor family, which has been implicated in regulating osteogenic differentiation (Levy et al., 2010;Joung et al., 2012;Zhang et al., 2020a). For example, Zhang et al. (2020a) suggested that leptin could activate the JAK/STAT signaling pathway in MSCs to enhance BMP9-induced osteogenesis. Various studies have indicated the importance of TWIST1 (Miraoui et al., 2010;Quarto et al., 2015;Fan et al., 2022) and LEP (Xu et al., 2016;Zhao et al., 2016) in the regulation of osteogenic differentiation: LEP reportedly alters the differentiation fate of chondrogenic progenitor cells and triggers their   6. lncRNA-miRNA-mRNA network. The lncRNA-miRNA-mRNA network for hub genes, in which red circular nodes represent hub genes identified from the PPI network, and orange hexagonal nodes and blue diamond nodes represent miRNAs and lncRNAs, respectively. senescence (Zhao et al., 2016), and promotes osteoblast differentiation of BMSCs (Xu et al., 2016); meanwhile, inhibiting the expression of TWIST1 could promote the osteogenic differentiation of periodontal MSCs  and adipose tissue-derived stem cells (Quarto et al., 2015). FOXO1 transcription factor, a member of the Fox family, is mainly involved in cell proliferation, differentiation, angiogenesis and other biological processes, and multiple studies have also reported the involvement of FOXO1 during osteogenic differentiation Chen et al., 2019;Yang et al., 2021). In our study, these four genes were all markedly up-regulated in osteogenic induction samples in comparison with uninduced hMSC samples, suggesting that they play vital roles in osteogenic differentiation of hMSCs, as reported in the above studies.
lncRNA and miRNA are non-coding RNAs, constituting about 90% of the RNA transcripts of the human genome. Although lacking protein-coding potential, they are regarded as important regulators of various cellular function and processes, including osteogenic differentiation . In general, miRNAs exert their functions by binding to the 3′ UTR of target mRNAs to regulate the transcription of mRNAs. lncRNAs act as competitive endogenous RNAs to bind to miRNAs, whereby they can regulate the expression of target mRNAs. Investigation of lncRNA-miRNA-mRNA regulatory networks contributes to revealing the underlying mechanisms of osteogenic differentiation (Liu et al., 2022). Therefore, an lncRNA-miRNA-mRNA regulatory mechanism for the identified hub genes was investigated. We found an lncRNA-miRNA-mRNA regulatory mechanism for STAT5A, LEP and FOXO1. In particular, multiple regulatory axes for FOXO1 were found, such as MIR155HG-miR-223-FOXO1. Interactions between miR-223 and FOXO1 have been investigated. Han et al. (2019) indicated that miR-223 contributed to neuroblastoma cell proliferation and invasion by targeting FOXO1, and Li et al. (2019b) reported that miR-223 contributed to maintaining functional B cells in diabetes by targeting FOXO1. Lee et al. (2018) further showed that miR-223 regulated the expression of FOXO1 by binding to the 3′ UTR of FOXO1 mRNA. Additionally, Yang et al. (2022) suggested that the lncRNA MIR155HG could facilitate nucleus pulposus cell pyroptosis by sponging miR-223. However, there has been no study to investigate the MIR155HG-miR-223-FOXO1 regulatory axis, or the role(s) of lncRNA MIR155HG in osteogenic differentiation.
There are two limitations to this study. 1) The sample size is small. According to the WGCNA tutorial, WGCNA is not recommended for data sets consisting of fewer than 15 samples, and at least 20 samples is recommended if possible. Although there are other studies with a sample size below 20 (Wlodarczyk et al., 2017), and even below 15 (Wang et al., 2020b), we acknowledge that more robust and refined results are usually obtained by including more samples in any analysis. 2) Confirmatory experiments are lacking. More experiments are required to validate the expression of the identified hub genes and their molecular regulation mechanism, as well as their biological function, in osteogenic differentiation of hMSCs. Despite these limitations, the present study revealed four hub genes during osteogenic differentiation, and their expression increased with the time of osteogenic differentiation, indicating that they were active throughout the whole process of differentiation. We further identified an lncRNA-miRNA-mRNA regulatory network. However, the majority of known regulatory axes have not been investigated in osteogenic differentiation. This provides potential targets to investigate the regulation of expression of these hub genes in osteogenic differentiation, thereby contributing to exploring the mechanism of osteogenic differentiation, which is important to develop bone regeneration and stem cell therapy.
We explored the key genes and their functions in hMSCs at different time points (day 2, 8, 12 and 25) during osteogenic differentiation. A total of 107 genes associated with osteogenic differentiation were screened, and these genes are mainly involved in ossification, ECMreceptor interaction, vasculature development, cartilage development and bone mineralization. Four hub genes, STAT5A, TWIST1, FOXO1 and LEP, were identified, and their regulatory mechanisms were explored. These results provide a basis for elucidation of gene expression profiles during osteogenic differentiation of hMSCs, and also provide novel targets to investigate the molecular mechanism of this differentiation.

MATERIALS AND METHODS
Data acquisition and preprocessing The series expression matrix of hMSC osteogenic differentiation was downloaded from the Gene Expression Omnibus database with accession number GSE37558. A total of 16 samples in the GSE37558 dataset were used in this analysis, including four uninduced hMSC samples, and 12 osteogenic induction samples at day 2, 8, 12 or 25 with three samples for each time point. The Affy package (version 1.56.0) in R 3.6.1 was utilized for data preprocessing. Probes were annotated based on annotation files provided by the GPL6947 platform. Probes that matched no gene symbol were removed, and the mean value was regarded as the final expression when multiple probes matched to one gene symbol.
Short time-series expression miner (STEM) Differentially expressed genes (DEGs) in Day 2 vs. Day 0, Day 8 vs. Day 0, Day 12 vs. Day 0 and Day 25 vs. Day 0 samples were screened using the limma package (version 3.34.9) with the threshold of |logFC| > 1 and adjusted P < 0.05. The DEGs at different time points were merged, and then clustered using STEM software (version 1.3.11); the parameters were set as: gene expression correlation coefficient in each cluster > 0.7, P < 0.05 and gene annotation source set as homo sapiens. Finally, the significant gene clusters with similar expression patterns were merged and used for subsequent analysis.
Weighted gene co-expression network analysis (WGCNA) Highly correlated gene modules that targeted osteogenic differentiation were analyzed using the WGCNA package (version 1.61) in R. First, the top 25% genes with wide variation between samples were selected as the input genes for WGCNA analysis. Then, to balance the relationship between mean connectivity and scale independence, the soft-thresholding power β was determined based on the scale-free topology network criterion. Next, genes were clustered following the dynamic tree cut algorithm with parameter settings minModule-Size = 30, MEDissThres = 0.3. Finally, the eigengene of each module was determined, followed by Pearson correlation analysis between eigengene and osteogenic differentiation time. The most highly correlated modules with P < 0.01 and |r| > 0.7 were selected for further analysis.
Screening of osteogenic differentiation-related genes Genes involved in osteogenic differentiation were searched for in the GeneCards database (http://www. genecards.org/) with the search term "osteogenic differ-entiation". Venn analysis was then performed for the selected genes from the above three analyses (GeneCards database, STEM analysis and WGCNA analysis).
Functional enrichment analysis To investigate the biological functions and signal pathway involvement of selected genes, Gene Ontology annotation terms and KEGG pathways were enriched using the clusterProfiler package (version 3.16.0). Significant enrichment was determined using a cut-off value of P < 0.05 and count ≥ 2.
Protein-protein interaction (PPI) network Interactions among genes were analyzed utilizing the STRING database (version 11.5) with PPI score setting as 0.7. The PPI network was visualized using Cytoscape software (version 3.6.1). Hub genes of the PPI network were then identified with the cytoHubba plug-in provided by Cytoscape. On the basis of the cytoHubba algorithm, the degree, maximal clique centrality (MCC), maximum neighborhood component (MNC), density of MNC (DMNC) and edge percolated component (EPC) of each gene were calculated, and hub genes were identified by referencing all five terms.
Construction of an lncRNA-miRNA-mRNA network miRNAs that targeted hub genes were predicted using the miRWalk 3.0 tool, in which miRNA-mRNA pairs with score > 0.95, as recorded in the TargetScan and miRDB databases, were selected. Long non-coding RNAs (lncRNAs) that targeted the above selected miRNAs were predicted using the ENCORI tool, in which lncRNA-miRNA pairs were selected if the values of their refine results with high stringency of CLIP data were ≥ 5. The obtained lncRNA-miRNA pairs and miRNA-mRNA pairs were finally integrated to construct an lncRNA-miRNA-mRNA network.

DECLARATIONS
Funding: None. Conflicts of interest: The authors declare that there is no conflict of interest regarding the publication of this paper.
Ethics approval: This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent: None. Authors' contributions: Yaqiong Li and Jun Wang carried out the conception and design of the research, and Yaqiong Li participated in the acquisition of data. Jun Wang carried out the analysis and interpretation of data. Yaqiong Li and Jun Wang conceived the study, and participated in its design and coordination and drafted the manuscript. All authors read and approved the final manuscript.