SELL and IFI44 as potential biomarkers of Sjögren’s syndrome and their correlation with immune cell inﬁltration

The onset of Sjögren’s syndrome (SS) is hidden, early diagnosis is difﬁcult, and the disorder seriously endangers the physical and mental health of affected peo-ple. This study aims to identify potential biomarkers of SS and to investigate the characteristics of immune cell inﬁltration. We used four SS gene expression proﬁle data series from the Gene Expression Omnibus database, and applied bioinformatics analysis and machine learning algorithms to screen two biomarkers, SELL (L-selectin) and IFI44 (interferon-induced protein 44), from 101 differentially expressed genes. The two-gene model comprising SELL and IFI44 showed good diagnostic ability for SS in the training set (AUC = 0.992) and veriﬁcation set (AUC = 0.917). Analysis of inﬁltrating immune cells in SS identiﬁed naive B cells, resting CD4 memory T cells, activated CD4 memory T cells, gamma delta T cells, M0 macrophages, M1 macrophages, plasma cells, CD8 T cells, activated NK cells and monocytes as candidate participants in the SS process. Furthermore, SELL was associated with M2 macrophages, activated CD4 memory T cells, gamma delta T cells, resting NK cells and plasma cells, while IFI44 was associated with activated mast cells, resting NK cells, resting mast cells and CD8 T cells. This study demonstrates that SELL and IFI44 can serve as good diagnostic markers for SS and may also be new diagnostic and therapeutic targets for SS.


INTRODUCTION
Sjögren's syndrome (SS) is among the most common chronic systemic autoimmune diseases, but its pathogenesis is still not fully understood. According to whether or not it occurs with other autoimmune diseases, it can be divided into primary and secondary SS, and women are more commonly affected than men (Dolcino et al., 2019). The typical clinical manifestations of SS include exocrinopathy, resulting in dry eyes and mouth, and extra-glandular systemic manifestations such as arthralgias, fatigue, interstitial nephritis and pulmonary fibrosis, as well as pulmonary hypertension, central and peripheral involvement of the nervous system, and vasculitis (Tzioufas et al., 2012;Ngo et al., 2016). However, due to the mild early symptoms of SS and the lack of reliable early diagnostic markers, it is often ignored, and most patients have a poor clinical prognosis. Therefore, it is important to identify novel biomarkers and determine the molecular mechanisms of SS to improve the early diagnosis and treatment of patients with SS, thus improving their prognosis.
To explore the pathogenesis of SS, many recent studies have focused on the role of infiltrating immune cells. Zhou et al. (2012) found a critical involvement of macrophage infiltration in the development of SSassociated dry eye. In the primary SS mouse model, CD8 T cells are the main T cells of the infiltrated fluid gland, and targeted depletion of CD8 T cells significantly reduces tissue damage and restores salivary gland functions even in mice with established disease (Gao et al., 2019). Therefore, from the point of view of the immune system, evaluating the infiltration of immune cells is valuable for elucidating the molecular mechanism of SS and exploring new immunotherapeutic targets. CIBER-SORT is a method for quantifying cell type composition from tissue gene expression datasets based on immune cell signatures (Newman et al., 2015). It has been widely used to analyze immune cell infiltration in various diseases such as osteoarthritis (Deng et al., 2020), colorectal cancer (Ye et al., 2019) and atherosclerosis (Xu and Yang, 2020). However, so far, there have been no reports on the use of CIBERSORT to analyze the infiltration of SS immune cells.
In this study, we downloaded and analyzed SS microarray data series from the Gene Expression Omnibus (GEO) database, and screened them for candidate biomarkers of SS. We then used the CIBERSORT algorithm to analyze the relative levels of 22 kinds of immune cells in SS tissues and normal tissues, and also analyzed the relation-ship between markers and immune cells, so as to better understand the molecular immune mechanism underlying the occurrence and development of SS.
Data processing and screening of differentially expressed genes (DEGs) Series matrix files and corresponding microarray platforms were downloaded from GEO, and the four series matrix files were annotated with an official gene symbol by microarray platforms. We used the average value of all probes as the expression value of the single corresponding gene. The GSE7451, GSE40611 and GSE23117 gene expression matrices were Fig. 1. Volcano plot of DEGs. The abscissa represents log 2 FC and the ordinate represents − log10(adjusted P-value). Red, gray and blue dots represent up-regulated, non-significant and down-regulated DEGs, respectively. merged into one file, and batch correction was applied by implementing the ComBat algorithm with the sva package (version 3.6.1) (Parker et al., 2014). We used the limma R package (Ritchie et al., 2015) to screen DEGs, and the ggplot2 package (Ginestet, 2011) to draw a volcano plot of DEGs. Only genes with |log2FC| > 1.5 and an adjusted P < 0.05 were considered DEGs.
Functional correlation analysis Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were performed using the clusterProfiler package (Yu et al., 2012). Only terms with a false discovery rate (FDR) and P < 0.05 were considered significant.
Biomarker screening and validation Least absolute shrinkage and selection operator (LASSO) regression analysis was conducted using the glmnet package (Friedman et al., 2010) in R to reduce high-dimensional data, and the optimal predictive features for SS were selected from the DEGs. Logistic regression methods were based on forward selection (likelihood ratio), and genes with a P-value ≤ 0.05 were included in the final logistic regression models (IBM SPSS version 19). A P-value (two-sided) < 0.05 was considered to be statistically significant. To assess the diagnostic value of the prediction model in SS, GSE127952 was used as an independent validation set, and receiver operating characteristic (ROC) curve analysis was completed using the pROC package in R (Robin et al., 2011).
Evaluation of immune cell infiltration CIBERSORT was used to compare and analyze the integrated gene expression matrix data and estimate the relative proportion of 22 infiltrating immune cell types in SS. Each sample was deconvoluted and its P-value determined. Samples with P < 0.05 were selected, and the putative proportions of infiltrating immune cell types were obtained. The ggplot2 package was used for principal component analysis (PCA) of immune cell infiltration matrix data, the MANOVA test was used to evaluate differences in infiltrating immune cells between groups, and the violin diagram was used to show differences in immune cell infiltration. The corrplot package (Friendly, 2002) was used to draw a correlation heat map between immune cells.
Correlation analysis between biomarkers and immune cells Spearman correlation analysis was used to analyze the correlation between diagnostic markers and infiltrating immune cells. The analyses made use of the packages ggplot2, ggpubr and ggExtra. P < 0.05 was

RESULTS
Screening and functional analysis of DEGs Three microarray data series were merged and used to extract DEGs. A total of 101 DEGs were obtained, as shown in the volcano plot of up-and down-regulated genes ( Fig.   1), and the top 10 and 20 up-regulated genes (sorted by adjusted P-value) are identified in Fig. 1 and listed in Table 1, respectively. The GO analysis results show that DEGs are mainly related to defense response to virus, response to virus and type I interferon signaling pathway (biological process); external side of plasma membrane, MHC protein complex and endocytic vesicle membrane (cellular component); and cytokine receptor binding, cytokine activity and G protein-coupled receptor binding (molecular function) ( Fig. 2A). KEGG analysis results are shown in Fig. 2B; DEG-rich pathways mainly include Epstein-Barr virus infection, cytokine-cytokine receptor interaction and hematopoietic cell lineage.
Immune cell infiltration PCA cluster analysis showed that there was a significant difference in the infiltration of immune cells between SS and normal tissues (MANOVA, F = 33.36, P < 0.001) (Fig. 5A). The heat maps of 22 types of immune cells showed that there was a strong positive correlation between CD8 T cells and activated NK cells. Plasma cells, gamma delta T cells and resting CD4 memory T cells had a strong negative correlation. CD8 T cells and gamma delta T cells also had a negative correlation (Fig. 5B). The violin plot of the difference in immune cell infiltration showed that, compared with the normal control sample, naive B cells, resting CD4 memory T cells, activated CD4 memory T cells, gamma delta T cells, M0 macrophages and M1 macrophages infiltrated more, while plasma cells, CD8 T cells, activated NK cells and monocytes infiltrated less (Fig. 5C).

DISCUSSION
SS is a chronic inflammatory autoimmune disease characterized by lymphocytic infiltration at the exocrine glands and other extra-glandular sites, resulting in dryness of the mouth and eyes (Hwang et al., 2013). Early diagnosis of SS remains challenging due to the lack of sensitive and specific biomarkers. In recent years, it has been found that immune cell infiltration plays an important role in the development of SS (Zhou et al., 2012;Gao et al., 2019). Therefore, it is of great value to identify sensitive and specific molecular markers for SS and to analyze the infiltration of immune cells to improve the prognosis of SS patients.
We downloaded SS expression profile data series from the GEO database and identified a total of 101 DEGs. GO enrichment analysis showed that the DEGs were mainly related to defense response to virus, response to virus and type I interferon signaling pathway. KEGG pathway enrichment analysis implied that DEGs were mainly involved in Epstein − Barr virus infection and cytokine − cytokine receptor interaction. These results are consistent with a previous study (Khuder et al., 2015), indicating that the approaches used in the present study are reliable.
To screen biomarkers that are relevant to SS, we used LASSO regression (Suykens and Vandewalle, 1999) and logical regression (forward selection) to improve the accuracy of bioinformatics analysis. Finally, SELL and IFI44 were identified as diagnostic markers of SS. The twogene model composed of SELL and IFI44 showed good predictive value in both the training set (AUC = 0.992) and the verification set (AUC = 0.917). SELL encodes a cell surface adhesion molecule, L-selectin, that belongs to the adhesion/homing receptor family and plays a role in promoting the migration of leukocytes to lymphoid organs (Wedepohl et al., 2012). In the mouse model, L-selectin can regulate the entry of neutrophils into joint tissue, which is related to the severity of inflammation (Sarraj et al., 2006). Compared with controls, the saliva of SS patients contained an increased level of soluble L-selectin (Kabeerdoss et al., 2016), which participates in the migration of lymphocytes from the blood to the inflammatory lacrimal glands through the adhesion pathway (Mikulowska-Mennis et al., 2001). SELL may play an important role in the occurrence and development of SS disease. Type I IFN plays an important role in the innate immune system by inhibiting viral replication, activating natural killer cells, boosting the generation and activation of dendritic cells and enhancing antibody responses (Wildenberg et al., 2008), related to a variety of autoimmune diseases (Yao et al., 2013;Thorlacius et al., 2018). In primary SS (pSS) patients, the interferon signal is associated with a higher disease activity index score. IFI44 is a type I IFN-induced protein with increased expression in SS, which may be related to the innate immune response after viral infection (Brkic et al., 2013;DeDiego et al., 2019), and IFI44 may therefore be a therapeutic target for reducing virus-mediated diseases and controlling diseases associated with excessive immune signaling.
Immune cell infiltration is related to the occurrence and development of many diseases. Previous studies have shown that naive B cells in pSS increase significantly, which is related to disease activity (Szabó et al., 2016); CD4 + T cells can activate IL-21 and promote the proliferation of T and B lymphocytes, and their activation is related to the pathogenesis of pSS and disease activity. The high sensitivity of B cells and the secretion of inflammatory factors are closely related (Parrish-Novak et al., 2000;Busch et al., 2012). One study found that high expression of gamma delta T cells in the peripheral blood of patients with pSS may play an auxiliary role in inducing B cells to secrete immunoglobulins and may participate in the pathological immune response of pSS (Gerli et al., 1993). Yoshimoto et al. (2019) also found that most lip epithelial sections of SS patients showed cellular edema infiltrated by macrophages, which may be one of the characteristics of SS. pSS patients have elevated plasma cell levels in the peripheral blood, and the number of IgG4 + plasma cells that infiltrate the salivary glands is negatively correlated with disease characteristics; the higher the IgG4 + expression in plasma cells, the lower the positive rates of serum anti-SSA antibodies, anti-SSB antibodies, antinuclear antibodies and rheumatoid factor Brokstad et al., 2018). The absolute count of CD8 T cells in the peripheral blood of patients with pSS is significantly reduced. The impaired lymphocyte distribution may be related to genetically determined lymphopenia or lymphocyte migration from the periphery to inflammatory sites, and/or to increased susceptibility to apoptosis (Sudzius et al., 2015). In patients with pSS, the NK cell number, NK cell killing activity and the expression of activating receptors CD2 and NKG2D in peripheral blood were significantly decreased, and the expression of NKp46 and the percentage of apoptotic NK cells were significantly increased. This may be the result of apoptotic death, and may contribute to impaired NK cell activity in patients with pSS (Izumi et al., 2006). Different from the results of this study, Yang et al. (2017) found that the increase of mononuclear cells in patients with pSS was significantly related to ESR, CRP, IgG and IgA. It can be seen that the occurrence and development of SS are the results of the joint action of a variety of immune cells. Exploring SS-related infiltrating cells can provide a new target for SS treatment, but further experimental data validation is still required.
The correlation analysis between diagnostic markers and immune cells showed that M2 macrophages, activated CD4 memory T cells, gamma delta T cells, resting NK cells and plasma cells were associated with SELL, while activated mast cells, resting NK cells, resting mast cells and CD8 T cells were associated with IFI44. Studies have shown that CD4 + T cells release multiple cytokines, which are the key factors for the production of autoantibodies, lymphocyte infiltration, and even the development of lymphoma (Maehara et al., 2012;Nocturne and Mariette, 2015;Mingueneau et al., 2016). Dendritic cells and mast cells play an important role in SS. Dendritic cells are related to the severity of SS (Manoussakis et al., 2007;Zhou and McNamara, 2014;Hillen et al., 2019). Therefore, we speculate that SELL will increase activated CD4 memory T cells, M2 macrophages and gamma delta T cells, and reduce resting NK cells and plasma cells; IFI44 will increase activated mast cells and resting NK cells, and reduce resting mast cells and CD8 T cells to contribute to the occurrence and development of SS. These hypotheses need further study to clarify the complex interactions between genes and immune cells.
There have been previous bioinformatics analyses of SS salivary gland tissue. Among our top 20 DEGs, 13 genes (SELL, IFIT3, XAF1, SAMD9L, CMPK2, EPSTI1, STAT1, EVI2A, APBB1IP, SAMD9, SAMHD1, CXCL13 and MS4A1) were identified by Song et al. (2014) through meta-analysis, and nine (SELL, IFIT3, PTPRC, SAMD9L, IFIT1, EPSTI1, IFI44, STAT1 and CXCL13) were identified by Khuder et al. (2015); our findings are thus highly consistent with these previous results. The difference from those reports is that our present research used machine learning algorithms to improve the accuracy of the analysis, and by narrowing down the candidate genes to two (SELL and IFI44) we have established a two-gene model. Besides, we analyzed the infiltrating immune cells in SS through the CIBERSORT algorithm and explored the correlation between SELL, IFI44 and these infiltrating immune cells, which should provide clues for the immunotherapy of SS. However, our research has some limitations. CIBERSORT analysis is based on limited genetic data, which may deviate from heterogeneous interactions between cells, disease-induced disorders, or phenotypic plasticity (Deng et al., 2020). Second, our sample size is relatively small, so the analysis of the results may not be powerful enough. The reliability of these results needs confirmation through further study.
In conclusion, our study identified a two-gene model that showed satisfactory performance in predicting SS patients. We also explored the infiltration behavior of immune cells in SS. SELL, IFI44 and these infiltrating immune cells may play an important role in the etiology and development of SS. Further study will help to achieve the goal of SS immunotherapy and will improve the level of immunomodulatory therapy in patients with SS.