2018 Volume 93 Issue 6 Pages 221-228
Human lifespan is determined by genetic and environmental factors. Potential longevity genes are neither specific nor reproducible, and longevity-related genes are constantly confused with age-related genes. To distinguish specific age- and longevity-related genes, we analyzed a Gene Expression Omnibus (GEO) dataset established by the Leiden Longevity Study. The individuals were classified into longevity (mean age, 93.4 ± 3.0 years), longevity offspring (60.8 ± 6.1) and control (61.9 ± 6.9) groups. The series matrix files were downloaded, and average expression values were calculated. Differentially expressed genes (DEGs) between longevity and control groups and those between longevity and their offspring were identified by GEO2R online. A total of 507 longevity- and 755 age-related DEGs were visualized using a Venn diagram. Weighted gene co-expression network analysis (WGCNA) was performed on the longevity- and age-related DEGs. Age-related color modules and genes were identified. However, no longevity-related modules or genes were found. The green module, with 46 age-related DEGs, was the most biologically significant to age and aging. Gene Ontology, Kyoto Encyclopedia of Genes and Genomes, and protein–protein interaction pathway analyses were conducted on these 46 DEGs, which are mainly enriched in B cell activation and receptor signaling pathways. CR2, VPREB3, MS4A1 and CCR6 were considered the most crucial candidate genes for aging.
The human lifespan is a complex phenotype influenced by genetic and environmental factors; the heritable component accounts for approximately one-fourth of the variation (Herskind et al., 1996). The appearance of long-lived people clusters in families enhances the genetic impact on extreme lifespan (Willcox et al., 2006). Thus, researchers have strived to identify genetic differences between long-lived and average-lived individuals. Numerous candidate longevity genes, such as CTGF, EGFR, FOXO3A, APOE, SIRT1, AKT1 and IGF-2R, were discovered (Deelen et al., 2013; Li et al., 2016a; Donlon et al., 2017). However, only several of them can be replicated (Broer et al., 2015; Dato et al., 2017), and different outcomes were observed in different studies (Li et al., 2016b; Lin et al., 2016a), indicating that candidate genes may not be highly specific to longevity. Longevity candidate genes are confused with age and aging as they are often involved in age-related diseases and aging-associated conditions (Budovsky et al., 2009; Tacutu et al., 2011). Genes operate in a cooperative manner in genome-wide epistatic interactions in topological networks, which can either enhance or weaken a gene’s biological function and specificity (Brown et al., 2014; Bloom et al., 2015). Network-assistant analysis of genetics and gene expression is a promising approach for identifying specific genes (Chesler et al., 2005; Hubner et al., 2005). Weighted gene co-expression network analysis (WGCNA) is an effective systems biology method that describes how genes work interactively and provides insights into the correlation between heterogeneous traits and genetics on the basis of genetic networks (Ghazalpour et al., 2006; Langfelder and Horvath, 2008); it has been widely applied in finding biomarkers and candidate genes of complex diseases (Osterhoff et al., 2014; Morrow et al., 2017). Thus, based on transcriptome data, WGCNA was applied to analyze available datasets in the Gene Expression Omnibus (GEO) database to identify specific longevity- and aging-related genes.
The expression profile dataset GSE16717 was uploaded and shared by the Leiden Longevity Study on the GEO database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16717). The study was performed on 150 individuals based on the GPL2895 platform (50 longevity: males = 26, females = 24, mean age = 93.4 years; 50 controls: males = 24, females = 26, mean age = 61.9 years; and 50 longevity offspring: males = 25, females = 25, mean age = 60.8 years). Preprocessed series matrix files (background correction, quantile normalization and probe summarization) were downloaded, and probe sets that showed no correspondence to any gene symbols were removed. When multiple probe sets corresponded to the same gene, the average expression value was calculated, and was considered the expression value of the gene.
Differentially expressed gene (DEG) identificationDEGs between longevity and control and between longevity and their offspring were identified by GEO2R online (https://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE16717). The pairs of DEGs were presented as a complement set and as an intersection set using a Venn diagram. The DEGs between longevity and control were influenced by age and genetics, whereas those between longevity and their offspring were influenced by age only as they shared the same family heritability. Thus, the complement set in longevity and control presented the longevity-related DEGs, and the overlap in the intersection set presented the age-related DEGs.
WGCNA on DEGsThe DEGs were clustered into differently colored modules by WGCNA on the basis of dissimilarity measure (1-topological overlap measure (TOM), TOM ≥ 0.15) (Kogelman and Kadarmideen, 2014). Gene significance (GS), module significance (MS) and module membership (MM) were determined. The significant modules and genes were identified according to the highest MS and GS values. In the WGCNA manual (Langfelder and Horvath, 2008), a significant absolute GS value indicates a biologically important gene, the MS value denotes the average absolute GS value for all genes in each module, and the MM value shows the correlation between the gene expression profile and module eigengene in each module.
Gene Ontology (GO) and pathway enrichment analysis of the DEGs in the most significant moduleThe DEGs in the most significant module were analyzed online by the Database for Annotation, Visualization and Integrated Discovery (https://david.ncifcrf.gov/summary.jsp) (Huang et al., 2009) for GO and pathway enrichment. P or adjusted P ≤ 0.05 and fold enrichment ≥ 10 indicate significant differences.
Protein–protein interaction (PPI) analysis of DEGsThe DEGs in the most significant module were analyzed using the STRING database to determine gene interactions at the protein level (http://string.embl.de/cgi/input.pl?UserId=vbIZutBFUXxs&sessionId=u98tgB1kBtoL&input_page_show_search=on) (Szklarczyk et al., 2017). The most interactive genes were identified as those with an interaction score ≥ 0.4 (medium confidence).
Statistical analysisAn F test was performed to compare age between groups, and a chi-square test was applied to analyze gender distribution. Benjamini–Hochberg’s procedure was applied to adjust the P value in GEO2R. P or adjusted P ≤ 0.05 indicates a significant difference.
A total of 150 individuals were recruited and classified into longevity, offspring and control groups, and Table 1 provides baseline information about these participants. Mean levels of the continuous variable and distributions of the classification variable were compared among groups. Significant differences were observed in age (longevity > offspring, longevity > control, P < 0.001), except between the offspring and control groups. No differences were observed in gender distribution.
| Variable | Longevity (N = 50) | Offspring (N = 50) | Control (N = 50) | P-value |
|---|---|---|---|---|
| Age (years)a | 93.4 ± 3.0 | 60.8 ± 6.1 | 61.9 ± 6.9 | < 0.001 |
| Males/females | 26/24 | 24/26 | 25/25 | 0.92 |
An overlapping Venn diagram of the DEGs is shown in Fig. 1. A total of 1,394 DEGs were observed between longevity and offspring groups, and another 1,262 were detected between longevity and control groups (adjusted P < 0.05), as revealed by GEO2R online. No DEG was found between control and offspring groups (adjusted P ≥ 0.05). In total, 755 DEGs in the overlap were age-related, whereas 507 DEGs in the complement set of longevity and control were longevity-related.

Overlapping Venn diagram for the DEGs between longevity and offspring and between longevity and control groups. The blue circle represents the 1,394 DEGs between the longevity and offspring groups, and the pink circle represents the 1,262 DEGs between longevity and control groups. A total of 755 age-related DEGs belonged to the interactive set, and 507 longevity-related DEGs were allocated to the complement set of longevity and control groups.
Hierarchical clustering dendrograms and gene modules partitioned by WGCNA are displayed in Fig. 2. The 755 age-related DEGs were clustered into six color modules (turquoise, blue, brown, yellow, green and gray), and the 507 longevity-related DEGs were clustered into two modules (turquoise and gray). No distinct longevity-related modules or genes were observed with poor module–trait absolute correlation, r < 0.4 (mild correlation; hierarchical clustering dendrograms not shown). Each branch in the plot corresponds to a gene, and the genes located at the branch tips display the highest gene interconnection in the module and are the key genes in the co-expression network.

Gene dendrogram and color modules for 755 age-related DEGs in the interactive set. Differently colored modules (x-axis) correspond to gene clusters. The gray module represents genes that cannot be clustered into distinct modules based on gene dissimilarity measure. The y-axis denotes the height of the dissimilarity measure (1-TOM), and each branch corresponds to a gene. A total of 755 age-related DEGs were clustered into six distinct modules: turquoise, green, blue, brown, yellow and gray.
Five distinct modules (turquoise, green, blue, brown and yellow) were related to age with P < 0.001. Table 2 shows the significant modules and the corresponding module-age correlations. For the 755 age-related DEGs, the turquoise module was positively correlated with age, whereas the blue, brown, yellow and green modules were negatively correlated with age. The green module was considered the most significant with the highest MS value (MS = 0.45, P < 0.001), indicating that it is negatively correlated with age and aging (r = −0.59, P < 0.001). The genes in the gray module could not be placed in any of the other five color modules, and were therefore excluded from further analysis. No module was related to gender. For the 507 longevity-related DEGs (not included in Table 2), only the turquoise module was identified, and it was negatively related to lifespan (MS = 0.30, P < 0.001; r = −0.38, P < 0.001).
| Module | ||||||
|---|---|---|---|---|---|---|
| turquoise | green | blue | brown | yellow | gray | |
| Gene number | 258 | 46 | 225 | 121 | 91 | 14 |
| Correlation | 0.58 | −0.59 | −0.45 | −0.49 | −0.5 | NA |
| P-value | 4×10−10 | 2×10−10 | 2×10−8 | 3×10−7 | 1×10−7 | NA |
| MS value | 0.36 | −0.45 | −0.32 | −0.29 | −0.38 | NA |
| P-value | 1×10−3 | 1×10−4 | 7×10−4 | 5×10−4 | 5×10−4 | NA |
NA, not applicable.
We constructed a scatterplot of GS (y-axis) versus MM (x-axis) for the 46 age-related DEGs in the green module (Fig. 3). According to the WGCNA manual (Langfelder and Horvath, 2008), in modules related to a trait of interest, genes with high MM often feature high GS. The scatterplot indicated that the DEGs exhibited a highly significant correlation with age and aging (r = 0.6, P = 1 × 10−5).

Scatterplot of MM versus GS for the 46 age-related DEGs in the green module. The x-axis represents MM, and the y-axis represents GS. Each green circle in the plot represents a gene.
Table 3 shows the top GO terms and enrichment pathways for the age-related DEGs in the green module with P ≤ 0.05 and fold enrichment > 10. These age-related DEGs were mainly classified into biological processes and molecular functions and, notably, were enriched in GO:0042113~B cell activation and hsa04662: B cell receptor signaling pathways.
| Category | Term | Gene count | P | Genes | Fold enrichment |
|---|---|---|---|---|---|
| GOTERM_BP | GO:0002583~regulation of antigen processing and presentation of peptide antigen | 2 | 0.03 | HLA-DOA, ABCB4 | 76 |
| GOTERM_BP | GO:0007638~mechanosensory behavior | 2 | 0.03 | CNTNAP2, STRBP | 76 |
| GOTERM_BP | GO:0042113~B cell activation | 7 | < 0.001 | LOC102723996, CR2, CCR6, MS4A1, CD79B, BANK1, IGHM | 13 |
| GOTERM_BP | GO:0002455~humoral immune response mediated by circulating immunoglobulin | 3 | 0.02 | CR2, FCER2, IGHM | 12 |
| GOTERM_BP | GO:0016064~immunoglobulin mediated immune response | 4 | 0.005 | CR2, CCR6, FCER2, IGHM | 11 |
| GOTERM_BP | GO:0019724~B cell mediated immunity | 4 | 0.006 | CR2, CCR6, FCER2, IGHM | 10 |
| GOTERM_MF | GO:0023026~MHC class II protein complex binding | 2 | 0.03 | MS4A1, HLA-DOA | 62 |
| GOTERM_MF | GO:0023023~MHC protein complex binding | 2 | 0.04 | MS4A1, HLA-DOA | 52 |
| KEGG_PATHWAY | hsa04662: B cell receptor signaling pathway | 3 | 0.01 | CR2, RASGRP3, CD79B | 17 |
| KEGG_PATHWAY | hsa04640: Hematopoietic cell lineage | 3 | 0.02 | CR2, FCER2, MS4A1 | 14 |
A PPI plot for the age-related DEGs in the green module is depicted in Fig. 4. CR2 (interaction score = 0.58), CCR6 (0.90), BANK1 (0.40), SPIB (0.48), FCER2 (0.58), VPREB3 (0.68), MS4A1 (0.44) and CD79B (0.68) were the top interactive genes, with interaction scores ≥ 0.4 (medium confidence).

PPI plot for the most interactive age-related DEGs in the green module. The circles represent genes, and the lines between circles represent the relationship network of these genes.
The top 10 DEGs of interest in the most significant green module according to the absolute GS value and MM value are listed in Table 4. Combining these data with the GO and KEGG pathway analysis, CR2, VPREB3, MS4A1 and CCR6 were identified as crucial candidate aging genes.
| Gene | GS | P-value | MM | P-value |
|---|---|---|---|---|
| CR2 | −0.64 | 6.80e-13 | 0.88 | 5.62e-34 |
| VPREB3 | −0.59 | 7.33e-11 | 0.87 | 1.60e-31 |
| ID3 | −0.57 | 5.35e-10 | 0.83 | 1.78e-25 |
| MS4A1 | −0.55 | 4.28e-09 | 0.88 | 8.49e-33 |
| TCL1A | −0.53 | 1.12e-08 | 0.85 | 1.33e-28 |
| CCR6 | −0.53 | 1.35e-08 | 0.79 | 5.97e-23 |
| OSBPL10 | −0.52 | 2.54e-08 | 0.80 | 8.54e-24 |
| USPL1 | −0.52 | 2.74e-08 | 0.83 | 1.21e-24 |
| HS3ST1 | −0.52 | 3.43e-08 | 0.87 | 2.32e-32 |
| SPIB | −0.52 | 3.59e-08 | 0.93 | 4.71e-43 |
WGCNA is a systems biological network analysis approach that identifies function connections based on gene co-expression similarities in the interactome. In this study, we performed WGCNA on a dataset to identify age- and longevity-related genes. An age-related module (green module) and 46 highly interconnected genes were found to be significantly associated with aging status, and from the high correlation between GS and MM we concluded that the green module and DEGs played important biological roles in progressive aging. By contrast, no significant longevity-related modules or genes were detected because of poor correlation (r < 0.4) according to the criterion of WGCNA. In this study, the gene expression profile was significant to aging, while no DEGs were found to be significantly related to extreme long lifespan or longevity.
We found that the age-related DEGs in the green module were significantly enriched in B cell activation and receptor signaling pathways and other immunity-related signaling pathways, thereby revealing a close relationship between aging and immune system function. Aging is a phenomenon characterized by a general decline in physiological functions (Avery et al., 2014), and its causes are generally attributed to the increased inflammatory status known as “inflammatory aging”, which promotes inflammatory mediators and weakens immune system function (Cevenini et al., 2010; Salvioli et al., 2013). Individuals with reduced immune system function are susceptible to infectious diseases and age-related diseases, such as carcinomas, neurodegeneration and cardiovascular disorders (Salvioli et al., 2013). Another typical feature of aging is a markedly decreasing B lymphocyte volume as well as changes in the relative frequencies of different B cell subsets in the peripheral blood circulation; this aging feature contributes to poor humoral immunity and a poor ability to produce high-affinity protective antibodies (Strindhall et al., 2007; Ademokun et al., 2010; Riley, 2013; Lin et al., 2016b), suggesting that these changes in the immune system cause high disease susceptibility in elderly people, in whom aging is the initial cause of diseases.
A number of genes in the green module, such as MS4A1, CD79B, BANK1, IGHM, CR2, CCR6 and RASGRP3, whose transcription products are either cell surface antigens or markers on the B cell, play important roles in regulating B cell activation and proliferation (Uchida et al., 2004; Ernst et al., 2005). In our study, these key genes displayed a descending expression trend with increasing age (GS < 0). In addition, several key genes are related to diseases when they are abnormally expressed. BANK1 is down-regulated in patients with systemic lupus erythematosus (SLE) (Dang et al., 2016), which is characterized by abnormal T and B cell responses. IGHM is down-regulated in colorectal cancer and idiopathic pulmonary fibrosis (Yang et al., 2012; Pentheroudakis et al., 2015). CCR6 is highly expressed in the B cells of SLE patients (Lee et al., 2017) and colon cancer patients (Kapur et al., 2016); RASGRP3 is highly expressed in breast cancer (Nagy et al., 2012), glioblastoma (Lee et al., 2015), prostate cancer (Zeng et al., 2014), uveal melanoma (Chen et al., 2017) and other malignant carcinomas. Overall, key genes are both age-related and disease-related and are expressed differentially relative to varying health conditions in the aging process.
Age-related key genes are highly interactive and play a central role in the progress of biological aging. CCR6 and VPREB3 are located downstream of CR2 and MS4A1; the latter two, which regulate the expression of various downstream genes and participate in crucial biological pathways (Fig. 4), are central nodes in the PPI network. Based on the GS values, MM values, PPI scores and disease pathway analysis, CR2, VPREB3, MS4A1 and CCR6 are considered the most crucial candidate aging genes, and experimental work is warranted to verify the correlation between key gene expression profiles and aging.
In conclusion, we failed to find candidate longevity genes but identified aging genes that are down-regulated with aging. As well as the transcriptome, epigenetics is a pressing issue in longevity and aging research. Future studies should explore crucial longevity biomarkers in DNA methylation, lncRNA and other fields.
We thank the researchers from the Leiden Longevity Study, who shared the dataset openly online. We also appreciate Yuancheng Chen from the Center for Bioinformatics and Genomics of Tulane University for her professional advice. This study was supported by grant number 81760577 from the Natural Science Foundation of China, the Innovation Project of Guangxi Graduate Education, and grant numbers YCBZ2012012 and 2017KY0122 from the Guangxi Education Department.