2015 Volume 91 Issue 8 Pages 410-422
Rheumatoid arthritis (RA) is a common autoimmune disease that results in significant morbidity. As with other complex disorders, genome-wide association studies (GWASs) have greatly contributed to the current understanding of RA etiology. In this review, we describe the genetic configuration of RA as revealed primarily through GWASs and their meta-analyses. In addition, we discuss the pathologic mechanisms of RA as suggested by the findings of genetic and functional studies of individual RA-associated genes, including HLA-DRB1, PADI4, PTPN22, CCR6 and FCRL3, and the potential use of genetic information for RA treatment in clinical practice.
Rheumatoid arthritis (RA) is one of the most common forms of autoimmune arthritis, with a prevalence of 0.3–1.1% in populations with European ancestry and 0.1–0.5% in those with Asian ancestry.1) RA is characterized by the inflammation of synovial joint tissues and the formation of rheumatoid pannus, which is capable of destroying adjacent cartilage and bone and causing subsequent joint deformity. Hyperplasia of the synovial lining cell layer is a hallmark of RA. Additional characteristic features of RA synovitis include infiltration of immune cells and neo-angiogenesis. Mononuclear cellular infiltrates are typically present in RA deep in the synovial lining layer. These infiltrates are comprised predominantly of macrophages and lymphocytes, which can form aggregates. CD4+, CD45RO+ memory T lymphocytes are common, especially in the areas between aggregates. B lymphocytes and immunoglobulin producing plasma cells are present within and between aggregates and in foci.
Sera from the majority of RA patients contain autoantibodies, such as rheumatoid factor (RF) or anti-citrullinated protein antibodies (ACPAs). The presence of these autoantibodies leads to the classification of RA as an autoimmune disease. Although RF is also found in other autoimmune diseases and immunological conditions, such as chronic infection and inflammation, the presence of ACPAs is highly specific, an indication that autoimmunity to citrullinated proteins is a key factor of RA pathogenesis.2) However, approximately 20% of RA patients lack these autoantibodies, suggesting etiologic heterogeneity of the disease. In clinical practice, the development of biologics that target inflammatory cytokines or molecules on the immune cells has dramatically improved the outcome of RA, although a certain proportion of patients do not respond to these therapies. The limitations of current RA therapies reveal the need for further investigation of disease pathogeneses and identification of new therapeutic targets.
The relative risks of RA in monozygotic twins (λ MZ) and siblings of affected individuals (λ sib) are 12 to 62 and 2 to 17, respectively.3) Consequently, the heritability of RA is estimated to be approximately 60%.4) Therefore, there is strong evidence regarding the existence of genetic factors that increase susceptibility to RA. Indeed, the human leucocyte antigen (HLA) region, the major histocompatibility complex (MHC) in humans, was initially suggested as an RA-associated locus in 1976. The HLA complex allows the immune system to distinguish between self and non-self (foreign) entities. Nonetheless, since HLA-DRB1 was identified as an RA-associated molecule in 1986, several linkage studies have also demonstrated that non-MHC genes are also associated with RA.5),6)
Serological studies revealed that the frequency of the serotypes of HLA-DR4, one of HLA class II genes, is higher in RA patients compared with controls.7) Other serotypes, such as DR1, are also associated with increased risk for RA, although the increase in risk is moderate compared with that of DR4. The sequencing of HLA-DRB1, which encodes the polymorphic β-chain of the DR molecule, revealed that RA risk HLA alleles differ among ethnic groups. In populations of European ancestry, HLA-DRB1*04:01, *04:04, and *01:01 are the most frequent RA risk alleles, whereas in East Asian populations, HLA-DRB1*04:05 and *01:01 are the most frequent RA risk alleles.1) In addition, several subtypes of the DR4 allele, such as *04:02 and *04:03 appear to confer protection against the disease. These observations led to the widely accepted hypothesis that a conserved amino acid sequence (i.e., QKRAA/QRRAA/RRRAA) spanning residues 70–74 in the third hypervariable region of the β chain (which is referred to as a “shared epitope [SE]”) is associated with RA susceptibility.8)
The importance of ACPAs in RA has become evident over the last decade. A strong association between SE alleles and ACPAs in RA patients has been found in several populations,9)–11) suggesting that DR molecules encoded by SE alleles are actually involved in the presentation of citrullinated peptides to T cells. In fact, a study with human DR4–transgenic mice reported that the presence of citrullinated peptide leads to increased affinity to DR molecules and subsequent activation of CD4 T cells.12) On the other hand, the association between pathology and citrullinated autoantigens is poorly defined because clinical laboratory testing of sera from RA patients is dependent on an artificial cyclic-citrullinated peptide that could react with multiple citrullinated self-proteins. However, highly relevant target molecules such as fibrinogen, α-enolase, vimentin, immunoglobulin binding protein (BiP), and type II collagen, are expressed in the synovial joint tissues.13) Actually, citrullinated vimentin and aggrican-specific CD4 T cells in the peripheral blood HLA-DR4 (*04:01)-positive RA and healthy individuals were observed.
The primary autoantigens could potentially differ among individuals. Patient serum samples can be expected to exhibit spreading of these epitopes, with an increase in the recognition of citrullinated antigens, before the onset of RA.14) Differences in antibody profiles among patients could depend on multiple genetic and environmental factors. Cigarette smoking is an example of an environmental factor that substantially increases the risk of ACPA appearance, and gene-environment interactions between the HLA-DRB1 SE allele and smoking have been reported.15),16) The combination of smoking and genetic factors, including HLA-DRB1, may determine the specificity of ACPAs in RA patients.17)
Significant RA risk of HLA-DRB1 SE alleles has been confirmed in multiple ethnic populations, and the SE hypothesis has been generally accepted for a few decades. However, several attempts have been made to reclassify HLA-DRB1 alleles that can more precisely predict RA risk than the SE hypothesis. Two recent studies clarified that the amino acids at residues HLA-DRb1 11 and 13, rather than the classical SE alleles at residues 70–74, are also independently associated with RA, which may explain the higher risk associated with DR4 (*04:01/*04:04/*04:05) compared with DR1 (*01:01).18)
Classical association of RA with SE alleles has been confirmed in several studies using ACPA-positive RA and controls. However, susceptibility SE alleles have unequal strength on genotypic risk. The group of alleles called DR4(*04SE) seems to be the strongest and DRB1*07, HLA-DRB1*08, HLA-DRB1*11, HLA-DRB1*13, HLA-DRB1*03 are more often low risk than high risk or neutral in European population. The association between HLA-DRB1 and ACPA-negative RA has not been extensively studied due to the high prevalence of APCA-positive RA. HLA-DRB*03:01, which is not the SE allele, is reportedly a risk allele for ACPA-negative RA in European populations, suggesting that the contribution of HLA-DRB1 alleles is distinctly different in ACPA-negative RA.19),20) In fact, a recent study of ACPA-negative patients statistically adjusted for the clinical heterogeneity of ACPA-negative RA identified two independent association signals in HLA-DRB1 and HLA-B gene products: serine 11 (encoded by DRB1*03) and aspartate 9 (encoded by HLA-B*08), providing additional evidence that ACPA-positive and ACPA-negative RA are genetically distinct.21)
GWASs have been performed for many years following a prototype that was first performed in Japan and covered only gene encoding regions (no intergenic spaces) (Fig. 1).22) In its current form, ∼1 million single-nucleotide polymorphisms (SNPs) are genotyped for affected patients (cases) and controls. GWASs performed using commercially available DNA array technology have led to the identification of RA loci with a larger effect size than in previous linkage-based studies. These individual GWASs identified a number of RA-susceptibility loci, including PADI4,23) PTPN22,24) TNFAIP3,25) TRAF1/C5,26) REL,27) CCR628) and FCRL3.29) However, each of these GWASs lacked sufficient statistical power to detect loci with moderate effect sizes. To overcome this limitation, meta-analyses of GWASs of both European and Asian populations have been conducted. These studies have increased the number of known risk loci.30),31)
RA susceptible genes identified by different methodological categories.
One of the largest meta-analyses of autoimmune disease GWASs was completed in 2014. The collaborative effort produced a multi-ethnic meta-analysis, using more than 100,000 subjects (29,880 RA cases and 73,758 controls) of European or Asian ancestry, which identified 101 RA risk loci.32) A systematic evaluation of the overlap between RA risk genes and the target genes of drugs approved for RA treatment and revealed a significant network of connections (Fig. 2). Empirically, these findings suggest that disease-associated genes are promising resources for drug development. These results also highlight the potential of human genetics not only to identify novel disease susceptibility loci, but to also contribute to the discovery of novel drugs.33)
Relative enrichment of RA risk genes in the targets of RA drugs. RA risk genes and the genes in protein-protein interaction (PPI) are significantly enriched in overlap with target genes of approved RA treatment drugs, compared to overlap with target genes of existing drugs for all diseases. Representative connections between RA risk genes and targets of RA treatment drugs are labeled.
Although a GWAS can identify disease risk loci, it cannot directly identify the responsible genes or disease-causing variants. The initial findings of GWAS only imply that a disease-causing variant is within the linkage disequilibrium of the identified polymorphism. The disease-causing variants could affect the function of the responsible genes by introducing stop codons or frame-shift mutations, changing the amino acid sequence, affecting alternative splicing, or regulating the level of transcript expression. Of almost 100 risk-associated SNPs in non-HLA RA risk loci, 16% are in linkage disequilibrium with missense SNPs, indicating that the majority of causal variants in the risk loci could affect splicing or the level of gene expression. In fact, 44 out of 100 RA-risk SNPs were found in cis-acting expression quantitative trait loci (cis-eQTL) identified in peripheral blood mononuclear cells.34) This indicates that majority of disease-causing variants in the risk loci affect the expression level of genes in cis. Similar observations have been reported for other autoimmune diseases.
As described above, meta-analyses of GWASs have identified more than 100 non-HLA RA risk loci. Although the effect of each individual locus is moderate, detailed analyses of an individual locus to identify a disease-causing variant and its effect on the relevant gene are expected to enhance our understanding of the disease (Fig. 3). Here, we discuss the examples of RA risk genes and their role in the pathogenesis of RA.
Genetic factors involving rheumatoid arthritis. RA risk genes are presented in italic type. Genes are placed in cells where their functions in the disease pathogenesis are investigated by vitro or in vivo studies. Mϕ: macrophages, Th1: T helper 1 cells, Th17: T helper 17 cells, Treg: regulatory T-cells, ACPA: Anti-cyclic citrullinated peptide antibody.
In 2003, we reported peptidylariginine deiminase type 4 (PADI4) as a susceptibility gene for RA in a Japanese population.23) PADI4 is a member of the PADI gene family that encodes an enzyme which converts arginine residue (peptidylarginine) to citrulline residue (peptidylcitrulline) in a posttranslational modification. PADI4 is highly expressed in bone marrow, macrophages, neutrophils and monocytes.35),36) Peptidylcitrulline is an important molecule in RA, because it is a target antigen of ACPA and only PADs (translated protein from PADI genes) can provide peptidylcitrullines, via modification of protein substrates. Through in vitro assays, we found that transcripts of the risk haplotype of PADI4 are more stable than those of the non-risk haplotype, suggesting that increased expression and function of PAD4 could increase the risk of RA.
The association of PADI4 variants with RA susceptibility has been replicated in Asian populations.37)–39) However, it has not been consistently replicated in European populations,40) although recent meta-analyses suggest PADI4 is also a risk allele in European populations. This suggests ethnic differences in the susceptibility of PADI4. One reason for the lack of definite association of PADI4 with RA in European populations may be related to gene-environmental interaction specific to Asian populations. Cigarette smoking is one of candidates, as smoking rate in Asian males is notably higher than in European males. Direct evidence indicating a link between PADI4 and smoking was presented based on the finding of citrullinated peptides in bronchoalveolar lavage cells and increased expression of PADI enzymes in smokers but not in non-smokers.41) In addition, the gene-environment interaction between PADI4 polymorphisms and smoking has been observed in both European and Asian populations,42),43) in which the effect size (odds ratio) of PADI4 polymorphisms on disease risk has shown to be prominently higher in male smokers. Altogether, the disparity in the contribution of PADI4 to RA between Asians and Europeans might be caused by the difference in the prevalence of smoking in these ethnic groups.
Of the RA-associated common variants outside the HLA region that have been identified by both conventional association studies and GWAS, a missense variant of the protein tyrosine phosphatase nonreceptor 22 (PTPN22) gene exerts the strongest effect.24),40) This missense variant (PTPN22 R620W) has been reported to be associated with over 20 different autoimmune diseases in European populations, including systemic lupus erythematosus (SLE), type 1 diabetes, and Graves disease.44) On the other hand, this variant is very rare or is not polymorphic in Asian and African populations45),46) and provides another example of genetic heterogeneity among populations. PTPN22 encodes lymphoid tyrosine phosphatase (LYP), which dephosphorylates the phosphotyrosine residues of target proteins in lymphocytes. The disease-associated variant exchanges the arginine residue for tryptophan at position 620. In vitro expression of the R620W risk allele leads to interference of the physical association between LYP and c-Src kinase (CSK), resulting in increased LYP activity.
In 2010, chemokine (C–C motif) receptor 6 (CCR6) was identified as an RA-associated gene in Asian and European populations.28),30) In particular, rs3093024 in the CCR6 gene was the second largest genetic risk allele in a Japanese RA population. A dinucleotide polymorphism (DNP) in the 5′-flanking region influences the binding of nuclear proteins and enhances the transcriptional activity of CCR6. The risk allele exhibits greater enhancing activity and the level of CCR6 transcription is higher in cells with the risk genotype. This variant, CCR6DNP, is also associated with the positive status of IL-17A in the serum of RA patients, suggesting that CCR6DNP influences the activity of Th17 cells. In an animal model, Ccr6+ Th17 cells play a role in the pathogenesis of arthritis.47) Although these data strongly support the hypothesis that Th17 cells are involved in RA, an association between the CCR6 locus and disease development has only been confirmed for Crohn’s and Basedow’s diseases and not for other Th17-related diseases, such as psoriasis and multiple sclerosis, in which a STAT3 variant increases the risk.
Fc receptor-like 3 (FCRL3) is a member of the FCRL gene family (FCRL1-6) that exhibits a high structural homology to classical Fc receptors. A regulatory SNP in FCRL3 is associated with susceptibility for RA, autoimmune thyroid disease and SLE in a Japanese population.48) This association was replicated with Japanese populations in both an independent RA cohort49) and studies using patients with additional autoimmune disorders such as autoimmune pancreatitis and chronic graft-versus-host disease. As association between FCRL3 and Grave’s disease has also been reproduced in European populations,50) the FCRL3 polymorphism appears to be a common risk factor for Asian and European populations with Grave’s disease. In contrast, inconsistent results have been shown for RA patients of European ancestry. The lack of association in RA patients from several different European populations may reflect a relatively lower contribution of FCRL3 polymorphisms in European populations than in Asian populations, as was observed with the PADI4 polymorphism, and signal that each individual study may have require a greater statistical power to detect an association.
FCRL expression is regulated during B-cell differentiation. In particular, germinal center B-cells express high levels of FCRL3. The cytosolic portion of FCRL3 contains four tyrosine residues that potentially function as an immunoreceptor tyrosine-based activation motif (ITAM) or as an immunoreceptor tyrosine-based inhibitory motif (ITIM). An in vitro study using murine Fc γ RIIB/human FCRL3 chimeric protein revealed that FCRL3 potentially inhibits BCR-mediated signalling through the activity of SH-2 domain-containing phosphatases SHIP, SHP-1, and SHP-2, which are recruited to the ITIMs of FCRL3.
The responsible SNP is located in the promoter region of FCRL3 at −169 bp from the transcription initiation site. This SNP has been shown to alter the binding affinity of NF-κB and regulate gene expression. High levels of FCRL3 expression were observed on B-lymphocytes carrying the risk allele, indicating that a quantitative gain-of-function in FCRL3 leads to disease onset. The potential inhibitory activity of FCRL3 on BCR-signalling, augmented by inhibition of BCR signalling in individuals with the disease susceptible genotype, may lead to autoimmunity.
GWAS has proven to be a powerful tool in identifying risk loci in common diseases.51) However, it is increasingly obvious that common variants explain only a small proportion of the heritability of these diseases. Thus, the remaining heritability might be explained by the effects of rare variants, which are usually defined as minor allele frequencies <1%. The signals from rare variants are difficult to detect in a conventional GWAS because genetic markers used in this type of study are mainly common variants. However, the development of next-generation sequencing technologies has enabled re-sequencing of the entire genome.52) The contribution of each individual variant to disease development would be potentially higher for a rare missense variant than a common missense variant, fascinating the sequencing of protein-coding regions. In a recent study attempting to determine the roles of rare variants in RA, deep exon sequencing of 25 biological candidate genes from GWAS-identified loci resulted in the identification of an accumulation of missense variants in the IL2RA and IL2RB genes.53) A more comprehensive approach involves whole-exome sequencing, which decodes all protein-coding genes. However, to date, no study has succeeded in identifying RA-associated rare variants using whole-exome sequencing. This might be due to insufficient statistical power, but also could come from the fact that rare variants do not explain a substantial part of heritability of common diseases.
The use of genetic data is essential because RA is a very heterogeneous disease with variable outcomes that require different treatment regimes. The heterogeneity can be explained, at least in part, by genetic factors. Accordingly, the specific combination of genetic factors in an individual can determine the outcome of the disease. In this regard, GWAS data can be used to predict the disease phenotype of an individual. Phenotype prediction has been largely investigated in RA with respect to disease severity and drug response.
In fact, the nature of progressive joint damage in RA varies among individuals. Patients with more rapid progression of joint damage apparently need more extensive therapy, such as the early use of biologics. Disease severity can be monitored by scoring the degree of joint damage using radiographic imaging. Regression analyses can be performed to test the associations between changes in radiologic scores and variant genotypes. The most extensively investigated gene so far is HLA-DRB1, which appears to have the strongest effect on disease severity.54) Potential associations between various candidate genes and disease severity have also been reported.55)–57) A recent analysis involving a Japanese cohort demonstrated that polymorphisms in PADI4 are associated with radiographic progression of RA.58) Furthermore, a GWAS on the radiological progression rate in autoantibody-positive RA patients recently identified an association in a SNP in the SPAG16 gene, which is expressed in the synovial tissues of RA patients.59) Although these pieces of evidence indicate that genetic variants can influence the severity of RA, individual alleles of these genes could not sufficiently predict disease severity for use in clinical practice.
Another important clinical issue with genetic information is individual drug response. The advances in biologic therapies such as treatment with anti-TNF antibodies have dramatically improved the treatment of RA. Although a substantial proportion of patients (20∼40%) will not respond to a certain biologic therapy, some patients who do not respond to one biologic therapy (e.g., anti-TNF antibodies) may respond to another (e.g., anti-IL-6R antibodies). Accordingly, if the response to a certain biologic agent can be predicted using genetic information, unnecessary cost and potential side effects can be reduced. Data from several GWASs examining the response to anti-TNF antibody therapy have suggested associations between drug response and genes involved in signaling, including the CD84 locus.60),61) However, similar to the prediction of disease severity, individual loci cannot fully predict the drug response of an individual. So far, the results obtained for predictions of disease severity and drug response clearly indicate that single genetic factors are not powerful predictors of clinical phenotype, and that combinatorial approaches using multiple genetic factors and possible environmental information must be established.
Kazuhiko Yamamoto graduated from the University of Tokyo School of Medicine in 1977.After finishing his training and residency in internal medicine and rheumatology, he conducted basic research in the laboratory of Prof. T. Tada in Tokyo and Prof. G. Haemmerling in Heidelberg. He then started his own research projects and also served as clinical rheumatologist in the University of Tokyo Hospital. He then moved to St. Marianna University School of Medicine as an associate professor and to Medical Institute of Bioregulation, Kyushu University as a professor. He is now a professor and chairman of Department of Allergy and Rheumatology, the University of Tokyo Graduate School of Medicine. He is also serving as a team leader of Laboratory of Autoimmune Diseases, Center for Integrative Medical Sciences, RIKEN. Dr. Yamamoto’s research interests include characterization of newly identified regulatory T cells and genetic analyses of rheumatoid arthritis and other autoimmune diseases.
Yukinori Okada was born in Shizuoka in 1980. He graduated the University of Tokyo in 2005, and received Ph.D. from the University of Tokyo in 2011. He worked as a research fellow at the Brigham and Women’s Hospital at Harvard Medical School and the Broad institute. From 2013, he is a Tenure-track Junior Associate Professor at Tokyo Medical and Dental University. He has multiple backgrounds as a rheumatologist, a statistician, and a bioinformatician. His research theme is an elucidation of mechanism in which genetic variants of the populations affect human complex diseasess. Through active collaborative partnership among the researchers of the genetics, Dr. Okada enormously conducted genome-wide association studies and identified a number of novel risk genetic loci. Dr. Okada also has experiences for analyzing genetic polymorphisms in the major histocompatibility complex (MHC) region and HLA genes, which have substantial impacts on genetics of human diseases. His recent research interests include development of novel analytic methods of genetics and bioinformatics that integrate Big Data obtained by the latest high-throughput technologies, such as next generation sequencing. Application of his methods contributed to novel drug discovery, disease biomarker identifications, and personalized medicine (Okada Y. et al. Nature 2014, Nature Genetics 2015). He received the President award of the University of Tokyo in 2012, the 49th Baelz award (second) in 2012, and the Young Scientists’ Prize, the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology in 2015.
Akari Suzuki was born in Kitakyushu in 1972. She graduated from Kyushu University in 1994. She received Ph.D. degree in 2001 in Kyushu University under the supervision of Prof. Kenshi Hayashi. From 2001, she moved to RIKEN, where Prof. Kazuhiko Yamamoto was a leader in the Laboratory for Rheumatic Disaseses in SNP Research Center. She principally focusing on the identification of genetic risk factors using case-control association study and has contributed greatly to our understanding of causative genes for rheumatoid arthritis. Peptidylarginine deiminase type4 that she identified in 2003 is known as one of the genetic risk factors of rheumatoid arthritis (RA). She also indicated that deficiency of PADI4 gene in mice suppressed arthritis score of RA model mice. Her recent research theme is enhancing understanding of physiological disease mechanisms by integrating GWAS and RNA-seq gene expression data for functional characterization of disease-causative variants. She was awarded the 49th Baelz award (second) in 2012.
Yuta Kochi was born in Regina, Canada, and raised in Yokohama. He graduated the University of Tokyo in 1999. After the residency programs in internal medicine and rheumatology at the University of Tokyo Hospital, he received Ph.D. from the University of Tokyo in 2005. Currently, he is a Deputy Team Leader of Laboratory for Autoimmune Diseases at Center for Integrative Medical Sciences, RIKEN. His area of interest is genetics of autoimmune diseases and their pathological mechanisms. He received the Young Scientists’ Prize, the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology in 2014. He is also an ancient history enthusiast, enjoying visiting ancient monuments around the world.