2021 Volume 70 Issue 2 Pages 44-50
SARS-CoV-2 whole-genome sequencing of samples from COVID-19 patients is useful for informing infection control. Datasets of these genomes assembled from multiple hospitals can give critical clues to regional or national trends in infection. Herein, we report a lineage summary based on data collected from hospitals located in the Tokyo metropolitan area. We performed SARS-CoV-2 whole-genome sequencing of specimens from 198 patients with COVID-19 at 13 collaborating hospitals located in the Kanto region. Phylogenetic analysis and fingerprinting of the nucleotide substitutions were performed to differentiate and classify the viral lineages. More than 90% of the identified strains belonged to Clade 20B, which has been prevalent in European countries since March 2020. Only two lineages (B.1.1.284 and B.1.1.214) were found to be predominant in Japan. However, one sample from a COVID-19 patient admitted to a hospital in the Kanto region in November 2020 belonged to the B.1.346 lineage of Clade 20C, which has been prevalent in the western United States since November 2020. The patient had no history of overseas travel or any known contact with anyone who had travelled abroad. Consequently, the Clade 20C strain belonging to the B.1.346 lineage appeared likely to have been imported from the western United States to Japan across the strict quarantine barrier. B.1.1.284 and B.1.1.214 lineages were found to be predominant in the Kanto region, but a single case of the B.1.346 lineage of clade 20C, probably imported from the western United States, was also identified. These results illustrate that a decentralized network of hospitals offers significant advantages as a highly responsive system for monitoring regional molecular epidemiologic trends.
SARS-CoV-2 whole-genome sequencing of samples from COVID-19 patients is useful for informing infection control.1 When multiple COVID-19-positive cases occur simultaneously in a hospital, it is critical to determine whether newly diagnosed patients have nosocomial infection or community infection. For nosocomial infection, a thorough contact tracing of healthcare workers and inpatients is essential, whereas, in the case of community infection, such persistent intrahospital surveillance may not be necessary.
Datasets of whole viral genomes assembled from multiple hospitals can give critical clues to regional or national trends. The national surveillance system developed in the United Kingdom clarified that more than 1000 lineages had spread during the pre-lockdown period of high travel volumes and few restrictions on international travel.2 Recently, genomic surveillance systems have successfully identified two variants of concern, i.e., 501Y.V1 (B.1.1.7 originating in the UK) and 501Y.V2 (B.1.351 originating in South Africa), both of which spread rather quickly.3,4 In Japan, five laboratories (including our group) have deposited whole viral genome data in the public sequence database GISAID (https://www.gisaid.org/): a total of 9943 sequences have been deposited, and some results of national surveys have been summarized.5,6 These studies indicated the importance of SARS-CoV-2 genome sequencing analysis for the prevention of outbreaks during the early pandemic period (March to April 2020). SARS-CoV-2 whole-genome data was generated using specimens from 198 patients with COVID-19 admitted to 13 collaborating hospitals located in the Tokyo metropolitan area. Data from the specimens were accumulated and utilized for infection control and showed that the predominant lineages of the SARS-CoV-2 strains were limited to only two closely related strains, B.1.1.284 and B.1.1.214. This finding indicates that the quarantine system has been relatively successful. However, one strain derived from a patient with nosocomial COVID-19 infection belonged to the B.1.346 lineage, which has been prevalent in the western United States of America (USA) from winter 2020. We suspect that this episode is attributable to undetected imported infections from the USA.
The study protocol was approved by the Institutional Review Board of Keio University School of Medicine (approval number: 20200062) in association with each collaborating hospitals’ review board and was conducted according to the principles of the Declaration of Helsinki.
DNA sequencing methodWhole viral genome sequences were determined as described previously.1 Polymerase chain reaction-based amplification was performed using Artic ncov-2019 primers in addition to a number 72-mutant primer, version 3 (https://github.com/artic-network/artic-ncov2019/blob/master/primer_schemes/nCoV-2019/V3/nCoV-2019.tsv) in two multiplex reactions according to the globally accepted nCoV-2019 sequencing protocol (https://www.protocols.io/view/ncov-2019-sequencingprotocol-bbmuik6w). The sequencing library for amplicon sequencing was prepared using the Next Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA). Paired-end sequencing was performed on the MiSeq platform (Illumina, San Diego, CA, USA). The bioinformatic pipeline used in this study, the mutation calling pipeline for amplicon-based sequencing of the SARS-CoV-2 viral genome, is available at https://cmg.med.keio.ac.jp/sars-cov-2/. All single nucleotide substitutions, including non-synonymous and synonymous mutations, were annotated using ANNOVAR software and were assessed using VarSifter (https://research.nhgri.nih.gov/software/VarSifter/). The analytic protocol was described previously in our publication submitted on November 24, 2020, and is available on the preprint server medRxiv.7
Genetic clade or lineage naming in phylogenic tree analysesPhylogenetic tree analysis was performed locally using the Augur program available from Nextstrain (https://nextstrain.org/) and genome sequence data obtained in the current study as well as data available from the global database EpiCov hosted at GISAID.8 Nextclade (https://clades.nextstrain.org/) was also used to generate fingerprinting patterns to visualize the SARS-CoV-2 sequence alignments and similarities/identities among samples.
We used the international genetic clade nomenclature system defined by Nextstrain.org (https://virological.org/t/updated-nextstain-sars-cov-2-clade-naming-strategy/581) (Fig. 1). We also used the software Phylogenetic Assignment of Named Global Outbreak Lineages (Pangolin; https://cov-lineages.org/index.html) to assign viral lineages in an automatic and precise manner.9
International genetic clades defined by Nextstrain.org. The Wuhan strain was defined as 19A, with a large clade derived dendritically. The currently prevalent British mutant strain was defined as 20I (501Y.V1) and the South African strain as 20H (501Y.V2).
Whole viral genome sequencing was carried out on samples from 198 COVID-19 patients collected between March 2020 and November 2020. In total, 189 (95.5%) of the viral genomes belonged to Clade 20B of the Clade system. Ninety-eight of 189 (51.9%) were classified as B.1.1.284 (referred to as Clade 20B-T in our previous publication7) whereas 60 (31.7%) were classified as B.1.1.214. These two lineages differ by six nucleotide substitutions. The total number of samples in the GISAID database corresponding to B.1.1.284 and B.1.1.214 are 4183 and 1070, respectively, as has been previously reported.7 The vast majority of B.1.1.284 and B.1.1.214 cases were found in Japan. Outside Japan, B.1.1.284 and B.1.1.214 are extremely rare: only eight such B.1.1.284 samples have been detected (two in Thailand, two in South Korea, two in Australia, one in Singapore, and one in Hongkong). Only three B.1.1.214 cases have been detected outside of Japan (two in Australia and one in the USA). Among our 198 samples, some harboring other lineages of Clade 20B were also detected: 15 from the B.1.1.285 lineage, 10 from the B.1.1.114 lineage, 4 from the B.1.1.119 lineage, and 1 from the B.1.1.163 lineage.
The B.1.346 lineage belonging to clade 20C that likely emerged in New York10 was detected in one patient with confirmed COVID-19 infection at a hospital in the Kanto region in November 2020 (Fig. 2, B.1.346). The patient had no history of overseas travel or known contact with anyone who had travelled abroad. This strain has 18 single nucleotide substitutions compared with the original Wuhan SARS-CoV-2 sequence (ID. NC_045512.2). The functional relevance of the single nucleotide substitutions is summarized in Table 1.
Relationship between Japan and the world in terms of Clade 20C. Internationally, more than 90% of Clade 20C cases have been detected in the USA. The whole genome sequence of a sample collected from one COVID-19 positive patient at a hospital in the Kanto area at the end of November 2020 was found to be very similar to a virus strain prevalent in the western United States (a member of 20C, B.1.346), according to phylogenetic tree analysis.
Position in reference to the Wuhan strain | Reference allele | Variant allele | Protein | Amino acid change | Annotation | Note |
---|---|---|---|---|---|---|
241 | C | T | 5′UTR | NA | NA | |
1059 | C | T | NSP2 | T85I | Non-structural protein 2 | 20C signature |
3037 | C | T | NSP3 | F106F | Predicted phosphoesterase, papain-like proteinase |
20A signature |
6948 | A | C | NSP3 | N1410T | Predicted phosphoesterase, papain-like proteinase |
Common among five strains from the western USA |
12820 | A | G | NSP9 | L45L | ssRNA-binding protein | |
14408 | C | T | NSP12b | P314L | RNA-dependent RNA polymerase, post-ribosomal frameshift |
20A signature |
14587 | G | T | NSP12b | A374S | RNA-dependent RNA polymerase, post-ribosomal frameshift |
|
15324 | C | T | NSP12b | N619N | RNA-dependent RNA polymerase, post-ribosomal frameshift |
Common among five strains from the western USA |
15403 | T | C | NSP12b | R645R | RNA-dependent RNA polymerase, post-ribosomal frameshift |
Common among five strains from the western USA |
16762 | C | T | NSP13 | L176F | Helicase | |
17637 | A | G | NSP13 | K467 K | Helicase | |
17880 | A | G | NSP13 | Q548Q | Helicase | |
20762 | C | T | NSP16 | T35I | 2’-O-Ribose methyltransferase | |
23403 | A | G | S | D614G | Spike | 20A signature |
24337 | C | T | S | N925N | Spike | Common among five strains from the western USA |
25563 | G | T | ORF3a | Q57H | ORF3a protein | 20C signature |
26735 | C | T | M | Y71Y | Membrane | Common among four strains from the western USA |
28887 | C | T | N | T205I | Nucleocapsid protein |
NSP, non-structural polyprotein; ORF, open reading frame; ssRNA, single-strand RNA; NA, not assessed.
In total, 159 strains belonging to the B.1.346 lineage have been deposited in the GISAID database (see Acknowledgment Table). Most of these strains were found in the western United States. Detailed examination of the phylogenetic tree revealed that four strains were very closely related (Fig. 2). The genomic data strongly suggest that this strain was imported into Japan across the quarantine barrier. GISAID data from Japan indicate that the last clade 20C strains detected in Japan were in March and May 2020, but none were detected thereafter, except for quarantine cases. These findings support the notion that the newly identified clade 20C strain in this study was indeed imported from overseas.
Comparison of the fingerprinting patterns of the B.1.346 lineage strains detected in Japan in this study, those detected in the western United States, and a representative clade 20C strain detected in Japan in the spring of 2020 (Fig. 3) further supports the notion that the strain of B.1.346 lineage belonging to clade 20C was imported into Japan across the quarantine barrier from the western United States. There is no evidence that the newly detected clade 20C strain in the Kanto region shows altered transmissibility or virulence, unlike variants of concern such as 501Y.V1 (B.1.1.7) in the UK and 501Y.V2 (B.1.351) in South Africa.
Nextclade analysis of B.1.346 showing a finger printing diagram with past Japanese 20C strains. The B.1.346 strain found in this study in November 2020 was very similar to five strains from the western United States in terms of mutation sites. Compared with strains belonging to Clade 20C found in domestic quarantine cases around May 2020, the majority of the strains have different mutation sites, although there are some similarities.
Through lineage analysis of 198 whole viral genomes accumulated from 13 hospitals in the Kanto region, we have herein concluded the following. First, most of the strains belonged to B.1.1.284 and B.1.1.214. Second, at least one strain was imported from the USA. We could not detect the importation route of this foreign strain (B.1.346), but we guess that it likely slipped through airport quarantine and spread human-to-human.
The fact that only two strains were predominant in the Kanto region supports the notion that Japan’s quarantine system was relatively successful after the national lockdown in April and May of 2020. The observation of an extremely limited number of predominant lineages (i.e., only two) is in sharp contrast to the situation in the UK and other countries where numerous strains are prevalent.11,12 In view of the emergence of potentially virulent strains (501Y) in the UK and South Africa, a further thorough and strict quarantine system is warranted in Japan.
The implications of the detection in Japan of a viral strain prevalent in the western United States are twofold. First, no quarantine policy is perfect. If the policy is adjusted to become less strict, close follow-up genomic monitoring will be mandatory. Because of the internationally unprecedented uniformity of the domestic viral strains in Japan, detection of foreign strains may be relatively easy, as exemplified by the single patient in the current report. Second, if foreign strains do enter Japan in the near future, it may be relatively easy to pinpoint the geographic origin of the incoming strain by the number of nucleotide substitutions in the SARS-CoV-2 genome.
Overall, the viral genomic monitoring system for in-hospital infections reported herein enabled us to swiftly unravel regional and national trends. In Japan, a national centralized network system composed of the National Institute of Infectious Diseases in Tokyo, public health centers, and public health institutes was put in place for the purpose of viral genome surveillance. This report illustrates that an agile decentralized network composed of multiple hospitals mutually sharing molecular genomic data in near real-time provides robust benefits to public health under the conditions of the present COVID-19 pandemic. Molecular genomic data obtained through such a system can be swiftly reflected in the national decision-making process for public health practices, including the strictness of quarantine measures or the implementation of lockdowns in response to the detection of SARS-CoV-2 variants of concern.
We downloaded the full nucleotide sequences of the SARS-CoV-2 genomes from the GISAID database (https://www.gisaid.org/). A table of the contributors is available below (Acknowledgment Table). We have uploaded the full nucleotide sequences of our cohort to the GISAID database.
We thank all the patients and healthcare workers who have fought against COVID-19. This work was supported by the Keio Donner Project and is devoted to the late Professor Shibasaburo Kitasato, the founder of Keio University School of Medicine. We also thank SRL, Inc. Funding provided by Keio Gijuku Academic Development Funds and the Japan Agency for Medical Research Development (AMED JP20he0622043, K.K. as the Lead) was used to acquire consumables and to carry out deep sequencing of viral genomes. The cost of consumables was also supported in part by Ryoshoku-Kenkyu-kai, which aims to tackle infections disease control (M.S.). The design and data analyses of this study were performed independently of the funding agencies.
The authors have no conflicts of interest to report.