The Tohoku Journal of Experimental Medicine
Online ISSN : 1349-3329
Print ISSN : 0040-8727
ISSN-L : 0040-8727
Regular Contribution
Comparison of Insertion, Deletion, and Point Mutations in the Genomes of Human Adenovirus HAdvC-2 and SARS-CoV-2
Tetsuya Akaishi
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML

2022 Volume 258 Issue 1 Pages 23-27

Details
Abstract

Virus genome mutation profiles with insertion, deletion, and point mutations have recently been revealed to differ remarkably between viruses. In RNA viruses like human coronaviruses or influenza viruses, genome samples collected over two to three decades usually show point mutations in 10-20% of the bases, while the rate of insertion and/or deletion mutations (indels) largely depends on the virus. This study evaluates the mutation profiles of DNA viruses by comparing a recently sampled genome of human adenovirus species C type 2 (isolate SG06/HAdvC2/2016) with a genome of the same species sampled in the 1970s. It was found insertions of 23 bases at seven sites and deletions of 22 bases at nine sites. The longest indels were 6-base insertions in E2B and L4. All indels in the coding regions were in-frame mutations with base lengths in multiples of three. In the non-coding regions, the lengths of the indels ranged from 1-4 consecutive bases. Long indels with more than 10 consecutive bases, which comprise nearly half of indels in the SARS-CoV-2 genome, were absent. In other sites, the point mutation rate was approximately 0.3%, which was significantly lower than in RNA viruses. In summary, the estimated point mutation rate in human adenovirus species C type 2 (HAdvC-2) was over 10 times lower than in RNA viruses. Unlike the relatively long indels in the SARS-CoV-2 genome, the indels in HAdvC-2 were short, with 6 or fewer consecutive bases.

Introduction

The genome of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been recently reported to include several medium-to-long insertion and/or deletion mutations (indels) with 10 or more consecutive bases (Andersen et al. 2020; Akaishi 2022). The long indels in the SARS-CoV-2 genome are unevenly distributed across the genome and concentrated in the non-structural protein 3 and spike (S) genes. Among these indels, one approximately 40 consecutive bases long occurring at the S1/S2 junction has been considered a key player in the acquisition of enhanced transmissibility of the virus to humans by creating a discriminative polybasic cleavage motif (Wrobel et al. 2020; Sasaki et al. 2021; Naveca et al. 2022). Although indels may have contributed significantly to the evolutionary process of SARS-CoV-2, the characteristics of the indels, such as their length and distribution in the genome of other DNA and RNA viruses, remain generally unknown. In two RNA viruses of the same species sampled two to three decades apart, it is thought that 10-20% of bases will display point mutations and 0-5% of bases will have indels (Akaishi 2022). However, the exact rate and size of indels in DNA viruses, including the human adenovirus (HAdv), are currently unknown. Therefore, this study compared two HAdv virus genomes with the aim of clarifying the viral mutation profile and estimating the potential role of indels in the evolutionary process of DNA viruses.

Methods

Genome sequences

The sequences of the two human adenovirus species C type 2 (HAdvC-2) genomes were obtained from the GenBank database of the US National Institutes of Health (https://www.ncbi.nlm.nih.gov/genbank/). The selected isolate of the more recent HAdvC-2, SG06/HAdvC2/2016 (GenBank: MN513342.1) (Coleman et al. 2020), was sampled in Singapore in the years 2012-2015. The earlier reference sequence was sampled and combined in the 1970s (GenBank: J01917.1) (Zain et al. 1979a, b; Hérissé et al. 1981; Gingeras et al. 1982).

Point mutation rate

The point mutation rate in the SG06/HAdvC2/2016 genome was calculated by dividing the number of bases with point mutations by the total number of bases, after excluding all confirmed sites with indels. To evaluate the difference in the point mutation rates by their position in the genome, the rolling average of point mutations (± 50 bases) at each base position of the genome was calculated and displayed as a line graph of the point mutation rate across the whole genome of the virus. The point mutation rate in HAdvC-2 was then compared with SARS-CoV-2 and influenza A viruses (Akaishi 2022).

Statistical analyses

Distribution of the size of indels in HAdvC-2 or SARS-CoV-2 was described using the median and interquartile range (IQR; 25-75 percentiles). Distributions of the size of indels between the two viruses were compared by the Mann-Whitney U test. The point mutation rate and the ratio of indels to point mutations were evaluated through either a chi-square test or Fisher’s exact test according to the number of bases with each type of mutation. Statistical significance was set at p < 0.05. Statistical analyses were performed using R Statistical Software (version 4.0.5; R Core Team, Vienna, Austria).

Results

Indels in HAdvC-2

Compared with the HAdvC-2 genome from the 1970s, the genome of SG06/HAdvC2/2016 had a total of 16 sites with indels: 7 sites with insertions for a total of 23 bases and 9 sites with deletions for a total of 22 bases. The base sequences of each indel site are listed in Table 1. The longest indels were 6-base insertions in E2B and L4. All indels in the coding regions were bases in multiples of three that could avoid frameshift mutations. The length of indels in the non-coding regions ranged from 1-4 consecutive bases, and there were no indels longer than 6 consecutive bases across the entire HAdvC-2 genome. Some of the indels were associated with repeated sequences, such as 5′-cttcttcttctt-3′ (base 9,375-9,386; deletion of “ctt”), 5′-gatgatgatgat-3′ (base 16,661-16,672; deletion of “gat”), or 5′-ccaccacca-3′ (base 28,437-28,445; deletion of “cca”). The median (IQR) of the size of indels (i.e., insertion, deletion, or insertion-and-deletion) in SARS-CoV-2 was 7.5 (3-23) bases, whereas that in HAdvC-2 was 2 (2-3) bases. The distributions of the size of indels in HAdvC-2 and SARS-CoV-2 are shown in Fig. 1a. The size of indels was significantly larger in SARS-CoV-2 than in HAdvC-2 (p = 0.0020, Mann-Whitney U test). The actual sites of indels in the genomes of HAdvC-2 and SARS-CoVs are shown in Fig. 1b. The former one in HAdvC-2 genome is a 6-base in-frame insertion, and the latter one in SARS-CoV-2 genome is an in-frame insertion-and-deletion mutation with an exchange of consecutive bases with the base size of involved sequence decreased from 10 to 7 nucleotides.

Table 1.

List of indels in the genome of human adenovirus species C type 2 (HAdvC-2).

A total of 16 indel sites were confirmed in the genome of SG06/HAdvC2/2016. The reference viral genome was obtained from HAdvC-2 sampled in the 1970s (GenBank: J01917.1). All indels in the coding regions were in-frame mutations formed of multiples of three bases. All indels were short with 6 or fewer consecutive bases. There were no medium-to-large indels.

Fig. 1.

Indels and point mutation rate in the genomes of HAdvC-2 and SARS-CoV-2.

The genome-wide mutation profiles in the genomes of HAdvC-2 and SARS-CoV-2 are shown. (a) Violin plots reporting the size of each indel in the genomes of HAdvC-2 and SARS-CoV-2 are shown. The size of indels was significantly larger in SARS-CoV-2 than in HAdvC-2. (b) Examples of indels in the genomes of HAdvC-2 and SARS-CoV-2 are shown. The former one in HAdvC-2 genome is a 6-base in-frame insertion, and the latter one in the genome of SARS-CoV-2 is an in-frame insertion-and-deletion mutation with the base size of the involved sequence decreased from 10 to 7 nucleotides. (c) Genome-wide point mutation rate and distribution of indels in the HAdvC-2 genome are shown. The solid blue line graph indicates the base-position-oriented point mutation rate, which was calculated as the rolling average of the point mutations in nearby (± 50) bases at each position. Transparent blue areas show the gene regions with relatively high incidences of point mutations. The broken blue and red lines show the distributions of insertion and deletion mutations, respectively.

b, base; E, early; HAdvC-2, human adenovirus species C type 2; HAP, hexon-associated protein; indels, insertion and/or deletion mutations; L, late; nsp3, non-structural protein 3; SARS-CoV, severe acute respiratory syndrome coronavirus; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.

Point mutations in HAdvC-2

After excluding the 23 bases at the indel sites, the point mutation rate in the overall genome of SG06/HAdvC2/2016 was 0.3% (91 of 35,901 bases). As shown in Table 2, the point mutation rate did not differ significantly between the genes or between the coding and non-coding regions, and the rate was significantly lower than in the SARS-CoV-2 and H1N1 influenza A genomes (p < 0.0001 for both, per chi-square test or Fisher’s exact test). The ratio of substituted bases with indels to bases with point mutations in HAdvC-2 was not significantly different from that in SARS-CoV-2 (p = 0.2467, chi-square test: Cramer’s V = 0.014) but was significantly higher than in the H1N1 influenza A virus (p < 0.0001, Fisher’s exact test; Cramer’s V = 0.439).

The base-position-oriented rolling average of the point mutation rate (± 50 bases) across the SG06/HAdvC2/2016 genome is shown in Fig. 1c. Although the overall point mutation rate was significantly lower than in the RNA viruses, non-coding regions of HAdvC-2 had a higher incidence of point mutations and indels than the coding regions. Across the whole genome, the rolling average of point mutation rates did not surpass 5.0%, suggesting a much lower mutation frequency in DNA viruses than RNA viruses.

Table 2.

Mutation profiles in human adenovirus species C type 2 (HAdvC-2) from 2012-2015 compared with the 1970s.

Mutation profiles in RNA viruses, SARS-CoV-2 and H1N1 influenza A virus, are shown for comparison. The point mutation rates in RNA viruses were over 10 times higher than those in the double-stranded DNA virus HAdvC-2. In contrast to the SARS-CoV-2 genome, all the indels in the HAdvC-2 genome were short, with 6 or fewer consecutive bases.

*The point mutation rate was calculated after excluding sites with indels.

E, early; H, hemagglutinin; HAdvC-2, human adenovirus species C type 2; HAP, hexon-associated protein; indel, insertion and/or deletion; L, late; N, neuraminidase; SARS-CoV, severe acute respiratory syndrome coronavirus; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.

Discussion

The results of the present study clearly demonstrate a much lower point mutation rate in the genome of HAdvC-2, a double-stranded DNA virus, than in the genomes of RNA viruses, such as SARS-CoV-2 or H1N1 influenza A. Furthermore, in this study there were no medium-to-long indels with 10 or more consecutive bases across the entire HAdvC-2 genome. This implies that medium-to-long indels, which are frequently observed in the SARS-CoV-2 genome, are not a universal or ubiquitous phenomenon in other viruses and could be a rare phenomenon specific to a few viruses, including betacoronaviruses. Another notable finding of the present study is that some indels were associated with repeated sequences. Because such short tandem repeats are sparse in virus genomes, they could be hot spots for the occurrence of indels in both prokaryotes and eukaryotes, as has been previously suggested (Darvasi and Kerem 1995; McDonald et al. 2011). In particular, when the tandem repeat unit is three bases long, the resultant indels do not cause frameshift mutations, and the indels have a higher probability of avoiding removal through natural selection. Additionally, the results of the present study demonstrate that indels 3-6 consecutive bases long are, for some viruses including HAdvC-2, not rare in the natural environment. Such relatively long indels may be more prevalent than previously thought, and may play a role in the evolutionary process of these viruses (Brown 2002). Future studies elucidating the mechanisms of relatively long indels in virus genomes are certainly warranted.

The current study had several limitations. First, this study selected only HAdvC-2 to evaluate double-stranded DNA virus mutation profiles and did not evaluate other DNA viruses. Therefore, it remains uncertain whether the findings of the present study can be generalized to other DNA viruses. Another limitation was that the recently reported medium-to-long indels in the SARS-CoV-2 genome have not yet been verified, and the existence of long indels has been confirmed in only the SARS-CoV-2 and bat coronavirus RaTG13 genomes. As a result, it remains unknown whether long indels are universally observed across all betacoronaviruses.

In conclusion, the point mutation rate in HAdvC-2, a double-stranded DNA virus, was over 10 times lower than in RNA viruses, such as SARS-CoV-2 or influenza A. All indels in the HAdvC-2 genome were short, with the involvement of 6 or fewer consecutive bases, and all indels in the coding regions had in-frame insertions or in-frame deletions. Some indels occurred within the repeated sequence of a 3-base unit, and such short tandem repeat sequences may be hot spots for the occurrence of indels in HAdvC-2.

Author Contributions

T.A. conceived, analyzed data, and drafted the manuscript.

Conflict of Interest

The author declares no conflict of interest.

References
 
© 2022 Tohoku University Medical Press

This article is licensed under a Creative Commons [Attribution-NonCommercial-NoDerivatives 4.0 International] license.
This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC-BY-NC-ND 4.0). Anyone may download, reuse, copy, reprint, or distribute the article without modifications or adaptations for non-profit purposes if they cite the original authors and source properly.
https://creativecommons.org/licenses/by-nc-nd/4.0/
feedback
Top