Genes & Genetic Systems
Online ISSN : 1880-5779
Print ISSN : 1341-7568
ISSN-L : 1341-7568
Full papers
Role of error catastrophe in transmission ability of virus
Naoyuki Takahata Hirotaka Sugawara
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML
Supplementary material

2022 Volume 97 Issue 5 Pages 237-246

Details
ABSTRACT

The role played by error catastrophe is explicitly taken into account in a mathematical formulation to analyze COVID-19 data. The idea is to combine the mathematical genetics formalism of the error catastrophe of mutations in virus gene loci with the standard model of epidemics, which lacks the explicit incorporation of the effect of mutation on the spreading of viruses. We apply this formalism to the case of SARS-CoV-2 virus. We assume the universality of the error catastrophe in the process of analyzing the data. This means that some basic parameter to describe the error catastrophe is independent of which group (country or city) we deal with. Concretely, we analyze Omicron variant data from South Africa and then analyze cases from Japan using the same value of the basic parameter derived in the South Africa analysis. The excellent fit between the two sets of data, one from South Africa and the other from Japan, using the common values of genetic parameters, justifies our assumption of the universality of these parameters.

INTRODUCTION

In usual analysis of virus spreading based on the standard model of epidemiology by Kermack and McKendrick (1927), we ignore the possible role played by mutations, at least in the explicit form of the model. One of the present authors published two papers (Sugawara, 2021a, 2021b) analyzing the world-wide COVID-19 situation based on the standard model, through which he learned that the standard model does not properly incorporate the role played by mutations. Because of this, some difficulty was encountered in interpreting the result of the analysis.

A simplistic assumption of the absence of mutations in the virus genome is still made even for recently developed models in which, for example, multiple pathogen strains interact with each other through cross-immunity (see Kucharski et al., 2016 for a review) or strains interfere through the removal of individuals from the susceptible pool after an acute infection (e.g., Rohani et al., 2002). Meanwhile, the importance of the random nature of pathogen mutations and the emergence of new strains have been well recognized and studied by simulation under the assumption that a host’s risk of infection is proportional to the genetic distance between the infecting phenotype and the closest phenotype in the host’s immune history (Bedford et al., 2012). Recently, Miller et al. (2022) explored an existing agent-based model for influenza A/H3N2 that incorporates mutation effects into both the transmission rate and the antigenic space in the standard model.

We present here a way to remedy the current situation for the presence of largely deleterious but occasionally advantageous antigenic mutations by introducing into the scheme of the standard model some elements showing the vital role played by mutations before, during and after the period of extensive virus spreading. When we analyze the data for variant strains of SARS-CoV-2 such as Alpha, Beta, Gamma, Delta, Omicron BA.1 and Omicron BA.2, we notice that temporal changes of the number of infected hosts show an interesting similarity among the different variant strains. In this regard, we emphasize an outstanding role played by error catastrophe (Eigen, 1971): viruses can improve their transmission ability by mutations, but it seems that there is a maximum ability they can reach due to errors during the DNA/RNA replication processes. After they reach the maximum (error threshold), both spontaneous and forced mutations (by an enzyme such as APOBEC3 (Sadeghpour et al., 2021), as was recently proposed) continue to cause errors in the copying process of the virus multiplication and thus reducing the transmission ability.

In the following sections, we first explain the biological justification of our model in more detail. We then formulate the virus mutation process in mathematical terms, and go on to apply this mathematical scheme to actual data analysis, taking infection data from Tokyo as an example. The last section is devoted to the analysis of Omicron data from South Africa and Japan. Appendices 1 and 2 deal with the issue of symbiosis and the generality in our formalism, respectively.

BIOLOGICAL JUSTIFICATION OF THE MODEL

As stated in the Introduction, analyses of the data for variant strains of SARS-CoV-2 show that temporal changes of the number of infected hosts exhibit an interesting similarity among the different variant strains. So far, no clear explanation of this phenomenon exists within the framework of mathematical epidemiology. The data also show similar decaying patterns of the infection numbers for a particular variant once they reach the maximum in various countries where different epidemic policies are adopted, including vaccination rates. These similarities especially of the decaying patterns, which are independent of the kind of variant strains or of epidemic policies, seem to be explainable by assuming that the reason is on the side of viruses concerned.

A possible model is what we call an error catastrophe model (Eigen, 1971), which claims that errors accumulated in the form of mutations make it hard for the virus multiplication process to be smoothly performed. The infection rate of a virus such as SARS-CoV-2, which has a high rate of mutation, will be strongly influenced by the errors and its decaying patterns are largely determined by them. In the case of SARS-CoV-2, which allows APOBEC3 enzymes to directionally mutate C to U, error catastrophe may be particularly important. We do not need to treat this RNA editing by APOBEC3 distinctly from the usual mutations that occur during replication (Sadeghpour et al., 2021). It simply emphasizes the importance of error catastrophe and, therefore, it is hidden in our mathematical formulation.

The genome size of SARS-CoV-2 is about 30 kb, three times larger than that of typical RNA viruses such as influenza viruses and HIV. Despite being an RNA virus, SARS-CoV-2 accumulates mutations comparatively slowly because the virus displays high-fidelity replication by expressing the 3′-5′ exoribonuclease nsp14-ExoN (Smith et al., 2013). However, if nsp14-ExoN is disrupted by self-mutagenesis or external agency, the mutation rate may increase 15-fold (reviewed by Robson et al., 2020; Thakur et al., 2022). Nearly half of the nucleotide changes in SARS-CoV-2 are C to U transitions and they occur preferentially in both 5′U/A and 3′U/A flanking sequence contexts that are favored motifs of the human APOBEC3 protein family of cytidine deaminases. As a result, SARS-CoV-2 may mutate at a rate of 3×10−4/site/year, or 9/genome/year, consistent with a mean of 5.5–9.5 nucleotide differences between variants (Simmonds, 2020; Ratcliff and Simmonds, 2021). Such abundant and context-dependent mutations likely hinder proper replication and lower the power of infection, although by unknown molecular mechanisms, and allow maladaptive amino acid changes to accumulate, leading to suicidal explosion of the virus or error catastrophe. Even so, antigenic escape can occur and challenge pandemic control efforts. This unique paradigm of SARS-CoV-2 genome evolution is not incorporated by standard models of molecular epidemiology.

In the past, efforts have been made to incorporate pathogen evolution into the framework of epidemic models (SIR or its extension); for example, see section 3 of Kucharski et al. (2016). Compared to the reports cited there, our approach is unique in its emphasis on error catastrophe.

MATHEMATICAL FORMULATION

The purpose of this section is to combine mathematical formulation of the mutation process with the standard model of mathematical epidemiology. We start with a formulation of the mutation process and properly combine it with the standard model of epidemiology. Our target space is a well-defined population of humans such as a country or a city. We concentrate on the genes that are relevant for the transmission of the virus. This means that we focus mostly, but not exclusively, on mutations in the spike proteins.

We start by defining xi to be the number of viruses that belong to the i-th variant. Here, “variant” means not just a prominent variant such as Delta or Omicron, but any population of viruses that has mutations distinguishing it from the original virus. Let ai denote the rate at which variant i reproduces (ai does not necessarily mean the infection rate of variant i), and let Qij denote the probability of a virus of variant i mutating to variant j. The rate of change of xj is then given by   

d x j dt = i a i Q ij x i .

Suppose we group one kind of variants that have higher infection rates into type X and another kind that have lower infection rates into type Y. We respectively denote yi, bi and Rij rather than xi, ai and Qij for the latter case. Then, we obtain   

j d x j dt = i,j a i ( 1- Q ij ) x i + i,j b i R ij y i
and   
j d y j dt = i,j b i ( 1- R ij ) y i + i,j a i Q ij x i .

To simplify the discussion, we assume that ai, bi, Qij, and Rij are all independent of suffix “i”. We then obtain   

dx dt =a( 1-Q ) x+bRy [3.1]
and   
dy dt =b( 1-R ) y+aQx. [3.2]

Here, we have   

x= j x j ,   y= j y j ,   Q= j Q ij ,   R= j R ij .

In the two-component vector form, Eqs. [3.1] and [3.2] become   

d dt ( x y ) =( a( 1-Q ) bR aQ b( 1-R ) ) ( x y ) Ω( x y ) . [3.3]

Here, 1+a is the average reproduction rate of variants in type X and 1+b is that for type Y. Both a and b can be positive or negative considering replication errors. We note that all the parameters in the matrix Ω may depend on time t. However, if the time dependence is small, we can use an adiabatic approximation by considering the constant parameters that will be given a small time dependence after solving the differential equation [3.1], [3.2] or [3.3].

The number of infected hosts I at time t of our target population will change according to the following formula:   

dI dt = ω x x- ω y y [3.4]
where ωx and ωy are constants. The eigenvalues of matrix Ω are given by   
det( a( 1-Q ) -ξ bR aQ b( 1-R ) -ξ ) =0 [3.5]
which gives   
ξ= 1 2 [ a( 1-Q ) +b( 1-R ) ± { a( 1-Q ) -b( 1-R ) } 2 +4abQR ]. [3.6]

In general, ξ is a complex number:   

ξ= ξ R ±i ξ I [3.7]
with   
ξ R = 1 2 { a( 1-Q ) +b( 1-R ) } [3.8a]
and   
ξ I = 1 2 [ -4abQR- { a( 1-Q ) -b( 1-R ) } 2 ]. [3.8b]

We obtain   

Ω= P -1 ( ξ R +i ξ I 0 0 ξ R -i ξ I ) P [3.9]
where P is the rows of the eigenvectors of Ω. Eq. [3.3] gives   
( x y ) = P -1 ( Aexp{( ξ R +i ξ I )dt} Bexp{( ξ R -i ξ I )dt} ) . [3.10]

The general real solution is given as   

x=Cexp( 0 t ξ R dt ) sin( 0 t ξ I dt+ k 1 ) [3.11a]
and   
y=Dexp( 0 t ξ R dt ) sin( 0 t ξ I dt+ k 2 ) [3.11b]
where C and D are constants. We may assume that the time dependence of ξR+I is not large and we ignore the contributions after the linear term in the power series expansion of the eigenvalues in terms of t for the later analysis of the data. Although a more precise treatment such as perturbation theory developed in particle physics is available, it will be explored elsewhere. We note that we have   
tan( k 2 - k 1 ) = ξ I   ξ R -a( 1-Q ) = -4abQR- { a( 1-Q ) -b( 1-R ) } 2 -a( 1-Q ) +b( 1-R ) [3.12a]
and   
D=| bQ aR |C. [3.12b]

Principle of error catastrophe

We see from Eqs. [3.11] that both x and y have an exponential factor exp( 0 t ξ R dt ) . This means (due to Eq. [3.4]) that for positive ξR, the infection number behaves wildly, oscillation becoming larger and larger in size. For vanishing ξR, we have a periodical oscillation as in a normal influenza virus infection. The error catastrophe is defined to be the case when we have ξR < 0. A condition for this to happen for time-independent ξR is given by   

ξ R = 1 2 { a( 1-Q ) +b( 1-R ) }<0. [3.13]

We assume that Eq. [3.13] is valid for SARS-CoV-2 virus, at least for x > x0 (error threshold). There will be periods when this condition is not valid, especially at the beginning of a surge of certain virus variants. The change of the infection number given in Eq. [3.4] is written as   

dI dt = ω x x- ω y y= e ξ R t [ ρsin( ξ I t+ k 1 ) -λsin( ξ I t+ k 2 ) ] [3.14a]
and   
ρ= ω x C,   λ= ω y D. [3.14b]

Here, for simplicity, we assumed the eigenvalues to be time independent. An adiabatic approximation will allow us to include moderate time dependence of the eigenvalues and it will be used in our actual analysis of the data in later sections. To see the behavior of dI/dt for a positive or a negative ξR, we plot Eq. [3.14a] in Fig. 1A and 1B for two cases.

Fig. 1.

Behavior of Eq. [3.14a] for daily cases (the ordinate) against days (t) elapsed since the beginning of infection (the abscissa). (A) ξR = 0.2, ξI = 0.1, ρ =5 , λ = −0.1, k1 = 0, k2 = 0.83. (B) ξR = −0.1, ξI = 0.1, ρ = 1, λ = 0.1, k1 = 1.0, k2 = 0. (C) ξR = 0.2 for 0 ≤ t ≤ 17 and ξR = −0.1 for 17 ≤ t ≤ 39. The other parameters are common: ξI = 0.1, ρ = 1, λ = 0.1, k1 = 1.0, k2 = 0.

The values chosen in Fig. 1 correspond to the following basic parameters:   

Q=R= 3 4 ,      b=-a+ε [3.15a]
with   
ξ R =ε( 1-Q ) = ε 4 [3.15b]
  
ξ I = 1 2 { ( a- ε 2 ) 2 - 3 ε 2 8 } [3.15c]
  
tan( k 2 - k 1 ) = ξ I ξ R - a 4 [3.15d]
and ε = 0.8 for case (A) in Fig. 1A and ε = −0.4 for case (B) in Fig. 1B. We also plot Eq. [3.14a] when ξR changes from 0.2 to −0.1 as the spreading peaks. Concretely, we choose (A) initially, and (B) immediately after the peak of spreading. We see from these plots that:

(1) There is a maximum value for the change of infection number for either case of ξR > 0 or ξR < 0

(2) Whether the positive ξR is retained (A) or it changes to a negative value (B), that is, whether dI/dt chooses the case depicted in Fig. 1C, depends on the actual data.

Incidentally, we observe that any city or country, except perhaps for China, has experienced multiple peaks in the COVID-19 daily infection number. It is likely that each peak is caused by a new variant, which is sometimes drastically new, such as the Delta or Omicron type, and sometimes just a small number of mutations different from the preceding variant.

Incorporating the standard model of epidemiology

Our method to combine the above formalism on mutation and the standard model of epidemiology, often called the SIR model as mentioned earlier, is straightforward. The SIR model is defined to be the following set of differential equations for S (susceptible), I (infected) and R (recovered):   

dS/dt=-βIS/N [3.16a]
  
dI/dt=βIS/N-γI [3.16b]
  
dR/dt=γI. [3.16c]

Here, β IS/N is the rate of decrease of the susceptible S, γ is the recovery rate and S + I + R = N. We assume that N is a constant (the total population of a body we want to discuss). On the other hand, we have Eq. [3.4], which defines the infection numbers in terms of two types of variants X and Y defined by Eqs. [3.1] and [3.2]. By regarding Eqs. [3.16a] and [3.16c] as defining S and R, the only useful equation is Eq. [3.16b]:   

dI dt =β IS N -γI. [3.17]

Assuming that most of the members of the group are susceptible, that is, S/N ≈ 1, we obtain   

dI dt =( β-γ ) I. [3.18]

Our idea is that we additively combine epidemiologic Eq. [3.18] with genetic Eq. [3.4]. Unlike Bedford et al. (2012), an implicit assumption is that the infection rate is solely determined by pathogen variants and independent of both the host immune history and possible time-dependent antigenic mutation rates. We further assume that infection spreads not only from the officially recognized infected in proportion to I, but from the officially unrecognized infected and/or partly as zoonosis. We then have the equation   

dI dt =ΓI+ ω x x- ω y y [3.19]
with   
Γ=β-γ. [3.20]

The daily infection number is defined as   

Daily infection number dI dt . [3.21]

The condition that the daily infection number is non-negative is given by dI dt 0 . This gives I ≥ (−ωxx + ωyy)/Γ, which can be satisfied as long as the accumulated daily infection is large enough. However, sometimes it may happen that the inequality is not satisfied. In this case, we can fit the data of this period separately from the other periods.

We solve linear differential Eq. [3.19] for I using the ordinary method of variation of parameters. We assume I in the form of I(t) = f(teΓt, substitute it into both sides, and obtain the result of f(t):   

dI dt = df( t ) dt Γ e Γt +ΓI=ΓI+ ω x x- ω y y    =Γ e Γt [ f( 0 ) + 0 t ( ω x x- ω y y ) e -Γτ dτ ]+ ω x x- ω y y    =Γ e Γt [ f( 0 ) +ρ{ H( t+ k 1 ξ I ) -H( k 1 ξ I ) }    -λ{ H( t+ k 2 ξ I ) -H( k 2 ξ I ) } ]    + ω x x- ω y y [3.22]
where we used Eq. [3.14a] and defined H(x) as   
H( x ) = e ( ξ R -Γ)x ( - ξ I cos[ ξ I x ]+( ξ R -Γ)sin[ ξ I x ] ) ( ξ R -Γ ) 2 + ξ I 2 . [3.23]

We note in Eq. [3.22] that the multiplicative factor f(t) reflects the interacting effect of mutations during the transmission in a population (Appendix 2). Even though the two terms in the right-hand side of Eq. [3.19] are assumed to be additive, the differential equation mixes their contributions.

ANALYSIS OF ACTUAL DATA

In this section, we analyze some data based on Eq. [3.22] with Eq. [3.23]. We see that the formula has the generic form:   

dI dt = c 1 e Γt + e ξ R t { c 2 sin( ξ I t ) + c 3 cos( ξ I t ) } [4.1]
where ci (i = 1, 2, 3) are certain constants. We note that ξR and ξI may be linear functions of t as discussed in the previous model section. We see that the first term on the right-hand side represents the contribution of non-genetic factors such as vaccination, physical distancing and PCR testing. The second term represents the genetic effect. Here and subsequently, we use these terminologies as far as the generic form is concerned. It is to be noted that the generic form merely singles out the two seemingly independent factors. Because of this, when Eq. [4.1] is used for overall data fitting, it does not allow us to assess relative contributions of genetic and non-genetic factors.

Tokyo 5th wave

We analyze the Tokyo 5th wave data to see how well our new way of analysis works, especially in comparison with the standard epidemiology model in Eqs. [3.16]. The Tokyo data on daily infection numbers in the 5th wave period are given in https://www.google.com/search?q=tokyo+data+on+covid-19&oq=tokyo+data&aqs=chrome.1.69i57j35i39j0i512l3j0i30l5.9499j0j15&sourceid=chrome&ie=UTF-8.   

Data list for Tokyo 5th wave={ 453, 619, 317, 716, 896, 830, 1008, 1359, 3177, 2195, 4498, 4989, 4377, 4392, 4227, 3168, 968, 1273, 763, 253, 299, 200 }. [4.2]

Data [4.2] show 22 daily cases in units of 5 days from June 18 to October 1, 2021, so that the entire period spans 110 days (Figs. 2, 3).

Fig. 2.

Tokyo daily infection numbers from June 18 to October 1, 2021 (ordinate) against days (t) elapsed since June 18, 2021 (abscissa).

Fig. 3.

Daily infection numbers in Tokyo (ordinate) against days (t) since June 18, 2021 (abscissa): (A) during the initial 40 days of the 5th wave expressed in Eq. [4.3], (B) during the middle 25 days expressed in Eq. [4.4], (C) during the later stage of the Tokyo 5th wave as expressed in Eq. [4.5], and (D) during the entire Tokyo 5th wave period fit by Eq. [4.9].

We fit the data with Eq. [4.1] using the nonlinear fitting of Mathematica. We found that we cannot fit the data for the entire period of the 5th wave if we retain the time-independent ξR. This is reasonable because the error threshold (ξR = 0) cannot be reached at the start of the spreading; it is achieved sometime during the spreading process. We use two strategies to fit the data:

(1) We split the data into three periods and fit the data in each period separately.

(2) We assume a simple form of time dependence for ξR. The advantage of this method is that we can calculate from the data when the error threshold is reached.

We start with the method (1). We divide the period of the Tokyo 5th wave into three periods: (A) first 40 days, (B) middle 25 days, and (C) later 45 days. The result of the fitting is as follows:

(A) period:   

dI dt =0.00018 e 0.406t -133.7 e 0.039t sin[ 0.016t ]    +419.2 e 0.039t cos[ 0.016t ]. [4.3]

The curve [4.3] is depicted in Fig. 3A. Note that ξR = 0.039 is positive, indicating that the error threshold has not yet been reached.

(B) period:   

dI dt =34.49 e -0.044t -23.68 e 0.0998t sin[ 0.0196t ]    +87.13 e 0.0998t cos[ 0.0196t ]. [4.4]

Note that ξR = 0.0998 is even larger than the value in the initial period and still below the threshold (Fig. 3B).

(C) period:   

dI dt =3550 e -0.0026t +3171 e -0.0093t sin[ 0.056t ]    -5652 e -0.0093t cos[ 0.056t ]. [4.5]

We see that ξR = −0.0093 is negative in this period, satisfying the error catastrophe condition. Eq. [4.5] is depicted in Fig. 3C.

We learned above that the value of ξR is dependent on which period we consider. This suggests that we may assume some simple form of time dependence for ξR and also for Γ in Eq. [4.1] (see the discussion following Eqs. [3.3] and [3.11], as well as Appendix 1). The form we assume is the following:   

ξ R = ξ R0 ( 1-At ) [4.6]
  
Γ= Γ 0 ( 1-Bt ) . [4.7]

We then fit the entire data with a single function:   

c 1 e Γ 0 ( 1-Bt ) t + e ξ R0 ( 1-At ) t { c 2 sin[ ξ I t ]+ c 3 cos[ ξ I t ] }. [4.8]

The result of the fitting is given as:   

dI dt =453 e 0.024( 1-0.0099t ) t +3.77 e 0.216( 1-0.0078t ) t sin[ 0.041t ]    -2.32 e 0.216( 1-0.0078t ) t cos[ 0.041t ]. [4.9]

We see from Eq. [4.9] that ξR changes from positive to negative during the 5th wave period. We can calculate the time of crossing the error threshold from:   

ξ R =0.216( 1-0.0078 t t ) =0 [4.10]
namely, tt = 128 days after the initiation of the Delta variant spreading. Although this value of tt indicates the later decay phase of the spreading, we see from Eq. [4.9] that other factors and even the non-genetic term contribute to the precise determination of the maximum position of the infection numbers. Fitting of the entire period by Eq. [4.9] is shown in Fig. 3D. The non-genetic term takes the maximum value at tn = 1 2B ≈ 51 and the genetic term does so at tg = 1 2A ≈ 64, half of tt = 128. The fitting describes clearly the battleground of virus vs. human immune system. Initially, the virus is winning, but eventually it reaches the maximum spreading power. The APOBEC3 deaminases involved in the human innate immune system (Robson et al., 2020; Simmonds, 2020; Ratcliff and Simmonds, 2021) continually force mutation of the viral RNA, which cannot avoid more and more errors so that the spreading power starts to shrink.

ANALYSIS OF DATA FROM SOUTH AFRICA AND JAPAN

We learned in the previous section that it is possible to fit the data of an entire period of a certain wave if we take into account the time dependence of ξR and Γ (infection rate β minus recovery rate γ in Eq. [3.20]). We adopt this strategy rather than dividing a period into several sub-periods.

The Omicron data were still insufficient when we were writing this paper, and the following analysis should be taken as preliminary rather than final. The Omicron variant was first found in South Africa. We therefore begin our analysis of the Omicron variant in South Africa. The data from November 29, 2021 to January 13, 2022 are given in https://www.worldometers.info/coronavirus/country/south-africa/.   

Data list for South Africa={ 2273, 4393, 8561, 15535,   16055, 16366, 10000, 6381, 13147, 19842, 22383, 19017,  17153, 37875, 13288, 20000, 26389, 24785, 20713,  16080, 15465, 8511, 15423, 21098, 21156, 13000, 14824,  5603, 3778, 6000, 9020, 12978, 11754, 9793, 6000,  3076, 8078, 11106, 9858, 9259, 7759, 4482,  2409, 4500, 6760, 5917 }. [5.1]

The data of the entire 46 cases are depicted in Fig. 4A. We fit the data with a function that is exactly the same as the one in Eq. [4.8], which is used to explain the entire period of the Tokyo 5th wave. The result of the fitting gives:   

dI dt =78.71 e 0.38( 1-0.0156t ) t +207.4 e 0.2015( 1-0.0207t ) t sin[ 0.072t ]    +4548 e 0.2015( 1-0.0207t ) t cos[ 0.072t ]. [5.2]
Fig. 4.

(A) South Africa data for daily infections (ordinate) against days (t) starting from November 29, 2021 (abscissa). (B) South Africa Omicron data (the same as (A)) and the nonlinear fitting by Eq. [5.2]. (C) Extension of (B) to a much later time.

We see from Eq. [5.2] that   

ξ R =0.2015( 1-0.0207t ) . [5.3]

Eq. [5.3] satisfies the error catastrophe condition of ξR < 0 after tt = 48 days. This value of 48 days is much smaller than the Tokyo 5th wave (presumed to be the Delta type) value of tt = 128 days, indicating that Omicron infection is more powerful and in turn it reaches the error threshold much faster than the Delta type. We show the curve described by Eq. [5.2] in Fig. 4B. We extend the nonlinear fitting in Fig. 4B to a later time as in Fig. 4C. We see from Figs. 4B, 4C that Omicron variant infection has a much steeper increase but also a sharp decrease due to the earlier arrival at the error threshold. Again, this transition happens at tg = 24, which is earlier than tt = 48. The non-genetic term takes the maximum value at tn = 32 (Figs. 4B, 4C).

Universality of the parameters ξR and ξI

Before fitting the Japan data, we discuss the nature of the parameters ξR and ξI. As shown above, the time to reach the error threshold is calculated using the time dependence of ξR. We obtained the value tt = 128 days for the Delta variant using the Tokyo data and tt = 48 days for the Omicron variant using the South Africa data. We assume that these values depend on the variants we consider, and not or very little on any other external effect such as vaccination and physical distancing. The reason is that ξR and ξI are genetic parameters and not epidemic parameters. They are more or less inherent in the RNA of the virus and can also depend on the genetically determined human immune system. The other parameters in Eq. [4.1] can depend on vaccination, physical distancing, PCR testing followed by segregation, etc.. The above argument allows us to use the values of ξR and ξI obtained from the South Africa data to analyze the Omicron data from other regions, although there may be some difference in the values of ξR and ξI due to population structure, such as how much of particular genomic segments of Neanderthals has introgressed into the DNA of the group being dealt with (Zeberg and Pääbo, 2020a, 2020b, 2021). Likewise, it is likely that virus epidemics and host adaptive responses have been limited to local populations. In the case of coronavirus, interactions between virus and human genes started more than 20,000 years ago but have left genomic signatures only in East Asians (Souilmi et al., 2021). In the following analysis of the Japan data, we use the South Africa ξR and ξI for simplicity.

The validity or the usefulness of the universality of ξR and ξI can be checked by analysis using an index such as the AIC. Along the same lines, we can also consider (calculate the AIC of) the limiting case of all the genetic parts of the parameters vanishing. These investigations are left to our future research.

Japan’s case: the 6th wave

We analyze the Japan data for the daily infection number starting from December 30, 2021 to January 20, 2022 when the Omicron variant was starting to dominate. We use the same ξR and ξI obtained in the South Africa analysis. Adding the numbering of dates in the first column of the data list, we take the daily cases for the 6th wave of Japan from https://www.worldometers.info/coronavirus//country/japan/.   

Data for 6th wave of Japan={ { 1, 466 }, { 2, 490 },  { 3, 463 }, { 4, 504 }, { 5, 600 }, { 6, 783 }, { 7, 1256 },  { 8, 2506 }, { 9, 4194 }, { 10, 5983 }, { 11, 7930 },  { 12, 8144 }, { 13, 6394 }, { 14, 9000 }, { 15, 12243 },  { 16, 17940 }, { 17, 21241 }, { 18, 22707 }, { 19, 26881 },  { 20, 22000 },  { 21, 29862 }, { 22, 39841 } }. [5.4]

We fit this with the following function, which is exactly the same as the one used in the South Africa analysis with an untested assumption that even the non-genetic parameter (B in Eq. [4.7]) is universal:   

dI dt = c 1 e Γ 0 ( 1-0.0156t ) t + c 2 e 0.2015( 1-0.0207t ) t sin[ 0.072t ]    + c 3 e 0.2015( 1-0.0207t ) t cos[ 0.072t ]. [5.5]

The result of the fitting gives   

dI dt =0.00072 e 1.158( 1-0.0156t ) t +2116 e 0.2015( 1-0.0207t ) t sin[ 0.072t ]    +496.1 e 0.2015( 1-0.0207t ) t cos[ 0.072t ]. [5.6]

Data [5.4] and Eq. [5.6] are plotted in Fig. 5A. From Fig. 5B, we see that the Japan’s daily infection number rises to approximately 100,000 and then starts decreasing. Our analysis of the Japan data from December 30, 2021 to January 20, 2022 in Data [5.4] correctly predicted the maximum daily infection number to be approximately 100,000. However, the actual speed of the decline from the peak number is much lower than predicted by the analysis. This is due to the invalidity of the untested assumption that even the non-genetic parameters are universal.

Fig. 5.

(A) Japan daily infection numbers (ordinate) and the nonlinear fitting by Eq. [5.6] against days (t) elapsed since December 30, 2021 (abscissa). (B) Extension of (A) to a later time.

Analysis of later data from Japan

We investigate the situation by analyzing the data from January 1 to March 31, 2022 (every 3 days). Here, we invalidate the untested assumption that the non-genetic parameters are universal. Using the same format as in the case of the 6th wave of Japan, we take the data for the infection number from https://www.worldometers.info/coronavirus//country/japan/. As in Data [5.4], the numbering of dates is presented in the first column of the following list, but here in units of 3 days.   

Later data of Japan={ { 1, 2505 }, { 2, 8144 }, { 3, 6829 },   { 4, 21241 }, { 5, 23351 }, { 6, 44638 }, { 7, 47176 },  { 8, 69736 }, { 9, 82159 }, { 10, 93388 } { 11, 94431 },  { 12, 7170 }, { 13, 10097 }, { 14, 80234 }, { 15, 95603 },  { 16, 77135 }, { 17, 66373 }, { 18, 71254 }, { 19, 55243 },  { 20, 70958 }, { 21, 55417 }, { 22, 62747 }, { 23, 52002 },  { 24, 50781 }, { 25, 49210 }, { 26, 29657 }, { 27, 47937 },  { 28, 32916 }, { 29, 53544 } }. [5.7]

We fit the data with the following function:   

dI d( 3τ ) = c 1 e Γ 0 ( 1-B×3τ ) 3τ + c 2 e 0.2015( 1-0.0207×3τ ) 3τ sin[ 0.072×3τ ]    +  c 3 e 0.2015( 1-0.0207×3τ ) 3τ cos[ 0.072×3τ ]. [5.8]

The meaning of both sides in Eq. [5.8] is the following:

(1) Compared to the previous fitting curve given in Eq. [5.5], we see that we use 3τ rather than t to designate the time variable (here we have data points of every 3 days as in Data [5.7] and τ ranges from 1 to 29).

(2) Otherwise, the only difference is the insertion of the parameter B in the first term. This is the most important deviation from Eq. [5.5] where the untested assumption of the non-genetic term was made.

(3) The universality of the mutation contribution is maintained: the second and third terms are the same as in Eq. [5.5].

The result of the fitting gives:   

dI d( 3τ ) =6882 e 0.2431( 1-0.0269τ ) τ +2886 e 0.6046( 1-0.0621τ ) τ sin[ 0.216τ ]    -3106 e 0.6046( 1-0.0621τ ) τ cos[ 0.216τ ]. [5.9]

This nonlinear fitting is shown in Fig. 6A. We extend the function in Eq. [5.9] to τ = 50 (t = 150 days after January 1, 2022, i.e., the end of May). It is depicted in Fig. 6B. The genetic and non-genetic terms take their respective maximum values at τg = 8.05 and τn = 18.6. In the daily unit, these correspond to tg = 24 and tn = 56, respectively. Similarly, the error threshold occurs at tt = 48 days after January 1, the same as in the case of South Africa.

Fig. 6.

(A) Nonlinear fitting of the Japan data on daily infections by Eq. [5.9] (ordinate) against τ (in units of 3 days) from January 1 to March 31 (abscissa, corresponding to the numbering of dates in the first column in Data [5.7]). (B) Extension of (A) from early January to the end of May.

Tentative summary of the analysis of the later data from Japan

(1) As one can see by comparing Eqs. [5.6] and [5.9], the difference between the previous analysis and the present one is mostly in the first term representing the contribution of non-genetic factors such as the physical distancing effect. The previous best fitting resulted in minimizing the first term, whereas the current fitting has a much larger contribution from this term.

(2) The exponential factor of the first term in the current analysis [exp{0.2431(1−0.0269τ)τ}] in Eq. [5.9] behaves in approximately the same way as that in the previous analysis [exp{1.158(1−0.0156t)t}] in Eq. [5.6], indicating that the structure of non-genetic factors was correctly taken into account in both analyses. The difference is in the overall factor of the first term (0.00072 vs. 6882).

(3) As one can see from Fig. 6B, there will be no 7th wave in Japan if our analyses are correct. Unfortunately, however, it appears that before Omicron BA. 2 completely disappeared, it had gradually been replaced by a new Omicron variant strain, BA. 5, which caused the 7th wave in Japan in the summer of 2022.

ACKNOWLEDGMENTS

One of the authors (H. S.) would like to express his sincere gratitude to Professor Toshimichi Ikemura for his valuable discussions and guidance. We thank two anonymous reviewers for their comments and constructive criticisms, and Dr. Ian Smith for his editorial supervision.

REFERENCES
 
© 2023 The Author(s).

This is an open access article distributed under the terms of the Creative Commons BY 4.0 International (Attribution) License (https://creativecommons.org/licenses/by/4.0/legalcode), which permits the unrestricted distribution, reproduction and use of the article provided the original source and authors are credited.
https://creativecommons.org/licenses/by/4.0/legalcode
feedback
Top