2022 Volume 98 Issue 2 Pages 87-92
A year and a half has passed since the outbreak of the COVID-19 pandemic. Mathematical models to predict infection are expected and many studies have been conducted. In this study, a new interpretation was created that could reproduce the daily positive cases in Tokyo using only a simple SIR model. In addition, the data on the ratio of transfer to delta variants could also be simulated. It is anticipated that this interpretation will be a basis for the development of forecasting methods.
Over the past year and a half, COVID-19 has caused significant damage to the health of people and economic activities around the world.1),2) As a countermeasure, technology to forecast the spread of infection using mathematical models is indispensable. The SIR model (susceptible, infected, recovered) has been proposed as a mathematical approach,3)–5) and further development, such as the SEIR model, has been advanced.6)–8)
Despite the simplicity and effectiveness of SIR-based models in predicting a variety of other infectious diseases, communities in many countries have reported inconformity in their applicability to COVID-19.9)–15) Therefore, it is necessary to further improve the SIR model and to develop a prediction tool for common infectious diseases, including COVID-19.
In this study, the applicability of the SIR model was scrutinized using data from newly positive individuals in Tokyo. A noteworthy point of view is the condition that the system in which the model holds is completely mixed.16)
In the basic SIR model, differential equations of the susceptible fraction (S(t)) infected fraction (I(t)) and recovered fraction (R(t)) for each time are shown as follows:
\begin{equation} \frac{dS}{dt} = -\beta \mathit{SI}, \end{equation} | [1] |
\begin{equation} \frac{dI}{dt} = \beta \mathit{SI} - \gamma I, \end{equation} | [2] |
\begin{equation} \frac{dR}{dt} = \gamma I, \end{equation} | [3] |
\begin{equation} S + I + R = 1. \end{equation} | [4] |
R0 is the basic reproduction number, as follows:
\begin{equation} \text{R}0 = \frac{\beta}{\gamma}. \end{equation} | [5] |
Furthermore, the fraction of new positives (daily positive cases) is denoted by P(t), and since it is the reduction of the susceptible fraction per day, it can be obtained by the following equation:
\begin{equation} P = -\frac{dS}{dt} = \beta \mathit{SI}. \end{equation} | [6] |
\begin{equation} S = \eta \exp \left(-\frac{\beta}{\gamma}R\right), \end{equation} | [7] |
\begin{equation} \eta = 1 - I(0). \end{equation} | [8] |
If the spread of infection starts with one person, the initial condition (I(0)) in a fractional representation requires:
\begin{equation} I(0) = \frac{1}{N}. \end{equation} | [9] |
An important point is found in Eq. [6]. It is that the number of new positive cases (P) is proportional to the number of infected (I). The fact that the proportionality coefficient (βS) has the same value around all infected persons is an approximation using a constant (β) and an average ratio of the susceptibility. In the region where this approximation using the average value (mean approximation) holds, the calculated values of the theory should correspond closely to the measured values. Furthermore, since the range of this region is not infinite, it must be closed at the boundary of a region with a lower average infection rate. If, on the other hand, the average value of the external region is high, it will be one of the fluctuations in the fast and large external spread of infection. The same argument has to be repeated for the larger region. Within a closed region, there should be a saturation effect as the spread of infection progresses. In the mathematical model, this saturation effect is expressed by the proportionality coefficient (S).
In order to determine the population size of the region, it is necessary to determine the characteristics of such a closed system from real data. The characteristics are as follows: exponential rising due to a constant infection rate and exponential falling due to a saturation effect. Each closed region with these characteristics is a kind of “subcommunity”, and its population can be estimated from the mean approximation model. If the mean approximation is adequate, this indicates that there is sufficient mixing in the region. Next, how far the region of sufficient mixing extends is important information to be acquired, which is to be inferred from actual data. Therefore, applying the SIR model to an unreasonably large system makes it difficult to reconcile the shapes, and it is worthwhile only in a regional range where the recovered and infected are completely mixed. It is the condition of “Complete Mixing”16) that the SIR model requires. We define the people included in this complete mixed region as a “Basic Community”.
Eventually, a larger population region should be an aggregation of these perfectly mixed subcommunities. To be precise, it should be possible to express the shape of the data over a long period of time in large population regions by a linear combination of functions of the SIR model solution. This method allows us to extract basic information about the community that is required to adapt the mathematical model to the data of a new positive case: the size of the population (N), the number of communities present at the same time (m), the infection rate (β), and the time when the infection started (1st day).
According to the above concept, the actual number of daily positive cases in the basic community is represented as
\begin{equation} \psi (t,N,\beta ,\eta) = N \cdot P(t,\beta ,\eta), \end{equation} | [10] |
Figure 1 shows the time variation of the number of I(t) in each parameter. As the population of the basic community increases, the number of infected people increases proportionally, and at R0 = 2.7, 26,000 infected people (hospitalized patients) are estimated for a total community of 100,000 people. For R0 = 2.7, 2.0 and 1.5, there is a steep slope in the spread of infection after about 2, 3 and 6 months, respectively.
Time variation of the number of infected people for each R0, N and γ = 0.1.
Figure 2 shows a profile of new positive patients. The maximum number of new positive cases in a community with a population of 100,000 at R0 = 2.7, 2.0, 1.5 is 3340, 1750 and 660, respectively. Corresponding to each parameter, the peak position of P is shifted by about 8 days earlier than I.
Daily positive cases (P) for each parameter.
The solution (P) of the SIR model for the i-th basic community is expressed by the following function:
\begin{equation} \psi_{i} = \psi (t_{i},N_{i},\text{R}0_{i},\eta_{i}), \end{equation} | [11] |
\begin{equation} \varPsi = \sum_{i}\psi_{i}. \end{equation} | [12] |
Figure 3 shows the data for daily positive cases in Tokyo (available from Tokyo Metropolitan Government’s new coronavirus countermeasure website (https://stopcovid19.metro.tokyo.lg.jp/, in Japanese and English) and NHK website (https://www3.nhk.or.jp/news/special/coronavirus/data/pref/tokyo.html, in Japanese)) and the theoretical curve Ψ that reproduces the data. Each parameter that makes up the theoretical curve (Ψ) is shown in Table 1. As a result, Fig. 3 shows a high degree of agreement between the data for Tokyo and the calculated values of the summed SIR model solution.
Daily positive cases in Tokyo and theoretical curve Ψ. The SIR solution (ψi) of the basic community used to combine Ψ is shown in the figure.
Solution No. |
1st day | β (R0 = β/γ) |
N | m | Remarks Wave order, variant type |
---|---|---|---|---|---|
ψ1 | 2020/2/5 | 0.22 | 6000 | 1 | 1st |
ψ2 | 2020/4/24 | 0.17 | 5000 | 5 | 2nd |
ψ3 | 2020/7/20 | 0.21 | 5000 | 1 | none |
ψ4 | 2020/8/10 | 0.16 | 5000 | 9 | 3rd |
ψ5 | 2020/11/28 | 0.31 | 8000 | 4 | 3rd |
ψ6 | 2021/1/11 | 0.18 | 20000 | 3 | 4th |
ψ7 | 2021/5/6 | 0.2 | 6500 | 5 | 5th |
ψ8 | 2021/5/28 | 0.235 | 40000 | 4 | 5th, δ (L452R) |
In Table 1, the second category “1st day” means the first day of the outbreak, and m is the number of “basic communities” that are simultaneously spreading the infection. The notable point is that the first day of the outbreak is during the peak period of the previous expansion wave. At this time, it indicates that the buds that will cause the spread of infection a few months later have metastasized to other basic communities. In the table, solutions ψ3 and ψ4 are not identified as large waves in Japan.
The solutions (P) of ψ7 and ψ8 in Table 1 are the two spreads of infection included in the fifth wave. Using the value of R0 in the table and the size of the basic community, the ratio of the delta mutant strains issued by the Tokyo Metropolitan Government (data available from https://www.bousai.metro.tokyo.lg.jp/_res/projects/default_project/_page_/001/015/548/63/20210916_10.pdf (in Japanese)) was reproduced as shown in Fig. 4. The good agreement between Tokyo’s data and our calculations in the time course of the transition to the delta mutant strain suggests that we were able to accurately separate the two infection peaks within the larger peak of the fifth wave.
Proportion of positive cases of the L452R(delta) variant compared to other strains. The staircase data are the actual values of the one-week average in Tokyo. The solid line shows the ratio of the calculated values of the other strains (ψ7) to the delta strains (ψ8).
Based on the Japanese Ministry of Health, Labour and Welfare’s report (Guideline for Medical Treatment of New Coronavirus Infection (COVID-19), edition 5.3 (2021), available from https://www.mhlw.go.jp/content/000825966.pdf (in Japanese)) that the infectivity of COVID-19 lasts for 7 to 10 days, the recovery rate was set to γ = 0.1 (1/day) in the calculation of these SIR model solutions.
The reported targets of analysis so far seem to be large and complex infectious situations, such as national and local governments.9),17) In such a large system, the SIR model is impossible to adapt without a convincing mean approximation. The nonconformity of the SIR model reported in Italy,13) China10),14) and Iran9) is attributed to the inability to represent the variation in details due to the large population, N. In order for the SIR model to decompose the data, some of the features of the model need to be exposed in the shape of the data, and it is easier to find exposures in small communities than in large ones.
The overall spread of an infection, even in a large system, is explainable by a composite model that combines several basic communities. It is, for example, to create a mathematical model of Japan as a whole by superimposing the surrogate functions of the SIR solution constructed for individual prefectures.
The parameter set given in Table 1 is one of the solutions of the mathematical model for understanding the infection situation in Tokyo, a city with a population of 14 million. The communities that actually caused the spread of infection are classified into eight (ψ1 to ψ8) locations by the start time. It is estimated from the corresponding peaks that there are tens or hundreds of thousands of people. The members forming this basic community are connected by face-to-face relationships in human behavior, which does not imply a community separated by physical space. Although such a community is constantly recombining, it is regarded as if it had always existed as a mathematical model of infection. In this picture, Tokyo appears to be as follows: there are many basic communities with high infection rates scattered among a huge community with low infection rates spread all over the place, as well as several neighboring communities with high infection rates which were infected in a chain reaction. It’s just as if piles of dead grass dotting a lawn are ignited sequentially.
The fact that the 1st day of infection is near to the preceding peak, the probability of “the fire jumping to the next pile” is highest at the time of the most intense burning of the previous pile. As found in Fig. 2, the time to reach the peak is three to four months for R0 = 2.0, and the half-width of the peak is within two months. With this feature, the spread of the infection is discrete, like a series of waves that are triggered sequentially.
One basic community is sufficient to explain the fifth wave, yet, in addition, two communities, ψ7 and ψ8, are provided to explain the data on the delta strain (L452R). Consequently, the infection spread that appears to be one basic community sometimes becomes a combination of several communities as more detailed data is acquired.
In regards to the peak of the fifth wave, which produced a large number of infections, the infection rate (β = 0.235) is not much different from the previous peak, so the conspicuous difference is the large population of one basic community.
The basic community is a closed system that is instantaneously affected by recovered and infected people under completely mixed conditions. The time required for that complete mixing is at least until the peak of the infection is reached, since it is sufficient that the peak is formed due to the saturation effect. Within this closed system, the number of infections rises exponentially, reaches a maximum value, and then drops off exponentially. If the population of the basic community is even larger, the time dependence of the number of infected people will exhibit steeper ups and downs, even for the same infection rate. The reason why the mathematical model fits so closely with the data on the number of infections in Tokyo is due to the simplicity of the model, which consists of a few isolated large well-formed waves.
Ultimately, any analysis of the actual situation of an epidemic should focus on elucidating the underlying basic communities and their trends. It will then be possible to forecast infections, and to avoid diffusion of the disease among communities as a preventive measure.
Edited by Toshimitsu YAMAZAKI, M.J.A.
Correspondence should be addressed: K. Maki, Sasazuka 2-5-2-806, Shiroi, Chiba 270-1426, Japan (e-mail: makik@pri.nir.jp).
susceptible, infected, recovered