We are at a unique moment in history that will decide the future trajectory of our country. Preventative measures have brought our economy to a grinding halt in an attempt to control the pandemic caused by the SARS-Cov-2 virus (COVID-19). America must decide when and how to restart our economy by eventually bringing the lockdown to an end.
The right decision requires accurate data on the spread of the virus. However, there is a lot that we do not know.
- We do not know how fast the infection spreads (i.e. what is the basic reproduction number R0?).
- We do not know when the first case of the infection occurred in the U.S. Between the time we first learned of the pandemic and the time President Donald Trump stopped all flights to and from China, there were over 1,300 direct flights from 17 cities in China, including Wuhan (the epicenter), to the U.S. carrying around 430,000 passengers. This makes it highly probable that the spread of the infection started much earlier than we think.
- We don’t know how many people are actually infected and how far (or close) we are to achieving herd immunity.
- We don’t understand the trade-off between the potential lives saved and the lives lost due to the lockdown. In fact, the lockdown is really just a delay in order to buy time for providers to get better prepared for the impact of COVID-19.
All of these unknowns are crucial in helping make the right decision. Without answers to these questions, any decision made is likely to be faulty and costly, both in lives and dollars. The most apparent solution here would be to rapidly increase testing in order to gather the data that is needed to bring us closer to the answer.
The Testing Challenges
There are currently two types of tests. The first type looks for the presence of the virus (indicative of the presence of RNA). The method used here is a Reverse Transcription Quantitative Polymerase Chain Reaction (RT-qPCR), which can be highly sensitive in detecting SARS-CoV-2, however, it has a number of limitations:
- It requires a high-quality nasal swab that contains sufficient amounts of viral RNA. This can be challenging since the level of viral RNA varies among patients.
- The test result can vary depending on the stage of the infection.
- Administering the test requires trained personnel to collect the samples. The quality of samples varies significantly based on the skills of the personnel.
- Without sufficient viral RNA, the test can result in a false negative reading.
- The analysis of the collected sample involves a sequence of steps to extract the RNA and perform Polymerase Chain Reaction (PCR) tests. For a small number of tests (say a few thousand), this is not a problem. However, for a global pandemic such as the one we are facing, we would require potentially millions of people to test it. This would be overwhelming and easily exhaust all our capacities.
The second type of test looks for the presence of human antibodies (proteins belonging to the immunoglobulin class). These antibodies are more stable than viral RNA and easier to collect since the antibodies are uniformly distributed in the blood. Further, the test samples remain stable during storage, transportation, and final analysis. The specimens can easily be collected with minor phlebotomy discomfort to the patient. Fortunately, the test procedure presents minimal risk to the person collecting the sample as the patient is no longer infectious.
Recently, the FDA approved one such test, developed by Abbott Labs, that can return the results in less than 10 minutes. Abbott shipped over half a million test kits on April 11th and is currently manufacturing 50,000 per day with the intent of increasing the production to 2 million per month. According to Abbott Labs, there are 200 m2000 test instruments, each capable of performing 470 tests per day. This gives the hospitals the capacity to perform 10,000 antibody test per day. However, we do not have the necessary test instruments available to be running tests in these quantities and therefore must use them objectively.
Using the Right Data
There is a great deal of data being regularly released by the various outlets that are inaccurate and should not be used to answer the four key questions presented above. Unfortunately, many of the decisions the government arrived at were based on incorrect or unreliable data for the following reasons:
- Most test kits manufactured in China and distributed to many countries around the world were faulty and flawed. For example, as many as 80 percent of the 150,000 portable test kits that China delivered to the Czech Republic earlier this month were actually defective.
- There is no global standardization around testing and its protocols. The testing methodologies used around the world vary significantly from country to country, with each country either developing their own test kits or sourcing them from others.
- The cause of death of COVID-19 is defined differently in different countries and now varies from state to state or even hospital to hospital in the U.S.
- The numbers being reported by China are not reliable and are being highly downplayed for political reasons.
The first and most important thing we need to understand is the actual mortality rate of COVID-19, which is defined as:
Mortality Rate = Number of Dead / Number of Infected
The problem here is that the numerator is not reliably reported while the denominator cannot be known without thorough testing. The only way we can know the true number of infected is through massive testing, something that was not possible using the original virus tests. The development of the tests that detect the presence of antibodies can improve our ability to collect the right data and help answer this question.
Fortunately, there are a few sources of reliable data that can be used as a basis for this rate. These sources include the data obtained from the Diamond Princess and Iceland outbreaks, and two other more recently published results, one based on infections among healthcare workers published by the CDC and the other from a study performed by Stanford University.
Diamond Princess Data
On February 4th, 2020, the passengers on the Diamond Princess received the news that 10 people onboard had tested positive for COVID-19. This was the beginning of a one-month ordeal. Every individual on board the cruise ship was ordered to quarantine in place. On February 28th, the Japan Ministry of Health reported 705 cases of COVID-19, and on the same day the CDC announced that 44 passengers who were flown into the U.S. had also tested positive. It became apparent the cases were rapidly growing.
Figure 1 summarizes these results from the Diamond data to estimate the infection and mortality rate. The actual mortality rate was 12 out of 712 infected or 1.69%. This is about half the number initially reported by the World Health Organization (WHO). However, this number is not an accurate indicator because the average passenger age was around 65. In comparison, the average age of the U.S. population is around 38. Therefore, the rate would need to be adjusted to correct for the disproportionate number of passengers that are over 60. By applying this correction (see Figure 2), we arrive at an estimated Case Fatality Rate (CFR) of 0.72%. This is 5 times less that the 3.4% reported initially by the WHO.
Figure 1. Number of COVID-19 patients on Diamond Princess Feb. 11 to Feb. 25, 2020
source: Statista Research Department, Apr 11, 2020 (https://www.statista.com/statistics/1099517/japan-coronavirus-patients-diamond-princess/)
Figure 2: Estimating the Case Fatality Rate (CFR) from Diamond Princess data by compensating for higher age profile of the passengers.
The COVID-19 pandemic reached Iceland in February 2020. There were 1,711 identified cases by April 12th, 2020, of which 8 had been reported dead. This results in a case fatality rate of 0.46%. On April 8th, they observed a higher number of recoveries than new infections for two successive days, and Iceland’s head epidemiologist announced that it was highly likely that the country had reached the peak of its outbreak.
CDC Published Healthcare Worker Data
More recently, the CDC published some preliminary data on COVID-19 cases among healthcare workers (U.S. Department of Health and Human Services/Centers for Disease Control and Prevention MMWR / April 17, 2020 / Vol. 69 / No. 15). Nearly 9,300 U.S. healthcare workers (median age of 42) contracted COVID-19, and 27 died. This associated case fatality rate for this group is 0.29%. 73% of those infected were women. A third of those who died were over 65 years old and a third had underlying conditions. Mortality among women was lower than men (38% vs. 62%) resulting in a lower overall case mortality. Accounting for the higher female population, we arrive at an adjusted CFR of 0.41%.
Stanford University Study for Santa Clara County
A study conducted by Stanford University used an antibody blood test to estimate the number of people infected with COVID-19 in Santa Clara County. The result of these tests indicated that 2.49% to 4.16% of the population had been infected with COVID-19 by April 1st. Santa Clara had 1,094 confirmed cases with 50 reported deaths or 4.6% CFR. The Stanford results projects, based on the total population of Santa Clara County, an infection rate of between 48,000 and 81,000, which is 50 to 85 times the 956 confirmed cases. These results indicate a CFR of between 0.06% and 0.10% and suggest it is in the same range as Influenza. (www.medrxiv.org/content/10.1101/2020.04.14.20062463v1)
Given the above set of data points, we can estimate the CFR to be 0.4%, which is close to the CDC reported data but lower by an order of magnitude from that reported by the World Health Organization. Of course, the mortality rate strongly depends on how quickly you act in treating the patient and the level of care each of them receive. It can be inferred from the sequence of events on the Diamond Princess that quarantine in place may have deprived the critically needed treatment of the passengers, potentially increasing the mortality rate. It is likely safe to assume that this case mortality will further reduce as more treatment options become available.
On April 19th, 2020, the reported number of deaths reached 40,000. Using an estimated CFR of 0.4%, the number of infected comes out to about 10 million, which is significantly higher than the 764,000 identified cases on the same day. Now given the fact that tests are only performed on individuals with symptoms (a guideline set by the CDC), it implies that the actual number of infected is 13 times the number of reported cases and those unrecorded cases are either asymptomatic or the symptoms were so mild that they did not fall within CDC’s criterion to get tested.
Estimating the Basic Reproduction Number R0
The SIR model is a set of differential equations that has been applied to predict the progression of diseases, especially airborne, since it was first introduced in 1927 by Kermack and McKendrick. S, I and R represent the number of susceptible, infected, and recovered. The number of dead can then be obtained using the mortality rate times the number of recovered, R. The number requires two key parameters often referred to as β and γ. γ is the inverse of the duration of the disease (for COVID-19 we assume this to be 14 days); γ = 1/14 = 0.0714. β is related to the reproduction number, (β = γ R0). Using this model, we were able to obtain the reproduction number that accurately reproduces the actual death rate at the initial phases of the disease and before any social distancing was introduced. Of course, social distancing will reduce the value of R0 as a result of a slowdown in the progression of the disease.
Figure 3: Estimating R0 by comparing result of SIR Model with actual death rate.
The R0 calculated appears to be within the range of the family of Coronaviruses and is slightly lower than that of the MERS-CoV, but higher that the SARS-Cov-1 virus. See figure 4 below.
Figure 4: Comparison of R0 for different Coronavirus.
The computed R0 using the SIR model to match the initial observed 30 days mortality rate will slightly depend on the assumed CFR. In this computation, we observed that R0 varies from 5.0, 4.7, and 4.5 for CFR 0.1%, 0.3%, and 0.5% respectively.
Given R0, when will we be at the peak of the epidemic?
Allowing the infection to take its course would result in a higher peak in the number of infected. Social distancing, lockdown, proper hand hygiene, wearing face masks, and other infection prevention actions can cumulatively reduce the effective reproduction number, R0. As a result of these actions, one can see:
- A delay in the onset of the peak infection (occurs at a later time).
- A reduction in the peak infection. This is helpful in preventing the healthcare facilities from getting overwhelmed with a surge of infected patients.
- A sufficient amount of time is available for us to develop better treatments, but probably not long enough to develop a vaccine.
Figure 5: Reduction in the number infected with change in R0 (“Flattening of the Curve”)
It is important to note the short period of time it would take to reach the peak infection rate with R0 = 4.5. This implies that if the first infection occurred in early January 2020, by the time the lockdown was imposed (middle of March 2020), it is possible that we were already very close to reaching the peak infection rate. Furthermore, the fact that the first infection in China had occurred in November 2019 and flights continued between Wuhan and the major cities in the U.S., it is likely that the first infection occurred sometime in December 2019. This would imply that the peak infection may have occurred earlier in March.
The reported data from a number of groups including:
- Diamond Princess passengers and crew,
- Iceland, where a significant number of tests were performed, and
- CDC on healthcare workers,
have an accurate estimate for the total number of infected and therefore allows us to estimate the mortality rate of SARS-Covid-2. Based on the calculations, the CFR here is estimated to be 0.4%, while the Stanford study indicates that this number may be around 0.1%, putting it closer to that of Influenza. However, all these results were based on a limited number of tests and could be reaffirmed with larger test populations.
Using an error minimization technique, the parameters of the SIR model were varied to obtain that which gives the least error when compared to the actual reported COVID-19 deaths in the first 30 days in the U.S. The results gave an R0, in the range of 4.5 to 5.0 when the mortality rate was assumed to vary from 0.5% down to 0.1%. When the CFR is 0.4%, the R0 is 4.6. This is considerably higher that what has been suggested in the past.
Finally, with the large number of flights between Wuhan and major US cities, including New York, Seattle, San Francisco, Los Angeles, it is not unreasonable to assume that the first case of COVID-19 may have occurred earlier than the assumed January 15th, 2020, case. About 60 days after the first reported case, the lockdown went into effect (sometime around March 15th, 2020). Based on the estimated R0, the peak infection should occur after 80 to 85 days. However, if the first case actually occurred sometime in December, then it is possible that many COVID-19 cases went undetected and were even diagnosed as pneumonia or severe flu between November 2019 and February 2020.
It is imperative to identify the individuals who may have been preciously exposed and get them for the presence of COVID-19 antibodies. This will help provide the data needed to answer two key questions; (a) did the infection occur earlier on, (b) how wide was the spread of the infection and how close are we to achieving herd immunity. This information will be critical in many decisions to be made in the near future.
Bahram Nour-Omid is the Executive Chairman of the Board and Cofounder of Vitalacy, Inc.