Statistical Analysis - Regression Analysis

Home
Statistical Analysis
Regression Analysis

In this section, we present the analysis results obtained from employing some statistical models to explore the features of COVID-19 in Canada. The primary purpose here is to demonstrate the possibility of using different modeling strategies to analyze the COVID-19 data. We hope the studies can shed light on understanding the complex features and the development of COVID-19 in Canada. When interpreting the results, readers are reminded to pay attention to the associated model assumptions that may be untestable.

Prediction with SIR Model | Prediction with NN | Regression Analysis

Objective

We are interested in understanding what factors may be associated with the COVID-19 fatality in Canada. We conduct statistical analyses by examining the fatality by May 6, 2020 in Canada.

Assumption and Model

The infection fatality rate (IFR) of the COVID-19 is defined as

where the number of deaths and the number of confirmed cases are taken up to May 6, 2020. With the limited access to the public data sources listed in Table 1, we examine the relationship between the IFR and the potential risk factors displayed in Table 1.

Table 1: Information of Potential Risk Factors.
(CCDSS: the Canadian Chronic Disease Surveillance System; GC: Government of Canada)

Descriptions of the risk factors are as follows:

Number of seniors: the number of residences age 65 or over per 10,000 people.
Average temperature: the average temperature from February to April.
Ischemic heart disease: the age standardized incidence rate of the ischemic heart disease per 100,000 people age 20 years and older.
Number of hospital establishments: the number of hospital establishments per 10,000 people in a province or city.
Smoking prevalence: the smoking prevalence in a province which including regular and nonregular smokers.
Number of tests: the number of tests per 10,000 people done by May 6, 2020.
Number of physicians: the number of family medicine and general practice physicians per 10,000 people in a province or city.

First, we employ the Linear Regressions Model (LRM) to describe the relationship between the seven possible risk factors and the IFR. Secondly, without imposing a specific model form, we utilize the Multiple Index Model(MIM) to study the relationship between the potential risk factors and the IFR, where we fit the data using an adaptive Lasso penalized Sliced Inverse Regression method.

Findings and Discussion

The analysis results are reported in Table 2. The LRM results show no evidence of the linear relationship between those factors and the IFR. However, the MIM results suggest that the risk factors x₁ (i.e., the number of seniors), x₄ (i.e., the number of hospital establishments), and x₇ (i.e., the number of physicians) are associated with the IFR of the COVID-19; there is no evidence to support the association of other risk factors with the IFR. Since the MIM estimates are direction vectors of the estimated basis of the subspace for the multiple index model, the positive (or negative) sign of an estimate does not indicate a positive (or negative) association.

The findings here reveal that different modeling choices may yield different results. While the LRM is easy to use, it is inadequate to facilitate the nonlinear relationship between possible risk factors and the IFR. The MIM has the flexibility in featuring the unknown association form for the variables, but the interpretability of its results should be treated with caution due to the limited size of the data. Due to the access limitations, our analyses here are conducted only to a small sample consisting of the fatality information at the province or city level. As more data become available, using the MIM to identify the characteristics of the COVID-19 may be more reliable by stratifying the data at a more refined scale with more accurate information of risk factors.

Table 2: Data Analysis using the Linear Regression Model (LRM) and the Multiple Index Model (MIM).
("EST" and "SE" represent the estimate and standard error, respectively.)