# Calculating COVID-19 infection statistics using Nspire

The recent outbreak of COVID-19 world wide is alarming. Using data published by the Johns Hopkins CSSE and NSpire calculator, we are able to perform some basic regression analysis with Nspire calculator to get a rough picture of the outbreak.

The graph below shows data of daily infections outside of China.

It is showing more of exponential growth than linear. The r squared value is contrasted between the exponential (0.91) and linear regression analysis (0.81) using the Statistics function built in to the Nspire.

For epidemiology analysis, there are well established mathematics models when fed with accurate data, better descriptions and even reliable predictions are possible. One of the index from these models is the Basic Reproduction Number, known as R0 value, which indicates how many more infections from an infected individual can infect other uninfected individual. By far the estimation for COVID-19 is from 1.4 to 6.6.

# Variance Inflation Factors in R

The Variance Inflation Factors function is available in R for determining existence of multicollinearity. The VIF function is given by:

And to use this built-in function is R:

```vif(fit) sqrt(vif(fit))```

# Durbin-Watson statistic in TI-84

Unlike the more sophisticated TI-89 and Nspire, the Durbin-Watson statistic is not included in the TI-84. Yet, calculating it is fairly straight-forward using list functions.

This statistics of regression is given as

where e is the residual list of values. To obtain this list (using a previous multiple regression example), simply subtract the actual values from the regression formula (Y7 below):

Finally, run the formula below for answer.

# Coefficient of determination for Multiple linear regression in TI-84 Plus

After determining the parameters of multiple linear regression in TI-84 (which do not have any direct built-in function support of this calculation), the coefficient of determination can also be easily calculated using the rich set of list functions supported by TI-84. Following the previous example, the dependent variable is in Sales list, the other two independent variables are Size and Dist lists.

The Yhat list is to be prepared first. This lists store the predicted values using the regression parameters determined in the previous installment.

Next, the mean of Y and Yhat are calculated and stored to a handy list S.

Furthermore, three lists SYY, SYhYh, SYYh are calculated respectively.

The result is obtained by the formula below.

# White test in TI Nspire and R

The White test is a statistical test to determine whether homoskedasticity exists in a data set. This test is based on the variance from the residual values. The TI Npsire is capable of computing this test even though it is not part of built-in functions, as the residual values can be recalled from regression tests. An example including multiple regression is shown below.

A scatter plot for visual inspection of heteroskedasticity.

In spreadsheet mode the calculation of the data set.

And in R.

# Quick residual plot in TI Nspire

When working with regression analysis, residual plot is a handy tool to gain insights by visualization. The TI Nspire provided easy and convenient access to these plots in just a few clicks.

Using a simple linear regression as an example below:

Access the menu 4:Analyze > 7:Residuals will show the two options for residual plots, including Show Residual Squares and Residual Plots. The nice plotting output are show below.

# Multiple linear regression in TI-84 Plus

Advanced feature like multiple linear regression is not included in the TI-84 Plus SE. However, obtaining the regression parameters need nothing more than some built-in matrix operations, and the steps are also very easy. For a simple example, consider two independent x variables x1 and x2 for a multiple regression analysis.

Firstly, the values are input into lists and later turned into matrices. L1 and L2 are x1 and x2, and L3 is the dependent variable.

Convert the lists into matrices using the `List>matr()` function. L1 thru L3 are converted to Matrix C thru E.

Create an matrix with all 1s with the dimension same as L1 / L2. And then use the `augment()` function to create a matrix such that the first row is L1 (Matrix C), second row is L2 (Matrix D), and the third row is the all 1s matrix. In this example we will store the result to matrix F. Notice that since `augment()` takes only two argument at one time, we have to chain the function.

The result of F and its transform look like below.

Finally, the following formula is used to obtain the parameters for the multiple regression

`([F]t * [F])-1 * [F]t * [E]`

The parameters are expressed in the result matrix and therefore the multiple regression equation is

`y = 41.51x1 - 0.34x2 + 65.32`

See also this installment to determine the correlation of determination in a multiple linear regression settings also using the TI-84.

# Comparing bug prediction methods by logistic growth and Gompertz curve in Nspire

Analysis can be performed on a sample set of data with cumulative bug counts collected over 12 days to obtain parameters to fit in models for future prediction. Column A and B are data, with the standard Nspire logistic regression function executed on column C and D to obtain the parameters a,b,c. Column E is the function value of the logistic function but not the one built-in with Nspire, instead the parameters are obtained separately using the Nelder-Mead program from the previous post.

There are other models besides logistic regression for prediction, one being an sigmoid function called Gompertz function and is applied to the same data set to obtain the parameters for comparison with the more common logistic function. Since the parameters are obtained in a similar fashion as the logistic function, i.e. by minimizing the sum of errors, the Nelder-Mead program can be reused. After obtaining the parameters, the function values on the data set are calculated and shown in Column F.

The application of the Nelder-Mead program to obtain the parameters of the logistic regression is shown below. Firstly the logi function is declared, and the sum of squared error is declared in the numfunc_logi function which in turn will be passed to the nm function in order to obtain the minimum by the Nelder-Mead algorithm. As shown below the results are exactly the same with the Nspire built-in logistic regression function (a=64.003, b=9.0317, c=0.33644, albeit the Nspire formula named a,b,c differently).

The application of the Nelder-Mead program to obtain the parameters of the Gompertz function is similar.

The number of bugs, data fit for both functions are plotted in the below graph alongside with the logistic regression curve. Hard to tell which of the two functions is better?

Turns out there is some guess better than others. As the calculation of Ru value below shown, the Gompertz function provided a little better fit in this bug prediction case. To calculate, obtain the one-var stats from the bugs data (only the sum of squares of deviation, stat.SSX is needed), and then plug in other values accordingly. Similar to the R coefficient in regression analysis, the larger value is, the better the prediction. And in this case, 0.9248 from Gompertz outperformed 0.9107 from logistic.
Eduguesstimate is what I’d call this conclusion 😉