The Variance Inflation Factors function is available in R for determining existence of multicollinearity. The VIF function is given by:

And to use this built-in function is R:

`vif(fit)`

sqrt(vif(fit))

Leave a reply

The Variance Inflation Factors function is available in R for determining existence of multicollinearity. The VIF function is given by:

And to use this built-in function is R:

`vif(fit)`

sqrt(vif(fit))

Advertisements

Unlike the more sophisticated TI-89 and Nspire, the Durbin-Watson statistic is not included in the TI-84. Yet, calculating it is fairly straight-forward using list functions.

This statistics of regression is given as

where e is the residual list of values. To obtain this list (using a previous multiple regression example), simply subtract the actual values from the regression formula (Y7 below):

Finally, run the formula below for answer.

After determining the parameters of multiple linear regression in TI-84 (which do not have any direct built-in function support of this calculation), the coefficient of determination can also be easily calculated using the rich set of list functions supported by TI-84. Following the previous example, the dependent variable is in Sales list, the other two independent variables are Size and Dist lists.

The Yhat list is to be prepared first. This lists store the predicted values using the regression parameters determined in the previous installment.

Next, the mean of Y and Yhat are calculated and stored to a handy list S.

Furthermore, three lists SYY, SYhYh, SYYh are calculated respectively.

The result is obtained by the formula below.

The White test is a statistical test to determine whether homoskedasticity exists in a data set. This test is based on the variance from the residual values. The TI Npsire is capable of computing this test even though it is not part of built-in functions, as the residual values can be recalled from regression tests. An example including multiple regression is shown below.

A scatter plot for visual inspection of heteroskedasticity.

In spreadsheet mode the calculation of the data set.

And in R.

When working with regression analysis, residual plot is a handy tool to gain insights by visualization. The TI Nspire provided easy and convenient access to these plots in just a few clicks.

Using a simple linear regression as an example below:

Access the menu **4:Analyze > 7:Residuals** will show the two options for residual plots, including **Show Residual Squares** and **Residual Plots**. The nice plotting output are show below.

Advanced feature like multiple linear regression is not included in the TI-84 Plus SE. However, obtaining the regression parameters need nothing more than some built-in matrix operations, and the steps are also very easy. For a simple example, consider two independent x variables x_{1} and x_{2} for a multiple regression analysis.

Firstly, the values are input into lists and later turned into matrices. L1 and L2 are x_{1} and x_{2}, and L3 is the dependent variable.

Convert the lists into matrices using the `List>matr()`

function. L1 thru L3 are converted to Matrix C thru E.

Create an matrix with all 1s with the dimension same as L1 / L2. And then use the `augment()`

function to create a matrix such that the first row is L1 (Matrix C), second row is L2 (Matrix D), and the third row is the all 1s matrix. In this example we will store the result to matrix F. Notice that since `augment()`

takes only two argument at one time, we have to chain the function.

The result of F and its transform look like below.

Finally, the following formula is used to obtain the parameters for the multiple regression

([F]^{t}* [F])^{-1}* [F]^{t}* [E]

The parameters are expressed in the result matrix and therefore the multiple regression equation is

y = 41.51x_{1}- 0.34x_{2}+ 65.32

See also this installment to determine the correlation of determination in a multiple linear regression settings also using the TI-84.

Analysis can be performed on a sample set of data with cumulative bug counts collected over 12 days to obtain parameters to fit in models for future prediction. Column A and B are data, with the standard Nspire logistic regression function executed on column C and D to obtain the parameters a,b,c. Column E is the function value of the logistic function but not the one built-in with Nspire, instead the parameters are obtained separately using the Nelder-Mead program from the previous post.

There are other models besides logistic regression for prediction, one being an sigmoid function called Gompertz function and is applied to the same data set to obtain the parameters for comparison with the more common logistic function. Since the parameters are obtained in a similar fashion as the logistic function, i.e. by minimizing the sum of errors, the Nelder-Mead program can be reused. After obtaining the parameters, the function values on the data set are calculated and shown in Column F.

The application of the Nelder-Mead program to obtain the parameters of the logistic regression is shown below. Firstly the **logi** function is declared, and the sum of squared error is declared in the **numfunc_logi** function which in turn will be passed to the **nm** function in order to obtain the minimum by the Nelder-Mead algorithm. As shown below the results are exactly the same with the Nspire built-in logistic regression function (a=64.003, b=9.0317, c=0.33644, albeit the Nspire formula named a,b,c differently).

The application of the Nelder-Mead program to obtain the parameters of the Gompertz function is similar.

The number of bugs, data fit for both functions are plotted in the below graph alongside with the logistic regression curve. Hard to tell which of the two functions is better?

Turns out there is some guess better than others. As the calculation of *Ru* value below shown, the Gompertz function provided a little better fit in this bug prediction case. To calculate, obtain the one-var stats from the bugs data (only the sum of squares of deviation, **stat.SSX** is needed), and then plug in other values accordingly. Similar to the R coefficient in regression analysis, the larger value is, the better the prediction. And in this case, 0.9248 from Gompertz outperformed 0.9107 from logistic.

*Eduguesstimate* is what I’d call this conclusion 😉