Category Archives: RStudio

Wald Test for Logistic Regression in R

Running the Wald test in R is much simpler than using the Nspire calculator. With the same sample data set:
rwald1

The package survey provided function for the Wald test as “regTermTest”. F statistic and P-value will be calculated.rwald2

Advertisements

Stochastic Gradient Descent in R

Stochastic Gradient Descent (SGD) is an optimization method common used in machine learning, especially neural network. The name implied it is aimed at minimization of function.

In R, there is a SGD package for the purpose. As a warm up for the newly upgraded R and RStudio, it is taken as the target of a test drive.

R-sgd1

Running the documentation example.
R-sgd2

Running the included demo for logistic regression.R-sgd3

Data input for ANOVA in TI nspire and R

In TI nspire CX, the application Lists & Spreadsheet provided a convenient Excel list interface for data input.

anova-datainput1

The data can also be named by columns and recalled from the Calculator application. Statistical functions can then be applied. Using a sample from the classical TI-89 statistics guide book on determining the interaction between two factors using 2-way ANOVA, the same output is obtained from the TI nspire CX.

anova-datainput2

anova-datainput3

anova-datainput4

In R, data are usually imported from CSV file using read.csv() command. There are also other supported formats including SPSS and Excel. For more casual data entry that command line input is suffice, raw data are usually stored into list variable using c() command. Working with ANOVA for data entry in this way is not as straightforward because dimension is required for the analysis on data stored in the list variable.

To accomplish the ANOVA, factor data types are used in conjunction with list variable. The below is the same TI example completed in R. Firstly we define the list variable in a fashion of the order by club (c1 = driver, c2 = five iron) then brand (b1-, b2-, b3-, with the last digit as the sample number), i.e.
{c1,b1-1}; {c1,b1-2}; {c1,b1-3}; {c1,b1-4};
{c1,b2-1}; {c1,b2-2};…
{c2,b1-1}; {c2,b1-2};…

Two Factor variables are then created, one for club (with twelve 1’s followed by twelve 2’s), and another for brand (1 to 3 each repeating four times for each sample, and then completed by another identical sequence).
anova-datainput-r1

These two Factor variables essentially represent the position (or index in array’s term) of the nth data value in respect of the factor it belongs to, and can be better visualized in the following table.
anova-datainput-r3

Finally, the 2-way ANOVA can be performed using the following commands.
anova-datainput-r2

Interaction plot in R.
anova-r-interactionplot1anova-r-interactionplot2

Useful R commands and techniques

A collection of some frequently used R commands.

Creating column for categorical group assignment. Shown below is to assign A/B/C according to some existing values.
R-Technique1-grouping

Similarly for new column for derived values from existing column.
R-Technique2-new-dependent-column

Sampling using the sampling package with strata.
R-Technique5a-sampling

With R Studio, view command can be invoked from clicking the variable under the Environment tab, and the frame will then be listed nicely in a tabular format in the data frame window.
R-Technique4-view

Data frame at a glance – head, str, summary.R-Technique3-head-str-summary