Category Archives: RStudio

Looking for insights from Fitbit data with R

With a month’s of Fitbit data, it’s about time to harvest for some insights from this technology packed wristband.

r-fitbit4


library(lubridate)
library(dplyr)
fitbitdata = read.csv('fitbit.csv')
fitbitdata % mutate(dow = wday(Date))
fitbitdata$dowlabel <- factor(fitbitdata$dow,levels=1:7,labels=c("Mon","Tue","Wed","Thu","Fri","Sat","Sun"),ordered=TRUE)
fitbitdata$scyl <- as.factor(as.integer(fitbitdata$distance)/max(fitbitdata$distance))
head(fitbitdata)
c1 <- rainbow(7)
c2 <- rainbow(7, alpha=0.4)
c3 <- rainbow(7, v=0.8)
boxplot(fitbitdata$steps~fitbitdata$dowlabel, col=c2, medcol=c3, whiskcol=c1, staplecol=c3, boxcol=c3, outcol=c3, pch=23, cex=2, alpha=fitbitdata$scyl)

r-fitbit3

Number of steps and distance traveled data per day is collected from the Fitbit’s phone app, converted into CSV format, and then uploaded to R for data analysis. With a few lines of R code to draw a box-plot for day of week analysis, this data set with a third part statistical package will fill the gap until the Fitbit App offers something more sophisticated .

RStudio Server and Apache

Leave a reply

rstudio2 To install R and RStudio Server on Ubuntu:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
sudo add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/'
sudo apt-get update
sudo apt-get install r-base
sudo -i R
wget https://download2.rstudio.org/rstudio-server-1.1.453-amd64.deb
sudo gdebi rstudio-server-1.1.453-amd64.deb

Configure Apache 2.4 to proxy RStudio, install required modules.

sudo apt-get install libapache2-mod-proxy-html
sudo apt-get install libxml2-dev

Edit configuration file 000-default.conf to add the followings. Assuming RStudio runs on default port 8787 and preferred path is /rstudio:

<Proxy *>
	Allow from localhost
< /Proxy *>

RewriteEngine on
RewriteCond %{HTTP:Upgrade} =websocket
RewriteRule /rstudio/(.*) ws://localhost:8787/$1  [P,L]
RewriteCond %{HTTP:Upgrade} !=websocket
RewriteRule /rstudio/(.*) http://localhost:8787/$1 [P,L]
ProxyPass /rstudio/ http://localhost:8787/
ProxyPassReverse /rstudio/ http://localhost:8787/
ProxyRequests Off

Finally restart apache.


sudo a2enmod proxy && sudo a2enmod proxy_http && sudo a2enmod proxy_wstunnel && sudo service apache2 restart

Wald Test for Logistic Regression in R

Leave a reply

Running the Wald test in R is much simpler than using the Nspire calculator. With the same sample data set:
rwald1

The package survey provided function for the Wald test as “regTermTest”. F statistic and P-value will be calculated. rwald2

Stochastic Gradient Descent in R

Leave a reply

Stochastic Gradient Descent (SGD) is an optimization method common used in machine learning, especially neural network. The name implied it is aimed at minimization of function.

In R, there is a SGD package for the purpose. As a warm up for the newly upgraded R and RStudio, it is taken as the target of a test drive.

R-sgd1

Running the documentation example.
R-sgd2

Running the included demo for logistic regression. R-sgd3

Data input for ANOVA in TI nspire and R

Leave a reply

In TI nspire CX, the application Lists & Spreadsheet provided a convenient Excel list interface for data input.

anova-datainput1

The data can also be named by columns and recalled from the Calculator application. Statistical functions can then be applied. Using a sample from the classical TI-89 statistics guide book on determining the interaction between two factors using 2-way ANOVA, the same output is obtained from the TI nspire CX.

anova-datainput2

anova-datainput3

anova-datainput4

In R, data are usually imported from CSV file using read.csv() command. There are also other supported formats including SPSS and Excel. For more casual data entry that command line input is suffice, raw data are usually stored into list variable using c() command. Working with ANOVA for data entry in this way is not as straightforward because dimension is required for the analysis on data stored in the list variable.

To accomplish the ANOVA, factor data types are used in conjunction with list variable. The below is the same TI example completed in R. Firstly we define the list variable in a fashion of the order by club (c1 = driver, c2 = five iron) then brand (b1-, b2-, b3-, with the last digit as the sample number), i.e.
{c1,b1-1}; {c1,b1-2}; {c1,b1-3}; {c1,b1-4};
{c1,b2-1}; {c1,b2-2};…
{c2,b1-1}; {c2,b1-2};…

Two Factor variables are then created, one for club (with twelve 1’s followed by twelve 2’s), and another for brand (1 to 3 each repeating four times for each sample, and then completed by another identical sequence).
anova-datainput-r1

These two Factor variables essentially represent the position (or index in array’s term) of the nth data value in respect of the factor it belongs to, and can be better visualized in the following table.
anova-datainput-r3

Finally, the 2-way ANOVA can be performed using the following commands.
anova-datainput-r2

Interaction plot in R.
anova-r-interactionplot1 anova-r-interactionplot2

CART in R

Leave a reply

Decision tree classification can be done with ease in R with the help from the package rpart. Additionally, the natural outcome of any decision tree is the visualization and can also be conveniently achieved by the package rpart.plot.

Useful R commands and techniques

Leave a reply

A collection of some frequently used R commands.

Creating column for categorical group assignment. Shown below is to assign A/B/C according to some existing values.

Similarly for new column for derived values from existing column.

Sampling using the sampling package with strata.

With R Studio, view command can be invoked from clicking the variable under the Environment tab, and the frame will then be listed nicely in a tabular format in the data frame window.

Data frame at a glance – head, str, summary.

	gmgolem on Vintage Casio Data-Calc D…
	Kari on Vintage Casio Data-Calc D…
	gmgolem on Vintage Casio Data-Calc D…
	Damian on Vintage Casio Data-Calc D…
	Microsoft Solver Fou… on Exploring optimization problem…