Category Archives: RStudio

Looking for insights from Fitbit data with R

With a month’s of Fitbit data, it’s about time to harvest for some insights from this technology packed wristband.

r-fitbit4


library(lubridate)
library(dplyr)
fitbitdata = read.csv('fitbit.csv')
fitbitdata % mutate(dow = wday(Date))
fitbitdata$dowlabel <- factor(fitbitdata$dow,levels=1:7,labels=c("Mon","Tue","Wed","Thu","Fri","Sat","Sun"),ordered=TRUE)
fitbitdata$scyl <- as.factor(as.integer(fitbitdata$distance)/max(fitbitdata$distance))
head(fitbitdata)
c1 <- rainbow(7)
c2 <- rainbow(7, alpha=0.4)
c3 <- rainbow(7, v=0.8)
boxplot(fitbitdata$steps~fitbitdata$dowlabel, col=c2, medcol=c3, whiskcol=c1, staplecol=c3, boxcol=c3, outcol=c3, pch=23, cex=2, alpha=fitbitdata$scyl)

r-fitbit3

Number of steps and distance traveled data per day is collected from the Fitbit’s phone app, converted into CSV format, and then uploaded to R for data analysis. With a few lines of R code to draw a box-plot for day of week analysis, this data set with a third part statistical package will fill the gap until the Fitbit App offers something more sophisticated .

 

 

RStudio Server and Apache

rstudio2To install R and RStudio Server on Ubuntu:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
sudo add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/'
sudo apt-get update
sudo apt-get install r-base
sudo -i R
wget https://download2.rstudio.org/rstudio-server-1.1.453-amd64.deb
sudo gdebi rstudio-server-1.1.453-amd64.deb

Configure Apache 2.4 to proxy RStudio, install required modules.

sudo apt-get install libapache2-mod-proxy-html
sudo apt-get install libxml2-dev

Edit configuration file 000-default.conf to add the followings. Assuming RStudio runs on default port 8787 and preferred path is /rstudio:

<Proxy *>
	Allow from localhost
< /Proxy *>

RewriteEngine on
RewriteCond %{HTTP:Upgrade} =websocket
RewriteRule /rstudio/(.*) ws://localhost:8787/$1  [P,L]
RewriteCond %{HTTP:Upgrade} !=websocket
RewriteRule /rstudio/(.*) http://localhost:8787/$1 [P,L]
ProxyPass /rstudio/ http://localhost:8787/
ProxyPassReverse /rstudio/ http://localhost:8787/
ProxyRequests Off

Finally restart apache.


sudo a2enmod proxy && sudo a2enmod proxy_http && sudo a2enmod proxy_wstunnel && sudo service apache2 restart

Stochastic Gradient Descent in R

Stochastic Gradient Descent (SGD) is an optimization method common used in machine learning, especially neural network. The name implied it is aimed at minimization of function.

In R, there is a SGD package for the purpose. As a warm up for the newly upgraded R and RStudio, it is taken as the target of a test drive.

R-sgd1

Running the documentation example.
R-sgd2

Running the included demo for logistic regression.R-sgd3

Data input for ANOVA in TI nspire and R

In TI nspire CX, the application Lists & Spreadsheet provided a convenient Excel list interface for data input.

anova-datainput1

The data can also be named by columns and recalled from the Calculator application. Statistical functions can then be applied. Using a sample from the classical TI-89 statistics guide book on determining the interaction between two factors using 2-way ANOVA, the same output is obtained from the TI nspire CX.

anova-datainput2

anova-datainput3

anova-datainput4

In R, data are usually imported from CSV file using read.csv() command. There are also other supported formats including SPSS and Excel. For more casual data entry that command line input is suffice, raw data are usually stored into list variable using c() command. Working with ANOVA for data entry in this way is not as straightforward because dimension is required for the analysis on data stored in the list variable.

To accomplish the ANOVA, factor data types are used in conjunction with list variable. The below is the same TI example completed in R. Firstly we define the list variable in a fashion of the order by club (c1 = driver, c2 = five iron) then brand (b1-, b2-, b3-, with the last digit as the sample number), i.e.
{c1,b1-1}; {c1,b1-2}; {c1,b1-3}; {c1,b1-4};
{c1,b2-1}; {c1,b2-2};…
{c2,b1-1}; {c2,b1-2};…

Two Factor variables are then created, one for club (with twelve 1’s followed by twelve 2’s), and another for brand (1 to 3 each repeating four times for each sample, and then completed by another identical sequence).
anova-datainput-r1

These two Factor variables essentially represent the position (or index in array’s term) of the nth data value in respect of the factor it belongs to, and can be better visualized in the following table.
anova-datainput-r3

Finally, the 2-way ANOVA can be performed using the following commands.
anova-datainput-r2

Interaction plot in R.
anova-r-interactionplot1anova-r-interactionplot2

Useful R commands and techniques

A collection of some frequently used R commands.

Creating column for categorical group assignment. Shown below is to assign A/B/C according to some existing values.
R-Technique1-grouping

Similarly for new column for derived values from existing column.
R-Technique2-new-dependent-column

Sampling using the sampling package with strata.
R-Technique5a-sampling

With R Studio, view command can be invoked from clicking the variable under the Environment tab, and the frame will then be listed nicely in a tabular format in the data frame window.
R-Technique4-view

Data frame at a glance – head, str, summary.R-Technique3-head-str-summary