Monthly Archives: July 2016

Performance gain by GPU in Theano

A very basic timing comparison on the performance of a setup of Theano using the MNIST dataset with multiple layer perceptron. A gain in performance of almost 18% is achieved with GPU, and significant improvement is observed using T-test.



Exploring Theano with Keras

Theano needs no introduction in the field of deep learning. It is based on Python and supports CUDA. Keras is a libray that wraps the complexity of Theano to provide a high level abstraction for developing deep learning solutions.

Installing Theano and Keras are easy and there are tons of resources available online. However, my primary CUDA platform is on Windows so most standard guides that are based on Linux required some adaptations. Most notably are the proper setting of the PATH variable and the use of the Visual Studio command prompt.

The basic installation steps include setting up of CUDA, a scientific python environment, and then Theano and Keras. CuDNN is optional and required Compute Capability of greater than 3.0 which unfortunately my GPU is a bit old and does not meet this requirement.


Some programs on Windows platform encountered errors and found to be library related issues. Like this one that failed to compile on Spyder can be resolved using the Visual Studio Cross Tool Command Prompt.

The Nvidia profiler checking for the performance of the GPU, running the Keras example of the MNIST digits with MLP.keras2

Training neural network using Nelder-Mead algorithm on TI Nspire

In this installment the Nelder-Mead method is used to train a simple neural network for the XOR problem. The network consisted of 2-input, 1-output, and 2 hidden layers, and is fully connected. In mainstream practical neural network, back propagation and other evolutionary algorithms are much more popular for training neural network for real world problem. Nelder-Mead is used here just out of curiosity to see how this general optimization routine performed under neural network settings on TI Nspire.

The sigmoid function is declared in an TI Nspire function.

For the XOR problem, the inputs are defined as two lists, and the expected output in another.

The activation functions for each neuron are declared.

To train the network, the sum of squared error function is used to feed into the Nelder-Mead algorithm for minimization. Random numbers are used for initial parameters.

Finally the resulting weights and bias are obtained from running the Nelder-Mead program.

The comparison graph of the performance of the Nelder-Mead trained XOR neural network against expected values.


Compiling CUDA in command line

For getting acquainted to more unix based CUDA development, the following command line and related environment are found to works in the current Visual Studio based platform.

cd /d "C:\ProgramData\NVIDIA Corporation\CUDA Samples\v6.5\1_Utilities\deviceQuery"

set path=C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin;%path%

nvcc -arch=sm_21 -I..\..\common\inc deviceQuery.cpp -o deviceQuery


The command line environment on Microsoft Visual Studio platform that have CUDA properly setup (like the one for approximating the value of pi here) relied on native commands like msbuild as below.

msbuild BlackScholes_vs2010.sln /t:rebuild


Adding Google authenticator support to RFID password keeper

The latest upgrade to my DIY gadget that keep passwords and log in to Windows with a swipe of an RFID card is the support of Google authenticator.

In brief, the gadget is an Atmel based micro-controller connected to an RFID card reader, packaged in the form-factor of a name card holder. With a USB connection to any Windows based PC, all I need to do to log in is to wave my card.

Although supporting static passwords only, this gadget served me well along the years. In recent years I found myself relied more on dynamic authentication like one time passwords provided by Yubikey and Google Authenticator.

These are proven technologies for multi-factor authentication, and I trusted this with many of my Amazon AWS based linux hosts.

Even though the Google authenticator is already very user friendly via its Android app, to use it I have to pull the phone out of pocket and start the app, read the six digit code and then type it in as quickly as possible.

To make life easier, I recently upgraded this RFID gadget to support one time password for Google authenticator. The upgrade in term of programming is easy as the algorithm is open (RFC-6238) and there are handful of libraries available. The obvious hurdle for implementing TOTP on this gadget is the lack of a real time clock (RTC) for the micro-controller to compute the required authentication code. Although most RTC modules are compact these days, fitting one more PCB board to this already cramped gadget is not easy.

So for now I will settle with an alternative – since the micro-controller supported serial communication, providing the time source by the PC host itself can easily be achieved with a simple Powershell script below:

$utctime=[int][double]::Parse($(Get-Date -date (Get-Date).ToUniversalTime()-uformat %s))
$port= new-Object System.IO.Ports.SerialPort COM7,9600,None,8,1

Just run this script to feed the timestamp, and then swipe the RFID card as usual at the SSH prompt asking for Google authenticator verification code. Happy with this upgrade.


Traffic accident images from

This traffic accident involving five vehicles happened last month and was caught on the CCTV camera. The still-image video below is constructed from data available to the public at, the same data source for an analysis of camera images using Nvidia GPU deep learning on the Amazon cloud.