The Movidius compute stick is supported on the Raspberry Pi 3 Model B platform. This is almost like a dream deployment for low power consumption applications. The overall dimension of Movidius with Raspi in their original configuration might not fit into some commonly available IP56 waterproof enclosures but with some ultra short extension cables and converters, packing them insides and even with a lithium battery pack should be easy.

# Category Archives: machine learning

# Deep Learning with the Movidius Neural Compute Stick

Deep Learning is a breakthrough in Artificial Intelligence. With its root from neural network, modern computing hardware advancement enabled new possibilities by sophisticated integrated circuits technology.

A branch of this exciting area in AI is machine learning. The leading development frameworks include TensorFlow and Caffe. Pattern recognition is a practical application of machine learning where photos or videos are analysed by machine to produce usable output as if a human did the analysis. GPU has been a favorite choice for its specialized architecture, delivering its supreme processing power not only in graphics processing but also popular among the neural network community. Covered in a previous installment is how to deploy an Amazon Web Services GPU instance to analyse real time traffic camera images using Caffe.

To bring this kind of machine learning power to IoT, Intel shrank and packaged a specialized Vision Processing Unit into the form factor of a USB thumb drive in the Movidius™ Neural Compute Stick.

It sports an ultra low power Vision Processing Unit (VPU) inside an aluminium casing, weights only 30g (without the cap). Supported on the Raspberry Pi 3 model B makes it a very attractive add-on for development projects involving AI application on this platform.

In the form factor of an USB thumb drive, the specialized VPU geared for machine learning in the Movidius performs as an AI accelerator for the host computer.

To put this neural compute stick into action, an SDK available from git download provided by Movidius is required. Although this SDK runs on Ubuntu, Windows users with VirtualBox can easily install the SDK with an Ubuntu 16.04 VM.

While the SDK comes with many examples, and the setup is a walk in the park, running these examples is not so straight forward, especially on a VM. There are points to note from making this stick available in the VM including USB 3 and filters setting in VirtualBox, to the actual execution of the provided sample scripts. Some examples required two sticks to run. Developers should be comfortable with Python, unix make / git commands, as well as installing plugins in Ubuntu.

The results from the examples in the SDK alone are quite convincing, considering the form factor of the stick and its electrical power consumption. This neural computing stick “kept its cool” literally throughout the test drive, unlike the FPGA stick I occasionally use for bitcoins mining which turn really hot.

# Logistic Regression – from Nspire to R to Theano

Logistic regression is a very powerful tool for classification and prediction. It works very well with linearly separable problem. This installment will attempt to recap on its practical implementation, from traditional perspective by maximum likelihood, to more machine learning approach by neural network, as well as from handheld calculator to GPU cores.

The heart of the logistic regression model is the logistic function. It takes in any real value and return value in the range from 0 to 1. This is ideal for binary classifier system. The following is a graph of this function.

## TI Nspire

In the TI Nspire calculator, logistic regression is provided as a built-in function but is limited to single variable. For multi-valued problems, custom programming is required to apply optimization techniques to determine the coefficients of the regression model. One such application as shown below is the Nelder-Mead method in TI Nspire calculator.

Suppose in a data set from university admission records, there are four attributes (independent variables: *SAT score, GPA, Interview score, Aptitude score*) and one outcome (“A*dmission*“) as the dependent variable.

Through the use of a Nelder-Mead program, the logistic function is first defined as ** l**. It takes all regression coefficients (

*a1, a2, a3, a4, b*), dependent variable (

*s*), independent variables (

*x1, x2, x3, x4*), and then simply return the logistic probability. Next, the function to optimize in the Nelder-Mead program is defined as

**. This is the likelihood function on the logistic function. Since Nelder-Mead is a minimization algorithm the negative of this function is taken. On completion of the program run, the regression coefficients in the result matrix are available for prediction, as in the following case of a sample data with [GPA=1500, SAT=3, Interview=8, Aptitude=60].**

*nmfunc*## R

In R, as a sophisticated statistical package, the calculation is much simpler. Consider the sample case above, it is just a few lines of commands to invoke its built-in logistic model.

## Theano

Apart from the traditional methods, modern advances in computing paradigms made possible neural network coupled with specialized hardware, for example GPU, for solving these problem in a manner much more efficiently, especially on huge volume of data. The Python library Theano is a complex library supporting and enriching these calculations through optimization and symbolic expression evaluation. It also features compiler capabilities for CUDA and integrates Computer Algebra System into Python.

One of the examples come with the Theano documentation depicted the application of logistic regression to showcase various Theano features. It first initializes a random set of data as the sample input and outcome using ** numpy.random**. And then the regression model is created by defining expressions required for the logistic model, including the logistic function and likelihood function. Lastly by using the

**method, the symbolic expression graph coded for the regression model is finally compiled into callable objects for the training of neural network and subsequent prediction application.**

*theano.function*A nice feature from Theano is the pretty printing of the expression model in a tree like text format. This is such a feel-like-home reminiscence of my days reading SQL query plans for tuning database queries.

# CUDA, Theano, and Antivirus

Most ubiquitous antivirus products monitor new process from executables in real time and will attempt to terminate their execution if deemed a potential threat. Some of these antivirus products simply do a signature match while some do more sophisticated heuristic or intelligent scanning.

However, there are times when antivirus might turn up a false positive. This is rare, but many software developers must have experienced the slow-down caused merely by the suspending and scanning of the new build from their favorite IDE. Recently my antivirus product let me know he has been very edgy on some Theano python programs with this scan alert.

This rings a bell. I remember the same happened when working with CUDA on Visual Studio. And a test with some sample CUDA programs quickly confirmed my memory.

The solution to get rid of the scan is quite simple. On most antivirus products there is an option to whitelist certain programs from being scanned. And on my Avast installation, simply adding the full file path of the nvcc output will do the trick. Note that doing so may pose certain security risks as this essentially neutralized the protection. So to compensate for the increased risk, the whitelist path should be set as precise as possible, such as including a wildcard filename as shown below.

One last interesting point is, as one of the great benefits from Theano is a high level abstraction of the CUDA layer, it performs some compiling to GPU executable on the CUDA platform using nvcc. Comparing the profiling results obtained before and after antivirus whitelisting shown improvement not only in the overall speed but also in the compile time. With reference to the before-whitelisting profiling result

Function profiling ================== Message: train Time in 10000 calls to Function.__call__: 1.122200e+01s Time in Function.fn.__call__: 1.089400e+01s (97.077%) Time in thunks: 1.069661e+01s (95.318%) Total compile time: 5.477000e+01s Number of Apply nodes: 17 Theano Optimizer time: 3.589900e+01s Theano validate time: 4.000425e-03s Theano Linker time (includes C, CUDA code generation/compiling): 1.772000e+00s Import time 1.741000e+00s

and the whitelisted, optimized results:

Function profiling ================== Message: train Time in 10000 calls to Function.__call__: 9.727999e+00s Time in Function.fn.__call__: 9.469999e+00s (97.348%) Time in thunks: 9.293550e+00s (95.534%) Total compile time: 2.827000e+00s Number of Apply nodes: 17 Theano Optimizer time: 1.935000e+00s Theano validate time: 1.999855e-03s Theano Linker time (includes C, CUDA code generation/compiling): 3.799987e-02s Import time 2.199984e-02s

If Avast only scan the executable before it starts executing, there should be no improvement at all in the compilation. It seems more likely, from analyzing the profiling breakdowns, Avast scans the GPU executable on its creation on the file system by Theano. Turning off Avast’s “File Shield” with no whitelisting triggered no scan alert, therefore confirmed the suspicion.

# Experimenting with convergence time in neural network models

After setting up Keras and Theano and have some basic benchmark on the Nvidia GPU, the next thing to get a taste of neural network through these deep learning models are to compare these with one to solve the same problem (an XOR classification) that run on a modern calculator, the TI Nspire, using the Nelder-Mead algorithm for convergence of neural network weights.

A sample of SGD settings in Keras Theano with 30000 iterations converged in around 84 seconds. While the TI Nspire completed with comparable results in 19 seconds. This is not a fair game of course, as there are lots of parameters that can be tuned in the model.

# Performance gain by GPU in Theano

A very basic timing comparison on the performance of a setup of Theano using the MNIST dataset with multiple layer perceptron. A gain in performance of almost 18% is achieved with GPU, and significant improvement is observed using T-test.

# Exploring Theano with Keras

Theano needs no introduction in the field of deep learning. It is based on Python and supports CUDA. Keras is a libray that wraps the complexity of Theano to provide a high level abstraction for developing deep learning solutions.

Installing Theano and Keras are easy and there are tons of resources available online. However, my primary CUDA platform is on Windows so most standard guides that are based on Linux required some adaptations. Most notably are the proper setting of the PATH variable and the use of the Visual Studio command prompt.

The basic installation steps include setting up of CUDA, a scientific python environment, and then Theano and Keras. CuDNN is optional and required Compute Capability of greater than 3.0 which unfortunately my GPU is a bit old and does not meet this requirement.

Some programs on Windows platform encountered errors and found to be library related issues. Like this one that failed to compile on Spyder can be resolved using the Visual Studio Cross Tool Command Prompt.

The Nvidia profiler checking for the performance of the GPU, running the Keras example of the MNIST digits with MLP.