Tag Archives: Python

Deep Learning with the Movidius Neural Compute Stick


Deep Learning is a breakthrough in Artificial Intelligence. With its root from neural network, modern computing hardware advancement enabled new possibilities by sophisticated integrated circuits technology.

A branch of this exciting area in AI is machine learning. The leading development frameworks include TensorFlow and Caffe. Pattern recognition is a practical application of machine learning where photos or videos are analysed by machine to produce usable output as if a human did the analysis. GPU has been a favorite choice for its specialized architecture, delivering its supreme processing power not only in graphics processing but also popular among the neural network community. Covered in a previous installment is how to deploy an Amazon Web Services GPU instance to analyse real time traffic camera images using Caffe.

To bring this kind of machine learning power to IoT, Intel shrank and packaged a specialized Vision Processing Unit into the form factor of a USB thumb drive in the Movidius™ Neural Compute Stick.

It sports an ultra low power Vision Processing Unit (VPU) inside an aluminium casing, weights only 30g (without the cap). Supported on the Raspberry Pi 3 model B makes it a very attractive add-on for development projects involving AI application on this platform.IMAG1435

In the form factor of an USB thumb drive, the specialized VPU geared for machine learning in the Movidius performs as an AI accelerator for the host computer.IMAG1439

To put this neural compute stick into action, an SDK available from git download provided by Movidius is required. Although this SDK runs on Ubuntu, Windows users with VirtualBox can easily install the SDK with an Ubuntu 16.04 VM.

While the SDK comes with many examples, and the setup is a walk in the park, running these examples is not so straight forward, especially on a VM. There are points to note from making this stick available in the VM including USB 3 and filters setting in VirtualBox, to the actual execution of the provided sample scripts. Some examples required two sticks to run. Developers should be comfortable with Python, unix make / git commands, as well as installing plugins in Ubuntu.

The results from the examples in the SDK alone are quite convincing, considering the form factor of the stick and its electrical power consumption. This neural computing stick “kept its cool” literally throughout the test drive, unlike the FPGA stick I occasionally use for bitcoins mining which turn really hot.


What does endianness have to do with Bitcoins

The order of the byte appears is called the endianness in computer technology. This term stem from processor architecture design, for example, x86 and the classic 6502 is little endian, while S/360 and SPARC are big endian. ARM processors like the one powering the Beagleboard SBC I am happy with from Yubikey to the R statistics package can be configured to run either.

At the end of the day, programs are compiled and linked to instruction sets for the hardware processor to execute. But that is not the end of the story for software developers. Apart from the hardware instruction sets there are also endianness in file. Any developers having involved in any form of low level file processing, in classic or modern programming languages alike, should be very familiar with this.

Take the bitcoin file as en example, the hex dump below is the genesis bitcoin with the timestamp field highlighted in yellow.


On file it reads 29AB5F49, but for the sake of endianness, this value should be interpreted as 495FAB29 in hexadecimal, and the corresponding decimal value is 1231006505. Converting this decimal value timestamp into human readable date:

It is quite trivial to convert from one to another through programming languages and a classic C example as simple as the below macro will do the job.


In Python:


Exploring Theano with Keras

Theano needs no introduction in the field of deep learning. It is based on Python and supports CUDA. Keras is a libray that wraps the complexity of Theano to provide a high level abstraction for developing deep learning solutions.

Installing Theano and Keras are easy and there are tons of resources available online. However, my primary CUDA platform is on Windows so most standard guides that are based on Linux required some adaptations. Most notably are the proper setting of the PATH variable and the use of the Visual Studio command prompt.

The basic installation steps include setting up of CUDA, a scientific python environment, and then Theano and Keras. CuDNN is optional and required Compute Capability of greater than 3.0 which unfortunately my GPU is a bit old and does not meet this requirement.


Some programs on Windows platform encountered errors and found to be library related issues. Like this one that failed to compile on Spyder can be resolved using the Visual Studio Cross Tool Command Prompt.

The Nvidia profiler checking for the performance of the GPU, running the Keras example of the MNIST digits with MLP.keras2