Author Archives: gmgolem

Building an interactive journey time map from data.gov.hk Traffic Speed Map data with Google Maps API

rtm-40-all-region

Some practical applications for road traffic data processing are the subject covered in past installments here, including a traffic jam classifier using GPU deep learning on surveillance camera images and a mash-up of CCTV still image feed with Google Maps. Both are centered around the data source provided by data.gov.hk, which is an initiative to support big data development by disseminating government data of various type to the public through the Internet.

This installment will walk through another mash-up of Google Maps API for real-time interactive journey time map using Google Maps API. Simply put, it is an overlay of real-time road saturation level information on an interactive digital map.  A demo is available here. As a public demo, it is limited to routes from one of three local regions (Hong Kong Island), largely in consideration of the free daily Google API account quota. For personal use, Google’s free daily quota should be more than enough for most needs.rtm-21-colorcode

Work as well in the 3D Earth view.
rtm-12-perspective-arrows-en

Zoom in for a clean view of travel time in color-coded road sections with direction arrows of travel.
rtm-11-morescreens-zoom

Traffic Speed Map data from data.gov.hk

The Transport Department here publishes road travel time data on data.gov.hk for selected road sections. It is not clear from that site the method of collections and collation of this average speed published. However it is envisaged that apart from traditional roadside vehicle sensors and automatic CCTV vehicle speed estimations, public transport operators like buses might have contributed with real time data from their fleet to improve the overall accuracy.

The page at data.gov.hk provided all the necessary information to consume this data feed, including URL to the XML Traffic Speed Map file (which is being updated every 2 minutes), the schema of the XML file, and some supporting documents with the coordinates of the route origin and destination.

As the XML schema suggested, the XML file is structured pretty straight forward. It includes all routes in one single file, under the element jtis_speedmap (JTIS is probably an acronym for Journey Time Indication System, which is a decade old infrastructure aiming at informing motorists, via giant dot matrix displays mounted on highway gantries, estimated travel time of certain routes). In this XML file, each route is represented by one jtis_speedmap node, and underneath this node carries data associated with that route, including the route record identifier LINK_ID.

<jtis_speedmap>
	<LINK_ID>3006-30069</LINK_ID>
	<REGION>K</REGION>
	<ROAD_TYPE>URBAN ROAD</ROAD_TYPE>
	<ROAD_SATURATION_LEVEL>TRAFFIC AVERAGE</ROAD_SATURATION_LEVEL>
	<TRAFFIC_SPEED>27</TRAFFIC_SPEED>
	<CAPTURE_DATE>2017-06-16T18:30:35</CAPTURE_DATE>
</jtis_speedmap>

Most of these data elements are self-explanatory. For more details, there is a separate PDF file describing the data elements in details. In addition, the table available in an Excel file included the origin and destination coordinates of each route.
rtm-06-exceldata

Customary to the local authority’s geodetic practices here, the coordinates in this file are expressed in HK80 grid, therefore translation to Google Maps’ WGS84 system is required. Further details of the translation are explained in this previous post.

Google Maps Directions

Because the data from data.gov.hk only provided the start and end point of each route (indirectly from the accompanying documentations in PDF and Excel), it is not sufficient for color-coded plotting of road sections on digital map – a feature common to most standard GIS visualization application for road network monitoring. In fact, this plotting feature would not be possible without working with complex GIS data structure.

To address this data insufficiency issue, we are going to utilize a publicly available technology, the Google Maps Directions API. With this Google service, routing between two points on road network is as simple as providing the coordinates of the two endpoints. There are many options available from this service, but we are only interested in the “DRIVING” travel mode here for road network.

This route from Google Maps Directions, however, is not guaranteed to represent the exact route the Traffic Speed Map data meant to measure speed for, but should serve well as a reasonably accurate approximation, and can be further fine-tuned manually. For example, route 910-931 will result in a much longer route using Google’s route service than it is more likely to be, as shown in below the “as-is” route from the XML data (left) and the fine-tuned result by making judgement call after assessing the road network (right):

It should be noted that even after fine tuning, there are still some “interesting” routes. But without further information like waypoints from the data provider (the local Transport Department) to resolve these ambiguities, we would settle with this for now.

To make calls to Google Maps API, a Google API Service account is required. In addition, a secret key has to be generated via the Google API Console. By embedding this secret key in any Javascript code, the Google Maps API will be ready for action. There is a good online collection of samples for reference.

Integrating Google Maps Directions with Traffic Speed Map data at data.gov.hk

To stick with the principle of keeping things simple, as in a previous article on Google Maps integration, the target is to keep everything inside one single Javascript HTML file.

The first step is to embed the static data from data.gov.hk into Javascript code. Those data that are not likely to change often are considered static, and in this case, the coordinates of the routes available on the PDF and Excel file previously mentioned. Using simple awk scripts, the following code fragment is generated for such data. The way the Traffic Speed Map file uses hyphens in the link ID necessitated some character substitution legwork during code generation, as this is used as variable names suffix and thus have to conform with Javascript variable naming conventions which prohibits hyphens.

// start dynamic data code generation

// -- route ID 722-50059
origin722_50059= new google.maps.LatLng(22.2859946618326,114.15524436017);
destination722_50059= new google.maps.LatLng(22.28686575617,114.153536622643);

...
// end dynamic data code generation

 

Next thing is to build the method to read the XML file available on data.gov.hk site. For security restrictions to safeguard cross domain issues on modern browsers, the XML file from the remote data.gov.hk server will be mirrored to my web server using cron jobs and monkeyed in order to prevent violation of browser restrictions.

wget -q http://resource.data.one.gov.hk/td/speedmap.xml -O speedmap.xml
gawk "NR==2{print \"<jtis_speedlist>\"}NR!=2{print}" speedmap.xml > speedmap_clean.xml

 

Adding the entry below to the crontab to schedule running the XML file mirroring every 2 minutes for best synchronization to the data update frequency of the Traffic Speed Map at data.gov.hk.


*/2 * * * * /home/ubuntu/syncrtm.sh

 

Now it is time for the XML request method.

	function readXML() {
		var xhttp = new XMLHttpRequest();
		xhttp.onreadystatechange = function() {
			if (this.readyState == 4 && this.status == 200) {
				liveDataXml = xhttp.responseXML;
			}
		};
		// use speedmap_clean for javascript xml parser CORS issues.
		xhttp.open("GET", "speedmap_clean.xml", true);
		xhttp.send();
	}

 

For supporting the extraction of specific data elements from the XML data source, some helper functions are prepared like extractSpeed() and extractSaturation(). These methods used XPath expression, as exemplified below, to retrieve values of interests in the XML.

var path = "/jtis_speedlist/jtis_speedmap[LINK_ID[text()='" + id + "']]/TRAFFIC_SPEED";

 

The TRAFFIC_SPEED value is extracted for displaying along with the LINK_ID as tooltip on each route markers on mouse over. The ROAD_SATURATION_LEVEL value is color coded to red, amber, and green respectively to render the route on the base map. At this point both the static data (i.e. the coordinates of routes) and the dynamic data (i.e. the XML file with speed and saturation) are ready. Plotting the route is simply by calling the Google Maps API “route” method from the class DirectionsService. The callback function holds the response object and will call another method in my implementation to render the routes and markers on the map. This is where our parsed data from the Traffic Speed Map XML file come into play – setting up the color code according to saturation and label the marker’s tooltip with speed information.

directionsService.route({directionsService.route({
  origin: originLatLng,
  destination: destinationLatLng,
  travelMode: 'DRIVING'
}, function(response, status) {
    if (status === 'OK') { // Update the route on screen
        updateRouteDisplay(...);
        console.debug('Route service called and updated display');
    }
....

 

As there will be quite a lot of routes to plot on the same map, and there are overlapping sections according to the Traffic Speed Map data, it certainly make sense to present information with higher value (congested road sections) without being covered by the less valued. A simple trick for this is to set the z-Index of the route to the inverse of its speed. Adding directional arrows along the route also helps to avoid route information being overlooked for overlapping routes as a result of rendering.

var zIndexInvSpeed = 1000 - parseInt(speed); 

directionsDisplay = new google.maps.DirectionsRenderer({
polylineOptions: {
 strokeColor: color,
 strokeWeight: 8,
 zIndex: zIndexInvSpeed,
 icons:[{repeat:'150px',icon:{path:google.maps.SymbolPath.FORWARD_OPEN_ARROW}}]
}
});

 

Event listeners are attached to each of the markers for a simple filtering function. On clicking of a marker, the map will hide every other routes but the one clicked. This will allow some form of interaction for viewers to focus on specific route of interest from a road network full of color-coded road sections.

Another feature is to allow viewers to cycle through overlapping endpoints during filtering. Imagine a position where two routes joined up, that is, where the end of route A is also start of route B. On the map the markers as rendered by Google Maps will certainly overlap. Allowing user to cycle through by simply clicking away greatly improve viewers’ ability to drill down by filtering without being hindered by the overlapping problem. Shown below is an example of clicking on the end of route 930-931 which coincide with the start of route 931-4650. The overall filtering effect will be filtering out 930-931 first, and then filtering out 931-4650, and finally back to no filtering (show all).
rtm-27-cycling-overlapping

The following screen shows the tooltip of a marker, with information of the route ID and the speed from the Traffic Speed Map XML data.
rtm-15-tooltip

Color-coded road sections.
rtm-18-colorcode
rtm-17-colorcode
rtm-26-colorcode

The core of the Javascript file is a loop that will update the map at specific interval according to the Traffic Speed Map XML data. Due to restrictions on request rating imposed by Google Maps API (which is a fair restriction), there are delays deliberately set in my implementation between each DirectionsService route call, even though this has to be done only once on startup for each route.

Finally the secret API key to calling Google Maps API must be included as the parameter to its library, along with the callback method.

<!-- Your GoogleMap API key goes here -->
<script async defer src="https://maps.googleapis.com/maps/api/js?key=YOUR_SECRET_KEY&callback=initMap">
</script>

 

Transportation and traffic analysis is one interest I am particularly fond of.  Studies on travel time has been a serious topic in the academics as well as the transportation industry, and is a very complex cross-disciplinary problem. For the general public, it is still fun to integrate your own free solution and in the way you want it to be.

Logistic Regression – from Nspire to R to Theano

Logistic regression is a very powerful tool for classification and prediction. It works very well with linearly separable problem. This installment will attempt to recap on its practical implementation, from traditional perspective by maximum likelihood, to more machine learning approach by neural network, as well as from handheld calculator to GPU cores.

The heart of the logistic regression model is the logistic function. It takes in any real value and return value in the range from 0 to 1. This is ideal for binary classifier system. The following is a graph of this function.
theanologistic1

TI Nspire

In the TI Nspire calculator, logistic regression is provided as a built-in function but is limited to single variable. For multi-valued problems, custom programming is required to apply optimization techniques to determine the coefficients of the regression model. One such application as shown below is the Nelder-Mead method in TI Nspire calculator.

Suppose in a data set from university admission records, there are four attributes (independent variables: SAT score, GPA, Interview score, Aptitude score) and one outcome (“Admission“) as the dependent variable.
theano-new1

Through the use of a Nelder-Mead program, the logistic function is first defined as l. It takes all regression coefficients (a1, a2, a3, a4, b), dependent variable (s), independent variables (x1, x2, x3, x4), and then simply return the logistic probability. Next, the function to optimize in the Nelder-Mead program is defined as nmfunc. This is the likelihood function on the logistic function. Since Nelder-Mead is a minimization algorithm the negative of this function is taken. On completion of the program run, the regression coefficients in the result matrix are available for prediction, as in the following case of a sample data with [GPA=1500, SAT=3, Interview=8, Aptitude=60].

theanologistic2(nspire1)

R

In R, as a sophisticated statistical package, the calculation is much simpler. Consider the sample case above, it is just a few lines of commands to invoke its built-in logistic model.

theano-new2

Theano

Apart from the traditional methods, modern advances in computing paradigms made possible neural network coupled with specialized hardware, for example GPU, for solving these problem in a manner much more efficiently, especially on huge volume of data. The Python library Theano is a complex library supporting and enriching these calculations through optimization and symbolic expression evaluation. It also features compiler capabilities for CUDA and integrates Computer Algebra System into Python.

One of the examples come with the Theano documentation depicted the application of logistic regression to showcase various Theano features. It first initializes a random set of data as the sample input and outcome using numpy.random. And then the regression model is created by defining expressions required for the logistic model, including the logistic function and likelihood function. Lastly by using the theano.function method, the symbolic expression graph coded for the regression model is finally compiled into callable objects for the training of neural network and subsequent prediction application.

theanologistic5(theano1)

A nice feature from Theano is the pretty printing of the expression model in a tree like text format. This is such a feel-like-home reminiscence of my days reading SQL query plans for tuning database queries.

theanologistic5(theano2).PNG

 

CUDA, Theano, and Antivirus

Most ubiquitous antivirus products monitor new process from executables in real time and will attempt to terminate their execution if deemed a potential threat. Some of these antivirus products simply do a signature match while some do more sophisticated heuristic or intelligent scanning.

However, there are times when antivirus might turn up a false positive. This is rare, but many software developers must have experienced the slow-down caused merely by the suspending and scanning of the new build from their favorite IDE. Recently my antivirus product let me know he has been very edgy on some Theano python programs with this scan alert.

theanoavast2

This rings a bell. I remember the same happened when working with CUDA on Visual Studio. And a test with some sample CUDA programs quickly confirmed my memory.

theanoavast3

The solution to get rid of the scan is quite simple. On most antivirus products there is an option to whitelist certain programs from being scanned. And on my Avast installation, simply adding the full file path of the nvcc output will do the trick. Note that doing so may pose certain security risks as this essentially neutralized the protection. So to compensate for the increased risk, the whitelist path should be set as precise as possible, such as including a wildcard filename as shown below.

theanoavast1

One last interesting point is, as one of the great benefits from Theano is a high level abstraction of the CUDA layer, it performs some compiling to GPU executable on the CUDA platform using nvcc. Comparing the profiling results obtained before and after antivirus whitelisting shown improvement not only in the overall speed but also in the compile time. With reference to the before-whitelisting profiling result

 Function profiling
==================
 Message: train
 Time in 10000 calls to Function.__call__: 1.122200e+01s
 Time in Function.fn.__call__: 1.089400e+01s (97.077%)
 Time in thunks: 1.069661e+01s (95.318%)
 Total compile time: 5.477000e+01s
 Number of Apply nodes: 17
 Theano Optimizer time: 3.589900e+01s
 Theano validate time: 4.000425e-03s
 Theano Linker time (includes C, CUDA code generation/compiling): 1.772000e+00s
 Import time 1.741000e+00s

and the whitelisted, optimized results:

Function profiling
==================
 Message: train
 Time in 10000 calls to Function.__call__: 9.727999e+00s
 Time in Function.fn.__call__: 9.469999e+00s (97.348%)
 Time in thunks: 9.293550e+00s (95.534%)
 Total compile time: 2.827000e+00s
 Number of Apply nodes: 17
 Theano Optimizer time: 1.935000e+00s
 Theano validate time: 1.999855e-03s
 Theano Linker time (includes C, CUDA code generation/compiling): 3.799987e-02s
 Import time 2.199984e-02s

If Avast only scan the executable before it starts executing, there should be no improvement at all in the compilation. It seems more likely, from analyzing the profiling breakdowns, Avast scans the GPU executable on its creation on the file system by Theano. Turning off Avast’s “File Shield” with no whitelisting triggered no scan alert, therefore confirmed the suspicion.

Bitcoin inflation

bitcoinaltera
Bitcoins are created through reward for successfully mining a block. The current reward as of writing is 12.5 BTC per block.

Referring to the bitcoin history, the current bitcoin in circulation is around 16,378,375, and a year ago it was 15,628,475. So there is 749,900 bitcoins being mined in between.

The rate of a successful mining is around 10 minutes. Assuming this, in one year there should be 24 * 60 / 10 * 12.5 * 365 bitcoins created. That is 657,000 and this theoretical value doesn’t quite add up with the history as this only amounts to 87.6% to the historic value. Close, but more than 10% than expected.

It turns out, in fact, there is a mechanism built-in to the bitcoin to halve the reward as a form of inflation. A month from now will mark the anniversary for the last Bitcoin reward halving from 25 BTC to 12.5 BTC. This change is by design and happened twice as expected in the past, from 50 to 25, to 12.5 as of time of writing.

The first block receiving 12.5 BTC is 420,000 on 2016-07-09, according to the original constant set in stone by the creator of Bitcoin:

Consensus.nSubsidyHalvingInterval = 210000;

Therefore during the period mentioned above where 749,900 are mined there are times when the reward was 25 and some other 12.5.

A rough calculation shown the current reward, which is 12.5 BTC per block, applies to 334 days during that 365-day period. Revising the calculation of the theoretical value resulted in 712,800 bitcoins. Taking into account this fact the estimation improved from 88% to 95% of historical values.

What does endianness have to do with Bitcoins

The order of the byte appears is called the endianness in computer technology. This term stem from processor architecture design, for example, x86 and the classic 6502 is little endian, while S/360 and SPARC are big endian. ARM processors like the one powering the Beagleboard SBC I am happy with from Yubikey to the R statistics package can be configured to run either.

At the end of the day, programs are compiled and linked to instruction sets for the hardware processor to execute. But that is not the end of the story for software developers. Apart from the hardware instruction sets there are also endianness in file. Any developers having involved in any form of low level file processing, in classic or modern programming languages alike, should be very familiar with this.

Take the bitcoin file as en example, the hex dump below is the genesis bitcoin with the timestamp field highlighted in yellow.

bitcoinraw02

On file it reads 29AB5F49, but for the sake of endianness, this value should be interpreted as 495FAB29 in hexadecimal, and the corresponding decimal value is 1231006505. Converting this decimal value timestamp into human readable date:
bitcoinraw03

It is quite trivial to convert from one to another through programming languages and a classic C example as simple as the below macro will do the job.

bitcoinraw04

In Python:

bitcoinraw07

Project in history – Going wireless for a Tamiya kit

This project is completed around 2011. It may look ancient as everyone is crazy about “Drone” today. Nevertheless, it brought me so much fun doing it I am still fond of the good memories today and would like to share here.

The goal back then was simple. A wired Tamiya kit + Arduino + bluetooth + Android phone = Wireless remote controlled bulldozer. Integrate the wireless and accelerometer capability from an Android phone to control this kit. As far as I can recall there was no such term as STEM back then but this sort of project can been seen in those syllabuses today on this exciting subject.

The name Tamiya must ring the bell for everyone who had get their feet wet in the remote control modelling world. Tamiya made great kits at very reasonable price for beginners to moderates. I was lucky enough to own and build one of their classic – a bulldozer kit – many years ago and it remained one of the best part of my memory although that bulldozer itself is nowhere to be seen now.

The below is about pretty much the same kit I came across from Tamiya around 2011 which is Xmas gift from my wife.

The finished mods. It is a Bluetooth enabled Tamiya kit that can be controlled from an Android phone.
395854_333259780020227_713584008_n403063_333259443353594_1805952813_n

The Tamiya kit came with a perfect wired On-Off switches to control the kit for forward-backward-left-right maneuvers.
384643_333179646694907_609266841_n398042_333179746694897_229243479_n

The classic kit I had decades ago (also from Tamiya) is of a wooden base, with metal gears. They are now replaced as plastic that are easier to work with but honestly I still missed the wood and metal. By the way, no power drill is needed for the classic kit with wood, I built mine comfortably with screwdriver and brute force.384637_333179810028224_1549591974_n395470_333179900028215_716701980_n

The core that made it possible from wired to wireless is an Arduino Mega board. Mounted with ease to the baseboard (white) on the Tamiya kit.
401128_333180170028188_114583678_n
388536_333180280028177_623687134_n

The Communication module is an HC-05 Bluetooth board alongside with the H-bride (SN754410) on an Arduino compatible extension board. The design was pretty standard to divide the control circuit from the power circuit that have to drive two 3V motors with power requirements the Arduino board cannot handle.376070_333180030028202_823667822_n

395873_333180076694864_1773154218_n

Prototype in testing.
384074_333179570028248_2022204756_n

A variation configuration using a mobile phone mounted on the kit and via WIFI to stream the real-time video, and have the control of the kit via Bluetooth from the viewing PC. It is a very popular mods back then and the experience of controlling a MARS Rover at home is simply fantastic.
CIMG0592CIMG0589CIMG0590CIMG0588

LU decomposition in TI-84

As an extension to a previous entry on doing LU decomposition in Nspire and R, the TI-84 is covered here. There is no built-in function like in the Nspire for this, but there are many programs available online, with most of them employing a simple Doolittle algorithm without pivoting.

The non-pivoting program described here for the TI-84 series is with a twist. No separate L and U matrix variables are used and the calculations are done in place with the original input matrix A. The end result of both L and U are also stored in this input matrix. This is made possible by the property of the L and U matrices in this decomposition are triangular. Therefore, at the little price of some mental interpretation of the program output, this program will take up less memory and run a little faster than most simple LU decomposition programs online for the same class of calculator. From a simple benchmark with a 5 x 5 matrix, this program took 2 seconds while another standard program took 2.7 seconds.

The input matrix A.
lu84-1

Results are stored in the same matrix.
lu84-2

L, U, and verification.
lu84-3