Monthly Archives: August 2015

Working with Mahalanobis distance in TI Nspire CX

The Mahalanobis distance is an important method in statistical analysis. It is a different thinking from the common Euclidean distance and considered the dimensionality of standard deviation. In TI Nspire, there is no built-in function for Mahalanobis distance. However, it can be easily calculated using the matrix operations available.

mahalanobis1

Using independent variables x1, x2, and dependent variable y. Firstly, the covariance matrix is obtained by either the first inverse matrix equation above, or the next one where d is defined as row-wise as x1, x2, and a last row of 1’s.

mahalanobis2

Once the covariance matrix is determined, the Mahalanobis distance for x1, x2 can be determined by the above equation, which is a summation of distances times the number of observation minus one. The use of a sum function on matrix is just for convenience of input and display as the summation function can be very long.

Advertisements

Chebyshev approximation of normal distribution probability

Apart from the approximation method by means of the Taylor function, another numerical technique which is considered by most as the de facto method is the Chebyshev approximation. In TI Nspire, the calculation can be done as below. The piece-wise function helps to decide the result from whether d is positive. For better performance, some constants can be pre-calculated instead of as shown below, like the square root of double π. Intensive calculations definitely will benefit from pre-calculated values.

ChebyshevApprox

Approximating normal distribution density function using Taylor series on TI Nspire CX CAS

On the TI Nspire CX CAS, the Taylor series is available as Calculus Series function taylor(). The following is an application of it to approximate the cumulative standard normal distribution. Using order of 12 in the Taylor function below.

taylor4

k-means clustering using TI Nspire

The k-means clustering is probably the simplest of clustering algorithm. Using the built in TI Basic the algorithm can easily be implemented. Source data can be edited in the default spreadsheet editor. A sample data set with two attributes length and weight on three pet types, dog, cats, and rabbits is used for testing. A 3-centroid cluster is selected.

The following is a scatter plot using the default TI Nspire data & statistics application.knn0

The same plot after defining weight and length.

knn3

Running of the k-means program. The dist_to_cluster matrix contains the distance to each centroid for each row of data, and from column-wise the first to third indicates the distance to the corresponding centroid. The last column is the cluster identified that is governed by the minimum value of distance to each of the three clusters. The last command in the screen is to transpose the matrix to fetch the fourth column as row and paste back into the spreadsheet for comparison.

knn1

A comparison chart using R. The default plot by R looks better than the Nspire’s. The label is grouped automatically by the cluster found.

knn2