The Mahalanobis distance is an important method in statistical analysis. It is a different thinking from the common Euclidean distance and considered the dimensionality of standard deviation. In TI Nspire, there is no built-in function for Mahalanobis distance. However, it can be easily calculated using the matrix operations available.
Using independent variables x1, x2, and dependent variable y. Firstly, the covariance matrix is obtained by either the first inverse matrix equation above, or the next one where d is defined as row-wise as x1, x2, and a last row of 1’s.
Once the covariance matrix is determined, the Mahalanobis distance for x1, x2 can be determined by the above equation, which is a summation of distances times the number of observation minus one. The use of a sum function on matrix is just for convenience of input and display as the summation function can be very long.
Apart from the approximation method by means of the Taylor function, another numerical technique which is considered by most as the de facto method is the Chebyshev approximation. In TI Nspire, the calculation can be done as below. The piece-wise function helps to decide the result from whether d is positive. For better performance, some constants can be pre-calculated instead of as shown below, like the square root of double π. Intensive calculations definitely will benefit from pre-calculated values.
On the TI Nspire CX CAS, the Taylor series is available as Calculus Series function
taylor(). The following is an application of it to approximate the cumulative standard normal distribution. Using order of 12 in the Taylor function below.
The k-means clustering is probably the simplest of clustering algorithm. Using the built in TI Basic the algorithm can easily be implemented. Source data can be edited in the default spreadsheet editor. A sample data set with two attributes length and weight on three pet types, dog, cats, and rabbits is used for testing. A 3-centroid cluster is selected.
The following is a scatter plot using the default TI Nspire data & statistics application.
The same plot after defining weight and length.
Running of the k-means program. The dist_to_cluster matrix contains the distance to each centroid for each row of data, and from column-wise the first to third indicates the distance to the corresponding centroid. The last column is the cluster identified that is governed by the minimum value of distance to each of the three clusters. The last command in the screen is to transpose the matrix to fetch the fourth column as row and paste back into the spreadsheet for comparison.
A comparison chart using R. The default plot by R looks better than the Nspire’s. The label is grouped automatically by the cluster found.