Tag Archives: Nelder-Mead

Logistic Regression – from Nspire to R to Theano

Logistic regression is a very powerful tool for classification and prediction. It works very well with linearly separable problem. This installment will attempt to recap on its practical implementation, from traditional perspective by maximum likelihood, to more machine learning approach by neural network, as well as from handheld calculator to GPU cores.

The heart of the logistic regression model is the logistic function. It takes in any real value and return value in the range from 0 to 1. This is ideal for binary classifier system. The following is a graph of this function.
theanologistic1

TI Nspire

In the TI Nspire calculator, logistic regression is provided as a built-in function but is limited to single variable. For multi-valued problems, custom programming is required to apply optimization techniques to determine the coefficients of the regression model. One such application as shown below is the Nelder-Mead method in TI Nspire calculator.

Suppose in a data set from university admission records, there are four attributes (independent variables: SAT score, GPA, Interview score, Aptitude score) and one outcome (“Admission“) as the dependent variable.
theano-new1

Through the use of a Nelder-Mead program, the logistic function is first defined as l. It takes all regression coefficients (a1, a2, a3, a4, b), dependent variable (s), independent variables (x1, x2, x3, x4), and then simply return the logistic probability. Next, the function to optimize in the Nelder-Mead program is defined as nmfunc. This is the likelihood function on the logistic function. Since Nelder-Mead is a minimization algorithm the negative of this function is taken. On completion of the program run, the regression coefficients in the result matrix are available for prediction, as in the following case of a sample data with [GPA=1500, SAT=3, Interview=8, Aptitude=60].

theanologistic2(nspire1)

R

In R, as a sophisticated statistical package, the calculation is much simpler. Consider the sample case above, it is just a few lines of commands to invoke its built-in logistic model.

theano-new2

Theano

Apart from the traditional methods, modern advances in computing paradigms made possible neural network coupled with specialized hardware, for example GPU, for solving these problem in a manner much more efficiently, especially on huge volume of data. The Python library Theano is a complex library supporting and enriching these calculations through optimization and symbolic expression evaluation. It also features compiler capabilities for CUDA and integrates Computer Algebra System into Python.

One of the examples come with the Theano documentation depicted the application of logistic regression to showcase various Theano features. It first initializes a random set of data as the sample input and outcome using numpy.random. And then the regression model is created by defining expressions required for the logistic model, including the logistic function and likelihood function. Lastly by using the theano.function method, the symbolic expression graph coded for the regression model is finally compiled into callable objects for the training of neural network and subsequent prediction application.

theanologistic5(theano1)

A nice feature from Theano is the pretty printing of the expression model in a tree like text format. This is such a feel-like-home reminiscence of my days reading SQL query plans for tuning database queries.

theanologistic5(theano2).PNG

 

Experimenting with convergence time in neural network models

After setting up Keras and Theano and have some basic benchmark on the Nvidia GPU, the next thing to get a taste of neural network through these deep learning models are to compare these with one to solve the same problem (an XOR classification) that run on a modern calculator, the TI Nspire, using the Nelder-Mead algorithm for convergence of neural network weights.

A sample of SGD settings in Keras Theano with 30000 iterations converged in around 84 seconds. While the TI Nspire  completed with comparable results in 19 seconds. This is not a fair game of course, as there are lots of parameters that can be tuned in the model.

keras-nspire4

 

Training neural network using Nelder-Mead algorithm on TI Nspire

In this installment the Nelder-Mead method is used to train a simple neural network for the XOR problem. The network consisted of 2-input, 1-output, and 2 hidden layers, and is fully connected. In mainstream practical neural network, back propagation and other evolutionary algorithms are much more popular for training neural network for real world problem. Nelder-Mead is used here just out of curiosity to see how this general optimization routine performed under neural network settings on TI Nspire.
neural-network-xor1

The sigmoid function is declared in an TI Nspire function.
neural-network-xor2

For the XOR problem, the inputs are defined as two lists, and the expected output in another.
neural-network-xor3

The activation functions for each neuron are declared.
neural-network-xor4

To train the network, the sum of squared error function is used to feed into the Nelder-Mead algorithm for minimization. Random numbers are used for initial parameters.
neural-network-xor6
neural-network-xor7

Finally the resulting weights and bias are obtained from running the Nelder-Mead program.
neural-network-xor8

The comparison graph of the performance of the Nelder-Mead trained XOR neural network against expected values.
neural-network-xor9

 

Optimizing optimization code in TI-84 program

As shown in a previous installment on running the Nelder-Mead optimization of the Rosenbrock function in TI-84, its hardware is not up to the challenge for problems of this kind. It is however an interesting case to learn from this program that took a considerable amount of time (in minutes), when most TI-84 programs complete in seconds, to grasp the technique in writing optimized code.

The problem used in the setting is the Rosenbrock function:
nelder-mead-84-code-optimization3

Two techniques were used to try to optimize the code. The first one is to in-line the function (in this case the Rosenbrock function). In the original version this function is stored in a string value to be evaluated whenever necessary by using the TI-84 expr() function. The idea from common in-lining is borrowed to see if it works in TI-84. As shown in line 44 below, the Rosenbrock function replaced the original expr() function call.

nelder-mead-84-code-optimization2b

Another technique is to remove unnecessary comments. For most modern languages this is never considered a problem but for simple calculators it is speculated valuable processing time may be spent on processing comments that contribute nothing to the algorithm.
nelder-mead-84-code-optimization1

Both programs returned the same results. To compare the gain in speed, the same set of initial parameters and tolerance were used in both the original and code-optimized program. Surprisingly, the inline method does not yield any gain while removing all those “:” that were used for indentation does improve the speed. The average of the original program is 183 seconds while the program with cleaned code is only 174 seconds, which is around 95% of the run time of the original program.
nelder-mead-84-code-optimization4

 

 

 

Using Nelder-Mead to solve for Markov chain stationary distribution

Taking the same problem from previous installments, another approach using the Nelder-Mead algorithm is tried and successfully obtained the same answer. After formulating the equation for passing to the Nelder-Mead program in TI Nspire, which required the same algebraic manipulation as in the previous linsolve() approach, another more intuitive formula is tried. The second one directly calculates matrix in the minimization equation and therefore render the algebraic manipulation unnecessary.

The first setting is as below.
markov_stationarydist_NM1

While the second setting that directly calculates on the matrix is shown below.
markov_stationarydist_NM2

Observations on effect of initial parameter of Nelder-Mead to solve linear problem

Out of curiosity a test program is created to iterate the initial parameters for the Nelder-Mead program in TI Nspire to see if there is any effects or patterns to the outcome of the algorithm. The Nelder-Mead program is to solve the following inequalities:

nmsimplexparmest1

The test case will iterate test variables x and y from 1 to 10, resulting in 100 tests. Absolute error value of the Nelder-Mead program result to the known optimal value (33) is plotted in the below chart.

nmsimplexparmest2

nmsimplexparmest3

Solving linear programming problem with Nelder-Mead method

For solving linear programming problem, the simplex method is often applied to search for solution. On the other hand, the Nelder-Mead method is mostly applied as a non-linear searching technique. It would be interesting to see how well it is applied to a linear programming problem previously solved using the Simple Method in TI-84.

The Nelder-Mead method is ran under the TI Nspire CX CAS with NM program written in the TI Basic program. The program accepts arguments including the name of the function to maximize as a string and a list of initial parameters to execute the Nelder-Mead algorithm. The function itself is declared using piece-wise function to bound the return value to the function to maximize while giving penalty to values that violate any constraints (as in the inequalities of the standard simplex method).

Previous program in the TI-84 using the simplex method obtained {200,400} as the solution. The Nelder-Mead returned a solution very close to it.

nmsimplex