Tag Archives: logistic regression

Logistic Regression – from Nspire to R to Theano

Logistic regression is a very powerful tool for classification and prediction. It works very well with linearly separable problem. This installment will attempt to recap on its practical implementation, from traditional perspective by maximum likelihood, to more machine learning approach by neural network, as well as from handheld calculator to GPU cores.

The heart of the logistic regression model is the logistic function. It takes in any real value and return value in the range from 0 to 1. This is ideal for binary classifier system. The following is a graph of this function.
theanologistic1

TI Nspire

In the TI Nspire calculator, logistic regression is provided as a built-in function but is limited to single variable. For multi-valued problems, custom programming is required to apply optimization techniques to determine the coefficients of the regression model. One such application as shown below is the Nelder-Mead method in TI Nspire calculator.

Suppose in a data set from university admission records, there are four attributes (independent variables: SAT score, GPA, Interview score, Aptitude score) and one outcome (“Admission“) as the dependent variable.
theano-new1

Through the use of a Nelder-Mead program, the logistic function is first defined as l. It takes all regression coefficients (a1, a2, a3, a4, b), dependent variable (s), independent variables (x1, x2, x3, x4), and then simply return the logistic probability. Next, the function to optimize in the Nelder-Mead program is defined as nmfunc. This is the likelihood function on the logistic function. Since Nelder-Mead is a minimization algorithm the negative of this function is taken. On completion of the program run, the regression coefficients in the result matrix are available for prediction, as in the following case of a sample data with [GPA=1500, SAT=3, Interview=8, Aptitude=60].

theanologistic2(nspire1)

R

In R, as a sophisticated statistical package, the calculation is much simpler. Consider the sample case above, it is just a few lines of commands to invoke its built-in logistic model.

theano-new2

Theano

Apart from the traditional methods, modern advances in computing paradigms made possible neural network coupled with specialized hardware, for example GPU, for solving these problem in a manner much more efficiently, especially on huge volume of data. The Python library Theano is a complex library supporting and enriching these calculations through optimization and symbolic expression evaluation. It also features compiler capabilities for CUDA and integrates Computer Algebra System into Python.

One of the examples come with the Theano documentation depicted the application of logistic regression to showcase various Theano features. It first initializes a random set of data as the sample input and outcome using numpy.random. And then the regression model is created by defining expressions required for the logistic model, including the logistic function and likelihood function. Lastly by using the theano.function method, the symbolic expression graph coded for the regression model is finally compiled into callable objects for the training of neural network and subsequent prediction application.

theanologistic5(theano1)

A nice feature from Theano is the pretty printing of the expression model in a tree like text format. This is such a feel-like-home reminiscence of my days reading SQL query plans for tuning database queries.

theanologistic5(theano2).PNG

 

Advertisements

Stochastic Gradient Descent in R

Stochastic Gradient Descent (SGD) is an optimization method common used in machine learning, especially neural network. The name implied it is aimed at minimization of function.

In R, there is a SGD package for the purpose. As a warm up for the newly upgraded R and RStudio, it is taken as the target of a test drive.

R-sgd1

Running the documentation example.
R-sgd2

Running the included demo for logistic regression.R-sgd3

Overclocking Casio fx-9860gII to the max with portable power bank

As discovered previously the overclocking program ftune performed better when USB cable is plugged in, it was not sure back then whether the data link or the extra power supply contributed to the performance boost. It is now confirmed power supply alone will do the trick.

The test is carried out using a Casio Basic program running a parameter searching for a logistic regression equation using the Nelder-Mead algorithm. A portable power bank is plugged to the USB port of the Casio fx-9860gII with ftune.

The tested portable USB power bank comes with 4000mAh capacity and 1A output, and is fully charged. There are 4 blue LED indicator lights.
casiousbpoweroverclock1

The power bank is turned on.
casiousbpoweroverclock2

The test result shown that the program run with power bank finished in 46 seconds, and the one without finished in 93 seconds.

Using program in Casio fx-9860Gii for matrix operation

To implement the Wald test as previously done in the TI Nspire CX CAS on the Casio 9860, some programming can help achieve building the matrices more easily. In the Casio Basic, list and matrix are accessed in index and by looping through for loops, values from formula calculation can be assigned. This result in the same matrix constructed in using the constructMat() function in the Nspire.

wald-casio3

The rest of calculations are the same.

wald-casio1

wald-casio2

Wald Test for Logistic Regression

The Wald test can be demonstrated using example from the previous post on the likelihood test for logistic regression. Again, assuming a confidence level of 95%. The hypothesis setting is a little different because this test targets individual parameter in the regression model:

Null hypothesis: A1=0
Alternative hypothesis: A1≠0

To test the hypothesis, the statistic below is obtained using the following equation:

( ma × mb × ma)-1

where ma and mb are matrices and defined as below.

ma consists of the x1 and x2 variables as first and second row, and all 1s as the third row. In the TI Nspire, the function colAugment is convenient to construct matrix from multiple lists.

wald1

The next step involved determining the y hat values from the regression model for each data row.wald2

Once determined, the matrix mb can be defined as below. It is a diagonal matrix with values correspond to the equation of y_hat × (1 - y_hat). Notice how constructMat worked with the piece-wise expression for this diagonal matrix.

wald3

The calculation can then be performed. The final results as shown in the second equation below is then used for determination of the P-value of χ² distribution in 1 degree of freedom.

wald4

Since the value is less than 0.05, the conclusion is to reject the null hypothesis A1=0 and accept the alternative hypothesis A1≠0.

Likelihood ratio test for Logistic Regression

Using a previous example on logistic regression, the likelihood ratio can be calculated for an estimate of goodness of fit of the parameters in the regression model. Assuming a confidence level of 95%, and hypothesis setting below:

Null hypothesis: A1=A2=0.
Alternative hypothesis: Not A1=A2=0.

Firstly, recall the parameters of the logistic regression are obtained by the Nelder-Mead method:

lr-likelihood-ratio1

Since the maximum likelihood approach is used, the maximum likelihood value, L, is obtained:

lr-likelihood-ratio2

The success and failure count are obtained, and then substituted into the equation below together with the maximum likelihood.

lr-likelihood-ratio3

Finally, the χ² distribution for the value obtained above is determined, with degree of freedom of 2.

lr-likelihood-ratio4

Since P-value is smaller than 0.05, therefore the conclusion is reject the null hypothesis A1=A2=0 and accept Not A1=A2=0.