05 Feb 2017, 17:00

Improved Logistic Classifier in C#

Last weekend I implemented a logistic classifier for use in a side-project I’ve been working on for the last few months (stay tuned on that). For anyone reading along (probably no one), you may recall I was quite enamoured with the beauty with which C# allowed the mathematics to be expressed in code.

Unfortunately, the implementation is annoyingly slow for the quantity of data I want to use it on, so this weekend I’ve built an optimised version. It’s approximately 20 times faster. Here’s what makes it faster:

  1. Unnecessary heap allocations and copying of data are avoided. I would guess this is the primary optimisation and accounts for about 10x improvement in performance. LINQ is the culprit here and the functional niceties it provides have been replaced by iterative loops.

  2. The cost function is never calculated (only it’s gradient w.r.t. $\theta$). I would guess this accounts for an additional 2x improvement. I was using the cost function to adaptively decrease the alpha step size and also to exit out of the training loop early when a good enough solution was found. I still wanted to decrease alpha as the current solution nears a local minima as I’ve found this results in faster training times. However, this is now done by a fixed proportion each cycle (specified using the halfLives parameter). Also, the training now always executes a fixed number of iterations.

  3. Array indexing is avoided in favour our unsafe code blocks and pointer arithmetic. I would guess this probably only accounts for a little improvement as I suspect standard loop constructs are heavily optimised by the C# compiler.

Finally, I’ve added a new feature. Some of the classification tasks I have are pretty unbalanced (relative class frequency is 10:1 in the worst case), so I wanted to play with allowing some of the training data to have more impact on the optimisation than others. There is a new parameter weightClass0 which specifies the weight of class 0 training data relative to class 1.

Parameters that work well for my data set are:

  • iterations ~ 100k
  • regularisation constant (lambda) ~ 0 (i’ve not found overfitting to be a problem)
  • theta - zero vector
  • alpha ~ 0.5
  • halfLives ~ 3

Code is available on github and I’ve packaged up the library and put it on nuget. Most of the calculation is done in the Gradient method, the new implementation of which is shown below:

    unsafe public static void FastGradient(double[,] xs, double[] ys, double[] theta, double lambda, double weightClass0, double[] thetaGradientOut)
    {
        int thetaLen = theta.Length;

        fixed (double* theta_fixed = theta)
        fixed (double* ys_fixed = ys)
        fixed (double* xs_fixed = xs)
        fixed (double* thetaGradient_fixed = thetaGradientOut)
        {
            double* ysPtr = ys_fixed;
            double* xsPtr = xs_fixed;
            double* resultPtr = thetaGradient_fixed;
            double* thetaPtr = theta_fixed;

            for (int j=0; j<thetaLen; ++j)
            {
                *(resultPtr++) = 0.0;
            }
            
            double weightTotal = 0.0;
            for (int i=0; i<ys.Length; ++i)
            {
                double weight = ys[i] > 0.5 ? 1.0 : weightClass0;
                weightTotal += weight;
                
                double* xsPtr_save = xsPtr;
                double h = 0;
                thetaPtr = theta_fixed;
                for (int j=0; j<thetaLen; ++j)
                {
                    h += *(thetaPtr++) * *(xsPtr++);
                }
                double delta = Sigmoid(h) - *(ysPtr++);

                resultPtr = thetaGradient_fixed;
                for (int j=0; j<thetaLen; ++j)
                {
                    *(resultPtr++) += weight * delta * *(xsPtr_save++);
                }
            }

            double trainingSetSize = ys.Length;

            resultPtr = thetaGradient_fixed;
            *(resultPtr++) /= weightTotal;
            thetaPtr = theta_fixed + 1;
            for (int j=1; j<thetaLen; ++j)
            {
                *resultPtr += lambda * *(thetaPtr++);
                *(resultPtr++) /= weightTotal;
            }
        }
    }

That’s it!

Again, just wanted to put out a quick writeup of this - time to get back to using it in my super-exciting side project.