Monday, 24 June 2013

Generating Credit Card Numbers/Tokens for Testing - Fast

An ex-colleague of mine who works as a team lead/principle developer in a large organisation keeps his own blog.  He recently posted about one of our ancient problems of validating credit card numbers in .NET using the well known Luhn check algorithm. In the post he introduces the theory and also introduces a card number/token generation algorithm to create credit card numbers. It was a problem which I was once asked to look at but did not have the time so left it. It had been picked up by different folk in the organisation, finished off for their purposes (including DB Unit Tests) and I saw it again for the first-time in two years the other day. It works fine, but then within it I saw the mathematical equivalent of a 'code smell' and it was this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public string GenerateCardToken()
{
    string cardNum = string.Empty;
 
    for (int j = 0; j < random.Next(3) + 13; j++)
    {
        cardNum += random.Next(0, 10).ToString();
    }
 
    int c = 0;
    string fullNum;
 
    while (!(fullNum = string.Format("{0}{1}", cardNum, c)).LuhnCheck())
    {
        c++;
    }
 
    return fullNum;
}

It made me think that I needed to look into this, as I really don't like the way it generates the check digit at  lines 13 to 16 inclusive.

Coders out there who are not optimisation geeks will ask what is wrong with that?

The problem with this part of the algorithm, which locates the check digit, is that it is a form of trial and error when there is a mathematical solution to the problem. Trial and error elements which simply increment a variable result in wasted loop cycles, wasted time and wasted computing. In the case of small scale projects or unit testing, this isn't too much of a problem until you want to make your tests fast and have lots of credit card data you need to generate.

Stephen did a good job of showing you the basics of Luhn. So I won't cover them again. Plus, he also shows his experience of generating valid numbers from partial cards, so that you can pretend to be a VISA or MasterCard customer to test that any card identification algorithms work. Functionally, this is further than I will go today. Hence, I would recommend reading his post if you want a more comprehensive overview of the subject.

I also won't concentrate on code level optimisations, use of LINQ, using Bitwise arithmetic, data types or anything else, as there are many good resources out there already for it. I am interested in finding a more performant way of doing this that against this algorithm. So if you are ready, let's up the anti.

The Maths

Firstly, given we are constraining ourselves to modulo arithmetic, once you have calculated the Luhn sums then the check digit is unique! To prove it, consider the following sequence of 16 digits, which form some card number C. The 15th x is shown at the front of the sequence and is the leftmost digit on a credit card.


A valid credit card number is anything that passes a Luhn check. The Luhn check is also pretty simple. Summing the result of doubling the odd digits (and adding the resulting digits if necessary) and then calculating the check digit by finding the difference between 10 and the units in the result.

What's the problem?

To optimise this, there are two steps here that can cause coders some problems. The first is the check for whether or not the doubling has resulted in a two digit number such as 14 (in which case the Luhn sum would  include 1 + 4 = 5) or not. Well, there is an elegant solution and that is that for any two digit number, if you modulo the result with 9, you get the two digit addition. Try it:

11 = 2 (mod 9)
34 = 7 (mod 9)
19 = 1 (mod 9)
...

Why does this work?

In number theory this works because any number can be expressed as a sum of some parts (units, tens, hundreds...). You have definitely been introduced to this before, I guarantee it.. unless you have never been educated in your life and if not, how are you reading this?

Because of that, 28 can be expressed as:

28 = (2)(10) + (8)(1)

And this generalises to any number 'ab' being expressed as:

'ab' =  10a + b

The Luhn algorithm adds the digits together, hence the sum is a + b. This then means that the difference between them is:

'ab' - (a + b) = 10a + b - a - b = 9a

What this means is that every single two digit number you have, which requires the summing of the two digits when subtracted from the original number (which is effectively what the Luhn check does to get the remainder) is a multiple of 9. It's always a multiple of 9. If it is always a multiple of 9, then taking the modulus of the original number will give you the remainder relative to that division by 9. So the mod is all we have to do once we have doubled the odd numbers in the zero based sequence.

For those I have worked with in my time, who I have played a few 'mathemagical' tricks on, they may recognise this from one of my mind-reading tricks. I hope you can see that maths is more than just puzzle solving! Mark my words, the nuclear bunker I have full of a 5 years supply of baked beans will also come in handy one day! :-D

So that's that one then, what about the check digit?

The check digit isn't really any harder. All you have to do is find the next multiple of 10 up from your Luhn sum of the digits 0 to 14. This can be by the use of the units or as I preferred to do it, multiply the sum by 9 and take the modulo 10 of that number. This becomes the 16th digit (at position 15). After all, that is how the check digit is calculated. It ultimately becomes the following, where L is a Luhn operator:



I know you don't like these symbols, so some code for you.

        public string GenerateCardTokenOptimised()
        {
            int[] checkArray = new int[15];
            
            var cardNum = new int[16];
 
            for (int d = 14; d >= 0; d--)
            {
                cardNum[d] = _random.Next(0, 9);
                checkArray[d] = ( cardNum[d] * (((d+1)%2)+1)) % 9;
            }
 
            cardNum[15] = ( checkArray.Sum() * 9 ) % 10;
 
            var sb = new StringBuilder(); 
 
            for (int d = 0; d < 16; d++)
            {
                sb.Append(cardNum[d].ToString());
            }
 
            return sb.ToString();
        }


That's it. Nothing more nothing less. I have tried to keep it fairly readable, but as I have implied, you can definitely make improvements, especially to the code quality and method name which is very poor.

How does it compare?

Method

Created a class library and placed both pieced of code within it. Developed a test project to test the generation of 1,000 card numbers. The tick times were taken and placed in a string builder to output to two separate files at the end of the run. This was repeated 6 times in total with the runs in either order (optimised first then unoptimised first).

Results

Pertinent unit-tests passed

Run No Optimised Unoptimised Speed Increase
1 20067 180036 8.971744655
2 30012 180025 5.99843396
3 30065 220097 7.320705139
4 19994 230033 11.50510153
5 19990 220012 11.00610305
6 30054 220014 7.320622879


Conclusion

Well, I think we can see from the above that the more optimised design developed though a solid and quite simply mathematical process can deliver benefits that far out-weigh straight coding. This basic process, neglecting code optimisations, gained increases of between nearly 6 and 11.5 times are possible if we sit down and think through the problem. Companies such as Google look for developers who can problem solve to this degree as they rely on their systems to be fast. I tend to prefer to bear in mind that my unit tests will also be running with other people's tests and the faster we can generate sets of data, the faster out tests will run and the faster our feedback loops will be (note, this can be used for common testing too).





0 comments:

Post a Comment

Whadda ya say?