Processing math: 100%

Monday, 24 June 2013

Generating Credit Card Numbers/Tokens for Testing - Fast

An ex-colleague of mine who works as a team lead/principle developer in a large organisation keeps his own blog.  He recently posted about one of our ancient problems of validating credit card numbers in .NET using the well known Luhn check algorithm. In the post he introduces the theory and also introduces a card number/token generation algorithm to create credit card numbers. It was a problem which I was once asked to look at but did not have the time so left it. It had been picked up by different folk in the organisation, finished off for their purposes (including DB Unit Tests) and I saw it again for the first-time in two years the other day. It works fine, but then within it I saw the mathematical equivalent of a 'code smell' and it was this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public string GenerateCardToken()
{
    string cardNum = string.Empty;
 
    for (int j = 0; j < random.Next(3) + 13; j++)
    {
        cardNum += random.Next(0, 10).ToString();
    }
 
    int c = 0;
    string fullNum;
 
    while (!(fullNum = string.Format("{0}{1}", cardNum, c)).LuhnCheck())
    {
        c++;
    }
 
    return fullNum;
}

It made me think that I needed to look into this, as I really don't like the way it generates the check digit at  lines 13 to 16 inclusive.

Coders out there who are not optimisation geeks will ask what is wrong with that?

The problem with this part of the algorithm, which locates the check digit, is that it is a form of trial and error when there is a mathematical solution to the problem. Trial and error elements which simply increment a variable result in wasted loop cycles, wasted time and wasted computing. In the case of small scale projects or unit testing, this isn't too much of a problem until you want to make your tests fast and have lots of credit card data you need to generate.

Stephen did a good job of showing you the basics of Luhn. So I won't cover them again. Plus, he also shows his experience of generating valid numbers from partial cards, so that you can pretend to be a VISA or MasterCard customer to test that any card identification algorithms work. Functionally, this is further than I will go today. Hence, I would recommend reading his post if you want a more comprehensive overview of the subject.

I also won't concentrate on code level optimisations, use of LINQ, using Bitwise arithmetic, data types or anything else, as there are many good resources out there already for it. I am interested in finding a more performant way of doing this that against this algorithm. So if you are ready, let's up the anti.

The Maths

Firstly, given we are constraining ourselves to modulo arithmetic, once you have calculated the Luhn sums then the check digit is unique! To prove it, consider the following sequence of 16 digits, which form some card number C. The 15th x is shown at the front of the sequence and is the leftmost digit on a credit card.


A valid credit card number is anything that passes a Luhn check. The Luhn check is also pretty simple. Summing the result of doubling the odd digits (and adding the resulting digits if necessary) and then calculating the check digit by finding the difference between 10 and the units in the result.

What's the problem?

To optimise this, there are two steps here that can cause coders some problems. The first is the check for whether or not the doubling has resulted in a two digit number such as 14 (in which case the Luhn sum would  include 1 + 4 = 5) or not. Well, there is an elegant solution and that is that for any two digit number, if you modulo the result with 9, you get the two digit addition. Try it:

11 = 2 (mod 9)
34 = 7 (mod 9)
19 = 1 (mod 9)
...

Why does this work?

In number theory this works because any number can be expressed as a sum of some parts (units, tens, hundreds...). You have definitely been introduced to this before, I guarantee it.. unless you have never been educated in your life and if not, how are you reading this?

Because of that, 28 can be expressed as:

28 = (2)(10) + (8)(1)

And this generalises to any number 'ab' being expressed as:

'ab' =  10a + b

The Luhn algorithm adds the digits together, hence the sum is a + b. This then means that the difference between them is:

'ab' - (a + b) = 10a + b - a - b = 9a

What this means is that every single two digit number you have, which requires the summing of the two digits when subtracted from the original number (which is effectively what the Luhn check does to get the remainder) is a multiple of 9. It's always a multiple of 9. If it is always a multiple of 9, then taking the modulus of the original number will give you the remainder relative to that division by 9. So the mod is all we have to do once we have doubled the odd numbers in the zero based sequence.

For those I have worked with in my time, who I have played a few 'mathemagical' tricks on, they may recognise this from one of my mind-reading tricks. I hope you can see that maths is more than just puzzle solving! Mark my words, the nuclear bunker I have full of a 5 years supply of baked beans will also come in handy one day! :-D

So that's that one then, what about the check digit?

The check digit isn't really any harder. All you have to do is find the next multiple of 10 up from your Luhn sum of the digits 0 to 14. This can be by the use of the units or as I preferred to do it, multiply the sum by 9 and take the modulo 10 of that number. This becomes the 16th digit (at position 15). After all, that is how the check digit is calculated. It ultimately becomes the following, where L is a Luhn operator:



I know you don't like these symbols, so some code for you.

        public string GenerateCardTokenOptimised()
        {
            int[] checkArray = new int[15];
            
            var cardNum = new int[16];
 
            for (int d = 14; d >= 0; d--)
            {
                cardNum[d] = _random.Next(0, 9);
                checkArray[d] = ( cardNum[d] * (((d+1)%2)+1)) % 9;
            }
 
            cardNum[15] = ( checkArray.Sum() * 9 ) % 10;
 
            var sb = new StringBuilder(); 
 
            for (int d = 0; d < 16; d++)
            {
                sb.Append(cardNum[d].ToString());
            }
 
            return sb.ToString();
        }


That's it. Nothing more nothing less. I have tried to keep it fairly readable, but as I have implied, you can definitely make improvements, especially to the code quality and method name which is very poor.

How does it compare?

Method

Created a class library and placed both pieced of code within it. Developed a test project to test the generation of 1,000 card numbers. The tick times were taken and placed in a string builder to output to two separate files at the end of the run. This was repeated 6 times in total with the runs in either order (optimised first then unoptimised first).

Results

Pertinent unit-tests passed

Run No Optimised Unoptimised Speed Increase
1 20067 180036 8.971744655
2 30012 180025 5.99843396
3 30065 220097 7.320705139
4 19994 230033 11.50510153
5 19990 220012 11.00610305
6 30054 220014 7.320622879


Conclusion

Well, I think we can see from the above that the more optimised design developed though a solid and quite simply mathematical process can deliver benefits that far out-weigh straight coding. This basic process, neglecting code optimisations, gained increases of between nearly 6 and 11.5 times are possible if we sit down and think through the problem. Companies such as Google look for developers who can problem solve to this degree as they rely on their systems to be fast. I tend to prefer to bear in mind that my unit tests will also be running with other people's tests and the faster we can generate sets of data, the faster out tests will run and the faster our feedback loops will be (note, this can be used for common testing too).





Wednesday, 12 June 2013

Deploying from SVN through Jenkins to AWS Elastic Beanstalk (Part 2): Linking Jenkins to AWS EB

Following on from Part 1 of the blog on integrating Jenkins with AWS EB, this concludes the two part series by focussing on the AWS deployment process within Jenkins. At the end we will briefly outline the limitations of AWSDeploy to EB and what you would need to look to if your needs are more complex.

So what does Jenkins look like?

Setting up Jenkins will basically require executing the following the steps:
  1. Create a 'New Job' in Jenkins 
  2. Fill in the URL of  your SVN repository
  3. Decide on Jenkins' polling frequency for SVN
  4. Fill in the batch process details, including MSBuild.exe and AWSDeploy.exe CLI, with appropriate parameters
  5. Save the job

Step 1: Create a 'New Job' in Jenkins

Nice and easy, from the main Jenkins page, click "New Job" on the left hand menu in Jenkins.

When the new job page loads, fill in the details of the job you with to create. You'll want to create a free-style software project. 

I Normally prefix the job name with 'RELEASE-' but for this demo, I am being a bit slapdash. When you are happy with it, click the OK button.

fig 1 - Creating a 'New Job' in Jenkins (click to enlarge)

Step 2: Configure the Job

Jenkins then moves to the job configuration page which allows you to set up the SCM (in this case SVN), the Build Steps and Post Build Actions.

Hence:

1. Fill in a display name under "Advanced Project Options" if you want it

2. Under "Source Code Management" select the Subversion radio button and enter your repository URL. Jenkins will poll your SCM and if you need to enter SVN credentials, a validation help link appears and you can click to enter your SVN credentials within that.
fig 2 - Links to enter your server's credentials. This server doesn't exist - just showing the link
3. Select your method of authentication. In my case, I chose Username/Password and then enter your credentials. Important note, the site runs unsecured by default. Once you have entered the credentials, click the OK button.

fig 3 - Subversion authentication in Jenkins (click to enlarge)
4. Complete the rest of Jenkins' SVN config. I always check out a fresh copy so I don't have previous builds laying around and potentially making a mess of the build. Note, you can also check out multiple URL's if you have dependent projects. This can of course, also be done through an SVN 'extern' declaration.

5. Build Triggers - This is an important feature that allows you to configure what sets off the build. For example, if you have a dependency chain of projects/solutions which are already set up as Jenkins jobs, you can choose which project(s) need to complete before this job runs. Projects are separated by commas.

Here you should select the "Poll SCM" step. This then brings up a text box where you can enter the polling frequency in a CRON like language. Jenkins polls your SCM (SVN here) and stores the latest revision that it builds. If there is a change in that revision number on a poll, it then starts a checkout and build process.  

Alternatively you can choose to build periodically, or combine the two.

6. The build steps - This is probably the key elements when it comes to deploying to AWS. For simplicity, I have put all the steps in this demo into one build step so that it completes the steps in one go. However, you would normally want to split them out into multiple steps, such as 'Build', 'Test', 'Deploy' so you can stop at any point. You can pick up the build artefacts between the steps by using the same workspace environment variables.

Putting it all together still works, as long as an errorcode is returned, but it isn't neat.

Clicking on the "Add build step" button open up a text area when you can enter a list of  Windows console commands. In my case, they are:

C:\Windows\Microsoft.NET\Framework64\v4.0.30319\MSBuild.exe "%WORKSPACE%\AProject.sln" /p:Configuration=Release /p:Platform="Any CPU" /m
cd "%WORKSPACE%\AProject"
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\MSBuild.exe /t:Package /p:Configuration=Release
"C:\Program Files (x86)\AWS Tools\Deployment Tool\awsdeploy.exe" /w /r /v /DDeploymentPackage="%WORKSPACE%\AProject\obj\Release\Package\AProject.zip" "%WORKSPACE%\AProject\AWSDeploy.txt"

You will notice the %WORKSPACE% environment variable in this series of commands. This is the Jenkins workspace that the code has been checked out to. There are a number of environment variable and they are: 

The following variables are available to shell scripts

BUILD_NUMBER
The current build number, such as "153"

BUILD_ID
The current build id, such as "2005-08-22_23-59-59" (YYYY-MM-DD_hh-mm-ss)

BUILD_DISPLAY_NAME
The display name of the current build, which is something like "#153" by default.

JOB_NAME
Name of the project of this build, such as "foo" or "foo/bar"

BUILD_TAG
String of "jenkins-JOBNAME{BUILD_NUMBER}". Convenient to put into a resource file, a jar file, etc for easier identification.

EXECUTOR_NUMBER
The unique number that identifies the current executor (among executors of the same machine) that's carrying out this build. This is the number you see in the "build executor status", except that the number starts from 0, not 1.

NODE_NAME
Name of the slave if the build is on a slave, or "master" if run on master

NODE_LABELS
Whitespace-separated list of labels that the node is assigned.

WORKSPACE
The absolute path of the directory assigned to the build as a workspace.

JENKINS_HOME
The absolute path of the directory assigned on the master node for Jenkins to store data.

JENKINS_URL
Full URL of Jenkins, like http://server:port/jenkins/

BUILD_URL
Full URL of this build, like http://server:port/jenkins/job/foo/15/

JOB_URL
Full URL of this job, like http://server:port/jenkins/job/foo/

SVN_REVISION
Subversion revision number that's currently checked out to the workspace, such as "12345"

SVN_URL
Subversion URL that's currently checked out to the workspace.

The batch commands perform the following functions.

The 1st command builds the solution containing the project, using the release configuration. In a normal project I at least use the web.Debug.config and web.Release.config files with the MS Build match and replace mark-up into the main web.config file. A tutorial blog can be found here.

The 2nd command enters into the directory of the built project. This is just so we can run the package verb for MSBuild.

The 3rd builds a standard deployment package out of the project.

The 4th line is the AWS Deployment line. This actually carries out the deployment of the package created in batch step 3 previously using the AWSDeploy.txt file we created from within Visual Studio.

Post-build actions
A post build action in Jenkins is an event that takes place after the build steps. They occur regardless of whether or not the build has completed successfully and encompass one or more of the steps shown in the screengrab below.

fig 4 - The Jenkins post-build actions (click to enlarge).
In email post-build steps, Jenkins allows users to be sent emails for unstable builds (it builds but tests fail), build failures and send separate emails to the people who broke the build.

You'll notice form the screenshot that there is the option of publishing JUnit tests. Obviously, MSTest results are not exactly in the same format. However, you can run a build step to convert MSTest's TRX to HTML files using tools such as the MS Test Report Generator. There is a similar process for NUnit results.

Recommendations:
Don't forget to set up e-mail alerts within Jenkins for job events as a "Post-build action". You can also tie events to an RSS feed. In either case, you have the option of using them for failed builds and all builds. Plus the email benefits identified above. Note though, Jenkins doesn't automatically set up an (or your) email server. You have to have a mail service/daemon running to send emails.

Also, I definitely recommend post-build steps to publish your test results. This may include a step to post to your build monitor if you have one. Maybe naming and shaming the individual breaking the build... not that you want a harassment suit on your company's hands of course ;-)

Step 3: Run the darn thing!

When you check in some code, at the intervals specified for the Jenkins cron, Jenkins will poll SVN, clean checkout the code, build, test and deploy the code. Any failure will cause the build history to include the usual red light.

fig 5 - Build history, including failures 
You can of course, run the build using the "Build Now" menu link in the top left menu to kick it off manually. In either case, there is a significant increased delay over any local process, but that is to be expected since the deployment is going over the wire and EB is incrementally building the environment (it would be longer first time out. I've had to wait over 10 minutes sometimes. However, )

The following is the example deployed to AWS using the above process. As well as using AWS Elastic Beanstalk, for my own work, I have a route 53 entry to the domain name and a back end DB.

fig 6 - Deployed EB example site (click to enlarge)

Conclusion

Jenkins has always been a really easy to use system for CI. I've found it easier to use than ThoughtWorks Go. However, both Go and TFS give you more customisation options out of the box. You can certainly expand Jenkins using custom plugins. Obviously, there is quite a bit more to do to set up a CI system from the above. But it gives you a framework to modify and work with. 

Also, the use of the AWSDeploy.exe makes it really easy to deploy fairly simple environments. 

Limitations
When you get up to the level of having to manage large, custom architectures and deployments, where detailed configuration and set-up of multiple AWS resources are requires with EC2, ELB, SQS, S3, SWF, RedShift and big data instances for example, a basic AWSDeploy text file won't cut it any more. 

At that point, you would need to consider moving from just using an EB setup to using CloudFormation to set up a more complex cloud architecture. As it happens, EB does basically generate a JSON Chef  Cookbook which is used as the CloudFomation template. You can see this if you go to the ClouseFormation stacks page in the AWS Console. I suspect there will be times when you'll need to roll your own.