Friday, September 05, 2008
Yet even more stupid benchmarks
Yet another silly optimization problem. This time, from a silly coding challenge to find the number of integers expressible with unique digits (that is, no single digit repeats) in a base-10 representation up to the value 10,000,000,000 (there are 8,877,690 such numbers, by the way).
The neatest and fastest solution was final program on this page, written in C#. It generates only such numbers; it doesn't try to test each number. Since I don't use C#, I decided to translate the code in C to play around with it. Wasn't all that hard:
#include <stdio.h> #include <stdlib.h> int total = 0; const int pre[(1 << 10) + 1] /* = { ... } */ ; void generate2( int maxlen, int currentlen, int availabledigits, int currentvalue ) { int last = (currentlen == maxlen - 1); int x = availabledigits; while(x != 0) { int digit = pre[x ^ (x & (x - 1))]; x &= (x - 1); if (digit == 0 && currentvalue == 0) continue; if (last) ++total; else generate2( maxlen, currentlen + 1, availabledigits & ~(1 << digit), (currentvalue * 10) + digit ); } } int main(int argc,char *argv[]) { int len; for (len = 1 ; len <= 10 ; len++) generate2(len,0,0xFFF >> 2,0); printf("total: %d\n",total); return EXIT_SUCCESS; }
I pregenerated the pre[]
array since I wanted this to run as
fast as possible. The code used to generate the array:
for (i = 0 ; i <= 10 ; i++) pre[1 << i] = i;
Anyway, once written and compiled (gcc -O4 -fomit-frame-pointer
f.c
) it ran in about 0.2 seconds (average run) on a 2.6GHz machine. Fast, but I could go faster by
running it across the two CPUs in the box. I was expecting about half the runtime,
since this is easily parallelizable.
It ran in about 0.16 seconds, a rather disappointing ¾ time. I
commented out the code in generate2()
just to test the overhead
of threading and syncronization and that isn't a factor (program ran in
0.001 seconds).
Undaunted, I decided to try one of the quad-core boxes at The Office. Reworked the code a bit to split the load between four CPUs as evenly as possible, and ran some tests.
0.13 seconds on average. Still not quite half the speed.
Hmmm …