Monday, May 21, 2007
Small forays into multiprocessing programming
% a parallel sudoku solver. each cell on the grid is a separate % process, aware of - and negotiating with - its "neighbours" (those % cells it must not conflict with). % a cell is very simple - it contains just two integer values as % "mutable" state: the current digit and a "confidence" in that value. % the negotiation protocol is defined below. cells "ping" the % controller when the exchange results in a change of value. the % controller stops the system when there has been no change in any % value for CONTROL_TIMEOUT (currently 1s). % this is terribly, terribly brain-dead and inefficient (takes several % - sometimes tens of - minutes to solve the test case on my newish % linux box). tuning the doubt parameter may help slightly (a value % of 0.1 seem to work). % i think some very interesting graphics could be generated from this % code - think of the solution as a kind of crystallisation, with % competing centres of nucleation. please contact me % (andrew@acooke.org) if interested...
Parallel Sudoku Solver in Erlang :o)
It's an intriguing little program, written in Erlang, to solve a certain class of puzzles (Smirk might find this interesting).
But it does point out the problems of multiprocess coding:
There's a bug. I don't know what it is, but there is one - at least one non-certain cell is stuck at a fixed value.
It's not easy, even in seemingly simple programs.
I came across this little gem of a program (this version in C):
#include <stdio.h> #include <stdlib.h> #include <math.h> #define MAX 32768 int main(void) { double a; double b; double c; int i = 0; for (c = 2 ; c <= MAX ; c++) { for (b = 1 ; b < c ; b++) { a = sqrt(c * c - b * b); if (trunc(a) == a) i++; } } printf("%d\n",i); return(EXIT_SUCCESS); }
This counts the number of Pythagorean triples within the first 32,768 integers. An “easy” way to speed up this program is to spread the calculations across mutiple processors. Easy enough, just break the main loop across the number of processors. So a uniprocessor box will do all 32,768 iterations, while a dualprocessor box, each processor will do 16,384 iterations, etc.
The first attempt broke the calcuations like that—1–16,383 on one processor, 16,384–32768 on the second one. And much to my surprise, the runtime went from 28 seconds on a single processor run to just a mere 27 seconds across two processors.
Well then …
It took some thinking about what's actually going on to find out the way to properly portion out the work (which I'll leave as an exercise to the reader to figure out). And even then, the time on a quad-processor box went from 28 seconds running on a single processor to 12 seconds across all four (where upon the expected time would be 7 seconds across all four, and oddly enough, the total user time accumulated went from 28 seconds on a uniprocessor to 44 seconds across all four, which makes accounting sense, but it's still odd to see), so the speed up isn't quite linear.
Even on a problem that seems simple to partition there are still “gotcha's” along the way.