Tuesday, July 10, 2007
There's a reason why a feature that takes five minutes to implement takes ten hours to code …
It seemed like an easy feature to add to the Quick and Dirty B-Movie Plot Generator. All I wanted was the ability to bookmark a particular plot.
What I didn't realize was that the method I used, saving the random number seed, was completely and utterly the wrong thing to do, and that the feature isn't quite as easy to implement as I first thought.
Easy to implement features, aren't.
My first clue? I “bookmarked” a particular plot on my development server:
Sheffie Les Mastenbrook is a mute world-famous card sharp from a doomed world. Susette Marie-Jeanne Sanford-Wright is a virginal green-skinned werewolf with nothing left to lose. And together, they must defuse a bomb in less than a day that's a thousand miles away.
(the bold parts? I'll get to that—and by the way, this is plot #1134)
Later on, I decided to check on the “bookmarked” plot on my deployment server, and got:
Sheffie Les Mastenbrook is a fast talking hunchbacked ad-exec on the run. Susette Marie-Jeanne Sanford-Wright is an orphaned short-sighted boxer with the power to bend men's minds. And together, they must fight crime.
Oops. (Bold text denotes differences between the two versions)
It seems that the random number generator between the two servers isn't quite the same; enough to get the names the same (which are the first things to be generated) but diverge right after that. So, if I wish to keep the URLs valid, I can't change servers, or even update the current server!
From this, I realized another problem—I can't change the data files the program uses! This stems from how I use the random numbers I get, and for this, I need a slight digression.
In C (which is what the program is written in), I call
rand() to get a random number.
rand() takes no
parameters, and returns an integer between 0 and some upper bound
RAND_MAX (which is a system and compiler dependant value). The
man page (on my development server, which is an older Linux distribution)
has this to say on using
rand() (sad to say, but the man page
on current Linux distributions excludes this excellent bit of advice):
The versions of
srand()in the Linux C Library use the same random number generator as
srandom(), so the lower-order bits should be as random as the higher-order bits. However, on older
rand()implementations, the lower-order bits are much less random than the higher-order bits.
In Numerical Recipes in C: The Art of Scientific Computing (William H. Press, Brian P. Flannery, Saul A. Teukolsky, William T. Vetterling; New York: Cambridge University Press, 1990 (1st ed, p. 207)), the following comments are made:
If you want to generate a random integer between 1 and 10, you should always do it byj=1+(int) (10.0*rand()/(RAND_MAX+1.0));
and never by anything resemblingj=1+((int) (1000000.0*rand()) % 10);
(which uses lower-order bits).
Random-number generation is a complex topic. The Numerical Recipes in C book (see reference above) provides an excellent discussion of practical random-number generation issues in Chapter 7 (Random Numbers).
For a more theoretical discussion which also covers many practical issues in depth, please see Chapter 3 (Random Numbers) in Donald E. Knuth's The Art of Computer Programming, volume 2 (Seminumerical Algorithms), 2nd ed.; Reading, Massachusetts: Addison-Wesley Publishing Company, 1981.
Since I'm picking a random entry from N items, I basically do
n = (int)(((double)rand() / (double)RAND_MAX) * (double)N);
I add an element to the list, the value of
N changes, and
for a given output of
rand() you get a diffent answer than with
the previous, smaller
And it doesn't matter where I add an item. Add a new occupation for our characters at the end of the list (on the development server) and for plot #1134 you now get:
Sheffie Les Mastenbrook is a mute world-famous inventor from a doomed world. Susette Marie-Jeanne Sanford-Wright is a virginal green-skinned vampire hunter with nothing left to lose. And together, they must defuse a bomb in less than a day that's a thousand miles away.
This “easy feature” is turning out not to be so easy. And this on a simple program. It's stuff like this that makes programming take longer than expected.
The proper solution is to record the fifteen random numbers generated and make that part of the URL, and make sure that additions to the various lists appear at the end, but we finally get “bookmarkable” plots.
Oh, I just realized—only to lose the “bookmarked plots” if the template (and thus, the number of random numbers selected) changes.