Monday, March 13, 2000
Tumbling through Code
I'm still working on the tumbler code and it's more interesting (read: complicated) than I originally thought. Basically, I think I want too much here.
I already parse Bible notations and to that I want to add a date-based reference system for the journal here. The bible notation is of the form:
book `.' chapter `:' verse
while the date based version is:
year `/' month `/' day `.' entry
It's easy enough to specify multiple unit separators, but I do want to maintain a canonical form for the search engines—I'd rather not pollute them with multiple references to the same page, so if someone were to request Genesis.1.1 (note the period instead of a colon), they would be redirected (via a permanent redirection) to Genesis.1:1 (note the colon). Similar for the date tumblers.
So now, while I can accept multiple unit separators, I need to keep track of which are the prefered ones, and which aren't, and do redirection accordingly. Doing this without making the code a horrendous mess is not easy.
Then there is the spelling correction (at least as far as the Electric King James goes)—someone can still have a correctly formatted reference to a book, say Eklesiastics.1:3 and yet it isn't correct. It's not E-K-L-E-S-I-A-S-T-I-C-S, it's E-C-C-L-E-S-I-A-S-T-E-S (don't worry, I can't spell either). In that case, I can detect what the user was most likely trying to get to and again, send a redirection to Ecclesiastes.1:3. But that's something else I need to keep track of.
Eight versions of the tumbler code later, I think I have it working, but I decide to ask for a second opinion. So I ask Mark how he would do it.
“A single unit separator, and if a user specified the wrong unit specifier, it's an error that is reported back to the user,” he said.
“What type of error? 404? Technically it's not found,” I said.
“Maybe not a 404, but an error page should come back, possibly saying `This is how you need to form the request,' ” he said.
“I hate programs like that, Mark. They can detect the error, they can even correct for the error yet they don't.”
“I'm for strict parsing rules and if they're not correct, it's an error.”
In once sense, his way is easier for the programmer—it's this format or it's an error. The code is eaiser to write and possibly maintain but it makes more work for the user. My way is harder to write, get correct and possibly maintain but is more forgiving of human input error and tries to do the Right Thing.
Coincidently, Mark doesn't like computers that attempt to do The Right Thing. Can't say I blame him much—many programs that attempt to do The Right Thing fail miserably all around. And he does have a point in that my tumbler code may be trying too hard to be general use, what with flags being passed back and forth.
Which explains the eight versions of code.
I think I finally have it though.