The Boston Diaries

Technical Details

Tumblers

Much like the Electric King James, this online journal (or weblog, or whatever you want to consider it) is an exploration in document storage and reference on the WWW. The entire site, except for a few pages like the one you are reading, is entirely dynamic and you can obtain as much or as little of the entire journal as you want.

You do this by specifying what exactly you want to read as part of the URL. For instance, to obtain the second entry for December 9^th, 1999, you grab:

/http://boston.conman.org/1999/12/09.2

But for the entire day, you would use:

http://boston.conman.org/1999/12/09

And the entire month:

http://boston.conman.org/1999/12

And consequently, for the year:

http://boston.conman.org/1999

At this point, while it may seem that I'm following a directory structure, where the year is the top level directory, then the months, days and the individual entries are files, it's not. For unlike a regular directory structure, you can also request ranges and obtain an arbitrary collection of entries. So, for instance, on the day I helped a friend move on August 9^th, 2000 and consists of entries two through eleven:

http://boston.conman.org/2000/08/09.2-11

Or maybe my vacation through Northern Florida looking for ghosts:

http://boston.conman.org/2000/08/11-15

Which happened between August 11^th and the 15^th, 2000. Actually, it happened between:

http://boston.conman.org/2000/08/10.2-15.5

The range specification works across months and years as well. Also, if you specify a range where the endpoint is earlier than the begin point, say,

http://boston.conman.org/2000/08/15.5-10.2

Then the entries come out in reverse chronological order, much like blogs are handled. Technically, the handling of ranges was the hardest problem to deal with (in fact, the Electric King James Bible doesn't handle reverse ranges at all!) and took almost 2,500 lines of C code to handle (what I didn't mention above is that under certain conditions, namely a single year, month, day or entry, navigation to previous and next entries is handled automatically).

There are some 670 lines of C code to parse the range and form it into what I term tumblers, a concept I modified from work done by Ted Nelson. So far, the tumblers used in the Electric King James aren't compatible with the tumblers used in the Boston Diaries and both aren't compatible at all with the Xanalogical tumblers of Ted Nelson. But like I said, both are experiments and both use a different addressing scheme for particular entries.

I do, however, make a distinction between a tumbler that specifies a range and one that doesn't, a so called single tumbler. In the Boston Diaries, the distinction between a single tumbler and a ranged tumbler isn't that great, and is really only used right now to enable navigation on single tumbler specifications; no Next or Previous links occur on ranged selections. Once that selection is made, internally I then create a ranged entry to simplify the code; for instance:

http://boston.conman.org/1999/12

This is classified internally as a single tumbler, but a range is created:

/1999/12/04.1-15.4

This forms the main loop, which reads in each entry and formats it on the same page. Doing it this way allows me to handle even the specification for a single entry as a loop; only instead of looping for say, twenty entries, just loop for one. There's already enough special cases to handle (such as printing out the entries in reverse).

Storage

The storage of each entry is actually trivial compared to some of the other issues involved in the code. Each year is a directory, below that, each month is a directory, and below that, each day is a directory. In the bottom level the actual files are stored. There is one file for each entry (the body), numbered. There are also three other files, one containing the titles (called appropriately enough, titles), one containing the keywords (currently called class for classification, but it's really a set of keywords) and one containing the author (called authors). Each file is a simple text file, with each line pertaining to that entry. So that the first line of each file is the meta information for the first entry, the second line for the second entry and so on.

I did it that way to keep the memory consumption down—reading in an entry only includes the title, keywords and author and only if needed to I actually load in the entry. It works but I'm not entirely wedded to the format. I've recently started reading up on XML and XSL and I'm thinking it may be worthwhile to use those and in that case, I'll need to merge the title, keywords and author into the entry file themselves. But that's still down the line yet.

The real mess is how I reference them internally. Right now, it's a linked list of days, with each day containing an array of pointers to that day's entries. That's fine, until you start specifying individual entries and it gets rather messy. Then add in the fact you can specify a reverse order to things and the term quagmire comes to mind. I've often thought of doing a linked list of entries, but then you have to track changes in the day/month/year and while not impossible, still isn't all that clean either, but it would make the generation of the RSS file a easier (since it only reports the last 15 entries, not the past seven days the main page displays).

Formatting

The output is generated by a template system of my own devising. It's simple and is not based on HTML or XML; it's more a macro substitution type system. When a macro substitution token is found, it's looked up in a table and the appropriate code is called. The template is actually broken up into multiple files so it's not all that easy to work with (when I said it was simple, I meant simple to code). There are eleven files that make up the templates used to display the page (two for the RSS file) and about 30 callbacks.

The actual formatting is done by CSS; the HTML is very straightforward—there are no <TABLE> tags used in the generation of these pages. The reason it looks so…plain (okay, ugly) in Netscape 4.X is that the actual style sheet is hidden (using a trick I picked up somewhere) so that Netscape 4.X doesn't blow up. And it should validate as valid HTML 4.01, although some of the ealier entries probably need some cleaning up to do.

Technical Details

Tumblers

Storage

Formatting

Obligatory Picture

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer