The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Monday, September 25, 2023

Changing the historical record of my blog

Twenty-one years ago I was worried about loosing the historical presentation of my blog both because it was template driven, and through the use of CSS. Changes that effect everything at once certainly appeared quite Orwellian to me, although I might be in a very small minority in worring about this.

And yet, since then, I've tweaked the CSS quite a bit since I wrote that. I figure I'm not changing the content, so it's okay. right?

It was over a year ago when I noticed that a lot of my earlier entries had the initial paragraph shifted over to the left, due to a change in the template file I made around 2003. The old template had an initial <P> tag so I didn't have to type it, and the new one removed said tag. That left maybe a thousand posts (give or take) that needed fixing. I started doing the job manually at first, then gave up at the sheer number of posts to fix. Again, it was not changing the content but fixing the presentation. And it bothered me that there were posts that weren't formatted correctly.

About a week or two ago, I realized that the markup I used for foreign words:

<span lang="de" title="My hovercraft is full of eels">Mein Luftkissenfahrzeug ist voller Aale</span>

is probably not sematically sound HTML. I even wrote about that issue twenty years ago, and now realize it should be:

<i lang="de" title="My hovercraft is full of eels">Mein Luftkissenfahrzeug ist voller Aale</i>

Around the same time, I read up on the “proper” use of <BLOCKQUOTE> and that the attribution should appear outside the blockquote, not inside as I've been doing for years, even though I was doing The Right Thing™ when I first started blogging, but changed for some reason I long forgot.

And then several days ago, I noticed the sample BASIC code was incorrect and it was bugging me—the keyword THEN would always show up as THENNOT. How that happened is a topic for another post, but in the meantime, I decided to fix the issue without mentioning it. The change didn't change the intended meaning of the post, it was fixing incorrect output, not saying we were always at war with Eastasia.

After that, I decided to go back and fix the “formatting” issues in the blog. I have code that will read entries and parse the HTML I use into into an AST (or should it be a DOM, even though I'm using Lua, not Javascript?) which I use to generate the Gopher and Gemini versions. To fix the initial paragraph issue, all I needed to do was identify the entries that didn't start with a <P> tag and just prefix the raw text with said tag.

To update the HTML for foreign words, it was enough to identify entries with <SPAN LANG="language"> and with some sed magic, switch it to read <I LANG="language"> (and fix the corresponding closing tags). It's just fixing the semantics of the HTML, not changing the past, right?

The fix for the <BLOCKQUOTE> issue wasn't quite so easy—I still had over 700 entries that needed to be fixed, so I ended up writing code that would spit out the parsed HTML back into HTML. It would have been easy to output it as:


<p>I've been following the various Linux <abbr title="Initial Public Offerin
g">IPO</abbr>s and today I see that <a class="external" href="http://www.val
inux.com/">VA Linux Systems</a> had their <a class="external" href="http://d
ailynews.yahoo.com/h/nm/19991209/bs/markets_valinux_1.html">IPO today.</a>. 
 Briefly, it IPOed (can you verb a TLA?  Can you verb the word “verb?” Whate
ver … ) at US$30 and opened at US$299.  Inbloodysane.</p><p><a class="extern
al" href="http://www.andover.net/">Andover.Net</a> wasn't nearly as inbloody
sane.</p>

one long line—the browsers don't care, but I do if I ever have to go back and edit this. Instead, I want the output to still be editable:

<p>I've been following the various Linux <abbr title="Initial Public
Offering">IPO</abbr>s and today I see that <a class="external"
href="http://www.valinux.com/">VA Linux Systems</a> had their <a
class="external"
href="http://dailynews.yahoo.com/h/nm/19991209/bs/markets_valinux_1.html">IPO
today.</a>. Briefly, it IPOed (can you verb a TLA? Can you verb the word
“verb?” Whatever … ) at US$30 and opened at US$299. Inbloodysane.</p>

<p><a class="external" href="http://www.andover.net/">Andover.Net</a> wasn't
nearly as inbloodysane.</p>

That meant handling not only <P> but all the block level tags in HTML, <BLOCKQUOTE>, <TABLE>, <DL> (which I use for emails and screenplay dialog), <UL>, <OL>, and <PRE>. Now that I have that working, I can identify the citation paragraphs for blockquotes, and move them to the appropriate location.

I'm about to do that, yet I'm still a bit hesitent. Yes, it's just fixing the semantic presentation, but now that I have the code to read and write HTML, future mass changes are easy to do.

I'm probably thinking too much on this.

I think.

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.