The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Thursday, July 18, 2002

A blogger's HTML

An article about the Dublin Core at CONTENU.nu got me thinking about the problem of indexing weblogs. The major problem is that there is no semantic markup to include meta-information in the body of a webpage. Sure, you can include meta-information in the <HEAD> section, using both <META> and <LINK> tags, and that's fine when the page in question is about a single topic.

But a weblog has several, mostly unrelated entries on a single page, with the rare weblog having several nearly article-length entries on the main page (and by extension, the archive pages). Google indexes these pages as if it were on a single topic and as a result, you get fodder the Disturbing Search Requests.

There are heuristics that can be used to index a weblog page, but it would be nice to have some defined way to mark individual entries, with the ability to include meta-information for each entry. I had intended for my software here to build up the <META> tags (since I do include keywords/classification for each entry I write) and while that may be viable for up to a weeks worth of entries on a page, it starts getting silly for a month, and for a whole year? It's just not practical.

But from the Dublin Core article, I ended up at the W3C site and came across XHTML 1.1, which is still being worked on, but (and this is the exciting part here) this version of XHTML can be extended! (unlike XHTML 1.0, even though the name says it's extensible) It's completely modular so new variants of XHTML (for example, it can be extended to MathML) can be constructed from bits and pieces of existing XHTML modules.

So in the future, it may be possible to extend XHTML to include meta-information in the middle of a page, instead of just in the <HEAD> section (sorry, <head> section—XHTML uses lower case for tags). So instead of having to parse code like:


<h3><a class="local" id="2002/07/16.1" href="/2002/07/16.1">The Ins
and Outs of Calculating Browser Usage</a></h3>

<!-- programming, statistics, web browsers, web log files -->

<p>

I spent the past few ...

<h2><a class="local" id="2002/07/14" href="/2002/07/14">Sunday, July
14, 2002</a></h2>

<h3><a class="local" id="2002/07/14.1" href="/2002/07/14.1">Probability</a></h3>

<p>

...

It can, instead, have an eaiser time with:


<entry>
<head>
<meta name="keywords" content="programming, statistics, 
              web browsers, web log files">
<link rel="permalink" href="/2002/07/16.1">
<link rel="next"      href="/2002/07/17.1">
<link rel="previous"  href="/2002/07/14.1">
</head>
<body>

<p>

I spent the past few ...

</body>
</entry>

<entry>
<head>
<meta name="keywords" content="daily life, web pages, home pages, 
              six degress of separation, Tom Hoylrod">
<link rel="permalink" href="/2002/07/14.1">
<link rel="next"      href="/2002/07/16.1">
<link rel="previous"  href="/2002/07/13.1">
</head>
<body>

...

</body>
</entry>

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.