The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Monday, Debtember 30, 2002

Semantic HTML

There's quite the buzz in the weblogging community over Mark Pilgrim's use of the <CITE> tag (among other more esoteric tags in HTML). It's a nice idea, but all the standard says about <CITE> is:

CITE:
    Contains a citation or a reference to other sources.

HTML 4.0 § 9.2.1 Phrase elements

And only a few scant and quite trivial examples. I'm not sure of the exact usage of the <CITE> tag. In the following:

In Snowcrash, Neal Stephenson explored the implications of neuro-linguistic hacking …

Now, am I supposed to mark that up like:

In <CITE>Snowcrash</CITE>, Neal Stephenson explored the implications of neuro-linguistic hacking ...

Because I'm citing the book Snowcrash? So, along those lines, if I had instead written it as:

Neal Stephenson, in his book Snowcrash, explored the implications of neuro-linguistic hacking …

Would I then mark it up as:

<CITE>Neal Stephenson</CITE>, in his book Snowcrash, explored the implications of neuro-linguistic hacking ...

since now I'm emphasizing Neal Stephenson over the book? But the book was written by Neal Stephenson so should it instead be:

In <CITE>Snowcrash</CITE>, <CITE>Neal Stephenson</CITE> explored the implications of neuro-linguistic hacking ...

Okay, so it's a contrived example, but generating semantically correct markup isn't trivial and expecting the general public to get it correct is asking a bit too much. As one person pointed out, given a hypothetical tag like <EDITOR>, is it:

<EDITOR>Joe Blow</EDITOR>

or

<EDITOR>vi</EDITOR>

(except when it's <EDITOR>Frontpage</EDITOR> but I won't go there)?

There are other semi-obscure tags for semantic mark-up and fortunately, most of them are less ambiguous as for usage, like <CODE> is for mark-up of computer source code, or <SAMP> for program output. Unfortunately the HTML spec lists both <CODE> and <SAMP> as an inline tag, not a block tag which really restricts their use. I'm not sure what the W3C was thinking when they made <CODE> and <SAMP> inline. Using <CODE> to mark-up code fragments will turn something like:

for (i = 0 ; types[i].sl != NULL ; i++)
{
  if (strstr(filename,types[i].sl) != NULL)
    return(types[i].sl);
}
return("text/plain");

into:

for (i = 0 ; types[i].sl != NULL ; i++) { if (strstr(filename,types[i].sl != NULL) return(types[i].sl); } return("text/plain");

Nice, huh?

Dougal Campbell suggests using:

CODE
{
  white-space: pre;
}

Which sounds good, but doesn't work. The CSS spec states that white-space is only valid for a display type of “block”, which <CODE> isn't (remember, it's “inline”). To work, you really need:

CODE
{
  display:     block;
  white-space: pre;
}

Which works fine in Mozilla, but fails for IE 5x (which is most likely a bug) and Lynx, which doesn't even look at the CSS file (and it looks like I have one regular reader who uses Lynx). As much as I would love to use <CODE> and <SAMP> for semantically better mark-up, I'm afraid I'm still stuck with using <PRE>; otherwise I'll end up with:

<CODE>for (i = 0 ; types[i].sl != NULL ; i++)</CODE><BR>
<CODE>{</CODE></BR>
<CODE>  if (strstr(filename,types[i].sl != NULL)</CODE><BR>
<CODE>    return(types[i].sl);</CODE><BR>
<CODE>}</CODE></BR>
<CODE>return("text/plain");</CODE><BR>

Which is silly. (Okay, it's easy enough to write some code to automatically convert the source code, but semantically, does it even make sense?)

The upshot of all this rambling about semantically correct HTML? Um … not much really. I won't be changing the mark-up I use too much since I do lose the visual appearance in most browsers (although I may try giving the <CITE> tag a bit of a go).

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.