Semantic HTML

Monday, Debtember 30, 2002

There's quite the buzz in the weblogging community over Mark Pilgrim's use of the <CITE> tag (among other more esoteric tags in HTML). It's a nice idea, but all the standard says about <CITE> is:

CITE:
Contains a citation or a reference to other sources.

HTML 4.0 § 9.2.1 Phrase elements

And only a few scant and quite trivial examples. I'm not sure of the exact usage of the <CITE> tag. In the following:

In Snowcrash, Neal Stephenson explored the implications of neuro-linguistic hacking …

Now, am I supposed to mark that up like:

In <CITE>Snowcrash</CITE>, Neal Stephenson explored the implications of neuro-linguistic hacking ...

Because I'm citing the book Snowcrash? So, along those lines, if I had instead written it as:

Neal Stephenson, in his book Snowcrash, explored the implications of neuro-linguistic hacking …

Would I then mark it up as:

<CITE>Neal Stephenson</CITE>, in his book Snowcrash, explored the implications of neuro-linguistic hacking ...

since now I'm emphasizing Neal Stephenson over the book? But the book was written by Neal Stephenson so should it instead be:

In <CITE>Snowcrash</CITE>, <CITE>Neal Stephenson</CITE> explored the implications of neuro-linguistic hacking ...

Okay, so it's a contrived example, but generating semantically correct markup isn't trivial and expecting the general public to get it correct is asking a bit too much. As one person pointed out, given a hypothetical tag like <EDITOR>, is it:

<EDITOR>Joe Blow</EDITOR>

<EDITOR>vi</EDITOR>

(except when it's <EDITOR>Frontpage</EDITOR> but I won't go there)?

There are other semi-obscure tags for semantic mark-up and fortunately, most of them are less ambiguous as for usage, like <CODE> is for mark-up of computer source code, or <SAMP> for program output. Unfortunately the HTML spec lists both <CODE> and <SAMP> as an inline tag, not a block tag which really restricts their use. I'm not sure what the W3C was thinking when they made <CODE> and <SAMP> inline. Using <CODE> to mark-up code fragments will turn something like:

for (i = 0 ; types[i].sl != NULL ; i++)
{
  if (strstr(filename,types[i].sl) != NULL)
    return(types[i].sl);
}
return("text/plain");

into:

for (i = 0 ; types[i].sl != NULL ; i++) { if (strstr(filename,types[i].sl != NULL) return(types[i].sl); } return("text/plain");

Nice, huh?

Dougal Campbell suggests using:

CODE
{
  white-space: pre;
}

Which sounds good, but doesn't work. The CSS spec states that white-space is only valid for a display type of “block”, which <CODE> isn't (remember, it's “inline”). To work, you really need:

CODE
{
  display:     block;
  white-space: pre;
}

Which works fine in Mozilla, but fails for IE 5x (which is most likely a bug) and Lynx, which doesn't even look at the CSS file (and it looks like I have one regular reader who uses Lynx). As much as I would love to use <CODE> and <SAMP> for semantically better mark-up, I'm afraid I'm still stuck with using <PRE>; otherwise I'll end up with:

<CODE>for (i = 0 ; types[i].sl != NULL ; i++)</CODE><BR>
<CODE>{</CODE></BR>
<CODE>  if (strstr(filename,types[i].sl != NULL)</CODE><BR>
<CODE>    return(types[i].sl);</CODE><BR>
<CODE>}</CODE></BR>
<CODE>return("text/plain");</CODE><BR>

Which is silly. (Okay, it's easy enough to write some code to automatically convert the source code, but semantically, does it even make sense?)

The upshot of all this rambling about semantically correct HTML? Um … not much really. I won't be changing the mark-up I use too much since I do lose the visual appearance in most browsers (although I may try giving the <CITE> tag a bit of a go).

The Boston Diaries

Monday, Debtember 30, 2002

Semantic HTML

Obligatory Picture

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer