Monday, Debtember 30, 2002
Semantic HTML
There's quite the buzz in the weblogging community over Mark Pilgrim's use of the
<CITE>
tag (among other more esoteric tags in HTML). It's a nice idea, but
all the standard says
about <CITE>
is:
CITE:
Contains a citation or a reference to other sources.
HTML 4.0 § 9.2.1 Phrase elements
And only a few scant and quite trivial examples. I'm not sure of the exact
usage of the <CITE>
tag. In the following:
In Snowcrash, Neal Stephenson explored the implications of neuro-linguistic hacking …
Now, am I supposed to mark that up like:
In <CITE>Snowcrash</CITE>, Neal Stephenson explored the implications of neuro-linguistic hacking ...
Because I'm citing the book Snowcrash? So, along those lines, if I had instead written it as:
Neal Stephenson, in his book Snowcrash, explored the implications of neuro-linguistic hacking …
Would I then mark it up as:
<CITE>Neal Stephenson</CITE>, in his book Snowcrash, explored the implications of neuro-linguistic hacking ...
since now I'm emphasizing Neal Stephenson over the book? But the book was written by Neal Stephenson so should it instead be:
In <CITE>Snowcrash</CITE>, <CITE>Neal Stephenson</CITE> explored the implications of neuro-linguistic hacking ...
Okay, so it's a contrived example, but generating semantically correct
markup isn't trivial and expecting the general public to get it correct is
asking a bit too much. As one person
pointed out, given a hypothetical tag like <EDITOR>
,
is it:
<EDITOR>Joe Blow</EDITOR>
or
<EDITOR>vi</EDITOR>
(except when it's <EDITOR>Frontpage</EDITOR>
but I won't
go there)?
There are other semi-obscure tags for semantic mark-up and fortunately,
most of them are less ambiguous as for usage, like <CODE>
is for mark-up of computer source code, or <SAMP>
for
program output. Unfortunately the HTML spec lists both <CODE>
and
<SAMP>
as an inline tag, not a block tag which
really restricts their use. I'm not sure what the W3C was thinking when they made <CODE>
and <SAMP>
inline. Using <CODE>
to
mark-up code fragments will turn something like:
for (i = 0 ; types[i].sl != NULL ; i++) { if (strstr(filename,types[i].sl) != NULL) return(types[i].sl); } return("text/plain");
into:
for (i = 0 ; types[i].sl != NULL ; i++) { if (strstr(filename,types[i].sl != NULL) return(types[i].sl); } return("text/plain");
Nice, huh?
Dougal Campbell suggests using:
CODE { white-space: pre; }
Which sounds good, but doesn't work. The CSS
spec states that white-space
is only valid for a display
type of “block”, which <CODE>
isn't (remember, it's
“inline”). To work, you really need:
CODE { display: block; white-space: pre; }
Which works fine in Mozilla, but fails for IE 5x (which is most likely a
bug) and Lynx, which
doesn't even look at the CSS file (and it looks like I have one regular reader who
uses Lynx). As much as I would love to use <CODE>
and
<SAMP>
for semantically better mark-up, I'm afraid I'm
still stuck with using <PRE>
; otherwise I'll end up
with:
<CODE>for (i = 0 ; types[i].sl != NULL ; i++)</CODE><BR> <CODE>{</CODE></BR> <CODE> if (strstr(filename,types[i].sl != NULL)</CODE><BR> <CODE> return(types[i].sl);</CODE><BR> <CODE>}</CODE></BR> <CODE>return("text/plain");</CODE><BR>
Which is silly. (Okay, it's easy enough to write some code to automatically convert the source code, but semantically, does it even make sense?)
The upshot of all this rambling about semantically correct HTML? Um … not much really.
I won't be changing the mark-up I use too much since I do lose the visual
appearance in most browsers (although I may try giving the
<CITE>
tag a bit of a go).