Wednesday, November 19, 2003
Hypertext editing and the Semantic Web
There's an interesting discussion about Jason Kottke's new design for his weblog and it brings up a topic I was thinking about earlier today.
Blogging software in general has made the publishing of new web pages (or entries) easier, automating a several step process as the click of a button. But what hasn't gotten any easier is the actual creating, or editing, of HTML content. I've talked about this before, how I sometimes have problems with the writing process with hypertext because the act of creating the hyperlink isn't seamless, but yet if I skip creating hyperlinks as I write, waiting until I'm done writing, I may forget what it was I wanted to link to exactly.
Some markup, say, <EM>
or <STRONG>
can be handled invisibly like it's been done for years in more traditional
editors. So for example, I could be typing along, whem bam! I want
to emphasize something I can hit ALT-E and start typing,
hitting ALT-E when done. But hypertext and any possible metadata
associated with said hypertext is harder to streamline like that.
For instance, when I quote a passage:
Oh, and I'd just like to point out that I'm not bashing any current weblog software for not being flexible enough or being wrong or whatever. As Anil has said, it's harder than just saying that a particular tool should do this or that. In fact, I love MT (not to mention the army of plug-in developers who put out these fantastic plug-in for free) more than ever for the amazing amount of flexibility and control that is possible (with a bit of work).
It's actually quite a bit of work for me. First it's cut-n-paste the quote
from the webpage to the editor I use, then go through to clean it up
(changing double quotes to two single back tics or two regular single quotes
(which my software will then pick up and change to “
and ”
respectively) and adding any appropriate HTML) but also adding the
<BLOCKQUOTE>
with appropriate attributes:
<BLOCKQUOTE CITE="http://www.kottke.org/03/11/kottke- redesign#8304" TITLE="the redesign continues ... ">
And adding the attribution line
<P CLASS="cite"> <CITE> <A CLASS="external" HREF="http://www.kottke.org/03/11/kottke-redesign#8304"> Jason Kottke </A> </CITE> </P>
I used to place this outside the <BLOCKQUOTE>
but
recently I moved this inside the <BLOCKQUOTE>
—I'm not sure
which I like better. How would you automate this? Partly by integrating the
editor with the browser and and passing along more information in the cut
buffer (like URL and title of
the page where the text is selected), but the main issue is one of layout,
like I mentioned above. Context sensitive templates for pasting perhaps? And
how to you handle links? Same way? A key-sequence for pasting a blockquote
and a separate one for a link? All I do know is that the HTML WYSIWYG editors I've seen have never handled links
cleanly. Want a link? Highlight the text, select link and then have to type
in the URL and forget about
having other attributes like TITLE
or
CLASS
; or perhaps not, but there are other buttons to select to
set those and by the time you're done, it would have been easier to
type the actual code than to have the editor so helpfully do it for
you.
The discussion at Kottke's site is about applying different layouts to different types of posts—the posts about movies are formatted one way, book reviews another and just regular posts yet another way and how to trigger the appropriate template for the type of post. Granted, the software used, Moveable Type, is geared more for people who don't care to learn or type by hand HTML so having a different layout for different posts is a bit more difficult to achieve than say, mod_blog where one pretty much has to know HTML to format posts. But there's a tradeoff to be made— since I use HTML raw (so to speak) I can go in a fudge the formatting as I see fit. My PhotoFriday posts (yes, I've seriously slacked off on those) used a different format than my regular posts and it was easy enough to handle—a new division, some definitions in the CSS file and there you go.
But the cost is that this isn't automatic. I don't have a menu item or a keyboard sequence to designate “this is a PhotoFriday post” in much the same way I don't have a menu item or keyboard sequence that says “these are a series of photos to display sequentially” or “here is a section of text I'm quoting from this web page.” Mind you, I wouldn't mind such an editor, and if done to my liking it would certainly make editing of posts much easier than it is now (and right now, I'm looking at all this text I've written so far, pretty much sans HTML and somewhat dreading having to go back and format it, but since I did skip the HTML formatting I had an easier time getting this out without forgetting what I wanted to mention, although hopefully I'll remember all the links I wanted to add).
Now, having finally formatted what I have, I will also say that this lack of good hypertext (or HTML) editors will also have an effect on the Semantic Web. There's been quite a bit of stir lately over the Semantic Web (stirred by Clay Shirky's essay, The Semantic Web, Syllogism, and Worldview) but except for a few diehard people who add semantic information to their webpages, it won't really take off until we get good HTML editors that will automagically include the required semantic information for us, and I don't see that happening any time soon.
For example, if you are using a web browser that supports the
<ACRONYM>
tag, you may notice that the TLAs and ETLAs are lightly underlined (at least, that's the default
for IE and Mozilla it
appears) and that if you mouse over them, the acronym is expanded in a small
text window, giving you the meaning. I add that, by hand, to every acronym I
use and yes, it does get to be a pain. I could automate that, but
the problem there is that computers are rather bad at figuring out context.
With only 17,576 TLAs available,
there is definitely going to be some overlap. Take for instance, IRA.
While the IRA may take actions against US interests that would effect Alice's (a member of the IRA) IRA, can an automated process work out which expansion of IRA should be used for each instance? Just ask yourself that question next time you ask YER computer two check you're spelling.
And while I'll probably never use the letters “I,” “R,” and “A” I would
like to note that WAP, as a technical acronym, has two close meanings. There
is WAP, which is a proprietary
and expensive replacement for HTTP for cellphones, and WAP, which is how I get my laptop onto the network here in the
Facility in the Middle of Nowhere, and while I tend to mention WAP quite often, I don't think I'll ever
use WAP as I think it's quite
silly (and I pity the person who has to read that paragraph in a browser that
doesn't support the <ACRONYM>
tag).
I suppose acronym expansion could work as spell checking does now, come across a potential TLA and if it isn't expanded, offer up a choice of possible expantions, which may help to prevent IRA GERSHWIN from becoming an Individual Retirement Account GERSHWIN (fahrfenugen).
And now I'm off to format what I've written since the last portion I've formatted. I would kill for a decent HTML editor that does The Right Thing™.