The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Sunday, October 29, 2023

Adventures in Utext

There is one point on the ASCII ↔︎ JS spectrum that I haven’t seen, and it’s one that, as I use Unicode in more complex ways on Gwern.net and have learned how many obscure features or characters Unicode has, I increasingly think has been neglected: only UTF-8 text rendered by a monospace font. Not ASCII, not a weird subset of SGML, not troff, not raw terminal codes, not bitmaps encoded in ASCII—just UTF-8. This document format does only what pure Unicode text can do—but does everything that pure Unicode can do, which turns out to be a lot. What if we take Unicode literally, but not seriously?

Your typical plain text output strips all formatting. At the most ambitious, it might have a Unicode superscript or fraction. But we can do so much more!

Utext: Rich Unicode Documents · Gwern.net

That was an interesting read (your mileage may vary).

To generate the gopher and Gemini versions of my blog, I parse the HTML and generate either plain text (for gopher) or Gemtext for Gemini. And I'm still not entirely happy with the output. For emphasized text, I would translate that to “*emphasized*”, which is … okay, I guess? And for deleted text—that was a harder to deal with, and I ended up with “[DELETED-deleted-DELETED]” text.

There's no excuse for that.

But after reading about Utext, and Uncode's COMBINING SHORT STROKE OVERLAY and COMBINING LOW LINE I thought I might try using those for some typographical niceties that you don't normally get with plain text. And that's when I learned that not all virtual terminals support all of Unicode all that well. And wraping text is … not that trivial anymore.

Ah well. For now, it seems to be working, but it remains to be seen if I like the results.

Update on Friday, December 8th, 2023

I reverted this change due to issues.

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.