Monday, November 13, 2017
It shouldn't be this hard to support another syndication feed format
A few days ago I came across a new syndication feed format (like RSS or Atom)—JSON Feed:
We — Manton Reece and Brent Simmons — have noticed that JSON has become the developers’ choice for APIs, and that developers will often go out of their way to avoid XML. JSON is simpler to read and write, and it’s less prone to bugs.
So we developed JSON Feed, a format similar to RSS and Atom but in JSON. It reflects the lessons learned from our years of work reading and publishing feeds.
See the spec. It’s at version 1, which may be the only version ever needed. If future versions are needed, version 1 feeds will still be valid feeds.
It's not like I need another syndication format, and it's still unclear just how popular JSON Feed really is, but hey, I thought, it should be pretty easy to add this. It looks simple enough:
{ "version": "https://jsonfeed.org/version/1", "title": "My Example Feed", "home_page_url": "https://example.org/", "feed_url": "https://example.org/feed.json", "items": [ { "id": "2", "content_text": "This is a second item.", "url": "https://example.org/second-item" }, { "id": "1", "content_html": "<p>Hello, world!</p>", "url": "https://example.org/initial-post" } ] }
I just need to add another entry to the template section of the configuration file, create a few templates files, and as they say in England, “the brother of your mother is Robert” (how they know my mother's brother is Robert, I don't know—the English are weird like that).
But the issue is filling in the content_text
field. The first
issue—JSON is encoded using
UTF-8. For
me, that's not an issue, as I'm using UTF-8 (and even before I switched to
using UTF-8, I was using ASCII, which is valid UTF-8 by
design). But in theory, someone could be using mod_blog
with some
other encoding scheme, which means an invalid JSON Feed unless fed through a character set conversion
routine, which I don't support in mod_blog
.
But even assuming I did, that still doesn't mean I'm out of the water.
Suppose this was my content:
<p>"Hello," said the politician, lying.</p> <p>"Back up!" I said, using my left hand to quickly cover my wallet in my back pocket. "You aren't getting any money from me!"</p>
If you check the syntax of
JSON, you'll see that the
double quote character "
needs to be converted to
\"
. A similar transformation is required for the blank line,
being converted to \n
. And I have no code written in
mod_blog
for such conversions.
It's not like it would be that much code to write. When I added support for RSS and Atom, I had to write code. But it irks me that I have to special case a lot of string processing.
Yes, yes, I know—mod_blog
is written in C, which is a
horrible choice for string processing. But even if I picked a better language
suited to the task, I would still have to write code to manually
transform strings from, say, ISO-8859-1 to UTF-8
and code to convert HTML to a form of non-HTML:
<p>"Hello," said the politician, lying.</p> <p>"Back up!" I said, using my left hand to quickly cover my wallet in my back pocket. "You aren't getting any money from me!"</p>
(Not to get all meta, but to display the first example HTML, I had to encode it into the
non-HTML you see above, and to
display the non-HTML you see
above, I have to encod the non-HTML into non-non-HTML—or in other words, convert the output yet again. So, to
show a simple &
in this page, I have to encode it as
&
, and to show that, I have to encode it as
&amp
, in ever deepening layers of Inception-like encoding.
By the way, that was encoded as &amp;amp;
—just for your
information.)
I spent way too much time trying to generalize a solution, only to ultimately reject the code. I'll probably just add the code I need to support JSON Feed and call it a day, because solving the issue once and for all is just too much work.