The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Tuesday, January 24, 2023

Notes on a seriously first world problem

“채널 하나 둘 셋 …”

“Why is the TV speaking Korean?”

“채널 하나 둘 넷 …”

“I don't know. It just started happening!”

“채널 하나 둘 다섯 … ”

“Let me see … wait! The configuration menu is also in Korean!”

“당연하지 …”

“I guess we're just going to have to learn Korean.”

“무아하하하하 …”


Notes on an overheard conversation about “Muskrat Love” as it played on satellite radio

“If you would have asked me who sang that, I wouldn't have been able to answer.”

“Wow! Captain and Tennille! That takes me back.”

“Me too.”

“Do you want to know what else the Captain and Tennille reminds me of?”

“What?”

“The Bionic Watermelon.”

“What?”

The Bionic Watermelon.”

“You are weird, sir.”

Monday, January 23, 2023

A few small differences

I received the following patch for my DNS library:

I am hoping to use this library to encode and decode mDNS queries and responses. It seems that the mDNS is mostly the same as unicast DNS, except for a few small differences which I aim to add to this PR as I encounter them.

Mdns mods by oviano · Pull Request #13 · spc476/SPCDNS

Those “few small differences” turn out not to be so small.

The main RFCs for mDNS appear to be RFC-6762 and RFC-6763 and to support them in full requires breaking changes to my library. The first are a bunch of flags, defined in RFC-6762 and it affects pretty much the entire codebase. The first deals with “Questions Requesting Unicast Responses.” Most flags are defined in the header section, but for this, it's “the top bit in the class field of a DNS question as the unicast-response bit.” And because mDNS specifically allows multiple questions, it's seems like it could be set per-question, and not per the request as a whole, as the RFC states: “[w]hen this bit is set in a question, it indicates that the querier is willing to accept unicast replies in response to this specific query, as well as the usual multicast responses.” To me, that says, “each resource record needs a flag for a unicast reponse.” The other bit the “outdated cache entry” bit. which again applies to individual resource records and not to the request as a whole. And again, to me, that says, “each resoure record needs a flag to invalidate previously cached values.”

How to handle this … well, one way would be to a Boolean field to each resource record type to hide protocol details (which was the point in this library frankly). But that can break existing code as the new fields will need initialization:

dns_question_t domain;

domain.name  = host;
domain.type  = RR_A;
domain.class = CLASS_IN;
domain.uc    = true; /* we want unicast reply */

/* and the other flag */

dns_a_t addr;

addr.name    = host;
addr.type    = RR_A;
addr.class   = CLASS_IN;
addr.ttl     = 0;
addr.ic      = true; /* invalidate cache data */
addr.address = address;

and document that the uc and ic fields are for mDNS use; if you aren't using mDNS, then they should be set to false.

Another approach is to leak protocol details and require the user to do something like:

/* We're making a query and want a unicast reply */
dns_question_t domain;

domain.name  = host;
domain.type  = RR_A;
domain.class = CLASS_IN | UNICAST_REPLY;

/* We're replying to a query and want to invalidate this record */
dns_a_t addr;

addr.name    = host;
addr.type    = RR_A;
addr.class   = CLASS_IN | INVALIDATE_CACHE;
addr.ttl     = 0;
addr.address = address;

And that's a less-breaking change, but on the decoding side, I still need some form of flag in the structure to indicate these flags were set because otherwise data is lost.

I'm not sure which approach is best. The first does a better job of hiding the DNS protocol details, but breaks more code. The second is less breaking, as I could ignore any cache flags on encoding, but it leaks details of DNS encoding to user code. I tend to favor the first but I really dislike the breaking aspect of it. And That's just the first RFC.

The other RFC utilizes what I consider to be an implementation detail of the DNS protocol to radically alter how I handle text resource records. The RFC that defined modern DNS, RFC-1035, describes the format for a text resource record, but is silent as to semantics.

Individual resource records come with a 16-bit length, so in theory, a resource record could be up to 65535 bytes in size, but it's rare to get a record that size. The base type of a text resource record is a “string.” and RFC-1035 defines a “string” as one byte for the length, followed by that many bytes as the contents. The length of a “string” is defined as one byte, which limits the length of 255 bytes in size. This means, in practice, that a text resource record can contain several “strings.”

How SPCDNS handles this now is that I assume a text resource record only has one value—a string:

typedef struct dns_txt_t        /* RFC-1035 */
{
  char const  *name;
  dns_type_t   type;
  dns_class_t  class;
  TTL          ttl;
  size_t       len;
  char const  *text;
} dns_txt_t;

When encoding such a record, I break the given string into as few DNS “strings” as possible. Give this a 300 byte string, and you get two DNS “strings” encoded, one being 255 byte long, and the other one 45 bytes long. Upon decoding, all the strings in a single text resource record are concatenated into a single string. As I said, DNS-1035 doesn't go into the semantics of a text resource record, and I did what I felt was best.

RFC-6763 uses the DNS “string” encoding for semantic information:

Apple TV - Office._airplay._tcp.local.	   10	IN	TXT	(
	"acl=0"
	"btaddr=00:00:00:00:00:00"
	"deviceid=A8:51:AB:10:21:AE"
	"fex=1d9/St5/FbwooQ"
	"features=0x4A7FDFD5,0xBC157FDE"
	"flags=0x18644"
	"gid=F014C3FF-1420-4374-81DE-237CD6892579"
	"igl=1"
	"gcgl=1"
	"model=AppleTV14,1"
	"protovers=1.1"
	"pi=c6fe9e6e-cec2-44c8-9c66-8994c6ad47"
	"depsi=4A342DB4-3A0C-47A6-9143-9F6BF83F0EDD"
	"pk=5ab1ac3988a6a358db0a6e71a18d31b8d525ec30ce81a4b7b20f2630449f6591"
	"srcvers=670.6.2"
	"osvers=16.2"
	"vv=2"
	)

I have to admit, this is ingenious—each DNS “string” here defines a name/value pair. But I did not see this use at all.

I wonder how much code out there dealing with DNS packets (not specifically mDNS) would treat these records:

	IN	TXT	"v=spf1 +mx +ip4:71.19.142.20/32 -all" 
	IN	TXT	"google-site-verification=XXXXXXXX­XXXXXXXX­XXXXXXXX­XXXXXXXX­XXXXXXXX­XXX"

the same way as:

	IN	TXT	(
		"v=spf1 +mx +ip4:71.19.142.20/32 -all" 
		"google-site-verification=XXXXXXXX­XXXXXXXX­XXXXXXXX­XXXXXXXX­XXXXXXXX­XXX"
	)

The first returns two text resource records, each consisting of a single DNS “string,” the second one text resource record but with two DNS “strings.” My gut feeling is “not many would deal with the second format” but I can't know that for sure.

And changing how I deal with text resource records in SPCDNS would be a major breaking change.

This is one change I really don't know how to approach.

Thursday, January 19, 2023

The good news? Somebody wants to use my blogging engine. The bad news? Somebody wants to use my blogging engine

Over the 23 year history of mod_blog, I've given up on the notion of anyone other than me using it. There was only one other person who used it for just a few months before deciding blogging wasn't for him and that was way back in 2002. So it was completely by surprise that I recently received a bug report on it.

Oh my … someone else is trying to use it.

I never did fully document it. And there are, as I'm finding, an amazing number of things I'm assuming about the environment, such as:

And that's just off the top of my head. There's probably more assumptions made that I'm just not thinking of. It's issues like these where one can spend 90% of the time writing 90% of the code, and then spend another 90% of the time writing the final 10% of the code and documentation.

I'm also amused by the timing. Back in August, I removed a ton of optional code that I never used, and because no one else was using mod_blog, it was just sitting there untested. And now someone wants to use the code.

Heh.

But also, gulp! I've got 23 years of experience with the code, so I know all the ins and outs of using it. Documenting this? So someone else can use this? Good lord!

Monday, January 16, 2023

The other SFTP that never was

For reasons, I'm doing some research into the history of FTP when I come across an RFC for SFTP. Only this isn't the SFTP that is used today, but instead the Simple File Transfer Protocol from 1984. Unlike TFTP, it uses TCP, and unlike FTP, it only uses a single network connection.

But this bit is why I'm writing about this:

Random Access

Pro: Wouldn't it be nice if (WIBNIF) SFTP had a way of accessing parts of a file?

Con: Forget it, this is supposed to be SIMPLE file transfer. If you need random access use real FTP (oops, real FTP doesn't have random access either – invent another protocol?).

Resolution: I have not made any provision for Random Access.

That “other protocol” would take several more years to be invented, and then take over the networking world.

Thursday, January 12, 2023

It's probably a good thing some malformed URLs are considered “valid”

It seems it's all too easy to generate double slashes in the path component of a URL, because I received via email a report that my current feed files all had that issue.

Sigh.

I made a change a few months ago in how I internally store the base URL of my blog. It used to be that I did not store the trailing slash (so that "https://boston.conman.org/" would be stored as "https://bost.conman.org") so I had code to keep adding it back in when generating links. I changed the code to store the tailing slash, but missed one section of code because I don't subscribe to any of my feed files and didn't notice the issue.

I also fixed an actual crashing bug. All I have to say about that is that web robots are quite good at generating really garbage requests using a variety of methods—it's like free fuzz testing! Woo hoo! Sob!

Wednesday, January 11, 2023

It's apparently a valid URL, despite it being malformed in my opinion

I've had a few posts make it to the front page of Lobsters. Lobsters supports webmention, yet I never received a webmention for those two posts. I checked the logs and yes, they were received but I rejected them with a “bad request.” It took a bit of sleuthing, but I found the root cause—the URL of my post was, accoring to my code, invalid. Lobsters was sending in a URL of the form https://boston.conman.org//2023/01/02.1—notice the two slashes in front of the path. My code was having none of that.

I'm not sure why Lobsters was sending a URL of that form as previous webmentions worked fine, but when I checked previous submissions to Lobsters I saw some of the links had a double slash in the path portion. As it's considered valid by the What Working Group? “living standard,” I ended up having to accept what I consider a malformed URL.

Sigh.

Monday, January 09, 2023

An epiphany about bloated web pages might be the result of a dumb network

I was scared by an epiphany I had the other day when I read a quote by John Carmack. But before I get to the quote and the ephiphany, I need to give some background to understand where I was, and where I am.

First, for the years I was working for The Corporation (and later, The Enterprise), I was in essense, working in telephony networking, and I was never a fan of telephony networking (the Protocol Stack From Hell notwithstanding).

Basically, the paradigm in telephony is a “smart network” and a “dumb edge.” All the “intelligence” of an application on telephony is on the network side of things—the “edge” here being the device the end user interacts with. In the old days, this was an on-off switch, a microphone and a speaker. Later models this device included a tone generator. So any features needed to be handled on the network side because the end user device (the “edge”) was incapable of doing much at all. If a person wants a new feature, they have to get it implemented on the entire network, or it's effectively not supported at all (because there's not much one can do with an on-off switch, speaker, microphone and a tone generator).

Contrast this with the Internet—it's a “dumb network” with a “smart edge”—all the network has to do is sling packets back and forth, not concerning itself with the contents. The “edge” in this case is (was?) a general purpose computer that can be programmed to do just about anything. So if a person wants a new feature, all that's needed is a program on at least two endpoints and said feature exists—there's no need to inform the rest of the network of it, as long as the “dumb network” can do its job and sling the data between the two endpoints. Want an alternative to the web? Just do it. Want an alternative to IRC? Just do it.

Second, I have always had a hard time understanding why people keep insisting on writing bespoke web browsers in JavaScript that just show text, when the user is already using a web browser has already been written to display text. The poster child for this (in my opinion) is the Portland Pattern Repository, a large repository of programming wisdom, that, for whatever reason, Ward Cunningham (creator of the site) felt that a normal web browser wasn't good enough to browse a text-only website and thus demands the latest and greatest in JavaScript conformance to view text. He's free to do so, but I find it annoying that I can no longer read a site I enjoyed (and even contributed to), just because I haven't updated my browser for the past twenty minutes. I'm not even asking to participate in editing the site any more, I just want to read it!

And finally we get to the John Carmack quote:

It is amusing to consider how much of the world you could serve something like Twitter to from a single beefy server if it really was just shuffling tweet sized buffers to network offload cards. Smart clients instead of web pages could make a very large difference.

John Carmack Tweet

Oh crap.

“Smart clients”—“smart edge.”

“Web pages”—“data.”

My dislike of the Portland Pattern Repository just got ran over by my liking of dumb networks and smart edges.

Ward Cunningham wants a smarter edge to view his site (and to “improve server performance” if you read the comments in the web page returned from the site) and I can't begrudge him that—I like smart edges! It makes more sense to me than a smart network. But at the same time, I want a web site to just return text to a “dumb browser,” even if the browser I'm using is not particularly dumb.

Do we, in fact, have too much intelligence in web servers? Do we want to push all the intelligence to the client? Do I have to reconcile my love of simple web clients and intelligent web servers with my love of the dumb network and smart edges? (And to spell it out—the “network” in this analogy is the web server and the “edge” is the web browser) Where does the simplicity need to reside?


Discussions about this entry

Obligatory Picture

[It's the most wonderful time of the year!]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2023 by Sean Conner. All Rights Reserved.