Tuesday, January 24, 2023
Notes on a seriously first world problem
“채널 하나 둘 셋 …”
“Why is the TV speaking Korean?”
“채널 하나 둘 넷 …”
“I don't know. It just started happening!”
“채널 하나 둘 다섯 … ”
“Let me see … wait! The configuration menu is also in Korean!”
“당연하지 …”
“I guess we're just going to have to learn Korean.”
“무아하하하하 …”
Notes on an overheard conversation about “Muskrat Love” as it played on satellite radio
“If you would have asked me who sang that, I wouldn't have been able to answer.”
“Wow! Captain and Tennille! That takes me back.”
“Me too.”
“Do you want to know what else the Captain and Tennille reminds me of?”
“What?”
“The Bionic Watermelon.”
“What?”
“You are weird, sir.”
Monday, January 23, 2023
A few small differences
I received the following patch for my DNS library:
I am hoping to use this library to encode and decode mDNS queries and responses. It seems that the mDNS is mostly the same as unicast DNS, except for a few small differences which I aim to add to this PR as I encounter them.
Those “few small differences” turn out not to be so small.
The main RFCs for mDNS appear to be RFC-6762 and RFC-6763 and to support them in full requires breaking changes to my library. The first are a bunch of flags, defined in RFC-6762 and it affects pretty much the entire codebase. The first deals with “Questions Requesting Unicast Responses.” Most flags are defined in the header section, but for this, it's “the top bit in the class field of a DNS question as the unicast-response bit.” And because mDNS specifically allows multiple questions, it's seems like it could be set per-question, and not per the request as a whole, as the RFC states: “[w]hen this bit is set in a question, it indicates that the querier is willing to accept unicast replies in response to this specific query, as well as the usual multicast responses.” To me, that says, “each resource record needs a flag for a unicast reponse.” The other bit the “outdated cache entry” bit. which again applies to individual resource records and not to the request as a whole. And again, to me, that says, “each resoure record needs a flag to invalidate previously cached values.”
How to handle this … well, one way would be to a Boolean field to each resource record type to hide protocol details (which was the point in this library frankly). But that can break existing code as the new fields will need initialization:
dns_question_t domain; domain.name = host; domain.type = RR_A; domain.class = CLASS_IN; domain.uc = true; /* we want unicast reply */ /* and the other flag */ dns_a_t addr; addr.name = host; addr.type = RR_A; addr.class = CLASS_IN; addr.ttl = 0; addr.ic = true; /* invalidate cache data */ addr.address = address;
and document that the uc and ic fields are for mDNS use;
if you aren't using mDNS,
then they should be set to false
.
Another approach is to leak protocol details and require the user to do something like:
/* We're making a query and want a unicast reply */ dns_question_t domain; domain.name = host; domain.type = RR_A; domain.class = CLASS_IN | UNICAST_REPLY; /* We're replying to a query and want to invalidate this record */ dns_a_t addr; addr.name = host; addr.type = RR_A; addr.class = CLASS_IN | INVALIDATE_CACHE; addr.ttl = 0; addr.address = address;
And that's a less-breaking change, but on the decoding side, I still need some form of flag in the structure to indicate these flags were set because otherwise data is lost.
I'm not sure which approach is best. The first does a better job of hiding the DNS protocol details, but breaks more code. The second is less breaking, as I could ignore any cache flags on encoding, but it leaks details of DNS encoding to user code. I tend to favor the first but I really dislike the breaking aspect of it. And That's just the first RFC.
The other RFC utilizes what I consider to be an implementation detail of the DNS protocol to radically alter how I handle text resource records. The RFC that defined modern DNS, RFC-1035, describes the format for a text resource record, but is silent as to semantics.
Individual resource records come with a 16-bit length, so in theory, a resource record could be up to 65535 bytes in size, but it's rare to get a record that size. The base type of a text resource record is a “string.” and RFC-1035 defines a “string” as one byte for the length, followed by that many bytes as the contents. The length of a “string” is defined as one byte, which limits the length of 255 bytes in size. This means, in practice, that a text resource record can contain several “strings.”
How SPCDNS handles this now is that I assume a text resource record only has one value—a string:
typedef struct dns_txt_t /* RFC-1035 */ { char const *name; dns_type_t type; dns_class_t class; TTL ttl; size_t len; char const *text; } dns_txt_t;
When encoding such a record, I break the given string into as few DNS “strings” as possible. Give this a 300 byte string, and you get two DNS “strings” encoded, one being 255 byte long, and the other one 45 bytes long. Upon decoding, all the strings in a single text resource record are concatenated into a single string. As I said, DNS-1035 doesn't go into the semantics of a text resource record, and I did what I felt was best.
RFC-6763 uses the DNS “string” encoding for semantic information:
Apple TV - Office._airplay._tcp.local. 10 IN TXT ( "acl=0" "btaddr=00:00:00:00:00:00" "deviceid=A8:51:AB:10:21:AE" "fex=1d9/St5/FbwooQ" "features=0x4A7FDFD5,0xBC157FDE" "flags=0x18644" "gid=F014C3FF-1420-4374-81DE-237CD6892579" "igl=1" "gcgl=1" "model=AppleTV14,1" "protovers=1.1" "pi=c6fe9e6e-cec2-44c8-9c66-8994c6ad47" "depsi=4A342DB4-3A0C-47A6-9143-9F6BF83F0EDD" "pk=5ab1ac3988a6a358db0a6e71a18d31b8d525ec30ce81a4b7b20f2630449f6591" "srcvers=670.6.2" "osvers=16.2" "vv=2" )
I have to admit, this is ingenious—each DNS “string” here defines a name/value pair. But I did not see this use at all.
I wonder how much code out there dealing with DNS packets (not specifically mDNS) would treat these records:
IN TXT "v=spf1 +mx +ip4:71.19.142.20/32 -all"
IN TXT "google-site-verification=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
the same way as:
IN TXT (
"v=spf1 +mx +ip4:71.19.142.20/32 -all"
"google-site-verification=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
)
The first returns two text resource records, each consisting of a single DNS “string,” the second one text resource record but with two DNS “strings.” My gut feeling is “not many would deal with the second format” but I can't know that for sure.
And changing how I deal with text resource records in SPCDNS would be a major breaking change.
This is one change I really don't know how to approach.
Thursday, January 19, 2023
The good news? Somebody wants to use my blogging engine. The bad news? Somebody wants to use my blogging engine
Over the 23 year history of mod_blog
,
I've given up on the notion of anyone other than me using it.
There was only one other person who used it for just a few months before deciding blogging wasn't for him and that was way back in 2002.
So it was completely by surprise that I recently received a bug report on it.
Oh my … someone else is trying to use it.
I never did fully document it. And there are, as I'm finding, an amazing number of things I'm assuming about the environment, such as:
That it's running under Apache. I do make use of the environment variable
$DOCUMENT_ROOT
, which technically is Apache specific (per the CGI RFC “The Comman Gateway Interface Version 1.1” as it's not documented there) and, as I found out over the years, the variables$REDIRECT_REMOTE_USER
and$REDIRECT_BLOG_CONFIG
. Other web servers might not define those, or might work differently. I don't know, I only have ever usedmod_blog
with Apache.How to configure Apache to run
mod_blog
. I wanted to hide the fact that I'm running a CGI program to drive my blog, not for “security-through-obscurity” reasons, but for “easy to understand and modify the URL” reasons. I think URLs likehttps://boston.conman.org/2023/01/19.1
looks much nicer thanhttps://boston.conman.org/boston.cgi?2023/01/19.1
(and nevermind the hideousness ofhttps://boston.conman.org/cgi-bin/boston.cgi?year=2023&month=1&day=19&entry=1
that it could have been). The other benefit is that if I ever do get around to makingmod_blog
an actual Apache module (which was my original intent) links won't break.As such, I use Apache's
RewriteRule
to map all requests throughmod_blog
. The code base also assumes this as it relies upon the environment variable$PATH_INFO
always being set, which isn't a given, depending upon how a CGI program is referenced via the web.The environment variable
$BLOG_CONFIG
is set to the configuration file. The configuration file can be either specified via the command line or stored in the environment variable. I added the environment to avoid having to embed the location in the executable or to expose the location in the query portion of a URL. And again, this comes back to the previous point—how to configure this under Apache (SetEnv
is the answer). I also have it set in my own environment (command line) as it makes it easy to test. It also makes it easy to fix spelling mistakes on the server as I can directly edit the files, which leads into the next point.All the files used by
mod_blog
are readable and writable by the program. My blog is, as far as I can tell, unique in that I can send in posts via email, in addition to a web page. Email support, for me, was non-negotiable. I get to use my preferred editor for writing, and by posting it via email, everything is handled automatically. I'm not aware of any other blogging system set up this way, and this is only viable because I run my own email server on the same box as my webserver.The issue becomes one of permissions. The web server runs as its own user. Email is delivered as the user of the recipient. Both can add new posts. I solved that issue my making
mod_blog
always run under my userid (it's “setuid” for the technically proficient). This means I don't have to make a bunch of files world writable on my server. I can make edits on the files directly as me. I can add entries via the web, email, or as a file from the command line (whichmod_blog
also supports).
And that's just off the top of my head. There's probably more assumptions made that I'm just not thinking of. It's issues like these where one can spend 90% of the time writing 90% of the code, and then spend another 90% of the time writing the final 10% of the code and documentation.
I'm also amused by the timing.
Back in August,
I removed a ton of optional code that I never used,
and because no one else was using mod_blog
,
it was just sitting there untested.
And now someone wants to use the code.
Heh.
But also, gulp! I've got 23 years of experience with the code, so I know all the ins and outs of using it. Documenting this? So someone else can use this? Good lord!
Monday, January 16, 2023
The other SFTP that never was
For reasons, I'm doing some research into the history of FTP when I come across an RFC for SFTP. Only this isn't the SFTP that is used today, but instead the Simple File Transfer Protocol from 1984. Unlike TFTP, it uses TCP, and unlike FTP, it only uses a single network connection.
But this bit is why I'm writing about this:
Random Access
Pro: Wouldn't it be nice if (WIBNIF) SFTP had a way of accessing parts of a file?
Con: Forget it, this is supposed to be SIMPLE file transfer. If you need random access use real FTP (oops, real FTP doesn't have random access either – invent another protocol?).
Resolution: I have not made any provision for Random Access.
That “other protocol” would take several more years to be invented, and then take over the networking world.
Thursday, January 12, 2023
It's probably a good thing some malformed URLs are considered “valid”
It seems it's all too easy to generate double slashes in the path component of a URL, because I received via email a report that my current feed files all had that issue.
Sigh.
I made a change a few months ago in how I internally store the base URL of my blog.
It used to be that I did not store the trailing slash
(so that "https://boston.conman.org/"
would be stored as "https://bost.conman.org"
)
so I had code to keep adding it back in when generating links.
I changed the code to store the tailing slash,
but missed one section of code because I don't subscribe to any of my feed files and didn't notice the issue.
I also fixed an actual crashing bug. All I have to say about that is that web robots are quite good at generating really garbage requests using a variety of methods—it's like free fuzz testing! Woo hoo! Sob!
Wednesday, January 11, 2023
It's apparently a valid URL, despite it being malformed in my opinion
I've had a few posts make it to the front page of Lobsters.
Lobsters supports webmention,
yet I never received a webmention for those two posts.
I checked the logs and yes,
they were received but I rejected them with a “bad request.”
It took a bit of sleuthing,
but I found the root cause—the URL of my post was,
accoring to my code,
invalid.
Lobsters was sending in a URL of the form https://boston.conman.org//2023/01/02.1
—notice the two slashes in front of the path.
My code was having none of that.
I'm not sure why Lobsters was sending a URL of that form as previous webmentions worked fine, but when I checked previous submissions to Lobsters I saw some of the links had a double slash in the path portion. As it's considered valid by the What Working Group? “living standard,” I ended up having to accept what I consider a malformed URL.
Sigh.
Monday, January 09, 2023
An epiphany about bloated web pages might be the result of a dumb network
I was scared by an epiphany I had the other day when I read a quote by John Carmack. But before I get to the quote and the ephiphany, I need to give some background to understand where I was, and where I am.
First, for the years I was working for The Corporation (and later, The Enterprise), I was in essense, working in telephony networking, and I was never a fan of telephony networking (the Protocol Stack From Hell notwithstanding).
Basically, the paradigm in telephony is a “smart network” and a “dumb edge.” All the “intelligence” of an application on telephony is on the network side of things—the “edge” here being the device the end user interacts with. In the old days, this was an on-off switch, a microphone and a speaker. Later models this device included a tone generator. So any features needed to be handled on the network side because the end user device (the “edge”) was incapable of doing much at all. If a person wants a new feature, they have to get it implemented on the entire network, or it's effectively not supported at all (because there's not much one can do with an on-off switch, speaker, microphone and a tone generator).
Contrast this with the Internet—it's a “dumb network” with a “smart edge”—all the network has to do is sling packets back and forth, not concerning itself with the contents. The “edge” in this case is (was?) a general purpose computer that can be programmed to do just about anything. So if a person wants a new feature, all that's needed is a program on at least two endpoints and said feature exists—there's no need to inform the rest of the network of it, as long as the “dumb network” can do its job and sling the data between the two endpoints. Want an alternative to the web? Just do it. Want an alternative to IRC? Just do it.
Second, I have always had a hard time understanding why people keep insisting on writing bespoke web browsers in JavaScript that just show text, when the user is already using a web browser has already been written to display text. The poster child for this (in my opinion) is the Portland Pattern Repository, a large repository of programming wisdom, that, for whatever reason, Ward Cunningham (creator of the site) felt that a normal web browser wasn't good enough to browse a text-only website and thus demands the latest and greatest in JavaScript conformance to view text. He's free to do so, but I find it annoying that I can no longer read a site I enjoyed (and even contributed to), just because I haven't updated my browser for the past twenty minutes. I'm not even asking to participate in editing the site any more, I just want to read it!
And finally we get to the John Carmack quote:
It is amusing to consider how much of the world you could serve something like Twitter to from a single beefy server if it really was just shuffling tweet sized buffers to network offload cards. Smart clients instead of web pages could make a very large difference.
Oh crap.
“Smart clients”—“smart edge.”
“Web pages”—“data.”
My dislike of the Portland Pattern Repository just got ran over by my liking of dumb networks and smart edges.
Ward Cunningham wants a smarter edge to view his site (and to “improve server performance” if you read the comments in the web page returned from the site) and I can't begrudge him that—I like smart edges! It makes more sense to me than a smart network. But at the same time, I want a web site to just return text to a “dumb browser,” even if the browser I'm using is not particularly dumb.
Do we, in fact, have too much intelligence in web servers? Do we want to push all the intelligence to the client? Do I have to reconcile my love of simple web clients and intelligent web servers with my love of the dumb network and smart edges? (And to spell it out—the “network” in this analogy is the web server and the “edge” is the web browser) Where does the simplicity need to reside?