The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Sunday, January 01, 2023

“I love the smell of black powder at night”

I think I'm resigned to the fact that every January 1st and July 4th I have to suffer living in a war zone. Unlike past years, Bunny and I decided to head outside and at least enjoy the show. The most impressive ones were at the other end of our street—huge ones shot perhaps only a couple hundred feet if that, in the air, with the embers nearly hitting the nearby roofs still lit accompanied by the loud thunderclap a second or two later.

Ouch.

Other than the loud explostions all around us, it was otherwise a quiet New Year.

Yeah.

HAPPY NEW YEAR!

Monday, January 02, 2023

Some notes on working with old C code

Recent postings have concerned testing software–therefore, how can we test trek? A goal is good to have, as that may influence what we test and how we go about it. Say we want to modernize the source code,

int main(argc, argv)
int argc;
char **argv;
{
    int ch;
    ...

perhaps on account of it using rather old C that modern C compilers might eventually come to hate. A code refactor may break the game in various ways; whether the code still compiles is a test, but may not catch the logic bugs we may introduce: sorting a list the wrong way, doubling damage numbers, things like that.

Trek

The post is briefly about testing the old game Star Trek that was popular in the 70s, and in this case, it's been ported to C probably sometime in the 80s. The game is interactive so writing any form of tests (much less “unit tests”) will be challenging, to say the least. (the post does go on to say that possibly using expect would probably be in order).

I have a bit of experience here with older C code. At one point, I got interested in Viola, an early graphical web browser from 1992, and is the type of C code that gives C a bad name. It was written in a transition period between K&R C and ANSI C, so it had to cater to both, so function prototypes were spotty at best. It also made several horrible assumptions about primitive data types—mainly that pointers, integers and long integers were all interchangable, and plenty of cases where signed quantities were compared to unsigned quantities.

Horrible stuff.

The first thing I did was rewrite the Makefile to simplify it. The original build system was a mess of scripts and over 7,000 lines of make across 48 files; I was able to get it down to just 50 lines of make in one file. Of course, I'm only interested in getting this to compile on POSIX systems with X Windows, so it's easy to simplify the build system.

Second step—crank the C compiler warnings to 11, and fix all the warnings. That's when I started finding all the buried bodies—longs and pointers interchanging, questionable casts, signed and unsigned comparisons and much much more. I got it to the point where it works on a 32-bit system, and it compiles on a 64-bit system but promptly crashes. And by “works” I mean, I can browse gopher with it, but trying to surf the modern web is laughably disasterous.

But I digress—the first is to crank the compiler warnings to 11 and fix all the warnings, then convert K&R C function declarations to ANSI C.

Does that count as “refactoring?” I personally don't think so—it's just mechanical changes and maybe fixing a few declarations from plain int to unsigned int (or size_t). And once that is done, then you can think about refactoring the code.


Discussions about this entry


It still surprises me what some find difficult to do

There's been ongoing discussions in Gemini about a webmention like mechanism. So I was intrigued by this statement:

The problem here is, that this mechanism includes some script that adds some complexity to the maintenance of the gemini capsule. As bacardi55 writes:

I do know that asking capsule owners to deploy a script will be the biggest concern here, but I guess there is always a "price to pay" … Yes it will require a CGI script, but it should be doable even with a small bash script to not add too much complexity in maintaining a capsule.

I agree that some kind of programming and scripting will be necessary to get notified. However I think that we can do it at least without a CGI-script. Here is the way I think I have found.

Gemlog responses - bacardi55's concept without CGI

And he goes on to implement a scheme that adds complexity to the configuration of the server, plus the issues with scheduling a program to scan the logfiles for Gemini requests. I've done the logfile scanning for “Project: Wolowizard” and “Project: Lumbergh” and it was not any easy thing to set up. Okay, in my case, it was checking the logs in real time to see if messages got logged as part of testing, but that aside, checking the logs for requests might not be straightforward. In this case, it soulds like he has easy access to the log files—but that is not always the case. There have been plenty of systems I've come across where normal users just don't have access to the logs (and I find it annoying, but that's a rant for another time). Then there's scheduling a script to run at a regular schedule. In the past, this would be cron and the bizarre syntax it uses, but I'm not sure what the new hipster Linux systemd way is these days (which itself is a whole bag of worms).

And it's not like the CGI script has to be all difficult. Here's a script that should work (it's somewhat untested—I have the concept tested and running on my Gemini server, as an an extension to my Gemini server and the CGI script below is based upon that extension):

#!/usr/bin/env lua

query = os.getenv("QUERY_STRING")
if query == "" then
  io.stdout:write("10 URL to send\r\n")
else
  query = query:gsub("%%%x%x", -- decode URL encoded data
            function(c)
              return string.char(tonumber(c:sub(2),16))
            end)
  mail = io.popen("/usr/sbin/sendmail me") -- send email
  if mail then
    mail:write(string.fomrmat([[
From: <me> (replace with real email address)
To: <me>
Subject: Mention, sir!

%s
]],query)
  io.stdout:write("20 text/plain\r\nIt has been accepted.\r\n")
end

os.exit(0)

Yes, this relies upon the email recipient to check the URI has the proper link, but it's simple and to the point. The only issue here is getting the Gemini server to run this script when /.well-known/mention is requested, and I feel that is easier than dealing with scanning logfiles and running cron jobs, but that's me.

As far as the actual proposal itself, I don't have much to comment about it, except to say I don't like the mandated text it uses in the link. I think just finding the link should be good enough. Even better in my mind would be two links, much like webmention uses, but that doesn't seem to be a popular opinion.


Discussions about this entry

Wednesday, January 04, 2023

Thoughts on an implementation of Gemini mentions

The other day I didn't have much to say about the Gemini Mentions proposal. Now that I've implemented it for my Gemini site (the code has been upated extensively since the other day), I have more thoughts.

First, having the location locked to /.well-known/mention works fine for single-user sites, but it doesn't work that well for sites that host multiple users under a single domain. Alice who has pages under gemini://example.com/alice/ and want to participate with Gemini mentions. So might Dave under gemini://example.com/dave/. Bob, who has pages under gemini://example.com/bob/ doesn't care, nor does Carol, under gemini://example.com/carol/. How to manage gemini://example.com/.well-known/mentions where half the users want it, and the other half don't? Having the ability to specify individual endpoints, say with a CGI script, would at least let Alice and Dave participate without having to bug the example.com admin to install a service under a single location.

Second, not every person may want to have every page to receive a mention. I know I don't—I want to restrict mentions to the blog portion of my Gemini site. The proposal only states that “a capsule owner MUST implement a basic endpoint: /.well-known/mention,” but it says nothing about limiting what pages can be the target of a mention. I suppose having a link to /.well-known/mentions on a page could indicate that page can receive mentions, but the implication is that the endpoint link doesn't have to be mentioned at all. For now, I just filter requests to my blog entries and for other pages I return a “bad request.”

Third, I'm still unsure about sending a single URI. My implementation does scan the given URI for links to my blog, and will grab the first link that matches a blog entry from the URI (and ignores other links to my Gemini site—see point above). Sending in two links, as in a webmention provides some form of check on the request.

Fourth, I don't check for the “RE:” in the link text as I don't think it's needed. The specification implies it has to be “RE:” (in all caps), but I can see “Re:” and “re:” being used as well, because humans are going to human and be lazy (or trollish and use “rE:” just to mess with people; or not include it at all).

I also did a second implemenation that addresses all these points (and the code for this version is very similar to the other one). I guess I'll see which one becomes more popular.


Discussions about this entry

Friday, January 06, 2023

“The street finds its own uses for things.”

There's a little bit of pushback on the whole Gemini mentions concept. Sandra wrote:

I had Atom and was pretty happy with that and people were like “why don’t you implement Gemini too” and I did and it was a bee and a half because back then almost no Gemini server supported different languages for different pages without serious hoops and then gmisub and then broken redirects and then dir traversal and then this and then that and then the other and after a while it’s all hacking and no writing.

I really, really don’t wanna implement this and that means either there’s a non-zero amount of grumpy grognards who don’t wanna do it (in which case you’re gonna have to use the other methods anyway, like Cosmos), so there’s no point in doing it, or I’m gonna get dragged kicking and screaming into doing it which I really hope does not happen.

I think bacardi55 is cool and I haven’t wanted to say anything about the project out of the “if you can’ [sic] say anything nice…” principle but then it seemed as if it were picking up steam and getting implemented.

Gemini mention, an ongoing discussion

I'm not familiar with the “was a bee and a half” idiom, but I suspect it means something like “annoying,” given the context. And if supporting Gemini was “annoying” then why even continue with it? The issues brought up, like the lack of per-page language support, were found by people trying to use Gemini, finding issues, and solving the issues. It would have been easy for most of the issues to be ignored, thanks to Gemini's “simplicity of implementatin über alles.” That would not have been a good idea long term, and thus, Gemini gets complex.

And Gemini mentions aren't mandatory, just like not every website supports webmentions. Don't like it? Don't bother with it. Taken to the limit, “I really hope does not happen” applied to Gemini means Gemini doesn't exist (and there are plenty of people who questioned the concept of Gemini).

And as bacardi55 said:

The main reason I "jumped" into this "issue" can be reduced to one sentence: I did it for me :)

Why did I work on gemini mention

If others find it useful, so be it. As William Gibson said: “The street finds its own uses for things.” Besides, given my past experience with the Gemini community, I think there will be only two sites supporting Gemini mentions.

Sunday, January 08, 2023

Today's date happens more frequently on Sunday than any other day of the week

Five years ago, I posted that January 8th is less like to occur on Monday. At the time, I just accepted it, but when I recently came across that post a few days ago, I figured I should actually see if that's true. I ran the numbers from 1583 (the first full year under the Gregorian calendar) to now:

Number of times January 8th fell on a day of the week, since 1583
Sunday 65
Friday 64
Tuesday 64
Wednesday 63
Thursday 62
Saturday 62
Monday 61

What are the odds I'd find this result on a Sunday? [High, given your results. —Editor] [Har har. —Sean] I was expecting the results to be nearly equal. I also find it funny that the actual average, 63, happens on Wednesday, the most average day of the week (you see, Wednesday being in the middle of the week and the average is … oh bother!). I wonder what causes this?


Discussions about this entry

Monday, January 09, 2023

An epiphany about bloated web pages might be the result of a dumb network

I was scared by an epiphany I had the other day when I read a quote by John Carmack. But before I get to the quote and the ephiphany, I need to give some background to understand where I was, and where I am.

First, for the years I was working for The Corporation (and later, The Enterprise), I was in essense, working in telephony networking, and I was never a fan of telephony networking (the Protocol Stack From Hell notwithstanding).

Basically, the paradigm in telephony is a “smart network” and a “dumb edge.” All the “intelligence” of an application on telephony is on the network side of things—the “edge” here being the device the end user interacts with. In the old days, this was an on-off switch, a microphone and a speaker. Later models this device included a tone generator. So any features needed to be handled on the network side because the end user device (the “edge”) was incapable of doing much at all. If a person wants a new feature, they have to get it implemented on the entire network, or it's effectively not supported at all (because there's not much one can do with an on-off switch, speaker, microphone and a tone generator).

Contrast this with the Internet—it's a “dumb network” with a “smart edge”—all the network has to do is sling packets back and forth, not concerning itself with the contents. The “edge” in this case is (was?) a general purpose computer that can be programmed to do just about anything. So if a person wants a new feature, all that's needed is a program on at least two endpoints and said feature exists—there's no need to inform the rest of the network of it, as long as the “dumb network” can do its job and sling the data between the two endpoints. Want an alternative to the web? Just do it. Want an alternative to IRC? Just do it.

Second, I have always had a hard time understanding why people keep insisting on writing bespoke web browsers in JavaScript that just show text, when the user is already using a web browser has already been written to display text. The poster child for this (in my opinion) is the Portland Pattern Repository, a large repository of programming wisdom, that, for whatever reason, Ward Cunningham (creator of the site) felt that a normal web browser wasn't good enough to browse a text-only website and thus demands the latest and greatest in JavaScript conformance to view text. He's free to do so, but I find it annoying that I can no longer read a site I enjoyed (and even contributed to), just because I haven't updated my browser for the past twenty minutes. I'm not even asking to participate in editing the site any more, I just want to read it!

And finally we get to the John Carmack quote:

It is amusing to consider how much of the world you could serve something like Twitter to from a single beefy server if it really was just shuffling tweet sized buffers to network offload cards. Smart clients instead of web pages could make a very large difference.

John Carmack Tweet

Oh crap.

“Smart clients”—“smart edge.”

“Web pages”—“data.”

My dislike of the Portland Pattern Repository just got ran over by my liking of dumb networks and smart edges.

Ward Cunningham wants a smarter edge to view his site (and to “improve server performance” if you read the comments in the web page returned from the site) and I can't begrudge him that—I like smart edges! It makes more sense to me than a smart network. But at the same time, I want a web site to just return text to a “dumb browser,” even if the browser I'm using is not particularly dumb.

Do we, in fact, have too much intelligence in web servers? Do we want to push all the intelligence to the client? Do I have to reconcile my love of simple web clients and intelligent web servers with my love of the dumb network and smart edges? (And to spell it out—the “network” in this analogy is the web server and the “edge” is the web browser) Where does the simplicity need to reside?


Discussions about this entry

Wednesday, January 11, 2023

It's apparently a valid URL, despite it being malformed in my opinion

I've had a few posts make it to the front page of Lobsters. Lobsters supports webmention, yet I never received a webmention for those two posts. I checked the logs and yes, they were received but I rejected them with a “bad request.” It took a bit of sleuthing, but I found the root cause—the URL of my post was, accoring to my code, invalid. Lobsters was sending in a URL of the form https://boston.conman.org//2023/01/02.1—notice the two slashes in front of the path. My code was having none of that.

I'm not sure why Lobsters was sending a URL of that form as previous webmentions worked fine, but when I checked previous submissions to Lobsters I saw some of the links had a double slash in the path portion. As it's considered valid by the What Working Group? “living standard,” I ended up having to accept what I consider a malformed URL.

Sigh.

Thursday, January 12, 2023

It's probably a good thing some malformed URLs are considered “valid”

It seems it's all too easy to generate double slashes in the path component of a URL, because I received via email a report that my current feed files all had that issue.

Sigh.

I made a change a few months ago in how I internally store the base URL of my blog. It used to be that I did not store the trailing slash (so that "https://boston.conman.org/" would be stored as "https://bost.conman.org") so I had code to keep adding it back in when generating links. I changed the code to store the tailing slash, but missed one section of code because I don't subscribe to any of my feed files and didn't notice the issue.

I also fixed an actual crashing bug. All I have to say about that is that web robots are quite good at generating really garbage requests using a variety of methods—it's like free fuzz testing! Woo hoo! Sob!

Monday, January 16, 2023

The other SFTP that never was

For reasons, I'm doing some research into the history of FTP when I come across an RFC for SFTP. Only this isn't the SFTP that is used today, but instead the Simple File Transfer Protocol from 1984. Unlike TFTP, it uses TCP, and unlike FTP, it only uses a single network connection.

But this bit is why I'm writing about this:

Random Access

Pro: Wouldn't it be nice if (WIBNIF) SFTP had a way of accessing parts of a file?

Con: Forget it, this is supposed to be SIMPLE file transfer. If you need random access use real FTP (oops, real FTP doesn't have random access either – invent another protocol?).

Resolution: I have not made any provision for Random Access.

That “other protocol” would take several more years to be invented, and then take over the networking world.

Thursday, January 19, 2023

The good news? Somebody wants to use my blogging engine. The bad news? Somebody wants to use my blogging engine

Over the 23 year history of mod_blog, I've given up on the notion of anyone other than me using it. There was only one other person who used it for just a few months before deciding blogging wasn't for him and that was way back in 2002. So it was completely by surprise that I recently received a bug report on it.

Oh my … someone else is trying to use it.

I never did fully document it. And there are, as I'm finding, an amazing number of things I'm assuming about the environment, such as:

And that's just off the top of my head. There's probably more assumptions made that I'm just not thinking of. It's issues like these where one can spend 90% of the time writing 90% of the code, and then spend another 90% of the time writing the final 10% of the code and documentation.

I'm also amused by the timing. Back in August, I removed a ton of optional code that I never used, and because no one else was using mod_blog, it was just sitting there untested. And now someone wants to use the code.

Heh.

But also, gulp! I've got 23 years of experience with the code, so I know all the ins and outs of using it. Documenting this? So someone else can use this? Good lord!

Monday, January 23, 2023

A few small differences

I received the following patch for my DNS library:

I am hoping to use this library to encode and decode mDNS queries and responses. It seems that the mDNS is mostly the same as unicast DNS, except for a few small differences which I aim to add to this PR as I encounter them.

Mdns mods by oviano · Pull Request #13 · spc476/SPCDNS

Those “few small differences” turn out not to be so small.

The main RFCs for mDNS appear to be RFC-6762 and RFC-6763 and to support them in full requires breaking changes to my library. The first are a bunch of flags, defined in RFC-6762 and it affects pretty much the entire codebase. The first deals with “Questions Requesting Unicast Responses.” Most flags are defined in the header section, but for this, it's “the top bit in the class field of a DNS question as the unicast-response bit.” And because mDNS specifically allows multiple questions, it's seems like it could be set per-question, and not per the request as a whole, as the RFC states: “[w]hen this bit is set in a question, it indicates that the querier is willing to accept unicast replies in response to this specific query, as well as the usual multicast responses.” To me, that says, “each resource record needs a flag for a unicast reponse.” The other bit the “outdated cache entry” bit. which again applies to individual resource records and not to the request as a whole. And again, to me, that says, “each resoure record needs a flag to invalidate previously cached values.”

How to handle this … well, one way would be to a Boolean field to each resource record type to hide protocol details (which was the point in this library frankly). But that can break existing code as the new fields will need initialization:

dns_question_t domain;

domain.name  = host;
domain.type  = RR_A;
domain.class = CLASS_IN;
domain.uc    = true; /* we want unicast reply */

/* and the other flag */

dns_a_t addr;

addr.name    = host;
addr.type    = RR_A;
addr.class   = CLASS_IN;
addr.ttl     = 0;
addr.ic      = true; /* invalidate cache data */
addr.address = address;

and document that the uc and ic fields are for mDNS use; if you aren't using mDNS, then they should be set to false.

Another approach is to leak protocol details and require the user to do something like:

/* We're making a query and want a unicast reply */
dns_question_t domain;

domain.name  = host;
domain.type  = RR_A;
domain.class = CLASS_IN | UNICAST_REPLY;

/* We're replying to a query and want to invalidate this record */
dns_a_t addr;

addr.name    = host;
addr.type    = RR_A;
addr.class   = CLASS_IN | INVALIDATE_CACHE;
addr.ttl     = 0;
addr.address = address;

And that's a less-breaking change, but on the decoding side, I still need some form of flag in the structure to indicate these flags were set because otherwise data is lost.

I'm not sure which approach is best. The first does a better job of hiding the DNS protocol details, but breaks more code. The second is less breaking, as I could ignore any cache flags on encoding, but it leaks details of DNS encoding to user code. I tend to favor the first but I really dislike the breaking aspect of it. And That's just the first RFC.

The other RFC utilizes what I consider to be an implementation detail of the DNS protocol to radically alter how I handle text resource records. The RFC that defined modern DNS, RFC-1035, describes the format for a text resource record, but is silent as to semantics.

Individual resource records come with a 16-bit length, so in theory, a resource record could be up to 65535 bytes in size, but it's rare to get a record that size. The base type of a text resource record is a “string.” and RFC-1035 defines a “string” as one byte for the length, followed by that many bytes as the contents. The length of a “string” is defined as one byte, which limits the length of 255 bytes in size. This means, in practice, that a text resource record can contain several “strings.”

How SPCDNS handles this now is that I assume a text resource record only has one value—a string:

typedef struct dns_txt_t        /* RFC-1035 */
{
  char const  *name;
  dns_type_t   type;
  dns_class_t  class;
  TTL          ttl;
  size_t       len;
  char const  *text;
} dns_txt_t;

When encoding such a record, I break the given string into as few DNS “strings” as possible. Give this a 300 byte string, and you get two DNS “strings” encoded, one being 255 byte long, and the other one 45 bytes long. Upon decoding, all the strings in a single text resource record are concatenated into a single string. As I said, DNS-1035 doesn't go into the semantics of a text resource record, and I did what I felt was best.

RFC-6763 uses the DNS “string” encoding for semantic information:

Apple TV - Office._airplay._tcp.local.	   10	IN	TXT	(
	"acl=0"
	"btaddr=00:00:00:00:00:00"
	"deviceid=A8:51:AB:10:21:AE"
	"fex=1d9/St5/FbwooQ"
	"features=0x4A7FDFD5,0xBC157FDE"
	"flags=0x18644"
	"gid=F014C3FF-1420-4374-81DE-237CD6892579"
	"igl=1"
	"gcgl=1"
	"model=AppleTV14,1"
	"protovers=1.1"
	"pi=c6fe9e6e-cec2-44c8-9c66-8994c6ad47"
	"depsi=4A342DB4-3A0C-47A6-9143-9F6BF83F0EDD"
	"pk=5ab1ac3988a6a358db0a6e71a18d31b8d525ec30ce81a4b7b20f2630449f6591"
	"srcvers=670.6.2"
	"osvers=16.2"
	"vv=2"
	)

I have to admit, this is ingenious—each DNS “string” here defines a name/value pair. But I did not see this use at all.

I wonder how much code out there dealing with DNS packets (not specifically mDNS) would treat these records:

	IN	TXT	"v=spf1 +mx +ip4:71.19.142.20/32 -all" 
	IN	TXT	"google-site-verification=XXXXXXXX­XXXXXXXX­XXXXXXXX­XXXXXXXX­XXXXXXXX­XXX"

the same way as:

	IN	TXT	(
		"v=spf1 +mx +ip4:71.19.142.20/32 -all" 
		"google-site-verification=XXXXXXXX­XXXXXXXX­XXXXXXXX­XXXXXXXX­XXXXXXXX­XXX"
	)

The first returns two text resource records, each consisting of a single DNS “string,” the second one text resource record but with two DNS “strings.” My gut feeling is “not many would deal with the second format” but I can't know that for sure.

And changing how I deal with text resource records in SPCDNS would be a major breaking change.

This is one change I really don't know how to approach.

Tuesday, January 24, 2023

Notes on an overheard conversation about “Muskrat Love” as it played on satellite radio

“If you would have asked me who sang that, I wouldn't have been able to answer.”

“Wow! Captain and Tennille! That takes me back.”

“Me too.”

“Do you want to know what else the Captain and Tennille reminds me of?”

“What?”

“The Bionic Watermelon.”

“What?”

The Bionic Watermelon.”

“You are weird, sir.”


Notes on a seriously first world problem

“채널 하나 둘 셋 …”

“Why is the TV speaking Korean?”

“채널 하나 둘 넷 …”

“I don't know. It just started happening!”

“채널 하나 둘 다섯 … ”

“Let me see … wait! The configuration menu is also in Korean!”

“당연하지 …”

“I guess we're just going to have to learn Korean.”

“무아하하하하 …”

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.