The Boston Diaries

Wednesday, January 01, 2025

Guess who made predictions for 2025? Can you say “Nostradamus?” I knew you could

Of course Nostradamus has predictions for 2025! When hasn't he had predictions for any given year?

Sigh.

So far, checking a few of the articles, not many have bothered to print the quatrains in question, and the one article (of which I hesitate to link to) I found that displays a translation of the quatrain, never bothered to list which quatrain it is.

And because the quatrains listed are translated, it's hard to locate the original in Nostradamus' writings.

For instance, this quatrain:

When the coin of leather rules,
The markets shall tremble,
The crescent and brass unite,
Gold and silver lose their value.

Doesn't seem to exist at all. Checking the version of Nostradamus at Project Gutenberg:

XXV.

French.

Par guerre longue tout l’exercite espuiser,
Que pour Soldats ne trouveront pecune,
Lieu d’Or, d’Argent cair on viendra cuser,
Gaulois Ærain, signe croissant de Lune.

English.

By a long War, all the Army drained dry,
So that to raise Souldiers they shall find no Money,
Instead of Gold and Silver, they shall stamp Leather,
The French Copper, the mark of the stamp the new Moon.

ANNOT.

This maketh me remember the miserable condition of many Kingdoms, before the west-Indies were discovered; for in Spain Lead was stamped for Money, and so in France in the time of King Dagobert, and it seemeth by this Stanza, that the like is to come again, by reason of a long and tedious War.

The true prophecies or prognostications of Michael Nostradamus, physician to Henry II. Francis II. and Charles IX. Kings of France, and one of the best astronomers that ever were.

This is the only quatrain where “leather” appears. And there's nothing in that quatrain about gold and silver losing their value. Moving on, another quatrain from the article I was able to locate:

4. The Surge of Natural Disasters

Nostradamus warned of a year marked by hurricanes, tsunamis, and earthquakes, driven by geological instability, solar activity, and climate change. His depiction of “hollow mountains” and poisoned waters paints a grim picture of devastation, particularly in vulnerable regions like the Amazon rainforest.

“Garden of the world near the new city,
In the path of the hollow mountains:
It will be seized and plunged into the Tub,
Forced to drink waters poisoned by sulfur.”

The confluence of these natural calamities could accelerate global efforts to combat climate change and reimagine disaster resilience. Yet, the cost in lives, resources, and environmental destruction underscores the urgent need for collective action before catastrophe becomes routine.

And let's see what the commentary from the 1600s said about this quatrain:

XLIX.

French.

Jardin du Monde aupres de Cité neufve,
Dans le chemin des Montagnes cavées,
Sera saisi & plongé dans la Cuve,
Beuvant par force eaux Soulphre envenimées.

English.

Garden of the World, near the new City,
In the way of the digged Mountains,
Shall be seized on, and thrown into the Tub,
Being forced to drink Sulphurous poisoned waters.

ANNOT.

This word Garden of the World, doth signifie a particular person, seeing that this Garden of the World was seized on and poisoned in a Tub of Sulphurous water, in which he was thrown.

The History may be this, that Nostradamus passing for a Prophet and a great Astrologer in his time, abundance of people came to him to know their Fortunes, and chiefly the Fathers to know that of their Children, as did Mr. Lafnier, and Mr. Cotton, Father of that renowned Jesuit of the same name, very like then that Mr. du Jardin having a son did ask Nostradamus what should become of him, and because his son was named Cosmus, which in Greek signifieth the World, he answered him with these four Verses.

Garden of the World, for Cosmus of the Garden, In his travels shall be taken hard by the New City, in a way that hath been digged between the Mountains, and there shall be thrown in to a Tub of poisoned Sulphurous water to cause him to die, being forced to drink that water which those rogues had prepared for him.

Those that have learned the truth of this History, may observe it here. This ought to have come to pass in the last Age, seeing that the party mentioned was then born when this Stanza was written, and this unhappy man being dead of a violent death, there is great likelyhood, that he was not above forty years old.

There is another difficulty, to know which is that new City, there being many of that name in Europe, nevertheless the more probable is, that there being many Knights of Maltha born in Provence (the native Countrey of our Author) it may be believed that by the new City he meaneth the new City of Maltha called la Valete, hard by which there is paths and ways digged in the Mountains, which Mountains are as if it were a Fence and a Barricado against the Sea, or else this Cosmus might have been taken by Pyrats of Algiers, and there in the new City of the Goulette be put to death in the manner aforesaid.

Nothing about it being 2025 when this comes to pass. Nothing about hurranes, tsunamis or earthquakes. It's almost as if Nostradamus was being intentionally vague about his prophesies. It could very well be about Naples, Italy, seeing how it's on the coast nestled in between volcanoes.

Or maybe Los Angeles. Yes, it's Los Angeles, land of Shake and Bake.

Of the other five “Nostradamus prophesies” mention in the aricle, none were written by the man. It's almost as if one could just make up Nostradamus prophesies. Why not?

HAPPY NEW YEAR!

Friday, January 03, 2025

It's more like computer security theater than actual security

In w3m, to edit a form textarea,
    ...
    f = fopen(tmpf, "w");
    if (f == NULL) {
        /* FIXME: gettextize? */
        disp_err_message("Can't open temporary file", FALSE);
        return;
    }
    if (fi->value)
        form_fputs_decode(fi->value, f);
    fclose(f);

    if (exec_cmd(myEditor(Editor, tmpf, 1)->ptr))
            goto input_end;
    ...
exec_cmd is some setup and teardown around a system(3) call with the user's editor and the temporary file. This is not good for security, as it allows w3m to execute by default anything. One tentative improvement would be to only allow w3m to execute a wrapper script, something like
    #!/bin/sh
    exec /usr/bin/vi -S "$@"
or some other restricted editor that cannot run arbitrary commands nor read from ~/.ssh and send those files off via internet connections. This is better, but why not disallow w3m from running anything at all?
    if (pledge(
          "cpath dns fattr flock inet proc rpath stdio tty unveil wpath",
          NULL) == -1)
       err(1, "pledge");
Here we need the “proc” (fork) allow so downloads still work, but “exec” is not allowed. This makes it a bit harder for attackers to run arbitrary programs. An attacker can still read various files, but there are also unveil restrictions that very much reduce the access of w3m to the filesystem. An attacker could make DNS and internet connections, though fixing that would require a different browser design that better isolates the “get stuff from the internet” parts from the “try to parse the hairball that is HTML” code, probably via imsg_init(3) on OpenBSD, or differently complicated to download to a directory with one process and to parse it with another. That way, a HTML security issue would have a more difficult time in getting out to the interwebs.

Security Hoop

What I find annoying is the lack of any type of attack as an example. It's always “data from da Intarwebs bad!” without regard to how it's bad. The author just assumes that hackers out there have some magical way of executing code on their computer just by the very act of downloading a file. The assumption that some special sequence of HTML can open a network connection to some control server in Moscow or Beijing or Washington, DC and siphon off critical data is just … I don't know, insane to me. Javascript, yes, I can see that happening. But HTML?

And then I recall the time that Microsoft added code to their programs to scan JPEG images for code and automatically execute it, and okay, I can see why maybe the cargo cult security mumbo-jumbo exists.

What I would like to see how opening a text editor with the contents of an HTML <TEXTAREA> could be attacked. What are the actual attack surfaces? And no, I won't accept “just … bad things, man!” as an answer. What, exactly?

One possible route would be ECMA-35 escape sequences, specifically the DCS and OSC sequences (which could be used to control devices or the operating system respectively), although I don't know of any terminal emulator today that supports them. Microsoft did add an escape sequence to reprogram the keyboard (ESC “[” key-code “;” string “p”) but that's in the “private use” area set aside for vendors.

This particular attack vector might work if the editor is running under a terminal or terminal emulator that support it, and the editor in question doesn't remove or escape the raw escape sequence codes. I tried a few text editors on the following text (presented as a hexadecimal dump to show the raw escape sequence):

00000000: 54 68 69 73 20 69 73 20 1B 5B 34 31 6D 72 65 64 This is .[41mred
00000010: 1B 5B 30 6D 20 74 65 78 74 2E 0A 0A             .[0m text...

None of the editors I tried (which are all based on the command line and thus, use escape sequences themselves to display text on a terminal) displayed red text. The escape sequence wasn't run as an escape sequence.

Another attack might embedding editor-specific commands within the text. This is a common aspect of some editors, like vi. And I can see this being concerning, especially if the commands one can set in a text file include accessing arbitrary files or running commands.

A third attack could be an attempt to buffer overflow the editor, either by sneaking in a huge download (like say, a file with a single one gigabyte line) or erroneous input (for example, if the editor expects a line to end with a CR and LF, send an LF then CR). Huge input is a bit harder to hide, but suble erroneous input could cause issues.

This is why I feel such articles are bad—by not talking about actual threats they enforce a form of “learned helplessness.” Everything is dangerous and we must submit to onerous measures to keep ourselves safe. Sprinkling calls to pledge() aren't the answer. Yes, it helps, but not thinking critically about security leads to a worse experience overall, such as having to manually edit a file which would still be subject to all three of the above attacks anyway. By identifying the attacks, then a much better way to mitigate the attacks could be found (in this case, an editor that strips out escape sequences and does not support embedded commands; and yes, I know I have a minority opinion here—sigh).

And to address the bit about parsing HTML—is parsing really that fraught with danger? All you need to parse HTML is to follow the explicit (and in excruciating detail) HTML5 specification. How hard can that be?

Saturday, January 04, 2025

It's still cargo cult computer security

My first question to you, as someone who is, shall we say, “sensitive” to security issues, why are you exposing a network based program to the Internet without an update in the past 14 years?

Granted, measures such as ASLR and W^X can make life more difficult for an attacker, and you might notice w3m crashing as the attackers try to get the stars to line up for their ROP gadget to work as you (or some automation) try to download a malicious page over and over. Or, you could get unlucky and they are now running whatever code they want, or reading all your files.

Attacks

I have my own issues with ASLR (I think it's the wrong thing to do—much better would have been to separate the stack into two, a return stack and a parameter (or data) stack, but I suspect we won't ever see such an approach because of the entrenchment of the C ABI) so I won't get into this.

What I would like to see how opening a text editor with the contents of an HTML <TEXTAREA> could be attacked. What are the actual attack surfaces? And no, I won't accept “just … bad things, man!” as an answer. What, exactly?

Where is your formal verification for the lack of errors?

I did not assert the code was free of error. I was asking for examples of actual attacks.

Otherwise, there is some amount of code executed to make that textarea work, all of which is the “actual attack surface”. If you look at the CVE for w3m (nevermind the code w3m uses from SSL, curses, iconv, intl, libc, etc.) one may find:

Format string vulnerability in the inputAnswer function in file.c in w3m before 0.5.2, when run with the dump or backend option, allows remote attackers to execute arbitrary code via format string specifiers in the Common Name (CN) field of an SSL certificate associated with an https URL.

w3m before 0.3.2.2 does not properly escape HTML tags in the ALT attribute of an IMG tag, which could allow remote attackers to access files or cookies.

Buffer overflow in w3m 0.2.1 and earlier allows a remote attacker to execute arbitrary code via a long base64 encoded MIME header.

Was that so hard?

The first bug you mention, the “format string vulnerability” seems to be related to this one-line fix (and yes, I did download the source code for this):

@@ -1,4 +1,4 @@
-/* $Id: file.c,v 1.249 2006/12/10 11:06:12 inu Exp $ */
+/* $Id: file.c,v 1.250 2006/12/27 02:15:24 ukai Exp $ */
 #include "fm.h"
 #include <sys/types.h>
 #include "myctype.h"
@@ -8021,7 +8021,7 @@ inputAnswer(char *prompt)
 	ans = inputChar(prompt);
     }
     else {
-	printf(prompt);
+	printf("%s", prompt);
 	fflush(stdout);
 	ans = Strfgets(stdin)->ptr;
     }

It would be easy to dimiss this as a rookie mistake, but I admit, it can be hard to use C safely, which is why I keep asking for examples and in some cases, even a proof-of-concept so others can understand how it works, and how to mitigate them.

But just keep crying pledge() and see how things improve.

The second bug you mentioned seems to be CVE-2002-1335, which is 23 years old by now and none of the links on that page show any details about this bug. I also fail to see how this could lead to an “arbitrary file access” back to the attacker unless there's some additional JavaScript required. The constant banging on the pledge() drum does nothing to show how such an attack works so as to educate programmers on what to look for and how to think about mitigations. When I asked “What are the actual attack surfaces?” I actually meant that. How does this lead to an “arbitrary file access?” It always appears to be “just assume the nukes have been launched” type of rhetoric. It doesn't help educate us “dumb” programmers. Please, tell me, how is this exploitable? Or is that forbidden knowledge not to be given out for fear it will be used by those less intentioned?

This is the crux of my frustration here—all I see is “programs bad, mmmmmmkay?” and magic pixie dust to solve the issues.

I've had to explain to programmers in a well regarded CSE department recently why their code was … sub-optimal. Less polite words could be used. They were running remote, user-supplied strings through a system(3) call, and it took a few emails to convince them that this was kind of bad.

And I can bitch about having to teach opererations how to configure syslog and “no, we can't have a single configuration file for two different, geographical sites and besides, we maintain the configuration files, not you!” so this cuts both ways.

Moreover, it's fairly simple to pledge and unveil a process to remove classes of system calls (such as executing other programs) or remove access to swathes of the filesystem (so an attacker will have a harder time to run off with your SSH keys).

…

And how, exactly, is adding pledge and unveil onerous? …

Easy huh?

The man page doesn't say anything about limiting calls to open(). It appears that is handled by unveil() which doesn't seem all that easy to me:

… Directories are remembered at the time of a call to unveil(). This means that a directory that is removed and recreated after a call to unveil() will appear to not exist.

…

unveil() use can be tricky because programs misbehave badly when their files unexpectedly disappear. In many cases it is easier to unveil the directories in which an application makes use of files.

unveil(2) - OpenBSD manual pages

To me, I read “in some cases, code may be difficult to debug.”

And while it may be easy for you to add a call to unveil() or pledge(), I assure you that it's not at all easy for the kernel to support such calls. Now, in addition to all the normal Unix checks that need to happen (and in the past, gone wrong on occasion) that a whole slew of new checks need to be added which complicate the kernel. Just as an example, pass “dns” promise to pledge() and the calls to socket(), connect(), sendto() and recvfrom() are disabled until the file /etc/resolv.conf is opened. Then they're enabled, but probably only to allow UDP port 53 through. Unless the “inet” promise is given, then socket(), connect(), etc. are allowed. That's … a lot of logic to puzzle through. And as someone who doesn't trust programmers (as you stated), this isn't a problem for you?

As a programmer, it can also make it hard to reason about some scenarios—like, if I use “stdio” promise, but not the “inet” promise, can I open files served up by NFS? I mean, probably, but “probably” isn't “yes” and there are a lot of programming sins commited because “it worked for me.”

I did say that using pledge() helps, but it doesn't solve all attacks. For instance, there's not special promise I can give to pledge() that states “I will not send escape codes to the terminal” even though that's an attack vector, espcially if the terminal in question supports remapping the keyboard! Any special recomendations for that attack? Do I really need to embed \e[13;"rm -rf ~/*"p to drive the point home?

Also (because I do not use OpenBSD) do I still have access to every system call after this?

pledge(
    " stdio rpath wpath cpath  dpath     tmppath inet   mcast"
    " fattr chown flock unix   dns       getpw   sendfd recvfd"
    " tape  tty   proc  exec   prot_exec settime ps     vminfo"
    " id    pf    route wroute audio     video   bpf    unveil"
    "  error");

If not, why not? That's a potential area to look for bugs.

How, exactly, is adding pledge and unveil to w3m “helplessness”, and then iterating on that design as one gains more experience?

As you said yourself: “I do not trust programmers (nor myself) to not write errors, so look to pledge and unveil by default, especially for ‘runs anything, accesses remote content’ browser code.” What am I to make of this, except for “Oh, all I have to do is add pledge() and unveil() to my program, and then it'll be safe to execute!”

In my opinion, banging on the pledge() drum doesn't help educate programmers on potential problems. It doesn't help programmers to write code to be anal when dealing with input. It doesn't help programmers to think about potential exploits. It just punts the problem with magic pixie dust that will solve all the problems.

… It took much less time to add to w3m than writing this post did; most of the time for w3m was spent figuring out how to disable color support, kill off images, and to get the CFLAGS aright. It is almost zero maintenance once done and documented.

What, exactly, is your threat model? Because that's … I don't know what to say. You remove features just because they might be insecure. I guess that's one way to approach security. Another approach might be to cut the network cable.

I only ask as I was hacked once. Bad. Lost two servers (file system wiped clean), almost lost a third. And you know what? Not only did it not change my stance around computer security, there wasn't a XXXXXXXXXX thing I could do about it either! It was an inside job. Is that part of your threat model?

By the way, /usr/bin/vi -S is used to edit the temporary file. This does a pledge so that vi cannot run random programs.

But what's stopping an attacker from adding commands to your ~/.bashrc file to do all the nasty things it wants do to the next time you start a shell? That's the thing—pledge() by itself won't stop all attacks, but by dismissing the question of “what attack surfaces” can lead one to believe that all that's needed is pledge(). It leads (in my opinion) to a false sense of security.

It is rather easy to find CVE for errors in HTML parsing code, besides the “did not properly escape HTML tags in the ALT attribute” thing w3m was doing that lead to arbitrary file access.

CVE-2021-23346, CVE-2024-52595, CVE-2022-0801, CVE-2021-40444, CVE-2024-45338, CVE-2022-24839, CVE-2022-36033, CVE-2023-33733, …

You might want to be more careful in the future, as one of those CVE's you listed has nothing do to with parsing HTML. I'll leave it as an exercise for you to find which one it is.

I also get the feeling that we don't see eye-to-eye on this issue, which is normal for me. I have some opinions that are not mainstream, are quite nuanced, and thus, aren't easy to get across (ask me about defensive programming sometime).

My point with all this—talk about computer security is all cargo cultish and is not helping with actual computer security. And what is being done is making other things way more difficult than it should be.

Sunday, January 05, 2025

Security Theater

Also, Linux is getting a landlock thing, which sounds maybe a bit like unveil. Are they likewise deluded, or maybe there's something useful about this class of security thingymabobber, especially with “defense in depth” in mind?

Tradeoffs

An aspect I think you are discounting is the effort required to implement the mitigations. While plege() and unveil() are simple to use, their implementation is anything but. Just from reading the man pages, it appears there are exceptions, and then exceptions to the exceptions, that must be supported. What makes Linux or OpenBSD different than other pieces of software, like openssl?

Sure, such things help overall but as you state, there are tradeoffs—and a big one I see is adding complexity to an already complex system. And in my experience, security makes it harder to diagnose issues (one exaple from work—a piece of network equipment was “helpfully” filtering network traffic for exploits, making it difficult to test our software properly, you know, in the absense of such technology).

A different take is that pledge and unveil, along with the various other security mitigations, hackathons, and so forth, are a good part of a healthy diet. Sure, you can still catch a cold, but it may be less bad, or have fewer complications.

I also think you are discounting the risk compensation that this may cause With all these mitigations, what incentives are there for a programmer to be careful in writing code? One area I think we differ in is just how much of a crutch such technology becomes.

If you don't want that defense in depth, eh, you do you.

It's less that I don't want defense in depth (and it's sad to live in a world where that needs to be the default stance) but that you can do everything “by the book” and still get blindsided. I recall the time in the early 90s when I found myself logged into the university computer I used and saw myself also logged in from Russia, all because of a Unix workstation in a different department down the hall had no root password and running a program sniffing the network (for more perspective—at the time the building was wired with 10-Base-2, also known as “cheap-net,” in which all traffic is transmitted to all stations, and the main campus IT department was more concerned with its precious VAX machine than supporting departments running Unix).

My first encounter with the clown show that is “computer security” came in the late 90s. At the time, I was working at a small web-hosting company when a 500+ page report was dumped on my desk (or rather, a large PDF file in my email) with the results of a “PCI compliance scan” on our network. It was page after page of “Oh My God! This computer has an IP address! This computer responds to ping requests! Oh My God! This computer has a web site on it! And DNS entries! Oh My XXXXXXX God! You handle email!”

For. Every. Single. Web. Site. And. Computer. On. Our. Network.

It was such an obviously low effort report with so much garbage, it was difficult to pull out the actual issues with our network. You know what would have been nice? Recognition what we were a web hosting company in addition to handling email and DNS for our customers. Maybe a report broken down by computer, maybe in a table format like:

Hypothetical report of a network scan
IP address	protocol/port	port name	notes
192.0.2.10	ICMP echo	ping	see Appendix A
	TCP port 22	SSH	UNEXPECTED—see Appendix D
	TCP port 25	SMTP	Maybe consolidate email to a single server—see Appendix B
	TCP port 53	DNS	DNS queries resolve—see Appendix C
	UDP port 53	DNS	DNS queries resolve—see Appendix C
	TCP port 80	HTTP
	TCP port 443	HTTPS
192.0.2.11	ICMP echo	ping	see Appendix A
	TCP port 22	SSH	UNEXPECTED—see Appendix D
	TCP port 25	SMTP	Maybe consolidate email to a single server—see Appendix B
	TCP port 53	DNS	DNS queries resolve—see Appendix C
	UDP port 53	DNS	DNS queries resolve—see Appendix C
	UDP port 69	TFTP	UNEXPECTED—see Appendix D
	TCP port 80	HTTP
	TCP port 443	HTTPS

Where Appendix A could explain why supporting ping is questionable, but allowable, Appendix B could explain the benefits of consolidating email on a machine that doesn't serve email, and Appendix C could explain the potential data leaks of a DNS server that resolves non-authoritative domains, which in our case, was the real issue with our scan but was buried in just a ton of nonsense results with the assumption that we have no clue what we're doing (at least, that's how I read the 500+ page report).

The hypothetical report above shows SSH being open on the boxes—fair enough. A common security measure to to have a “SSH jump server” that is specifically hardened to only expose SSH one one host, and the rest only accept SSH connections on a (preferrably) separate “management” interface with private IP addresses. And oh, we're running TFTP on a box—again we should probably have a separate system on a “management” interface running TFTP to backup our router configs.

But such a measured, actionable report takes real work to generate. Much much easier to just dump a raw network scan with scary jargon.

And since then, most talk of “computer security” has, in my experience, been mostly of the breathless “Oh My God You're Pwned!” scare tactic variety.

My latest encounter with “computer security” came a few years ago at The Ft. Lauderdale Office of the Corporation, when our new Overlords wanted to change how we did things. The CSO visited and informed us that they were going to change how we did security, and in the process make our jobs much more difficult. It turns out it wasn't because our network or computers were insecure—no! Our network had a higher score (according to some networking scoring company—think of the various credit scoring companies but for corporate networks) than our new parent company (almost a perfect score). No, it came down to “that's not how we do things. We're doing it, our way!” And “their way” was just checking off a list of boxes on some list as cheaply as possible.

I think another way we differ is in how much we think “computer security” has become a cargo cult.

Update on Monday, January 6^th, 2025

This thread on Lobsters is a perfect example of the type of discussion I would like to see around security. Especially on-point is this comment: “… the [question] I was actually asking: ‘Why is it dangerous, so I can have a better mental model of danger in the future?’”

Tuesday, January 07, 2025

I am Socrates

I tried reading this with an open mind, but then I came across this:

This is a very easy fix. If I paste the error back into the LLM it will correct it. Though in this case, as I’m reading the code, it’s quite clear to me that I can just delete the line myself, so I do.

Via Lobsters, How I program with LLMs

My initial reaction to this was Woah there buddy! Are you sure you want to use your brain? Yes, caustic sarcasm is not a pretty reaction but I am what I am. [A reactionary cynical neo-Luddite? —Editor] [Shut up you! —Sean] Further down the page, the author presents some code the LLM wrote and then says:

Exactly the sort of thing I would write!

And I'm like, Yeah, you have 30 years of programming experience backing that up. What about programmers today who don't have that experience? They just accept what's given to them uncritically? [Yup, A reactionary cynical new-Luddite. —Editor] [Sigh. —Sean] At least the code in question were unit tests and it wasn't he who had to write unit tests for AI written code (which was my fear just prior to leaving The Enterprise).

But reading further, I can't help but think of Socrates:

For this invention will produce forgetfulness in the minds of those who learn to use it, because they will not practice their memory. Their trust in writing, produced by external characters which are no part of themselves, will discourage the use of their own memory within them. You have invented an elixir not of memory, but of reminding; and you offer your pupils the appearance of wisdom, not true wisdom, for they will read many things without instruction and will therefore seem to know many things, when they are for the most part ignorant and hard to get along with, since they are not wise, but only appear wise.

Plato rejects writing by the mouth of Socrates

While that's true to some degree, over the past 2½ millenium since then, it's been, overall and in my opinion, a positive thing. But then again, writing and books have been a part of my world since I was born, so it's the natual part of the way the world works:

Anything that is in the world when you're born is normal and ordinary and is just a natural part of the way the world works. Anything that's invented between when you're fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it. Anything invented after you're thirty-five is against the natural order of things.

Douglas Adams, The Salmon of Doubt

Can you guess I'm older than thirty-five?

So I'm resigned to the fact that this is our new reality—programmers will use AI (against my better judgement but nobody asked me—it really is alien to my way of thinking) and it's for the future to see if it was worth it in the long term.

But in the mean time, I am Socrates (and no, the irony that his thoughts on writing were written down is not lost on me).

Friday, January 17, 2025

These robots enable employment

An incredible video about the development of robots not solely controlled by software but by people that enable them to work jobs they otherwise could not do so. While I guess you could technically call these “robots,” they come across more as “waldos,” devices that enable people to physically work from a remote location. In any case, I think it's a fantastic use of technology.

Saturday, January 18, 2025

I bet this comes with an automatic compacting bit-bucket for disposing of all that network noise

Setting up a media server on a PC or using a computer as a network audio renderer (endpoint) is easy nowadays. But the problem with computers is that they were never designed with audio in mind. While there are improvements for USB-based playback available (such as our JCAT USB Card FEMTO or JCAT USB Isolator), the network controller part of a PC remains noisy. JCAT delivers the solution with the NET Card FEMTO – the ultimate network interface designed specifically for transferring high-quality audio over LAN.

…

The sound image becomes crystal-clear: transparent, quiet, smooth and yet full of fine details you have never heard before. It will allow you to experience music at much deeper level.

NET CARD FEMTO - JCAT . precision sounds.

There are times when I think, are there people who actually buy this stuff? And yet, I come across this page:

The XACT PHANTOM™ USB cable is the ultimate choice for discerning audiophiles seeking unparalleled precision and natural sound. Handcrafted with meticulous attention to detail, each cable takes over 7 hours to complete, ensuring unmatched quality and performance. Our proprietary design includes precise mechanical and impedance pairing of the conductors, as well as a highly specialized twisting process. This meticulous construction is key to eliminating interference and preserving the purity of the audio signal.

The XACT PHANTOM™ USB cable features custom-designed aluminum connectors, engineered to provide a secure and stable connection. The result is a cable that delivers remarkable clarity, preserving the full natural richness of your music across the entire frequency range.

PHANTOM CABLES – XACT Audio

And now I'm thinking, I'm in the wrong industry! What's wrong with separating rich-yet-stupid audiophiles from their money? It's just too bad that the market for Eberhard Faber Design Art Marker No. 255 has, if you'll pardon the pun, dried up.

Sunday, February 02, 2025

Artisanal code to solve an issue only I have

Update on Tuesday, February 4^th, 2025

The code presented below has a bug that has been fixed.. The code linked to below contains the current fixed code. That is all.

Now on with the original post …

I'm still using my ISP, despite the repeated letters that my service will go away at some point. But in the meantime, they keep reissuing a new IP address every so often just to reiterate their dedication to their serving up a dynamic IP address at no addtional cost to me. One of the casualties of their new policy is the monitoring of the system logs on my public server. I used to send syslog output from my public server to my development system at home, just to make it easier to keep an eye on what's happening. No more.

What I needed was a reverse-proxy type of thing—where the client (my development machine) connects to the server, then the server sends a copy of the logs down the connection. A separate program would be easier to write then to modify the exiting syslog daemon I'm using. It was a simple matter of telling the syslog daemon to forward a copy of all the logs to another program on the same system. Then I just had to write that program. To begin with, I need to load some modules:

local syslog  = require "org.conman.syslog"
local signal  = require "org.conman.signal"
local errno   = require "org.conman.errno"
local tls     = require "org.conman.nfl.tls"
local nfl     = require "org.conman.nfl"
local net     = require "org.conman.net"

The nfl module is my “server framework” for network based servers. Each TCP or TLS connection will be run on its own Lua thread, making the code easier to write than the typical “callback hell” that seems to be popular these days. I still need to make some low-level network calls, so I need the net module as well.

On to the configuration:

local SYSLOG  = "127.0.0.1"
local HOST    = "brevard.conman.org"
local CERT    = "brevard.conman.org.cert.pem"
local KEY     = "brevard.conman.org.key.pem"
local ISSUER  = "/C=US/ST=FL/O=Conman Laboratories/OU=Security Division/CN=Conman Laboratories CA/emailAddress=ca@conman.org"
local clients = {}

I didn't bother with a configuration file. This whole code base exists to solve an issue I have as simply as possible. At this point, a configuration file is overkill. The SYSLOG variable defines the address this server will use to accept output from syslog. Due to the way my current syslog daemon works, the port number it uses to forward logs is hard coded, so no need to specify the port. I'm going to run this over TLS because, why not? The tls module makes it easy to use, and it will make authentication trivial for this program. The CERT and KEY are the certificates needed, and these are generated by some scripts I wrote to play around with running my own simple certificate authority. My server is set to accept certificates signed by my simple certificate authority, which you can see in the definition of the ISSUER variable.

The clients variable is to track the the clients that connect to collect syslog output. Even though I'll only ever have one client, it's easy enough to make this an array.

local laddr = net.address(SYSLOG,'udp',514)
local lsock = net.socket(laddr.family,'udp')
lsock:bind(laddr)

nfl.SOCKETS:insert(lsock,'r',function()
  local _,data,err = lsock:recv()
  if data then
    for co in pairs(clients) do
      nfl.schedule(co,data)
    end
  else
    syslog('error',"recv()=%s",errno[err])
  end
end)

And now we create the local socket to receive output from syslog, and then add the socket to a table of sockets the framework uses, telling it to handle “read-ready” events. The data is read and then for each thread (Lua calls them “coroutines”) in the clients list, we schedule said thread to run with the data received from syslog.

local okay,err = tls.listen(HOST,514,client_main,function(conf)
  conf:verify_client()
  return conf:keypair_file(CERT,KEY)
     and conf:protocols("tlsv1.3")
end)

if not okay then
  syslog('error',"tls.listen()=%s",err)
  os.exit(1,true)
end

signal.catch('int')
signal.catch('term')

nfl.server_eventloop(function() return signal.caught() end)
os.exit(0,true)

And before we get to the routine that handles the clients, this is the code that creates a listening socket for TLS connections. We configure the listening socket to require the client send a certificate of its own (this is half of the authentication routine) and the certificates required to secure the connection, and the minimum protocol level. There's some error checking, setting up to catch some signals, then we start the main loop of the framework, which will terminate upon receiving a SIGINT (interrupt) or SIGTERM (terminate).

And finally, the code that runs on each TLS connection:

local function client_main(ios)
  ios:_handshake()
  
  if ios.__ctx:peer_cert_issuer() ~= ISSUER then
    ios:close()
    return
  end
  
  syslog('info',"remote=%s",ios.__remote.addr)
  clients[ios.__co] = true
  
  while true do
    local data = coroutine.yield()
    if not data then break end
    local okay,errmsg = ios:write(data,'\n')
    if not okay then
      syslog('error',"tls:read() = %s",errmsg)
      break
    end
  end
  
  syslog('info',"remote=%s disconnecting",ios.__remote.addr)
  clients[ios.__co] = nil
  ios:close()
end

The handshake is required to ensure that the client certificate is fully sent before we can check the issuer of said certificate. This is the extent of my authentication—I check that the certificate is issued from my simple certificate authority and not just any random but valid certificate being presented. Yes, there is a chance someone could forge a certificate claiming to be from my simple certificate authority, but to get such a certificate, some real certificate authority would need to issue someone else a certificate that maches the issuer on my certificates. I'm not seeing that happening any time soon (and if that happens, there are bigger things I need to worry about).

Once I've authenticated the certificate, I then pause the thread, waiting for data from the UDP socket (see above). If there's no data, then the client has dropped the connection and we exit out of the loop. We then write the data from syslog to the client and if that fails, we exit out of the loop.

Once out of the loop, we close the connection and that's pretty much all there is to it.

Yes, I realize that the calls to syslog() will be sent to the syslog daemon, only to be passed back to this program, but at least there's a log of this on the server.

I should also note that I do not attempt to track which logs have been sent and which haven't—that's a deliberate design decision on my part and I can live with missing logs on my development server. The logs are still recorded on the server itself so if it's important, I still have them, and this keeps this code simple.

The client code on my development server is even simpler:

local clock  = require "org.conman.clock"
local signal = require "org.conman.signal"
local tls    = require "org.conman.net.tls"
local net    = require "org.conman.net"

local SYSLOG = "192.168.1.10"
local HOST   = "brevard.conman.org"
local CERT   = "/home/spc/projects/CA/ca/intermediate/certs/sean.conner.cert.pem"
local KEY    = "/home/spc/projects/CA/ca/intermediate/private/sean.conner.key.pem"

Again, load the required modules, and configure the program. Much like the server, having a configuration file for this is way overkill, thus the above variables.

signal.catch('int')
signal.catch('term')

local addr = net.address(SYSLOG,'udp',514)
local sock = net.socket(addr.family,'udp')

connect(sock,addr)

The code sets up some signal handlers, creates a socket to send the data to syslog and calls the main function.

local function connect(sock,addr)
  local ios,err = tls.connect(HOST,514,function(conf)
  return conf:keypair_file(CERT,KEY)
     and conf:protocols("tlsv1.3")
  end)
  
  if not ios then
    io.stderr:write("Failure: ",err," retrying in a bit ...\n")
    clock.sleep(1)
  else
    io.stderr:write("\n\n\nConnected\n")
    main(ios,sock,addr)
  end
  
  if not signal.caught() then
    return connect(sock,addr)
  end
end

The connect() function tries to connect to the server with the given certificates. If it fails (and I expect this to happen when I get reassigned an IP address) it waits for a bit and retries again. If the connection succeeds though:

local function main(ios,sock,addr)
  for data in ios:lines() do
    if signal.caught() then
      ios:close()
      os.exit(0,true)
    end
    sock:send(addr,data)
  end  
  ios:close()
end

The code just loops, reading lines from the server and then sending them directly to the syslog daemon. Any errors (like the IP address got reassigned so the connection drops) the loop ends, we close the connection and return, falling into the retry loop in the connect() function.

In case anyone is interested, here's the source code for the server and the client.

And now some metacommentary on the artisanal code I just wrote

When I wrote the two programs to retrieve output from syslog from my public server, the thing I did not do use was any AI program (aka Cat) to help with the design nor the code. It was a simple problem with a straightforward solution and it's sad to think that more and more programmers are reaching for Cat for even simple programs.

I wonder though—is the popularity of Cat because of business demands that incentivize quick hacks to get new features and/or bug fixes and deincentivize deep knowledge or methodical implementations? Because of the constant churn of frameworks du jour and languages with constantly changing implementations? Because of sprawling code bases that not a single person can understand as a whole? Because businesses want to remove expensive programmers who might say “no”?

Anyway …

I don't expect the code I wrote to be of use for anyone else. The issue I'm solving is probably unique to me (and to the death of the true peer-to-peer Internet but I digress). But I also feel that such simple programs, ones that can be thought of as “disposable” almost, are not popular these days.

Although I'll admit that could be just a bias I'm forming from some forums I hang out on. These programs are too simple, there's no need for Docker (which is what? A tar file with some required files for use with a custom program to get around the fact that shared libraries are a mess?) or Kubernetes (which is what? A Google project Google doesn't even use but convinced enough people it's required to run at Google Scale?). Yeah, there are a few dependencies but not the hundreds you would get from using something like NodeJS.

I don't know … I'm just … in a mood.

Tuesday, February 04, 2025

Concurrency is tricky

As I was writing the previous entry I got the nagging feeling that something wasn't quite right with the code. I got distracted yesterday helping a friend bounce programming issues off me, but after that, I was able to take a good look at the code and figured out what I did wrong.

Well, not “wrong” per se, the code as it worked—it's just that it could fail catastrophically in the right conditions (or maybe wrong conditions, depending upon your view).

But first, a bit about how my network sever framework works. The core bit of code is this:

local function eventloop(done_f)
  if done_f() then return end

  -- calculate and handle timeouts
  -- each coroutine that timed out is
  -- scheduled to run on the RUNQUEUE,
  -- with nil and ETIMEDOUT.

  SOCKETS:wait(timeout)
  for event in SOCKETS:events() do
    event.obj(event)
  end

  while #RUNQUEUE > 0 do
    -- run each coroutine in the run queue until
    -- it eithers yields or returns (meaning
    -- it's finished running).
  end

  return eventloop(done_f)
end

Details are emitted full gory details here but in general, the event loop calls a passed in function to check if we need to shut down, then calculates a timeout value while checking for coroutines that registered a timeout. If any did, we add the coroutine to a run queue with nil and ETIMEDOUT to inform the resuming coroutine that it timed out. Then we scan a set of network sockets for activity with SOCKETS:wait() (on Linux, this ends up calling epoll_wait(); BSDs with kqueue() and most other Unix systems with poll()). We then call the handling function for each event. These can end up creating new coroutines and scheduling coroutines to run (these will be added to the run queue). And then for each coroutine in the run queue, we run it. Lather, rinse, repeat. Simple enough.

Now, on to the code I presented. This code registers a function to run when the given UDP socket recieves a packet of data, and schedules a number of coroutines waiting for data to run. This happens in the eventloop() function.

nfl.SOCKETS:insert(lsock,'r',function()
  local _,data,err = lsock:recv()
  if data then
    for co in pairs(clients) do
      nfl.schedule(co,data) -- problem
    end
  else
    syslog('error',"recv()=%s",errno[err])
  end
end)

I've noted a problematic line of code here.

And now the core of the routine to handle a TLS connection. This code yields to receive data, then writes the data to the TLS socket.

  while true do
    local data = coroutine.yield()
    if not data then break end
    local okay,errmsg = ios:write(data,'\n') -- <<< HERE
    if not okay then
      syslog('error',"tls:read() = %s",errmsg)
      break
    end
  end

I've marked where the root cause lies, and it's pretty subtle I think. The core issue is that ios:write() here could block, because the kernel output buffer is full and we need to wait for the kernel to send it. But the code that handles the UDP socket just assumes that the TLS coroutine is ready for more data. If ios:write() blocks and more UDP data comes on, the coroutine is prematurely resumed with the data, but that's just taken by the TLS thread as the write being successful, then yielding and then things get … weird, as the UDP side and the TLS side are now out of sync with each other. This, fortunately, hasn't trigger on me.

Yet.

It could, if too much was being logged to syslog. I wrote the following code to test it out:

#include <syslog.h>

#define MSG " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXUZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"

int main(void)
{
  int i;
  for (i = 0 ; i < 500 ; i++)
    syslog(LOG_DEBUG,"%3d " MSG MSG MSG,i);
  return 0;
}

And sure enough, the ~~spice~~ data stopped flowing.

What I needed to do was queue up the log messages to a given client, and only schedule it to run when it's waiting for more data. A few failed attempts followed—they were all based on scheduling the TLS thread when X number of messages were queued up (I tried one, then zero; neither worked). It worked much better by using a flag to indicate when the TLS coroutine wanted to be scheduled or not.

The UDP socket code is now:

nfl.SOCKETS:insert(lsock,'r',function()
  local _,data,err = lsock:recv()
  if data then
    for co,queue in pairs(clients) do
      table.insert(queue,data)
      if queue.ready then
        nfl.schedule(co,true)
      end
    end
  else
    syslog('error',"recv()=%s",errno[err])
  end
end)

The client list now contains a list of logs to send, along with a flag that the TLS coroutine sets indicating if it needs running or not. This takes advantage of Lua's tables which can have a hash part (named indices) and an array part, so we can include a flag in the queue.

And now the updated TLS coroutine:

local function client_main(ios)
  local function main()
    while #clients[ios.__co] > 0 do
      local data     = table.remove(clients[ios.__co],1)
      local okay,err = ios:write(data,'\n')
      if not okay then
        syslog('error',"tls:write()=%s",err)
        return
      end
    end
    
    clients[ios.__co].ready = true
    if not coroutine.yield() then
      return
    end
    clients[ios.__co].ready = false
    return main()
  end
  
  ios:_handshake()
  
  if ios.__ctx:peer_cert_issuer() ~= ISSUER then
    ios:close()
    return
  end
  
  syslog('info',"remote=%s",ios.__remote.addr)
  clients[ios.__co] = { ready = false }
  main()
  clients[ios.__co] = nil
  syslog('info',"remote=%s disconnecting",ios.__remote.addr)
  ios:close()
end

The core of the routine, the nested function main() does the real work here. When main() starts, the flag for queue readiness is false. It then runs through its input queue sending data to the client. Once that is done, it sets the queue readiness flag to true and then yields. Once it resumes, it sets the queue readiness flag to 'false' and (through a tail call) starts over again.

This ensures that logs are queued properly for delivery, and running the C test program again showed it works.

Tuesday, February 11, 2025

Two videos on how we figured out our solar system just based on obversations alone, long before we left the surly bonds of Earth

The video “Terence Tao on how we measure the cosmos” was very interesting to watch, as Terence goes into depth on how people in the past, and by past, I mean the distant past, figured out the earth was a sphere, how big that sphere was, and even reasoned that the earth went around the sun, long before the Christian Church even existed! And the method that Kepler used to figure out the orbits of Earth and the planets, when at the time we didn't quite know the distance to them, and all we had were positions in the sky to go by.

Incredible.

Also, a second video on how the moons of Jupiter (yes, it's not at all about Pluto despite the title) revealed much about how our solar system works. It even revealed that light had a finite speed.

I think if these methods were more widely known, how we figured out the shape of the Earth, the size of the moon and sun, and how orbits worked, then people wouldn't have the mistaken belief of a flat earth holding up the firmaments.

Update on Tuesday, March 18^th, 2025

Part Two of “Terence Tao on how we measure the cosmos” has been released.

I never got the memo on “copyover servers”

There’s only so much you can do with builder rights on someone else’s MUD. To really change the game, you needed to be able to code, and most MUDs were written “real languages” like C. We’d managed to get a copy of Visual C++ 6 and the CircleMUD source code, and started messing about. But the development cycle was pretty frustrating — for every change, you had to recompile the server, shut it down (dropping everyone’s connections), bring it back up, and wait for everyone to log back in.

Some MUDs used a very cool trick to avoid this, called “copyover” or “hotboot”. It’s an idiom that lets a stateful server replace itself while retaining its PID and open connections. It seemed like magic back then: you recompiled the server, sent the right command, everything froze for a few seconds, and (if you were lucky) it came back to life running the latest code. The trick is simple but I can’t find a detailed write-up, so I wanted to write it out while I thought of it.

Via Lobsters, How Copyover MUD Servers Worked | Blog | jackkelly.name

Somehow, in all my years of programming (and the few years I was looking into the source code of various MUDs back in the early 90s) I never came across this method of starting an updated version of a server without losing any network connections. In hindsite, it's an obvious solution—it just never occured to me to do this.

Discussions about this entry

Lazy Reading for 2025/02/23 – DragonFly BSD Digest

Saturday, March 01, 2025

Fixing a 27 year old bug that only now just got triggered

I will, from time to time, look at various logs for errors. And when I looked at the error log for my web server, intermixed with errors I have no control over like this:

[Tue Feb 25 10:41:19.504140 2025] [ssl:error] [pid 16571:tid 3833293744] [client 206.168.34.92:47678] AH02032: Hostname literature.conman.org provided via SNI and hostname 71.19.142.20 provided via HTTP have no compatible SSL setup
[Tue Feb 25 12:39:33.768053 2025] [ssl:error] [pid 16408:tid 3892042672] [client 167.94.146.59:50798] AH02032: Hostname hhgproject.org provided via SNI and hostname 71.19.142.20 provided via HTTP have no compatible SSL setup
[Sat Mar 01 05:34:44.029898 2025] [core:error] [pid 21954:tid 3841686448] [client 121.36.96.194:53710] AH10244: invalid URI path (/cgi-bin/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/bin/sh)
[Sat Mar 01 05:34:45.077056 2025] [core:error] [pid 23369:tid 3875257264] [client 121.36.96.194:53722] AH10244: invalid URI path (/cgi-bin/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/bin/sh)

I found a bunch of errors that I found concerning:

[Sun Feb 23 10:14:54.644036 2025] [cgid:error] [pid 16408:tid 3715795888] [client 185.42.12.144:51022] End of script output before headers: contact.cgi, referer: https://www.hhgproject.org/contact.cgi
contact.cgi: src/Cgi/UrlDecodeChar.c:41: UrlDecodeChar: Assertion `((*__ctype_b_loc ())[(int) ((*src))] & (unsigned short int) _ISxdigit)' failed.

It's obvious that a call to assert() failed in the function UrlDecodeChar() due to some robot failing to encode a web request properly. Let's see what the code is actually doing:

char UrlDecodeChar(char **psrc)
{
  char *src;
  char  c;
  
  assert(psrc  != NULL);
  assert(*psrc != NULL);
  
  src = *psrc;
  c   = *src++;
  if (c == '+')
    c = ' ';
  else if (c == '%')
  {
    assert(isxdigit(*src));
    assert(isxdigit(*(src+1)));
    c    = ctohex(*src) * 16 + ctohex(*(src+1));
    src += 2;
  }
  *psrc = src;
  return(c);
}

The problem was using assert() to check the results of some I/O—that's not what assert() is for. I think I was being lazy when I used those assertions and didn't bother with the proper coding practice of returning an error. Curious as to when I added this code, I checked the history and from December 3^rd, 2004:

char UrlDecodeChar(char **psrc)
{
  char *src;
  int	c;

  ddt(psrc  != NULL);
  ddt(*psrc != NULL);

  src = *psrc;
  c   = *src++;
  if (c == '+')
    c = ' ';
  else if (c == '%')
  {
    ddt(isxdigit(*src));
    ddt(isxdigit(*(src+1)));
    c	 = ctohex(*src) * 16 + ctohex(*(src+1));
    src += 2;
  }
  *psrc = src;
  return(c);
}

The history in the current repository goes no further back due to losing my CVS repositories and it's interesting to see that this function is the same as it was back then (with the difference of using my own version of assert() called ddt() back in the day). Some further sluthing convinced me that I wrote this code back in 1997. This function is old enough to not only vote, be drafted, get drunk, and sign contracts, but be removed from its parents health insurance!

Good lord!

It's not how I would write that function today.

It's even more remarkable that I haven't seen this assert() trigger in all those years.

The fix was easy:

char UrlDecodeChar(char **psrc)
{
  char *src;
  char  c;
  
  assert(psrc  != NULL);
  assert(*psrc != NULL);
  
  src = *psrc;
  c   = *src++;
  if (c == '+')
    c = ' ';
  else if (c == '%')
  {
    if (!isxdigit(*src))   return '\0';
    if (!isxdigit(*src+1)) return '\0';
    c    = ctohex(*src) * 16 + ctohex(*(src+1));
    src += 2;
  }
  *psrc = src;
  return(c);
}

And propagating the error back up the call chain. This does result in a new major version for CGILib since I do follow semantic versioning since this is, technically speaking, a change in the public API even though this is less than 10 lines of code (out of 8,000+).

Update on Monday, October 20^th, 2025

It wasn't quite that easy …

Monday, March 03, 2025

Yelling at clouds

I will admit—these are kneejerk reactions, but they're honestly my reactions to reading the following statements. I know, I know, hanging onions off our belt is long out of style.

And get off my lawn!

Anyway … statment the first:

Think jq, but without having to ask an LLM to write the query for you.

Via Lobsters, A float walks into a gradual type system

So … using jq is so hard you need to use a tool that will confabulate ¼ of the time in order to construct a simple query? Is that what you are saying? That you can't be bothered to use your brain? Just accept the garbage spewed forth by a probabilistic text slinger?

Really?

And did you use an LLM to help write the code? If not, why not?

Sigh.

And statement the second:

… and most importantly, coding can be social and fun again.

Via Lobsters, introducing tangled

If I had known that programming would become a team sport, I, an introvert, would have choosen a different career. Does XXXXXXX everything have to be social? Why can't it just be fun? I need to be micromanaged as well?

A quirk of the Motorola 6809 assemblers

I just learned an interesting bit of trivia about 6809 assembly language on a Discord server today. When Motorola designed the 6809 assembler, they made a distinction between the use of n,PC and n,PCR in the indexing mode. Both of those make a reference based off the PC register, but in assembly language they defined, using n,PC means use the literal value of n as the distance, whereas n,PCR means generate the distance between n and the current value of the PC register.

I never knew that.

I just looked and all the materials I had on the 6809 use the n,PCR method everywhere, yet when I wrote my assembler, I only support n,PC and it always calculates the distance. I think I forgot that it should have been n,PCR because on the 68000 (which I also programmed, and was also made by Motorola) it always used n,PC.

And I don't think I'll change my assembler as there does exist a method to use an arbitrary value of n as a distance: LDA (*+3)+n,PC. The asterisk evaluates to the address of the current instruction, and by adding 3 you get the address of the next instruction, which in the PC-relative addressing mode, is a distance of 0. Then n will be the actual offset used in the instruction. Yes, it's a bit convoluted, but it's a way to get how Motorola originally defined n,PC.

And apparently, Motorola defined it that way to make up for less intelligent assemblers back in the day due to memory constraints. We are long past those days.

Tuesday, March 18, 2025

Measuring the cosmos, part II

Last month, I mentioned part one of how we measured the night sky, and now, part two of “Terence Tao on how we measure the cosmos”.

A network of bloggers, a reel of YouTubers and other collective nouns

While I just made up the “network of bloggers” and “reel of YouTubers,” other collective nouns for groups, like a gaggle of geese, a murder of crows, or a pod of whales, are not quite as old as they may seem, and were largely made up just a few hundred years ago, and there were a lot more than we use today, according to this video. Neat.

Who serves whom?

The narrative around these bots is that [AIs] are there to help humans. In this story, the hospital buys a radiology bot that offers a second opinion to the human radiologist. If they disagree, the human radiologist takes another look. In this tale, AI is a way for hospitals to make fewer mistakes by spending more money. An AI assisted radiologist is less productive (because they re-run some x-rays to resolve disagreements with the bot) but more accurate.

In automation theory jargon, this radiologist is a "centaur" – a human head grafted onto the tireless, ever-vigilant body of a robot

Of course, no one who invests in an AI company expects this to happen. Instead, they want reverse-centaurs: a human who acts as an assistant to a robot. The real pitch to hospital is, "Fire all but one of your radiologists and then put that poor bastard to work reviewing the judgments our robot makes at machine scale."

Pluralistic: AI can't do your job (18 Mar 2025) – Pluralistic: Daily links from Cory Doctorow

This has always been my fear of the recent push of LLM backed AI—not that they would help me do my job better, but that I existed to help it do its job better (if I'm even there).

Wednesday, March 19, 2025

How I vibe code

There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

Via Flutterby, Andrej Karpathy on X

Good Lord! If you thought software today was bloated and slow, this sounds like it would produce software that is gigantically glacial in comparison (and by “embrace exponentials” I think he means “accept code with O(n²), O(2ⁿ) or even O(n!) behavior”).

That's not how I would “vibe code.” No, to me, “vibe coding” is:

Don't necessarily worry about the behavior of the code—make it work but at least try to avoid O(2ⁿ) or worse algorithms, then make it right, then fast.
Don't use version control! If you make a mistake and need to revert, revert by hand, or carry on through the bad code. And avoid using directores like “src.1/”, “src.2/“ or “src-no-really-this-works/”—that's still a form of version control (albeit a poor man's version control). Power through your mistakes.
Don't bother with “unit tests,” “integration tests,” TDD or even BDD. I'm not saying don't test, just don't write tests. Want to refactor? Go ahead—bull through the changes, or don't. It's your code. Yes, this does mean mostly manual testing, and having a file of test data is fine—just don't write test code.
Format the code however you want! Form your own opinions on formatting. Have some soul in your code for once.
This isn't a team sport, so no pair programming! This is vibe coding, not vibe partying.
Remember the words of Bob Ross: “we don't make mistakes, just happy little accidents.”
Go with the flow. Just Do It™!

Now that I think about it, this is pretty much how programmers wrote code on home computers in the late 70s/early 80s. Funny that. But just blindly accepting LLM-written code? Good luck in getting anything to run correctly.

Sheesh.

Friday, March 21, 2025

A different approach to blocking bad webbots by IP address

Web crawlers for LLM-based companies, as well as some specific solutions to blocking them, have been making the rounds in the past few days. I was curious to see just how many were hitting my web site, so I ran a few queries over the log files. To ensure consistent results, I decided to query the log file for last month:

Quick summary of results for February 2025
total requests	468439
unique IPs	24654

Top 10 requests per IP
IP	Requests
4.231.104.62	43242
198.100.155.33	26650
66.55.200.246	9057
74.80.208.170	8631
74.80.208.59	8407
216.244.66.239	5998
4.227.36.126	5832
20.171.207.130	5817
8.29.198.26	4946
8.29.198.25	4807

(Note: I'm not concerned about protecting any privacy here—given the number of results, there is no way these are any individual. These are all companies hitting my site, and if companies are mining their data for my information, I'm going to do the same to them. So there.)

But it became apparent that it's hard to determine which requests are coming from a single entity—it's clear that a company can employ a large pool of IP addresses to crawl the web, and it's hard to figure out what IPs are under control of which company.

Or is it?

An idea suddenly hit me—a stray thought from the days when I was wearing a network admin hat I recalled that BGP routing basically knows the network boundaries for networks as it's based on policy routing via ASNs. I wonder if I could map IP addresses to ASNs? A quick search and I found my answer—yes! Within a few minutes, I had converted a list of 24,654 unique IP addresses to 1,490 unique networks, I was then able to rework my initial query to include the ASN (or rather, the human readable version instead of just the number):

Requests per IP/ASN
IP	Requests	AS
4.231.104.62	43242	MICROSOFT-CORP-MSN-AS-BLOCK, US
198.100.155.33	26650	OVH, FR
66.55.200.246	9057	BIDDEFORD1, US
74.80.208.170	8631	CSTL, US
74.80.208.59	8407	CSTL, US
216.244.66.239	5998	WOW, US
4.227.36.126	5832	MICROSOFT-CORP-MSN-AS-BLOCK, US
20.171.207.130	5817	MICROSOFT-CORP-MSN-AS-BLOCK, US
8.29.198.26	4946	FEEDLY-DEVHD, US
8.29.198.25	4807	FEEDLY-DEVHD, US

Now, I was curious as to how they identified themselves, so I reran the query to include the user agent string. The top eight identified themselves consistently:

Requests per Agent
Agent	Requests
Go-http-client/2.0	43236
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/132.0.0.0 Safari/537.36	26650
WF search/Nutch-1.12	9057
Mozilla/5.0 (compatible; ImagesiftBot; +imagesift.com)	8631
Mozilla/5.0 (compatible; ImagesiftBot; +imagesift.com)	8407
Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help@moz.com)	5998
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)	5832
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)	5817

The last two, however had a changing user agent string:

Identifiers for 8.29.198.26
Agent	Requests
Feedly/1.0 (+https://feedly.com/poller.html; 16 subscribers; )	1667
Feedly/1.0 (+https://feedly.com/poller.html; 6 subscribers; )	1419
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 6 subscribers; )	938
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 16 subscribers; )	811
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 8 subscribers; )	94
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 37 subscribers; )	17

Identifiers for 8.29.198.25
Agent	Requests
Feedly/1.0 (+https://feedly.com/poller.html; 16 subscribers; )	1579
Feedly/1.0 (+https://feedly.com/poller.html; 6 subscribers; )	1481
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 6 subscribers; )	905
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 16 subscribers; )	741
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 8 subscribers; )	90
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 37 subscribers; )	11

I'm not sure what the difference is between polling and fetching (checking the URLs shows two identical pages, only differing in “Poller” and “Fetcher.” But looking deeper into that is for another post.

The next request I did was to see how many IPs (that hit my site in February) map to a particular ASN, and the top 10 are:

IPs per AS
AS	Count
ALIBABA-CN-NET Alibaba US Technology Co., Ltd., CN	4034
AMAZON-02, US	1733
HWCLOUDS-AS-AP HUAWEI CLOUDS, HK	1527
GOOGLE-CLOUD-PLATFORM, US	996
COMCAST-7922, US	895
AMAZON-AES, US	719
TENCENT-NET-AP-CN Tencent Building, Kejizhongyi Avenue, CN	635
MICROSOFT-CORP-MSN-AS-BLOCK, US	615
AS-VULTR, US	599
ATT-INTERNET4, US	472

So Alibaba US crawled my site from 4,034 different IP addresses—I haven't done the query to figure out how many requests each ASN did, but it should be a straightforward thing to just replace IP address with the ASN to get a better count of which company is crawling my site the hardest.

And now I'm thinking, I wonder if instead of a form of ad-hoc banning of single IP addresses, or blocking huge swaths of IP addresses (like 47.0.0.0/8, it might not be better to block per ASN? The IP to ASN mapping service I found makes it quite easy to get the ASN of an IP address (and to map the ASN to an human-readable name), Instead of, for example, blocking 101.32.0.0/16, 119.28.0.0/16, 43.128.0.0/14, 43.153.0.0/16 and 49.51.0.0/16 (which isn't an exaustive list by any means) just block IPs belonging to ASN 132203, otherwise known as “TENCENT-NET-AP-CN Tencent Building, Kejizhongyi Avenue, CN.”

I don't know how effective that idea is, but the IP-to-ASN site I found does offer the information via DNS, so it shouldn't be that hard to do.

Discussions about this entry

Lazy Reading for 2025/04/13 – DragonFly BSD Digest

A deeper dive into mapping web requests via ASN, not by IP address

I went ahead and replaced IP addresses with ASNs in the log file to find the network that sent the most requests to my blog for the month of February.

Top 10 networks requesting a page from blog
MICROSOFT-CORP-MSN-AS-BLOCK, US	78889
OVH, FR	31837
ALIBABA-CN-NET Alibaba US Technology Co., Ltd., CN	25019
HETZNER-AS, DE	23840
GOOGLE-CLOUD-PLATFORM, US	21431
CSTL, US	17225
HURRICANE, US	15495
AMAZON-AES, US	14430
FACEBOOK, US	13736
AKAMAI-LINODE-AP Akamai Connected Cloud, SG	12673

Even though Alibaba US has the most unique IPs hitting my blog, Microsoft is still the network making the most requests. So let's see how Microsoft presents itself to my web server. Here are the user agents it sends:

Web agents from the Microsoft Network
agent	requests
Go-http-client/2.0	43236
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)	23978
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36	7953
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0	2955
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot	210
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot	161
DuckDuckBot/1.1; (+http://duckduckgo.com/duckduckbot.html)	123
'DuckDuckBot-Https/1.1; (+https://duckduckgo.com/duckduckbot)'	122
Python/3.9 aiohttp/3.10.6	28
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.6478.36 Safari/537.36	14
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.114 Safari/537.36	14
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.68	10
DuckAssistBot/1.2; (+http://duckduckgo.com/duckassistbot.html)	10
DuckAssistBot/1.1; (+http://duckduckgo.com/duckassistbot.html)	10
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36	6
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.143 Safari/537.36	6
python-requests/2.32.3	5
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.142 Safari/537.36	5
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36	4
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:77.0) Gecko/20100101 Firefox/77.0	4
DuckDuckBot-Https/1.1; (+https://duckduckgo.com/duckduckbot)	4
Twingly Recon	3
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)	3
Mozilla/5.0 (compatible; Twingly Recon; twingly.com)	3
python-requests/2.28.2	2
newspaper/0.9.1	2
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36	2
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b	2
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36	2
http.rb/5.1.1 (Mastodon/4.2.10; +https://trystero.social/) Bot	1
http.rb/5.1.1 (Mastodon/4.2.10; +https://trystero.social/)	1
Mozilla/5.0 (Windows NT 6.1; WOW64) SkypeUriPreview Preview/0.5 skype-url-preview@microsoft.com	1
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36	1
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36	1
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.48	1
Mastodon/4.4.0-alpha.2 (http.rb/5.2.0; +https://sns.mszpro.com/) Bot	1
Mastodon/4.4.0-alpha.2 (http.rb/5.2.0; +https://sns.mszpro.com/)	1
Mastodon/4.3.3 (http.rb/5.2.0; +https://the.voiceover.bar/) Bot	1
Mastodon/4.3.3 (http.rb/5.2.0; +https://the.voiceover.bar/)	1
Mastodon/4.3.3 (http.rb/5.2.0; +https://discuss.systems/) Bot	1
Mastodon/4.3.3 (http.rb/5.2.0; +https://discuss.systems/)	1

The top result comes from a single IP address and probably requires a separate post about it, since it's weird and annoying. But the rest—you got Bing, you got OpenAI, you got several Mastodon instances—it seems like most of these are from Microsoft's cloud offering. A mixture of things.

What about Facebook?

Web agents from Facebook
agent	requests
meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)	13497
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)	207
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36	12
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36	4
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36	4
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36	4
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:58.0) Gecko/20100101 Firefox/59.0	4
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36 Edg/132.0.0.0	2
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36	2

Hmm … looks like I have a few readers at Facebook, but other than that, nothing terribly interesting.

Alibaba, on the other hand, is frightening. Out of 25,019 requests, it presented 581 different user agents. From looking at what was requested, I don't think it's 500 Chinese people reading my blog—it's defintely bots crawling my site (and amusingly, there are requests to /robots.txt file, but without a proper user agent to go by, it's hard to block it via that file).

I can think of one conclusion here—if you do filter by ASN, it can help tremendously, but it also comes with possibly blocking legitimate traffic.

Discussions about this entry

Lazy Reading for 2025/04/13 – DragonFly BSD Digest

Still no information on who “The Knowledge AI” is or was

Back in July 2019 I was investigating some bad bots on my website when I came across the bot that identified itself simply as “The Knowledge AI” that was the number one robot hitting my site. Most bots that identify themselves will give a URL to a page that describes their usage like Barkrowler (to pick one that recently crawled my site). But not so “The Knowledge AI”. That was all it said, “The Knowledge AI”. It was very hard to Google, but I wouldn’t be surprised if it was OpenAI.

The earliest I can find “The Knowledge AI” crawling my site was April of 2018, and despite starting on April 16th, it was the second most active robot that month. In May it was the number one bot, and it stayed there through October of 2022, after which it pretty much dropped—from 32,000+ in October of 2022 to 85 in November of 2022 (about 4½ years). It was sporadic, showing up in single digit hits until January of 2024. It may be still crawling my site, but if it is, it is no longer identifying itself.

I don’t know if “The Knowledge AI” was an LLM company crawling, but if it was, not giving a link to explain the bot is suspicious. It’s the rare crawler that doesn’t identify itself with at least a URL to describe it. The fact that it took the number one crawling spot on my site for 4 ½ years is suspicious. As robots go, it didn’t affect the web server all that much (I’ve come across worse ones), and well over 90% of its requests were valid (unlike MJ12, which had a 75% failure rate). And my /robots.txt file doesn’t exclude any robot from scanning, so I can’t really complain about it.

My comment on “Mitigating SourceHut's partial outage caused by aggressive crawlers | Lobsters”

Even though the log data is a few years old, I don't think that IPs change from ASN to ASN all that much (but I could be wrong on that). I checked the IPs used by “The Knowledge AI” in May 2018, and in October 2022, and they didn't change that much. They were still the same /24 networks across that time.

Looking up the information today is very disappointing—Hurricane Electric LLC., a backbone provider.

So no real information about who “The Knowledge AI” might have been.

Sigh.

Now a bit about feed readers

There are a few bots acting less than optimally that aren't some LLM-based company scraping my site. I think. Anyway, the first one I mentioned:

Identifiers for 8.29.198.26
Agent	Requests
Feedly/1.0 (+https://feedly.com/poller.html; 16 subscribers; )	1667
Feedly/1.0 (+https://feedly.com/poller.html; 6 subscribers; )	1419
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 6 subscribers; )	938
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 16 subscribers; )	811
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 8 subscribers; )	94
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 37 subscribers; )	17

Identifiers for 8.29.198.25
Agent	Requests
Feedly/1.0 (+https://feedly.com/poller.html; 16 subscribers; )	1579
Feedly/1.0 (+https://feedly.com/poller.html; 6 subscribers; )	1481
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 6 subscribers; )	905
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 16 subscribers; )	741
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 8 subscribers; )	90
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 37 subscribers; )	11

This is feedly, a company that offers a news reader (and I'd like to thank the 67 subscribers I have—thank you). The first issue I have about this client is the apparent redundant requests from six different clients. An issue because I only have three different feeds, the Atom feed, the RSS feed and the the JSON feed. The poller seems to be acting correctly—16 subscribers to my Atom feed and 6 to the RSS feed. The other four? The fetchers? I'm not sure what's going on there. There's one for the RSS feed, and three for the Atom feed. And one of them is a typo—it's requesting “//index.atom” instead of the proper “/index.atom” (but apparently Apache allows it). How do I have 16 subscribers to “/index.atom” and another 37 for “/index.atom”? What exactly, is the difference between the two? And can't you fix the “//index.atom” reference? To me, that's an obvious typo, one that could be verified by retreiving both “/index.atom” and “//index.atom” and seeing they're the same.

Anyway, the second issue I have with feedly is their apparent lack of caching on their end. They do not do a conditional request and while they aren't exactly slamming my server, they are making multiple requests per hour, and for a resource that doesn't change all that often (excluding today that is).

Then there's the bot at IP address 4.231.104.62. It made 43,236 requests to get “/index.atom”, 5 invalid requests in the form of “/gopher://gopher.conman.org/0Phlog:2025/02/…” and one other valid request for this page. It's not the 5 invalid requests or the 1 valid request that has me weirded out—it's the 43,236 to my Atom feed. That's one request every 55 seconds! And even worse—it's not a conditional request! Of all the bots, this is the one I feel most like blocking at the firewall level—just have it drop the packets entirely.

At least it supports compressed results.

Sheesh.

As for the rest—of the 109 bots that fetched the Atom feed at least once per day (I put the cut off at 28 requests or more durring February), only 31 did so conditionally. That's a horrible rate. And of the 31 that did so conditionally, most don't support compression. So on the one hand, the majority of bots that fetch the Atom feed do so compressed. On the other hand, it appears that the bots that do fetch conditionally most don't support compression.

Sigh.

Update on Tuesday, August 5^th, 2025

I have an update on the Feedly bot.

Wednesday, March 26, 2025

Notes on blocking spam by filtering on ASN

So now that I can classify IP addresses by ASN, I thought I might see how it could help with spam email. I'm already using an ansi-spam agent to cut down on spam, so maybe filtering by ASN could cut down even more. The last time I looked into additional means of spam avoidance, the use of SPF wasn't worth the effort.

And I'm afraid the effort of blocking via ASN won't be worth the effort either. Looking over email attempts over the past month, the top 10 networks who sent email to my server, from 5,181 individual emails:

Top 10 emailers to my server
AS	Count
IOMART-AS,	375
IDNIC-IDCLOUDHOST-AS-ID	369
PAIR-NETWORKS,	263
MICROSOFT-CORP-MSN-AS-BLOCK,	246
AS-COLOCROSSING,	152
EMERALD-ONION,	124
GOOGLE,	122
SPARKPOST,	120
TZULO,	112
AMAZON-02,	106

Unlike the web (or even Gemini or gopher) there isn't one dominant network here—it's all spread out. I don't think it's really worth the effort to block via ASN for spam. At least for my email server.

Discussions about this entry

Lazy Reading for 2025/04/13 – DragonFly BSD Digest

Thursday, April 03, 2025

God, I feel like I'm an old man wearing a tin foil hat yelling at the world

A few months ago our ice maker broke and the upshot—we got charged for a replacement unit that was never installed and we're using ice trays.

Last week, our dryer stopped working. It turned on, you can set the controls, but it fails to spin up and do any actual drying. We thought about maybe getting it repaired, but in all likelyhood, by the time we pay for the repair man and the replacement widget that isn't working, we would probably be looking at the price of a new dryer anyway. And if it can't be fixed, then we're definitely out the prirce of a new dryer and the repair man.

So early this week we went out to one of the two remaining home improvement stores (and it wouldn't surprise me to find out they're both owned by the same shadow company—can you tell I'm gettin cynical here?) and bought a replacement. Of course, we couldn't just buy a new dryer—no. We had to buy a new electrical cord (what? It doesn't come as part of the dryer?) and a new dryer hose (what? What's wrong with the one we already have? This stuff is pretty much standardized by now). Then there was the delivery fee, the installation fee, and the removal-of-the-old-unit fee. All that tacked on an additional 25% on the shelf price, not including taxes.

Maybe it would have been cheaper to get it repaired.

Anyway, today we received the unit. Two men unloaded the new dryer from the truck, brought it inside the garage, took the old dryer off the “pedestal” (which we purchased several years ago when we got a new washer and dryer—each are on a pedestal that acts as storage space), and then said they would NOT install the new dryer on the old pedestal. In fact, they insisted they COULD NOT install on the older pedestal, and then they left.

Well, thank you very much.

We called the home improvement store and they said it was company policy not to install a dryer on a pre-existing pedetal. They didn't say which company mandated this policy—the manufacturer of the dryer (which, at this point, I wouldn't be surprised if there was only one shadow company that owned all the appliance manufacturers) or the home improvement store. And of course, the removal of the pedestal would require an additional removal-of-the-old-unit fee, because it was considered a separate unit. It didn't matter that we had a perfectly good pedestal whose size matches the new dryer. And it didn't matter that the new dryer hose matched the size of the existing hose. Oh, and if we installed the dryer ourselves, we would immediately void the one-year warrantee on the dryer.

It seems like our only real choices were to spend even more money on a new pededstal, more money for delivery, more money to remove the old pedestal, more money to install the new pedestal. and probably an additional fee to install the new dryer (which we had already paid for installation) to install the dryer on said pedestal, or we could elect to void the warrantee and install the dryer on our own.

Well, XXXX me!

Bunny thinks I'm too cynical, but at this point, how do you not be cynical?

We ended up calling a friend over to help install the dryer on the old pedestal. It fit fine. So did the old hose. I'm not sure if it's even worth attempting to get the installation fee back—they might just void our warrantee on the spot (I can hear Bunny calling out “Stop being so cynical!”). I'm glad we didn't spring for the three or five year warrantees—the unit will probably last that long (and no longer, because then the same shadow company that owns all the appliance manufacturers gets another sale, and thus the line goes up, because XXXX you, think of the poor billionaires).

I hope it lasts longer. I don't want to be this cynical.

Wednesday, April 16, 2025

Extreme automobiles, Ft. Lauderdale edition

Bunny and I stopped by Josef & Joseph to have them fix Bunny's watch (it turned out to just need a new battery). Inside is a large collection of clocks, watches and jewelry. Outside, however, was this!

[A silver car, longer than a limousine with an assortment of ornamants adorning the outside] “That's not a limousine, this is a limousine!”

[The frontend of a silver car with nine, count them, nine, headlights spanning the front grill] I don't think I'd want this barrelling towards me from behind at night!

[The other side of a three-times longer than normal silver car with three baloon-like things leaned up against it] This is so unusual of a car that three aliens have come down to examine it closely.

[Closeup detail of the car, sporting emblems from VW, BMW, and Cadilac, with a model air plane strapped to the top, all silver] I don't think those wings are capable of supporting flight. Probably for the best.

When I asked about the car inside, I was informed that indeed, it could drive, so it's a working car, made of a VW, BMW and Cadilac welded together and painted silver. And it was delivered as part of an Elvis Presley exhibit at Josef & Joseph. What Elvis has to do with a Mad-Max inspired limousine is beyond me. But it's there, and it's awesome.

It's also been there so long you can see it on Google's street view!

Monday, June 02, 2025

I've just implemented a Forth system

It started out innocently enough—I just wanted to know how to implement the Forth word DOES>. I ended up implementing ANS Forth for the 6809, as you do.

I've had a fascination with Forth since college. I have a copy of both the first edition and the second edition of Starting Forth. I have a copy of Thinking Forth I also have a copy of Threaded Interpretive Languages with the Robert Tinney cover art. I even wrote my own Forth-like langauge in college, which I used for a class project and a few work-relelated programs.

But the one concept I couldn't figure out how to implement was the Forth word DOES>. As a user of Forth, using DOES> is pretty easy and one of those things never thought about, much like closures to a JavaScript programmer. But implementing it? That was a problem.

So in April I set out to do that, using the 6809 (because why not?) with the intent of figuring it out. I made a stab with it a few years ago, writing just enough of a Forth implementation in C and got it working. Barely. I wanted to do better.

And boy, did I.

I didn't intend on implementing ANS Forth, but once I “got” DOES> done (and I know the grammar on that doesn't quite work, but that's Forth for you) I had enough of a system up that sure—why not finish it?

It's a classical indirect thread coded Forth interpreter. They tend to be the easiest to write and the most compact. They aren't exactly the fastest though. And the 6809 is unique among the 8-bit CPUs in that Forth is a very good match for it. Forth requires two stacks, the 6809 has two stack pointers. There are two index registers, so one can be used as the Forth instruction pointer, and the other one for use. It even has some limited 16-bit arithmatic operations. So it's a good match for Forth.

It just took a bit longer than expected. I ended up implementing 254 Forth words (in Forth, a “word“ is like a function) out of the possible 435—I wanted a Forth independent of any existing operating system, so I skipped implementing a few wordsets (Forth jargon for “module” or “library”). The only routines that need to be provided are a character input routine, a character output routine, and a routine to return back to the operating system. I also didn't implement floating point, as that would take up a considerable amount of space (the IEEE-754 routines for the 6809 Motorola placed into the public domain clock in at 8K and all that gives you is addition, subtraction, multiplcation, division and square roots—and the ANS Forth standard for floating point requires a lot more).

I also tried writing as much of it using ANS Forth standard words as possible, only it turned out to be less than I thought using the restrictions I placed upon myself (mainly—avoid using non-standard Forth words but more on that in a later post).

Then it took a while to get it to pass the test suite—the semantics of some Forth words are a bit tricker than I expected, and some words worked completely differently than how I expected them to work, but again, I'll be going into detail about that later. I also had to debug the test suite as it made some unwarrented assumptions about the Forth environment, mainly case-insensitivity (mine is case-sensitive, which is allowed by the Forth standard) and line length (mine is limited to 80 characters, again, the minimum allowed by the Forth standard). And the occastional outright bug (mostly typos).

Fun times.

Anyway, expect a flurry of posts about implemeting an ANS Forth system.

Discussions about this entry

Lazy Reading for 2025/06/08 – DragonFly BSD Digest

Wednesday, June 04, 2025

The basics of an indirect threaded code ANS Forth implementation

I need to get into the habit of writing prose more often.

Before I go into the implementation of my ANS Forth system, I need to define somethings. First, a bit about Forth terminology—a Forth “word” can be thought of as a function or subroutine of other languages, but it's a bit more than that—it can also refer to a variable, a constant, the equivalent of a key word. It's a fluid concept, but thinking of a “word” as a function won't be too far off. Also, a collection of Forth words is collected into a wordset (or if you are reading older documentation about Forth, a “dictionary”). You can think of a “dictionary” as a library, or maybe a module, of code.

Second, Forth is a stack-based language. There are rarely any explicit parameters, just data placed on a stack, known as the “data stack.” Because of this, expressions are written in Reverse Polish Notation (RPN)—data is specified first, then the operator.

Third, there's a second stack, the “return stack” that is pretty much what it sounds like—it records the return address when calling into a Forth word.

Forth, the langauge is ridiculously easy to parse—a token is just a collection of characters separated by space. So not only is my_variable_name a valid name for a Forth word, so is my-variable-name, #array, 23-skidoo and even 3.14 valid names for Forth words. Yes, that means it's easy to do stupid things like define the token “1” to be 2 (you know, for when 1 equals two for large values of 1). But this also makes Forth trivial to parse. In fact, a Forth interpreter is nothing more than:

begin
	name = parse_token()
	word = lookup(name)
	if (word)
		execute(word)
	else
	{
		if (valid_number(name,current_base))
			push_number(convert(name,current_base))
		else
			error()
	}
repeat

That's it. That's a Forth interpreter more or less. Compiling Forth isn't much harder:

begin
	name = parse_token()
	word = lookup(name)
	if (word)
	{
		if (immediate(word))
			execute(word)
		else
			compile_into_definition(word)
	}
	else
	{
		if (valid_number(name,current_base))
			add_code_to_push(convert(name,current_base))
		else
			error()
	}
repeat

A Forth word can be marked as “immediate,” which just means the word is executed during compilation mode rather than being compiled, and this is how the Forth compiler can be extended. A Forth word like IF or BEGIN is just another Forth word, albeit marked as “immediate” so it can do its job when compiling.

The details about switching between interpreting and compiling will be covered in a later post, but it's not a difficult as it may seem.

And one bit about ANS Forth in particular—the standard defines a collection of “wordsets,” most of which are optional in an implementation. The minimum required is the Core Word set. The rest, including the Core Extension words, are optional. I did not implement all the wordsets, but again, more on that later.

Anyway, my ANS Forth system is a classic indirect threaded code (ITC) implementation. As I mentioned, it's easy to implement but perhaps not the fastest of implementation styles. As this is on the MC6809, I used a typical-for-the-6809 register use for my Forth system:

Register	Type	Forth usage
`D`	16-bit accumulator	Free for use (top of stack is kept in memory)
`X`	16-bit index register	execution token of word being run, free for use
`Y`	16-bit index register	Forth IP register
`U`	16-bit index register/user stack	data stack pointer
`S`	16-bit index register/system stack	return stack pointer

I elected to keep the top of stack in memory and not in the D register. I'm not sure if this effects the speed any, but it was easier, implementation wise, to keep the top of stack always in memory.

The Forth words in my implementation are either primitives, that is, written in assembly language, or secondaries, that is, written in Forth. Here's the format of a primitive word, +:

forth_core_plus                 ;  n1|u2 n2|u2 -- n3|u3 )
                fdb     forth_core_star_slash_mod	; previous word in dictionary
                fdb     .xt - .name			; length of name
.name           fcc     "+"				; name
.xt             fdb     .body				; execution token of word
.body           ldd     ,u++				; body of word
                addd    ,u
                std     ,u
                ldx     ,y++
                jmp     [,x]

The first thing to notice is the label. Given how difficult naming is in Computer Science, I decided to use the canonical name of each Forth Word as defined by the standard. All the labels start with “forth_,” and are followed by the wordset (in this case, “core_”) followed by the actual Forth word. Yes, this makes for some long labels, but I don't have to think about naming, which is nice. The .xt label (which is a “local label”—this defines the full label of forth_core_plus.xt) defines the “execution token” (xt) which is defined as “a value that identifies the execution semantics of a definition.” Here, this is a pointer to a function that provides the execution semantics of the word and in this case, just points to the “body” of the Forth word, which adds the top two elements of the data stack.

Of particular note are the last two instructions:

	ldx	,y++
	jmp	[,x]

Get used to seeing this fragment, it's used a lot—this is used to execute the next word in the program. As stated above, the Y register is the Forth IP, and this loads the X register with the xt of the next word, then jumps to the code that handles this word. The [,X] bit informs the CPU that we are jumping through a function pointer and not directly to the code (if we did that, JMP ,X, that would turn this from “indirect threaded code” to “direct threaded code”—and yes, that is the only difference between an ITC and DTC implementation).

The overall format of a word is a link to the previous word in the dictionary (here to */MOD), followed by a 16-bit length, the text that makes up the word, followed by the xt and then the body of the definition. You might be wondering why I would use a full 16-bits for the length on an 8-bit CPU—wouldn't that waste space? Yes, it would, especially given that in this implementation, a Forth word is restricted to just 31 characters, but I need a way to mark some information about each word, like if it's an immediate word or not. And while an 8-bit length where the largest value would be 31 giving me three bits for flag values, I ended up needed a few more than just three. So 16 bits for the length gives me a potential of 11 bits to use as flags.

A word written in Forth will have a different xt and body, for example, the very next word in the dictionary:

forth_core_plus_store           ; ( n|u a-addr -- )
                fdb     forth_core_plus
                fdb     .xt - .name
.name           fcc     "+!"
.xt             fdb     forth_core_colon.runtime
        ;===============================
        ; : +!  DUP @ ROT + SWAP ! ;
        ;===============================
                fdb     forth_core_dupe.xt
                fdb     forth_core_fetch.xt
                fdb     forth_core_rote.xt
                fdb     forth_core_plus.xt
                fdb     forth_core_swap.xt
                fdb     forth_core_store.xt
                fdb     forth_core_exit.xt

The xt here points to the label forth_core_colon.runtime. This is usually called DOCOLON but I'm being explicit here—Forth words defined in Forth are created by the Forth word :, and this label, forth_core_colon.runtime, implements the runtime portion of said word. The body of this word is then an array of execution tokens of various Forth words comprising the definition. The last word of all Forth words defined this way end with a call to corth_core_exit.xt.

The : runtime function looks like:

forth_core_colon		; ( C: "<spaces>name" -- colon-sys ) E ( i*x -- j*x )
		fdb	forth_core_two_swap
		fdb	.xt - .name
.name		fcc	":"
.xt		fdb	.body
.body		...		; I'll get to this bit of the code in a later post

.runtime	pshs	y
		leay	2,x
		ldx	,y++
		jmp	[,x]

Here, the runtime will push the Forth IP (which is the Y register) onto the return stack, set the Y register to the body of the word being executed and that two instruction sequence that goes to the next word to execute.

And the function EXIT looks like:

forth_core_exit			; E ( -- ) ( R: nest-sys -- )
		fdb	forth_core_execute
		fdb	_NOINTERP :: .xt - .name
.name		fcc	"EXIT"
.xt		fdb	.body
.body		puls	y	; restore the Forth IP
		ldx	,y++	; and execute next word
		jmp	[,x]

The first thing to notice here is the _NOINTERP flag. EXIT is defined as having no interpretation semantics, so typing EXIT outside of a word being defined is meaningless. Yes, this flag is used, and it does generate an error, but that again, is a later post. I should also mention that the :: here is a special operator in my assembler. The left hand side defines the most significant byte of a 16-bit quantity, and the right hand side defines the least significant byte. It's short hand for _NOINTERP * 256 + (.xt - .name).

The second thing to notice is that the Forth IP (again, the Y register) is pulled from the return stack, and we yet again, run that two instruction sequence to run the next word.

And this is pretty much the entire execution engine of Forth.

In fact, I wrote this bit first, even before writing code to compile Forth (and yes, I hand-compiled all Forth code, so I had something to compare against when I eventually got to compiling).

So on the one hand, Forth is an easy language to implement and can be quite small. On the other hand, ANS Forth has some subtle semantics that make for some … interesting implementation details and isn't that easy to implement, as we shall see over the coming posts.

Discussions about this entry

Thursday, June 05, 2025

Avoiding Roko's basilisk, part II

The other day I came across this comment on Lobsters:

On a personal level I have helped various people get value out of AI tools where they initially did not understand how to use it properly. But that setting is more of a 1:1 for a specific situation. For generic how to use agentic tools, there are so many articles already. Peter Steinberger has a multi hour talk online of him using an army of agents to write on his project.

If someone has a specific situation where they failed using an agent, ideally with some open source code, I would be happy to have a look at it. It’s just hard to engage on abstract “does not work for me” posts.

Comment on “AI Changes Everything”

I failed using an agent a few months ago. It was on an open source project of mine. Perhaps mitsuhiko would be happy to have a look at it. So I replied.

And mitsuhiko was happy to look at it.

Or rather, spend a few minutes telling his “coding agent” to look at the code and let it do its thing. So I took a look.

Development was done on a Mac, which doesn't have the vm86() system call, so his agent, “Claude,” started writing an 8086 emulator. Or I should say, an 80386 emulator since that's the most common architecture these days. It also came up with a few tests and once it those tests were working, it stopped.

When I tried the code, attempting to run RACTER.EXE, it just sat there, turning my computer into a space heater. Looking a bit further, I saw there was an option for debug output (but the option appears at the end of the command line, not after the command itself, like every other command on Unix). Then I saw line after line of

...
Execute: 2010:0020: 8B
Unhandled opcode at 2010:0020: 8B
Execute: 2010:0021: EC
Unhandled opcode at 2010:0021: EC
Execute: 2010:0022: 81
Unhandled opcode at 2010:0022: 81
Execute: 2010:0023: EC
Unhandled opcode at 2010:0023: EC
Execute: 2010:0024: 02
Unhandled opcode at 2010:0024: 02
Execute: 2010:0025: 00
Unhandled opcode at 2010:0025: 00
Execute: 2010:0026: 9A
Unhandled opcode at 2010:0026: 9A
Execute: 2010:0027: C2
Unhandled opcode at 2010:0027: C2
Execute: 2010:0028: 10
Unhandled opcode at 2010:0028: 10
Execute: 2010:0029: 52
Unhandled opcode at 2010:0029: 52
Execute: 2010:002A: 24
Unhandled opcode at 2010:002A: 24
Execute: 2010:002B: 9A
Unhandled opcode at 2010:002B: 9A
Execute: 2010:002C: A2
Unhandled opcode at 2010:002C: A2
Execute: 2010:002D: 19
Unhandled opcode at 2010:002D: 19
Execute: 2010:002E: 52
Unhandled opcode at 2010:002E: 52
...

To say I was underwhelmed is an understatement.

The thread somewhat petered out.

I noticed today that mitsuhiko gave it another attempt. He put the whole thing into Docker so he could run under a Linux VM, and the code now could run enough of RACTER.EXE to display the banner:

[spc]lucy:/tmp/racter>/tmp/NaNoGenMo-2015/C/msdos RACTER.EXE



          .-----------------------------------------------------,
          |                                                     |
          |            A CONVERSATION WITH RACTER               |
          |                                                     |
          |       COPYRIGHTED BY INRAC CORPORATION, 1984        |
          | PORTIONS COPYRIGHTED BY MICROSOFT CORPORATION, 1982 |
          |                   ...........                       |
          `-----------------------------------------------------'




Hello, I'm Racter.  You are?  
>Sean
Sean

But that's it. It's still chugging along, turning my computer into a space heater. I'm still unimpressed.

This isn't to fault mitsuhiko. I'm sure he finds value in AI agents coding for him, but I think this was way out of his bailiwick, which is why he didn't bother to understand what I was trying to attempt. “Claude” got to the point of printing the banner from RACTER.EXE and stopped, because I think that's all it was instructed to do, besides attempting to buffer the input.

I'll close this out with the last few comments in the thread:

Sean: What type of programming do you do? Or rather, what type of programming do you have Claude do for you? Because I am still unconvinced it will be any benefit to the programming I do.
mitsuhiko: Right now I’m building a backend for a prototype of the next project I’m working on. That is a rather complex web application using both Python and Rust. Over the last year or so I used it quite a bit to extend minijinja (but that wasn’t agentic yet).
Sean: Ah, stuff that is definitely over-represented in the training sets. Gotcha.
mitsuhiko: Considering that I’m doing a very fringe thing I’m not so sure that this is a very accurate assessment :)
Sean: Python, Rust and web applications are over-represented in the training sets. The 6809, RACTER.EXE and ANS Forth aren’t. What you are doing might be novel, but the tech being used isn’t. The stuff I described isn’t novel (well, maybe having RACTER and Eliza chat, but I was riffing on an article written in the 80s about doing that) but using tech that (in my opinion) is novel (that is, not mainstream). There’s a difference.

I do appreciate the attempt though.

Update on Friday, June 6^th, 2025 at 3:06 AM

One last comment from mitsuhiko in the thread: “I had excellent results with completely niche technology too. For as long as you have a way for the machine to validate it’s [sic] outputs it can even program in languages that you just invented.”

I think I'll have to keep this in mind for next time.

Monday, June 09, 2025

Implementing DOES> in Forth, the entire reason I started this mess

The issue I had with DOES> isn't that it's hard to use—it's just that I had no idea how one would go about implementing it, much like Javascript programmers use closures without having to think about how they're implemented (even if they're aware of closures in the first place). So, before going into how it works, a sample from Starting Forth is in order.

: STAR 42 EMIT ;

: .ROW CR 8 0 DO
    DUP 128 AND IF STAR ELSE SPACE THEN 2*
  LOOP DROP ;

: SHAPE CREATE 8 0 DO C, LOOP
  DOES> DUP 7 + DO I C@ .ROW -1 +LOOP CR ;

HEX 18 18 3C 5A 99 24 24 24 SHAPE MAN

These two words support the example. The first word, STAR just prints a asterisk (42 is the ASCII code for the word). The second word, .ROW, takes an 8-bit value and for each bit, if it's a 1, prints an asterisk, otherwise, it prints a space. DO LOOP is Forth's for loop by the way. The next word, SHAPE is the interesting one. But first, we need to discuss CREATE.

This word creates a new entry in the Forth dictionary by reading the next word (defined as a collection of non-space letters) in the input as the name. It then gives the newly created word a default action of pushing the address of the body of the word into the stack. Going ahead a bit, the word MAN just after CREATE is run will look like this (in assembly):

man		fdb	shape	; link to next word
		fdb	.xt - .name
.name		fcc	'man'
.xt		fdb	forth_core_create.runtime
.body

When MAN is run, the address of .body will be pushed onto the stack. CREATE is typically used to create “smart data structures”—data structures that know how to do some action.

Now, getting back to the example, when SHAPE is run, the first thing it does is call CREATE to create a new word, then it compiles 8 values off the top of the stack into the body of the newly created word. Just prior to DOES>, MAN will look like:

man		fdb	shape	; link to next word
		fdb	.xt - .name
.name		fcc	'man'
.xt		fdb	forth_core_create.runtime
.body		fcb	$24
		fcb	$24
		fcb	$24
		fcb	$99
		fcb	$5A
		fcb	$3C
		fcb	$18
		fcb	$18

Now we get to DOES>. Due to the nature of what it does, DOES> is an immediate word—that is, its executing during compilation to do the voodoo that it do. Um, does. Somehow, it needs to modify the newly created word to not only push the address of its body onto the stack, but execute the code that appears after itself. So the code to be executed needs to be compiled and stored somewhere, and somehow MAN (in this example) needs to run this code.

And this was the problem I had with the word—how does this all work? Even the well known JonesForth, implemented as an ITC, didn't bother with implementing DOES> (and now that I have implemented DOES>, I suspect I know why JonesForth didn't implement it).

The runtime portion of CREATE just pushes the address of the body of the word into the stack. The data bytes following the xt have no meaning in and of themselves (even as code it's nonsensical). I did a search and found only one page that describes how to implement DOES>, but:

it was part three of a series of articles describing how Forth's are implemented;
using terminology no longer used by the ANS Forth standard;
attempting to describe how to implement Forth on several different CPU architectures;
using a few different methods (like ITC, DTC and STC);
and on this page, a wierd side trip through another Forth word ;CODE.

It wasn't an exactly easy source to read, but between part three and part one, I was able to puzzle it out (and it makes much more sense now that I've done it). Now I can discribe the result using a single architecture (6809) and a single implementation (ITC). The trick here is to realize that DOES> has a temporal aspect unlike any other Forth word.

Most immediate words in Forth have two temporal aspects—at the time of compilation, and later at runtime. For instance, IF's compile time aspect is to compile a conditional jump into the word, and the runtime aspect is to do said conditional jump (at least, it does so in my implementation). But DOES> has three temporal aspects:

: SHAPE CREATE ...a DOES> ( time 1 ) ...b ;
...c SHAPE MAN (time 2 )
MAN (time 3 )

At time 1, we are compiling a word that creates other words (so at this point, CREATE is compiled, not run). The compiler looks up DOES>, notices that it's an immediate word and executes it. DOES> at this point needs to include code to cause SHAPE to stop executing, then somehow leave … something … behind for time 2, and somehow compile the rest of the code ...b for later execution.

At time 2, we're defining a new word. CREATE has been called and the initialization code for this new word …a has been executed. At this point, DOES> needs to modify the new word … somehow … to execute the code that followed it at time 1.

And at time 3, the word created is run and somehow, it needs to know where the code to run is located. But going back to what CREATE and the inialization code left us:

man		fdb	shape	; link to next word
		fdb	.xt - .name
.name		fcc	'man'
.xt		fdb	forth_core_create.runtime
.body		fcb	$24
		fcb	$24
		fcb	$24
		fcb	$99
		fcb	$5A
		fcb	$3C
		fcb	$18
		fcb	$18

What can be done?

The easy answer—DOES> updates the xt of the newly created word at time 2. Where is this xt created? At time 1. And when is it uses? At time 3.

Here's what happens.

DOES> is an immediate word. When it runs at time 1, it compiles into the current word (in this example, SHAPE) the xt of its runtime. So SHAPE will look like this:

shape		fdb	dot_row	; link to next word
		fdb	.xt - .name
.name		fcc	'shape'
.xt		fdb	forth_core_colon.runtime
		fdb	forth_core_create.xt
		fdb	forth_core_literal.runtime_xt
		fdb	8
		fdb	forth_core_literal.runtime_xt
		fdb	0
		fdb	forth_core_do.runtime_xt
.L1		fdb	forth_core_literal.runtime_xt
		fdb	128
		fdb	forth_core_and.xt
		fdb	forth_core_if.runtime_xt
		fdb	.L2
		fdb	dot_row.xt
		fdb	forth_core_ext_again.runtime_xt
		fdb	.L3
.L2		fdb	forth_core_space.xt
.L3		fdb	forth_core_two_star.xt
		fdb	forth_core_loop.runtime_xt
		fdb	.L1
		fdb	forth_core_drop.xt
		fdb	forth_core_does.runtime_xt

(Note: here you can see that literal numbers have the LITERAL runtime action, that IF compiles to its runtime action. There are two Forth words that pretty much do the same thing—AHEAD does an unconditional branch forward, and AGAIN does an unconditional branch backwards; they basically both do an unconditional branch, so I picked one to handle both internally and I picked AGAIN for this. More on this in a later post.)

To create the new xt that words created by SHAPE will use (or any word that includes DOES>) it then lays out a single instruction, JSR forth_core_create.does_hook (more on this in a bit). It then exits, keeping the compiler “on” so the rest of the code that follows DOES> gets compiled into the word (SHAPE in this case). This is all DOES> does (man, that sounds weird) at time 1. At the end, SHAPE looks like:

shape		fdb	dot_row	; link to next word
		fdb	.xt - .name
.name		fcc	'shape'
.xt		fdb	forth_core_colon.runtime
		fdb	forth_core_create.xt
		fdb	forth_core_literal.runtime_xt
		fdb	8
		fdb	forth_core_literal.runtime_xt
		fdb	0
		fdb	forth_core_do.runtime_xt
.L1		fdb	forth_core_literal.runtime_xt
		fdb	128
		fdb	forth_core_and.xt
		fdb	forth_core_if.runtime_xt
		fdb	.L2
		fdb	dot_row.xt
		fdb	forth_core_ext_again.runtime_xt
		fdb	.L3
.L2		fdb	forth_core_space.xt
.L3		fdb	forth_core_two_star.xt
		fdb	forth_core_loop.runtime_xt
		fdb	.L1
		fdb	forth_core_drop.xt
		fdb	forth_core_does.runtime_xt

.does		jsr	forth_core_create.does_hook	; !!!

		fdb	forth_core_dupe.xt
		fdb	forth_core_literal.runtime_xt
		fdb	7
		fdb	forth_core_plus.xt
		fdb	forth_core_do.runtime_xt
.L4		fdb	forth_core_i.xt
		fdb	forth_core_c_fetch.xt
		fdb	dot_row.xt
		fdb	forth_core_literal.runtime_xt
		fdb	-1
		fdb	forth_core_ext_plus_loop.runtime_xt
		fdb	.L4
		fdb	forth_core_c_r.xt
		fdb	forth_core_exit.xt

Now we execute SHAPE. Things go along until we get to forth_core_does.runtime_xt. At this point, the Y register is pointing to the JSR forth_core_create.does_hook (see the previous installment for why this is—but to recap: the Y register is the Forth IP). We get the xt of the newly created word (and yes, I had to modify CREATE to stash this for later use) to replace the default xt. At this point, MAN now looks like:

man		fdb	shape	; link to next word
		fdb	.xt - .name
.name		fcc	'man'
.xt		fdb	shape.does
.body		fcb	$24
		fcb	$24
		fcb	$24
		fcb	$99
		fcb	$5A
		fcb	$3C
		fcb	$18
		fcb	$18

Then the DOES> runtime basically does a Forth return, ending the execution of SHAPE. Thus ends the steps that happen at time 2.

When MAN executes, it executes JSR forth_core_create.does_hook. This is a small extension to forth_core_create that does the double duty of pushing the address of the body onto the stack, and setting things up to run the Forth code compiled just after that instruction:

forth_core_create
		fdb	forth_core_c_r
		fdb	.xt - .name
.name		fcc	"CREATE"
.xt		fdb	.body
.body		...		; not important right now

.does_hook	puls	d	; pull return address of the stack
		pshs	y	; push Forth IP onto return stack
		tfr	d,y	; point to DOES> code
.runtime	leax	2,x	; get body from xt
		pshu	x	; push into the stack
		ldx	,y++	; NEXT
		jmp	[,x]

The forth_core_create.does_hook pulls the return address (from the JSR instruction) from the stack—this contains the Forth code after DOES> that needs to run. We then push the existing Y register onto the stack, then set Y to the Forth code to execute. This leads right into forth_core_create.runtime, which pushes the body of the word (in this case, MAN) onto the stack, and then jumps into the code following the DOES>.

And at the end of all this, you get:

MAN
   **   
   **   
  ****  
 * ** * 
*  **  *
  *  *  
  *  *  
  *  *
 OK

I suspect the reason why JonesForth didn't implement DOES> has to do with the direct subroutine call in the middle of a Forth word. This only works if memory is both writable and exectuable, and modern systems tend to disallow that. There might be a way around this, but I haven't yet bothered to figure it out. I'm just happy to have figured it out as it is.

Discussions about this entry

DOES> RECURSE doesn't DOES> RECURSE does't DOES> RECURSE …

Recursion in Forth isn't as straitforward as you would think. The obvious:

: FOO	... FOO .. ;

doesn't work. It will either error out as FOO isn't found, or it will call the previous definition of FOO if it exists. This is a quirk of Forth, and it one reason why globals aren't as much of an issue as they are in other languages—if you define the word REPEAT it won't break existing code that called REPEAT, they will just keep using the old version of REPEAT while new words will use the new version. In fact, the ANS Forth standard says as much: “The current definition shall not be findable in the dictionary until [colon] is ended.” Thus the reason for the word RECURSE, an immedate word (which is run durring compilation, not compiled) to exist in Forth—to do recursion.

This was an easy word to implement:

forth_core_recurse		; ( -- )
		fdb	forth_core_r_fetch
		fdb	_IMMED | _NOINTERP :: .xt - .name
.name		fcc	"RECURSE"
.xt		fdb	.body
.body		ldx	forth__here		; get current comp location
		ldd	forth__create_xt	; get xt of current word
		std	,x++			; recurse
		stx	forth__here
		ldx	,y++			; NEXT
		jmp	[,x]

So the above would be written as:

: FOO ... RECURSE ... ;

And the resulting code would look like:

foo		fdb	...
		fdb	.xt - .name
.name		fcc	"FOO"
.xt		fdb	forth_core_colon.runtime
.body		fdb	dot_dot_dot.xt
		fdb	foo.xt		; FOO's xt
		fdb	dot_dot_dot.xt
		fdb	forth_core_exit.xt

The only reason I'm mentioning this word is because of this bit from the Standard: “An ambiguous condition exists if RECURSE appears in a definition after DOES>.” There's a reason for that—depending upon the implementation, it may be impossible to do recursion after DOES>. Why?

In my Forth implementation, the code following DOES> doesn't have an xt to reference. The xt of any word is the address of the .xt field. So using the example from my explaination of DOES>, the xt of MAN would be of its .xt field:

man		fdb	shape	; link to next word
		fdb	.xt - .name
.name		fcc	'man'
.xt		fdb	shape.does ; the XT of this word is this address
.body		fcb	$24
		fcb	$24
		fcb	$24
		fcb	$99
		fcb	$5A
		fcb	$3C
		fcb	$18
		fcb	$18

But the problem is—that address doesn't exist until the word is defined! If, for example, the definition of SHAPE used RECURSE:

: SHAPE CREATE 8 0 DO C, LOOP
  DOES> ... RECURSE ... ;

when RECURSE is executed, there is no xt for it to use. We can't use the xt for SHAPE—that's not the word we want to recurse on. And we can't use the address of shape.does because that's not an actual xt. And the code following DOES> can be shared by multiple words:

... SHAPE MAN
... SHAPE FACE-HUGGER
... SHAPE ALIEN
... SHAPE FLAME-THROWER

so there's no single xt that RECURSE could use when compiling the code after DOES> (never mind the fact that that happens before the words that use the code are created).

So, in my Forth implementation, no RECURSE after DOES>. Which is fine, because it's an ambiguous condition.

Could I make it work? Maybe. But it would be a lot of work for a feature that Forth programmers can't rely upon anyway.

Wednesday, June 11, 2025

More or fewer, many or less

At Chez Boca, Bunny is the prescriptivist in the household, and I the descriptivist. So while “Grammar rules you can stop sticking to” meshed with my biases, Bunny remained unconvinced with a small exception towards not ending a sentence with a preposition.

But the majority of our discussion centered around the use of “fewer” and “less.” The rule Bunny was taught was to use “fewer” for a countable number of items and “less” for uncountable or fungible items. For example, we have fewer cookies around here because we had less flour to make them (I originally ended this sentence with “to work with” but I wanted to avoid ending with a preposition). I always say “less” but I suspect this has less to do with my descriptivism and more to do with programming, where x < 3 is translated to “x is less than three.” It just seems weird to say “x is fewer than three,” despite most numbers on a computer system being countable, if potentially large (the only exceptions would be ±inifinty and NaN).

I also wondered about the opposites of “fewer” and “less.” When I asked Bunny for the opposite for “fewer” she said “more,” and when asked for the opposite of “less” she also said “more.” To her, the word “more” could be applied to either countable items, like “I need more cookies,” and for fungible items, like “I need more flour.” But that struck me as odd—why separate words for “a smaller number or amount of” and not for “a greater number or amount of?” Why does “more” get a pass for both concepts, and not something like “many” for countable items, and “more” for fungible items? Why the rule for “less” and “fewer?” I need many cookies, and I need more flour to make them.

After our discussion, I thought about this for a bit. While Robert Baker made this distinction in 1770 (per the video), I have to wonder why he felt the distinction needed to be made, applying “fewer” to numbers rather than “less.”

At first, I thought it may have something to do with the Norman conquest of England. As my 1924 copy of Roget's Treasury Of Words says: “[i]t is interesting to note that the French names for different kinds of food became restricted to the cooked meats; while the English names were reserved for the living animals.” It also noted the act of word doubling—using both the Norman-French and Saxxon terms, such as humble and lowly, poor and needy, act and deed, aid and abet, use and wont, will and testament, and assault and battery.

Could this be a reason for the distinction between “fewer” and “less?”

It's not due to the Norman invasion that's for sure.

While looking through my copy of the Oxford English Dictionary, I found the word “less” is an Old English word from Northumbria, having been a word in both Old Frisian and Old Teutonic. The usage meaning “smaller quantity” didn't first appear until 1314. And as Oxford states, the opposite is “more.”

The word “few” is also an Old English word, also in Old Frisian and Old Teutonic but importantly, not from Northumbria! It's meaning of “smaller quantity” or “a small number” is documented from around 900, and it's “antithesis” (as Oxford calls it) is “many!”

How about that?

But I'm now of the opinion that Robert Baker wanted to signal he wasn't part of the hoi polloi and came up with a pointless distinction. Bunny remains unconvinced of my theory.

Thursday, July 31, 2025

Surrealism is where you find it

For reasons, we had a package that needed to be sent via UPS. All I had to do was drop it off at the local UPS store around the corner from Chez Boca.

I arrive. Adorning the windows of the establishment in the strip mall were Federal Express banners; nothing at all about the store said “UPS” to me. I double checked Google Maps and it said this was a UPS location. I entered the store.

It was small. The counter was cluttered with tons of paper, an obvious scale and a computer terminal. Buried in the clutter was a desk bell. I tapped the top, it rang out with its distinctive bell sound.

No one answered.

Off to my left was an open doorway, through which I could see a fancy high-backed chair, the type of which one might find at Versailles. I walked through the doorway to find a section of the store under heavy construction. I have no idea why a fancy French-style chair was in the room though. Did the foreman of the construction crew use it to sit there and bark orders when they were there?

I walked back to the cluttered desk, rang the desk bell a few times and called out “Hello?”

Still no answer.

Puzzled, I maybe thought the employees were all in the back of the store and couldn't hear me or the bell. There was a door behind the desk that was partially open, so I walked over to it and took a peek through the door way, still calling out “Hello?”

Through the partially open door I saw a large stuffed chair which for some reason registered as a dentists' chair even though it wasn't one, was just sitting there in the middle of a hallway, facing the the wall. Beyond it was a T-intersection with another hallway. I wonder what's with the chairs? I asked myself.

Still no answer, either for the rhetorical question about the chair situation, nor from any employee at the supposed UPS store.

I called out “Hello?” yet again.

“Hello!” said a voice.

It took me a few seconds to realize the voice was coming from behind me, and when I tured, I saw an employee walking in through the front door. “I'm sorry, I was out checking in a U-Haul truck,” she said. We passed each other, she walking behind the cluttered desk, and me waling from behind the cluttered desk.

“Do you handle UPS here?” I asked.

“Yes,” she said.

“Here you go,” I said, handing her the package.

She took it, slapped it on the scale and scanned the packing label on it. “That's funny,” she said, “it's saying it's already been dropped off.”

“That's a good trick,” I said. “It was originally dropped off at our house, and now we need to send it back.”

She kept working at the computer terminal, puzzled at first, then a look of clarity crossed her face. “Ah, normally this would be picked up by a UPS driver. It's all good now.” And with that, she handed me a receipt.

In all my excitement over handing off the package, I forgot to ask about the chairs. Darn it all.

Friday, August 01, 2025

Keep calm and carry on

I haven't been writing nearly as much as I would like due to my own, well, not exactly fear, I'm not sure what the best word for this is, because of LLMs being shoved down the collective throats of everybody. It's as if the rest of the world has decided to double down on crazy pills and if you aren't also taking crazy pills, you yourself are crazy and should be shunned from society.

Or something to that effect.

It just seemed so pointless to keep writing about my ANS Forth implementation when Forth itself is a rather niche language that has less “training material” than Python, Go or Rust, running for a criminally underrated 8-bit CPU that probably has less “training data” than Forth, for what? A world that has decided that expertice is an outdated concept that should be handed off to a glorified random number generator? That “time to market” has to be minimized to such a degree that programmers should use every short cut they can, which includes LLMs, is now The Right Thing™ to do? [Never mind that being “first to market” hasn't stopped companies like Microsoft, which to my recollection, has never been first to market with anything other than a commercial version of BASIC back in 1975, or Apple, which wasn't the first to market a home computer (with or without a GUI), or Google, which wasn't even in the first two dozen of web search engines, or ~~Facebook~~ Meta, which wasn't the first social website, from becoming some of the largest companies in the world. “Time to market” my XXX!]

I also find it worrying that of all the development tools created to “help” programmers with their jobs, it's LLMs that, again in my experience, has been the only one that has been mandated from the C-suite that everybody must use it! I've never found IDEs to be useful myself, yet I've never had an employer demand I use one. So why do LLMs get shoved down our throats? I just don't understand it. With IDEs, individual developers, or maybe even a team, can decide that the use of an IDE with worth the investment and I have no problem with that. And I feel the same should be for LLMs—those developers who feel an LLM is worth using should be able to use them. But being pushed by the C-suite? When the C-suite probably has no idea how programming works? It must be some herd mentality pushed by hype.

So yes, I haven't been inclined to write much about programming because who XXXXXXX cares when LLMs will do it all for us? But then I have to forcibly remind myself that the Orange Site isn't indicative of the industry as a whole and I should just keep calm and carry on. And follow through on my own advice.

Discussions about this entry

Saturday, August 02, 2025

Two more bits of surrealism while out and about in Boca Raton

I was out driving to grocery store when I saw an old man walking on the street (our neighborhood is sans sidewalks—typical for Lower Sheol), white hair, white moustache, white beard, set off by the entirely black suit he's wearing, along with a heavy black outer coat. Mind you, it's about 90°F outside (30°C for those of you who are not living in one of the three countries in the world to still use the Imperial System). That's not the oddest thing though. No, he's carrying what looks like a small wheel covered completely in fur.

It turns out it's a type of hat the man wears on certain religeous holidays. I wasn't sure what religeous holiday it was, but whatever it was, it apparently requires a hat that looks like a furry wheel while wearing all black.

On a larger road, I fall behind a man riding a motorcycle. The man in question is not wearing a shirt, which, okay, I wouldn't go shirtless on a motorcycle, but hey, it's his choice for a potentially bad case of road rash (to be fair though—he was wearing a helmet, so he wasn't completely blasé about it). Again, that wasn't the odd thing. No, the odd bit was the series of large dots covering just the right side of his back. There were a bit too uniform in shape and placement to be some form of skin condition, yet they didn't look like tatoos either. It almost looked as if he had the right side of his back cupped.

Monday, August 04, 2025

When vibe coding, isn't the source code the prompt?

I've been thinking about “vibe coding” (probably overthinking) and how that might effect development in odd ways. And by “vibe coding” I mean in its original meaning, “where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.” Since February, I've come across several projects, some commercial, that have been “vibe coded” in such a manner. And I found myself asking myself, what's the source code in this case? What file should be checked into source control? And my answer was “the prompts, of course.”

I have a project where I need to parse HTML, and I wrote some PEG code to do it. But the code I checked into source control wasn't the resulting C sludge that came out of the tool, but the PEG code itself—that is the source of … um … the source, as it were. If I want to change the parser, I don't change the C code, I change the PEG code and regenerate the C code. I don't necessarily care about the C code output, much like I no longer necessarily care about the assembly output from the C compiler. And there are many DSLs out there that “compile” into some other code like C or Rust, and in those cases, it's the file that contains the DSL that is checked into source control, not the resulting output of the tool.

So how is that any different from “vibe coding,” where “you fully give in to the vibes … and forget that the code even exists?” It's not the output that you necessarily care about, but the input. So, when “vibe coding,” the source code is the prompt or prompts. And it's that source code, the prompts, that you should therefore check into source control.

I can hear the arguments, even from the pro-AI side, that this is a silly concept to even contemplate and you should check in the resulting output into source control. But, isn't a selling point that AI will improve to the point where programming will change fundamentally? That in the future, all you have to do is prompt the computer to “write a content management system for a website where updates can be made via email” and have the AI do the work? In such a scenario, it's the prompt that matters, not the resulting Rust/Go/Javescript/Python sludge that comes out. Another arguement against this would be that it would play hell with reproducable builds, but again, that can be solved by AI, right? That's the end game for this, right? To have AI write the code for us?

Full disclaimer: I find this horrifying and hope it doesn't come to pass, but I feel this is the logical outcome from “vibe coding.”

Discussions about this entry

Tuesday, August 05, 2025

A bit of a deep dive into the Feedly bot

This Lobsters thread got me looking at Feedly again. Last month, the Feedly bot made the following requests to the Atom feed for my site:

Feedly Agents for July 2025
Resource	Response	Agent	Requests
`/index.atom`	200	Feedly/1.0	2058
`//index.atom`	200	Feedly/1.0 (poller)	86
`/index.atom`	200	Feedly/1.0 (+http://www.feedly.com/fetcher.html; 37 subscribers; )	19
`/index.atom`	200	Feedly/1.0 (+http://www.feedly.com/fetcher.html; 16 subscribers; )	19
`//index.atom`	200	Feedly/1.0 (+http://www.feedly.com/fetcher.html; 8 subscribers; )	19
`/index.atom`	200	Feedly/1.0 (poller; 37 subscribers; )	11
`/index.atom`	200	Feedly/1.0 (+https://feedly.com/poller.html; 37 subscribers; )	1
`/index.atom`	200	Feedly/1.0 (+https://feedly.com/poller.html; 16 subscribers; )	1

One suggestion for one request for /index.atom having 37 subscribers, and another for /index.atom having 16 subscribers was one was originally for http: and the other for https:. That's a decent explanation, given we have 8 subscribers for //index.atom, telling me that Feedly is treating http://boston.conman.org/index.atom, https://boston.conman.org/index.atom and https://boston.conman.org//index.atom as entirely separate feeds, even though I now redirect http: to https:. But even though I do redirect http: to https:, it's with a temporary redirect, not a permanent one (because I'm still wary about making the redirect permanent) so that one is totally on me; the //index.atom is obviously a typo so that one is totally on Feedly.

I still can't tell the difference between the fetcher and the poller. Even the pages describing the two are identical, except one says “Fetcher” and the other says “Poller.” That's just really weird. And what's with the plain “Feedly/1.0” bot?

The 200 response means that Feedly did not do a conditional fetch of the feed (Feedly can ask “Did the file change since I last requested it?” and my server can reply with either “Yes, here it is” with a 200 response, or “No, it did not” with a 304 response). I did go back before I {^2022/12/04.1 switch to https:) and there, (from November of 2022) I get a completely different Feedly:

Feedly Agents for November 2022
Resource	Respose	Agent	Requests
`/index.atom`	304	Feedly/1.0 (+http://www.feedly.com/fetcher.html; 27 subscribers; like FeedFetcher-Google)	2656
`/index.atom`	304	Feedly/1.0 (+http://www.feedly.com/fetcher.html; 28 subscribers; like FeedFetcher-Google)	477
`/index.atom`	200	Feedly/1.0 (+http://www.feedly.com/fetcher.html; 27 subscribers; like FeedFetcher-Google)	40
`/index.atom`	200	Feedly/1.0 (+http://www.feedly.com/fetcher.html; 28 subscribers; like FeedFetcher-Google)	8

Aside from picking up one more subscription from Feedly, it's what I would expect—most requests are conditional with none of that “fetcher/poller” stuff. I can explain the current lack of conditional requests on the http: to https: redirect throwing off the request code, since that seems to hold true starting with December of 2022. But in June of 2023, when I get my first https: subscriber, there is no conditional requests:

Feedly Agents for June 2023
Resource	Response	Agent	Requests
`/index.atom`	200	Feedly/1.0 (+http://www.feedly.com/fetcher.html; 32 subscribers; )	227
`/index.atom`	200	Feedly/1.0 (+http://www.feedly.com/fetcher.html; 1 subscribers; )	136
`/index.atom`	200	Feedly/1.0 (+http://www.feedly.com/fetcher.html; 33 subscribers; )	74

So maybe their https: request code fails to do conditional requests? Odd, but it does explain why there are no conditional requests. It also appears that they tuned their polling down from November of 2022 to June 2023. Seems like a few easy bugs to fix to me.

But then in September of 2023, the number of requests for the https: version shoots up over 4,000, almost 6,000 in October 2023, and in November of 2023 the “poller” first shows up, only to go away in December of 2023, only to show up again in March 2024 and stick around from there. So it's clear to me that the backend at Feedly changed, and from my point of view, not for the better.

Tuesday, August 12, 2025

I'm afraid Benjamin just won't be reclaiming his property any time soon

I think I have to resign myself to perpetual confusion over people not knowing their own email addresses (and I could have picked easily a dozen other posts about this). This case, it's not some other Sean Conner not knowing their own email address, or a person attempting to mail some other Sean Conner not knowing the proper email address, but a non-Sean Conner not knowing their own email address. This time it's a Benjamin who is attempting to reclaim some property while signing up for rent assistance who also has an Electronic Arts gaming account.

I suppose it could be some elaborate troll but that's quite a bit of work to set up. Maybe some sort of tactic to get me to click on some malware laden link? Whatever it is, it's just bizarre.

Wednesday, August 13, 2025

A first world problem that is partially my own fault, but I must also lay some blame to the Monopolistic Phone Company who took untold billions from the government for infrastructure upgrades, failed to do so while at the same time driving out other DSL providers

Nearly a year ago I received the first letter from my ISP about it ditching DSL service and replacing it with “wireless service.” To me, “wireless” is bad enough but to further state that it was easy to set up from a “phone app” was just XXXXXXX icing on the cake. What if I didn't have a “phone” to put such a “phone app” on? So I kept putting the letters into the round file.

Then a little over three months ago my router just spontaneously reset to it's factory settings and I couldn't get the DSL back up. I called the ISP and trying to get up and running again, I relented to get the “wireless service” only to realize after I hung up that I had use the wrong password. I was then back on DSL. Several days later the “wireless service device” showed up at our doorstep, but I knew that just be looking at it funny, the DSL would be immediately turned off and I would be forced to use it. So I put it into a corner of the office and pretended to forget about it.

Then in late July, Bunny received two texts from the ISP. The first one said that we best return the unit within 90 days of receiving it or face additional charges. Six hours later, the ISP sent a text saying they haven't received the unit, and we were facing additional charges.

And then today, we're offline.

The DSL has solid green lights. It's just that I can't get authenticated to get onto the Internet. I then saw the “wireless service unit” in the corner of the office and decided to bite the perverbial bullet. I installed the “phone app” on Bunny's phone (since I don't have a “phone” capable of running a “phone app”) and tried to activate the unit.

I could not get the “wireless service unit” activated.

It took about two hours on the phone, but the upshot is—my ISP, the Monopolistic Phone Company that took billions in “handouts” from the government to improve their infrastructure (you know, fiber everywhere? Except here in Boca Raton, and for large portions of the US in general) and also used their position to force third party DSL providers out of business, has finally decided to call it quits on DSL, and because I did not set up the “wireless service unit” within 90 days of receiving it, it was “cancelled.” So I have to return that unit, and get a new “wireless service unit” sent to me. Oh, and if I can't run the “phone app” I don't have to worry, all I have to do is call them up and have them configure things on my behalf.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX!

Until it arrives (anytime between Friday and Monday), we are without DSL here at Chez Boca.

I also dread installing the “wireless service unit” because it's probably been designed assuming everything I use here is wireless (it's not) and that it's going to control the routing (I already have a network set up XXXX you very much). I mean, it may be possible to keep my existing network up and running, but at this point, I'm too cynical to even keep my hopes up.

Thursday, August 14, 2025

“I've got a bad feeling about this”

While on the phone with my ISP, I was told I could just take the now cancelled “wireless service unit” to the closest UPS store, give them the account number and they would take care of returning it. No charge.

At least there's that.

So I sent to my local UPS store and unlike last time the clerk was behind the desk, browsing the Intarwebs on her smart phone. “I'm here to return the Monopolistic Phone Company ‘wireless service unit,”,” I said, dropping the box on the counter.

“I'm sorry,” the clerk said, “we don't handle these at this location. You want the UPS store across the street.”

“Excuse me?”

“Yes, you can even just make it out from here, over there,” she said as she pointed. “We get a steady stream of people in here returning them. I even returned mine.”

“You did?”

“Yup, it sucked!”

“Oh.”

“Yup,” she said, “I much prefer using my phone.”

“Oh. Thank you.” I then left the store and navigated my way across the street.

This does not bode well.

I'm at the UPS store across the street. It's larger and not as cluttered as the previous UPS store. it's even closer to Chez Boca than the previous store. How did I not know about this one? Anyway, “I'm here to return the Monopolistic Phone Company ‘wireless service unit’,” I said, dropping the box on the counter.

“Okay,” said the clerk. She picked up the box, scanned it a few times and that was that.

“Done?” I asked.

“Done.”

“Do you get may people returning these?”

“All the time,” she said.

This really doesn't bode well.

Friday, August 15, 2025

Notes on an overheard conversation while at dinner

“No, I'm the one that goes ‘woo-hoo’ when you push my stomache. Woo-hoo!”

“We have a weird and wonderful relationship.”

“I'm weird, and you're wonderful?”

“Yup.”

Tuesday, August 19, 2025

It was as bad as I feared, but not in the way I was expecting

My friend Mark wrote me, asking why I was not with Comcast. I answered it was more the devil I know than the devil I don't know. Intertia is hard to overcome at times.

The “wireless service unit” finally showed up at Chez Boca yesterday. I had the “phone app” to configure the device installed on Bunny's phone and I think that was my first major mistake. I should have ignored the marketing crap about the “easy to use phone app” and configured it manually via the web interface that most home routers come with these days.

But alas, I did not do that. I opened the box, took the unit out, and started following the steps on the “phone app.” I got as far as the sign in page.

The “wireless service unit” is right next to me. Why do I need to “sign in” to The Monopolistic Phone Company to configure a unit right next to me? I tried creating a sign in account, but didn't get very far. Since the service is in Bunny's name, I tried using her email address XXXXX@XXXXXXXX.conman.org but the Monopolistic Phone Company apparently doesn't like a third level .org domain. It accepted sean@conman.org though.

But even that wasn't enough to get signed in.

So I did the second worse mistake of the day—I called tech support. I spent the first half hour getting to the right technician only to have the phone call dropped. The second call was an hour and a half and consisted of me repeating the following over and over again:

“No, I do not read email on my phone.”

“I can't check email on my phone since I never set it up.”

“No, as I have stated, I cannot check my email on my phone.”

“No, I cannot check my email since I don't have Internet access! That's what I'm trying to set up!”

“What do you mean I have to use my email password to sign in? That's for email, not this site!”

“Yes, I know my email address is my ID, but that doesn't mean your system will accept my email password.”

“What do you mean check my email on the web? I don't use the web to check my email.”

“Okay, I've got my laptop hooked up to the ‘wireless service unit” and I still haven't received your email yet.”

“Yes, I can receive email. I've received several pieces of emails in the time you've tried sending me email.”

“No, I cannot check my email on my phone.”

“No, I don't read my email on the web.”

“Gmail? You want me to log into Gmail? Look, I know that 95% of people use Gmail, but I'm in that 5% that don't use Gmail.”

“Okay, I'll click on the link you sent to my phone for a video call. Okay, see? There's my email client—no email from you.”

“You want me to log into Gmail on the web? Did you not hear me? I do not have a Gmail account.”

“Okay, see where it says ‘Sign up for Gmail”? Oh, now you believe me?”

“Okay, can you give me the number for Comcast?”

Did I mention that was an hour and a half long conversation?

It was clear during the conversation that the “wireless service unit” was preconfigured with default settings, enought for my laptop (and Bunny's laptop) to get onto the Internet. After hanging up, I did what I should have done and went to https://192.168.1.254 (as printed on the back of the unit) and configure it via its web interface (I'm now convinced that the “phone app” is nothing more than a web browser that loads the web app from the “wireless service unit”).

It was at this point that I have to move three book shelves to get to the mess of existing wires supporting the now dead DSL link. My existing wireless router decided to have a fit and for a good half hour I thought it was dead (no, it just took an exceptionally long time to reboot—yes, I was having a grand ol' time here).

Over the course of a few hours, I was able to get the “wireless service unit” configured to get everything back online. I had to dig for the advanced Wi-Fi settings to get it to send different SSID for the 2.4GHz and 5GHz radio bands (something my old wireless router did automatically). I wish it came with more than two Ethernet ports, but I'm really only using one port anyway (it's the principle). It may be faster than the DSL, but I can't really tell since I'm not downloading huge files nor a hardcore gamer. ~~The only things I don't like is the lack of multicast support (which I did use but can't now)~~ (there is multicast support, somewhat) and I think it drops inactive NAT TCP connections a bit too quickly for my liking (still testing that).

The only positive thing is that the “wireless service unit” supports IPv6 and not only did it assign a (I hope) static IPv6 address, but it's a /64, so now my main computers all have a public IP address, something I haven't had in 25 years.

I'll see how today goes, but I'll double check the phone number for Comcast.

Wednesday, August 20, 2025

Notes on an overheard conversation while at lunch with friends

“Could I please have a menu?”

“Wait! What? A menu?”

“Yes.”

“Really?”

“Yes.”

“Oooookay … here you go.”

“Thank you.”

“Did Hell freeze over or something?”

“I just feel like having something different today.”

“…”

“I'll have the pork sandwich, with cornbread.”

“I … I … I don't know how to act. Okay, pork … cornbread … ”

“Thank you.”

“A. just said you were ordering something different? Is that true?”

“Yes, I ordered something different today.”

“Really?”

“Yes.”

“…”

“I just felt like having something different.”

“I … I don't know how to act.”

“That seems to be happening a lot here today.”

More issues with the Monopolistic Phone Company's “wireless service unit”

Day three of the “wireless service unit” and it's becoming clear why so many people are returning it. Bunny's laptop seems to drop consistently from the network, and I can't figure out why it's happening. She needs to turn off Wi-Fi, then turn it back on again, and she can use the Internet for some random period of time.

There was a spontaneous reboot of the “wireless service unit” last night. The thing takes an unusually long time to boot up. We're talking a solid five minutes or so.

The signal strength of the internet connection fluctuates wildly throughout the day.

And the only positive aspect I found, the IPv6 assignment, isn't what I thought it was. It, too, is reassigned at random like the IPv4 address. There is no need for that to happen! The IPv6 address space is large enough to almost assign every grain of sand on Earth its own IPv6 /64 network address space! There is no reason to reassign IPv6 addresses at all!

Sheesh.

Thursday, August 21, 2025

“Bro, ban me at the IP level if you don't like me!”

More and more I think I'm coming around to Alex Schroeder's Butlerian Jihad. For reasons, I'm looking into web activity and so far, the top webbot this month is one identifying itself as “Thinkbot,” which may be related to this AI company but I can't be sure. Here's how it identifies itself: “Mozilla/5.0 (compatible; Thinkbot/0.5.8; +In_the_test_phase,_if_the_Thinkbot_brings_you_trouble,_please_block_its_IP_address._Thank_you.)”.

Seriously, that's it. No URL to read up on it. It doesn't look at the robots.txt file. Just “bro, ban me at the IP level if you don't like me!”

Yeah, block its IP address. You mean the 74 unique addresses it used this month alone? Checking each IP address for the ASN it's from shows the 74 address coming from 41 (41!) network blocks!

A further check showed that all the network blocks are owned by one organization—Tencent. I'm seriously thinking that the CCP encourage this with maybe the hope of externalizing the cost of the Great Firewall to the rest of the world. If China scrapes content, that's fine as far as the CCP goes; If it's blocked, that's fine by the CCP too (I say, as I adjust my tin foil hat).

In any case, I added the following network blocks to my “badbots firewall rule set:”

43.130.0.0/18
43.130.64.0/18
43.130.128.0/19
43.130.160.0/19
43.131.0.0/18
43.132.192.0/18
43.133.64.0/19
43.134.128.0/18
43.135.0.0/18
43.135.64.0/18
43.135.192.0/19
43.153.0.0/18
43.153.192.0/18
43.154.64.0/18
43.154.128.0/18
43.154.192.0/18
43.155.0.0/18
43.155.128.0/18
43.156.192.0/18
43.157.0.0/18
43.157.64.0/18
43.157.128.0/18
43.159.128.0/19
43.163.64.0/18
43.164.192.0/18
43.165.128.0/18
43.166.128.0/18
43.166.224.0/19
49.51.132.0/23
49.51.140.0/23
49.51.166.0/23
101.32.0.0/20
101.32.48.0/20
101.33.64.0/19
119.28.64.0/19
119.28.128.0/20
129.226.160.0/19
150.109.32.0/19
150.109.96.0/19
170.106.32.0/19
170.106.176.0/20

The above list probably doesn't exhaustively enummerate Tencent's network block ownership, but it's a start. The above covers 476,590 unique IP addresses (excluding the base network and broadcast address for each network block). I think it's bad that I had to do this, but with the current landscape of the Internet, it seems inevitable. We can't have nice things it seems.

Discussions about this entry

Commenting runtime state changes

As I was banning Thinkbot, I saw the previous entries in the “badbots firewall rule set”. The first one was banning a particularly bad Gemini bot that would make an invalid empty request only to immediately follow up with a valid request, for every request it made! That was the first bot I actually banned, and it was very recent ban too—June 19^th.

But it was the second entry on the list that puzzled me:

Chain badbot (1 references)
    pkts      bytes target     prot opt in     out     source               destination         
       0        0 DROP       tcp  --  *      *       77.25.18.172         0.0.0.0/0           tcp dpt:1965 
     138     8280 DROP       all  --  *      *       185.177.72.0/24      0.0.0.0/0

(the count of 0 for the first rule—I had to reboot my server recently for reasons I'm still trying to resolve). I will have to go through the log archives to see why I banned the 185.177.72.0/24 network, and that reminded me of an idea I had years ago but never did anything about it.

Twenty-eight years ago (sigh) I wrote the greylist daemon (source code, and for the record, I'm still using it). It tracks a tuple of sending host, from address, to address and the default is to just greylist (that is, artifically delay) a tuple never seen before. But you can override the default behavior for the hosts, from address and to address. So for instance, I can reject hosts:

gld-mcp>iplist reject 206.214.64.0/19

But now, years later, why did I ban that network? I mean, I did set it at some point:

gld-mcp>show iplist
       106 GREYLIST         0.0.0.0         0.0.0.0
         0   ACCEPT       64.12.0.0     255.255.0.0
         0   ACCEPT    64.233.160.0   255.255.224.0
         0   ACCEPT     66.94.224.0   255.255.224.0
         0   ACCEPT      66.102.0.0   255.255.240.0
         0   ACCEPT    66.163.160.0   255.255.224.0
         0   ACCEPT     66.218.64.0   255.255.224.0
         0   ACCEPT  66.220.144.128 255.255.255.128
         0   ACCEPT     66.249.80.0   255.255.240.0
         0   ACCEPT     66.249.64.0   255.255.224.0
         0   ACCEPT    66.252.224.0   255.255.252.0
         0   ACCEPT     69.63.176.0   255.255.240.0
         0   ACCEPT     69.147.64.0   255.255.192.0
         0   ACCEPT      70.34.16.0   255.255.240.0
         0   ACCEPT     72.14.192.0   255.255.192.0
         0   ACCEPT      74.125.0.0     255.255.0.0
         0   ACCEPT       127.0.0.1 255.255.255.255
         0   ACCEPT    140.211.11.3 255.255.255.255
         0   ACCEPT     149.174.0.0     255.255.0.0
         0   REJECT     172.128.0.0     255.128.0.0
         0   ACCEPT     192.168.0.0     255.255.0.0
         0   ACCEPT   204.127.217.0   255.255.255.0
         0   ACCEPT     204.127.0.0     255.255.0.0
         0   ACCEPT    205.152.58.0   255.255.254.0
         0   ACCEPT   205.188.156.0   255.255.254.0
         0   ACCEPT     205.188.0.0     255.255.0.0
         0   REJECT    206.214.64.0   255.255.224.0
         0   ACCEPT    207.115.11.0 255.255.255.192
         0   ACCEPT     207.115.0.0   255.255.192.0
         0   ACCEPT   207.171.188.0   255.255.255.0
         9   ACCEPT    209.85.128.0   255.255.128.0
         0   ACCEPT    209.131.32.0   255.255.224.0
         0   ACCEPT     216.39.48.0   255.255.240.0
         0   ACCEPT    216.239.32.0   255.255.224.0

but there's no indication of when, or why. A fews years of use, and I wish I had added a way to comment such entries. For instance, I blocked 172.128.0.0/16 at some point, but since then, the block is now owned by Microsoft in the United Kingdom. I think I can remove that block now (maybe?).

And I think that iptables (and related commands, I think the preferred firewall interface for Linux is now nftables? Good lord, the churn in this industry is insane) having a way to add comments might be nice, like:

# iptables -A badbots --comment "Thinkbot daring me to ban it 2025-08-21" -s 43.131.0.0/18 -j DROP

I don't know, it's just a random idea I have.

Friday, August 22, 2025

I did not get the memo, because if I did get the memo, it went straight into the spam bucket

My biggest spam folder by far is the one for emails addressed to my registrar email address. It gets spammed because I've been using it for years, back at the time when WHOIS information was public and you had to pay extra to hide it. So any emails sent to my registrar email address not from my registrar got filtered into a specific spam folder (and yes, I think I might be the only person to filter spam into different folders). Email from my actual registrar is filtered into another folder, where I get notices about upcoming domains expiring.

Today I just happened to notice in the registrar spam folder that Network Solutions was sending me emails about an expired domain. That's weird, I thought. I haven't used Network Solutions in decades. I first used Network Solutions back in the late 90s, but I pretty quickly switched to Dotster which I found to be decent enough, and when they finally stopped the upsells for services I don't want or need, they got bought out by Web.com several years ago. Web.com was … meh. Very slow, not many upsells, but damned if I could change my payment method throught their website (I had to update my credit card info with a new expiration date and security code and it was impossible to do so via the web; even tech support had issues with changing it so I had to switch to a new card—literally, how hard is it to give you money, Web.com?). So Network Solutions telling me I have an expired domain just seemed weird.

But I decided to check with Web.com anyway, and … WHAT THE XXXX?

It's now Network Solutions?

In the three months since I renewed a domain, Web.com got bought out by Network Solutions?

No wonder I didn't get the memo, it went straight to spam.

Many upsells, and (because I did not get the memo in time) an additional “reactivation fee” for the now-expired domain. And as icing on the perverbial payment cake, I couldn't change my credit card info.

Sigh.

Monday, August 25, 2025

A neat idea, but I can see this leading to the Balkanization of the Internet

I did not expect my recent post about blocking a Chinese web bot to generate such a large discussion at Hacker News and to a lesser degree, Lobsters (for the record, I did not submit my post to either site). Reading over the comments at both sites, I think my favorite comment so far is:

"I'm seriously thinking that the CCP encourage this with maybe the hope of externalizing the cost of the Great Firewall to the rest of the world. If China scrapes content, that's fine as far as the CCP goes; If it's blocked, that's fine by the CCP too (I say, as I adjust my tin foil hat)."

Then turn the tables on them and make the Great Firewall do your job! Just choose a random snippet about illegal Chinese occupation of Tibet or human rights abuses of Uyghur people each time you generate a page and insert it as a breaker between paragraphs. This should get you blocked in no time :)

Hacker News comment on “Bro, ban me at the IP level if you don't like me”

There's also the 1989 Tiananmem Square massacre, Falun Gong persecution and last but not least, that Xi Jinping looks like Winnie-the-Pooh. I was considering the idea to include such information in each post, but then in a follow-up comment to that one:

I just tried this, i took some strings about Falun Gong and the Tianmen thing from the chinese wikipedia and put them into my SSH server banner. The connection attempts from the Tencent AS ceased completely, but now they come from Russia, Lithuania and Iran instead.

Other Hacker News comment on “Bro, ban me at the IP level if you don't like me”

Russia and Iran don't surprise me, but Lithuania? Weird. But ultimately, yes, I get China to block me, only for them to outsource their scraping to other countries.

Sigh.

Discussions about this entry

Wednesday, August 27, 2025

My post about banning a Chinese web bot? Apparently, that was really about me being shadow banned from Hacker News … which is news to me

Oh my! This is hilarious:

Developer gets shadowbanned by Hacker News and asks for a real IP ban instead

Sean Conner discovered he's been shadowbanned from Hacker News and would rather just be banned at the IP level if they don't want him there. SEAN'S BLOG POST | HACKER NEWS DISCUSSION

Unsupervised Learning NO. 495

I can see how the author might get that from if they only read the headline on Hacker News, but (to his credit) he spelled both my first name and last name correct, so he must have clicked through to my post (where my name shows up at the bottom of the page) but not read the actual post.

Wow!

But given the author writes about AI, perhaps he had his AI write the summary for him. Given the two errata he mentions about his previous newsletter, if he does use an AI, perhaps he needs a word or two with it …

Friday, August 29, 2025

Some more notes on the “wireless service unit”

After much experimentation, I found out that the “wireless service unit” the Monopolistic Phone Company sent us to replace the DSL does in fact support multicasting, although it's a bit more pedantic about it than any other router I've encountered so far. The address I used to use, 239.255.0.1, falls into the “administratively scoped” category of multicast addresses, and I picked it because I wanted a multicast address that was scoped. The “wireless service unit” isn't something I fully control, so it rejected that range of multicast addresses. In fact it appeared that it didn't like any multicast address that could, in theory, be routed.

Of course it exhibited different behavior with different blocks. Most of the blocks it would work for just under five minutes, then fail. I found this out by writing some very simple programs—one to send some data once per second to a multicast address, and one to receive the data. I would run the sender on two computers, and the listeners also on the two computers. Both the listeners would receive data from both senders, and then as they approached five minutes, they would only receive multicast packets from the sender running on the same computer.

It was only when I switched to using the one non-routable multicast address block, 224.0.0.0/24, did things Just Work™. 224.0.0.0/24 is categorized as “local subnetwork” and is not routable.

Sigh.

So now I'm able to use the multicast program I was using before, since it was always local to my home network anyway.

Other notes about the “wireless service unit”—it's reporting pages suck. There's the “event report” page, which dumps data like:

No.	Data/Time	SoureIP	DestinationIP	Proto	Reason
1	2025/08/…	XXXXXXXXXXXXXXX	XXXXXXXXXXXXXXX	TCP	Generic Discards
2	2025/08/…	XXXXXXXXXXXXXXX	XXXXXXXXXXXXXXX	TCP	Generic Discards
3	2025/08/…	XXXXXXXXXXXXXXX	XXXXXXXXXXXXXXX	UDP	Generic Discards
4	2025/08/…	XXXXXXXXXXXXXXX	XXXXXXXXXXXXXXX	TCP	Generic Discards

TCP and UDP traffic is being stopped, but what TCP and UDP traffic? No indication, and there's no way to configure what is logged. Lovely.

The other log report is the list of current NAT sessions. It's more useful as it includes source address, destination address, NAT address, protocol, port numbers, and lifetime, but the table itself is capped to a maximum width, so making the browser window wider doesn't show more columns. Horizontal scrolling for the win? I guess? Sigh.

The port-forwarding feature is wonky. On my old router, I could set incoming packets from the Internet to TCP port 22 to be forwarded to my development machine. On the “wireless service unit,” however, setting that up means all traffic to TCP port 22 gets forwarded to my development machine, even on the local network! I mean … yeah … it works, but it's not public traffic that gets forwarded, all traffic gets forwarded. I can work around that but it's annoying.

The “wireless service unit” has also spontaneously rebooted itself a couple of times. Not enough for a pattern to emerge, but enough to be very annoying. And one time it failed to obtain an IPv6 address (which shouldn't change in my opinion but then again, I don't run the Monopolistic Phone Company) and I had to power cycle it to get IPv6 back.

And I can't shake the feeling that it's doing something to my DNS queries, even though I'm running a local DNS server …

Saturday, August 30, 2025

Notes on an overheard conversation between a prescriptivist and a descriptivist

“I'm going to the grocery story, so if you wake up from your nap and I'm not here, that's where I will be at.”

“No. Don't say that.”

“You don't want me to go to the store?”

“No, don't end your sentances with a preposition.”

“Really? How should I have said that?”

“‘That's where I will be.’ You don't need the ‘at.’”

And the next thing you'll tell me is to stop splitting my infinitives.”

“Pththththththththt.”

Saturday, September 06, 2025

Notes on an overheard conversation of two people opening a package

“Oh, it's a mug from Sunny Farms, in Sequim, Washington.”

“It's pronounced ‘skwim.’”

“Pardon?”

“It's pronounced ‘skwim.’”

“So, you're saying the ‘E’ is silent?”

“Yes.”

“…”

“It's Washington! What can I say?”

Sunday, September 07, 2025

There's a van Gogh joke in here somewhere

Bunny went to the local Michaels Craft store to pick up a project, and she returned with a small gift for me:

[An image of a pencil eraser in the shape of a human ear] “'Ere you go! Earase to your ear's content!”

An eraser (or as it states on the package, a gomme à effacer) in the shape of a human ear. And it also appears not to be for kids between newly born and 12 years old if I'm reading the small graphic icon in the corner of the package correctly. Maybe not for kids less than a year old? I don't know, it just has a small kid face with the text “0–12,” encased in a circle with a slash.

I … don't know what to say, other than “Thank you!” but aside from that … I have nothing.

Thursday, September 11, 2025

Some musings on the Metric system

I just watched this amusing video “Why Didn't America Go Metric? Now I Finally Get It” where Busted Knuckle Woodworks goes into the history of the matric system and why the US doesn't use it. Yes, it goes into the whole “pirates stole the metric system from the US” story, but it also mentions the late 1800s Pyramid Power movement that also put the kibosh on the metric system here in the US (and one I had not heard and sadly, such mystical thinking is still very much in the main stream). And it's interesting that the Imperial System is still in main stream use in the UK.

But one advantage I see for the Imperial System is that it's mostly based on factors of two and three, like two cups per pint, two pints per quart, three teaspoons per tablespoon, and three barlycorns per inch. The downside of that advantage is the sheer number of units available, like drams, furlongs, pennyweights and gills. I don't think it's that bad though.

For instance, the chocolate ice cream recipie I've been using contains the following ingrediants:

1 cup whole milk
1 pint heavy cream (or 2 cups)
1 teaspoon vanilla extract
¾ cup sugar (using 1 cup makes the ice cream too soft in my experience)
¾ cup cocoa powder

In the metric system, you get some really weird values though:

237mL whole milk
474mL heavy cream
5mL vanilla extract
150 grams sugar (using 200 grams makes the ice cream too soft in my experience)
72 grams cocoa powder

I think the amount for cocoa powder is correct, as I found answers from 75‥95g of cocoa powder per cup, and as I like chocolate, I used ¾ the upper value.

I suppose one could get by with:

250mL whole milk
500mL heavy cream
5mL vanilla extract
150 grams sugar
70 grams cocoa powder

for more “round” amounts in metric. I do wonder if such rounding up (or even down) might affect the results though (probably not). Personally, I find the Imperial version easier to remember, but that might be bias on my part.

Wednesday, September 17, 2025

To think that my previous pair of glasses lasted nearly twenty years, and yet my current pair almost two

So there I was, sitting in the Compute Room at Chez Boca cleaning my glasses. I was holding them in my left hand, and I had finished cleaning the left lens and had started cleaning the right lens when …

SNAP—

I now had a literal pair of glasses. The bridge on my glasses had snapped cleanly in half.

Well, gosh darn it, I thought to myself. What I said out loud, though, was “XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX!”

I did my best tape job, and then Bunny drove us (because I wasn't about to drive with a pair of glasses that might fall apart at any second) to the store where I had purchased my glasses.

Of course they don't sell those frames any more. They still sell Flexon frames, just not the ones I have. And because they can't just take the existing lenses and recut them for new glasses (because I now have progressive lenses) it'll take a bit over a week for the new glasses to be ready.

Aaaaaaaaaahhh!

In the meantime, it was suggested I try going to a jewelry store to see if they could temporarily solder the two halves together, because the clear tape clearly wasn't cutting it.

So then Bunny and I went to a nearby jewelry store, and while they didn't have the equipment to do that, they did suggest we head out to International Jewelers Exchange out in West Boca. It's a large, flea-market like jewelry store with multiple merchants inside, some of which do repairs.

West Boca and the International Jewelers Exchange it is!

Asking around, we found Mike, who also does eye glass repairs. When he saw the glasses, he had bad news for us. “That's titanium—my equipment doesn't get hot enough to properly repair it,” he said. “You can try the laser and welding store upstairs, they might have the proper equipment.”

So we trudged upstairs to the laster and welding store and asked. “Sorry, no,” that guy said. “I mean, we can try soldering titanium, but any slight pressure and ‘POP!’ it will fail. Our equipment just isn't hot enough to properly handie it.”

Well, I thought, gosh darn it. This time, I managed not to say “XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX” out loud.

So now I'm back home at Chez Boca. The 3M micropore tape and a small segment of a toothpick for structural support appears to be working, although my glasses are now a bit floppy. But it's holding, and it should be fine until next week.

“XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX!”

Wednesday, September 24, 2025

I wonder how long it will be until my new registrar is bought out

Over a week ago, I saw the registrar that Dan Lyke uses was bought out (at least he got the memo) and it prompted me to look at alternative registrars. After some consideration, I signed up with Porkbun, a registrar out of the Pacific Northwest. They're about half the price of Network Solutions, and they don't upsell on every link, which is nice. And also unlike Web.com and Network Solutions, they make it easy to update credit card information so you can, you know, give them money!

So on the 16^th, I logged into Network Solutions to start a domain transfer. I picked a domain that wasn't that important to me just to test the process out and get a feel for how it works.

First, Network Solutions said it wouild take up to three days to generate a transfer token, but it was more like five days. Then they took yet another four days to give me time to ponder my decision to leave them (no! I want to leave now!). But once the nine days were over, the domain transferred without incident. I just have to be paitent with Network Solutions.

Now it's just a matter of transferring the rest of my domains over.

Now that's a keyboard

Via Lobsters, I came across this incredibly insane Japanese keyboard:

[A closeup of a gargantuan Japanese keyboard] Just this portion of the keyboard represents only 6% of all the keys

The photo itself is from DeskThority, and it's described as an an Alps CP10SJ550A kanji keyboard from Japan. It has 542 keys and weighs around 27 pounds (12kg for those with a sane measurement system). It's an insane keyboard, and … I kind of want one.

I'm not sure what I would use it for. I suppose I could map each ASCII code to its own key, map syntatic constructs for multiple languages to its own key, and still have some left over for … oh … I don't know … simulating a piano.

I don't even know where one would get such a keyboard, but I'm sure Bunny can find one in time for Christmas (muahahahahahahaha!).

Thursday, September 25, 2025

Maybe these will last longer than two years

Eight days later and I finally have my new glasses!

[Self-portrait with new glasses] Glasses. Titanium, not steel.

I no longer have to tape them to my head to keep them on. Even better, I can now clean my glasses with cursing like a sailor.

Woot!

This is my markup language. There are plenty of others, but this is mine

The Lobster's Blog Carnival is up, and the theme is “What have you made for yourself?” While I have plenty of programs I wrote, there's one that I specifically wrote for my own use: MOPML.

I wrote it to make writing blog entries easier for me. For twenty years, I was hand-crafting HTML for each entry and I finally got tired of it. I wanted an easier way to make entries, so I started down the path of implementing my own markup language. Existing languages like Markdown or AsciiDOC didn't appeal to me and were a bit too generic in how they did things. I also wanted to steal ideas from TeX and Org Mode, as well as some ideas I had to support tags like <ABBR> (which not many sites bother doing).

As I already had twenty years of entries in HTML, one design goal was not to store the entries as MOPML, but to keep them their final HTML-rendered state. This meant that I could play around with the syntax of MOPML and not have to worry about breaking existing entries. Besides, if I had to edit a post after publication, I can edit the HTML directly; I have been doing that for years anyway. For the impementation, I chose Lua, specifically so I could use LPEG.

The TeX inpsired syntax are for items like M-dashes, where I can type three dashes like --- and get a single M-dash on output: —. Or even type typographical quotes where I can type ``This is quoted'' and get “This is quoted”. I even extended that so that when I type “1/2” I get “½”. It's also easy to add new entries to the particular parsing rule.

And while I was inpired by Org Mode for things such as tables and block quotes, I did not care for the syntax, so I changed it to suit my needs. A table is easy to generate:

#+table This is a caption
*header	foo	bar	baz
**footer	foo	bar	baz
Entry 1	3	14	15
Entry 2	92	62	82
Entry 3	8	-1	4
#-table

The #+table starts a table defintion, and is followed by an optional caption. A header row is marked by a starting asterisk, and a footer row is marked by two asterisks. Each field is separated by a tab character. The above example will produce the following table:

This is a caption
header	foo	bar	baz
footer	foo	bar	baz
Entry 1	3	14	15
Entry 2	92	62	82
Entry 3	8	-1	4

The above sample is yet another Org Mode inspried block:

#+source MOPML
#+table This is a caption
*header	foo	bar	baz
**footer	foo	bar	baz
Entry 1	3	14	15
Entry 2	92	62	82
Entry 3	8	-1	4
#-table
#-source

(For the record, I did have to go in after rendering this post and fix the above example, but I never intended to nest #+source blocks in the first place.)

I also have a defined block for when I quote email:

#+email
From: John Doe <{{johndoe@example.net}}>
To: sean@conman.org
Subject: Re: Morbi in lorem ut lectus accumsan
        placerat. Morbi TLA enim id turpis
Date: Mon, 1 Apr 2019 18:12:41 +0200

Lorem ipsum dolor sit amet, consectetur adipiscing elit.  Donec gravida
justo et aliquam lobortis.

#-email

The From: header has another formatting quirk—the {{ and }} denote text that is to be censored in the output. This is how I get those XXXXXXX censor bars in my posts. The above will render the block as:

From
John Doe <XXXXXXXXXXXXXXXXXXX>

To
sean@conman.org

Subject
Re: Morbi in lorem ut lectus accumsan placerat. Morbi TLA enim id turpis

Date
Mon, 1 Apr 2019 18:12:41 +0200

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec gravida justo et aliquam lobortis.

The Markdown-inspried bits are mostly with inline markup, such as *emphasis* for emphasis, and `code` for code. I never did like the Markdown or Org Mode syntax for links, so I played around with it for quite a bit until I got a syntax I like. So when I want to link to this page, I type {^https://www.conman.org/people/spc/about/ this page}. It's a minimal syntax that isn't likely to appear in normal text. And if I ever do want to include a “}” in a link, I can always escape it like {^/ sample \} text} to get sample } text.

But I think the best feature is how I handle abbreviations. HTML contans the <ABBR> tag to semantically mark up TLAs and what not. I wish most web authors would do this, as it would make reading about the MPZ easier to understand, and most browsers on the market will show a tooltip with the TITLE attribute if you hover over it.

I was moaning about this way back in 2003, and I finally have a method I'm happy with. All I do is include a block of abbreviations at the top of the post:

abbr:	HTML	HyperText Markup Language
	MOPML	My Own Private Markup Language
	LPEG	Lua Parsing Expression Grammar
	URL	Uniform Resource Locator
	TLS	Three Letter Acronyms
	MPZ	Medial Palisade Zone

The code will read this block and generate the LPEG code to recognize the acronym, such as TLA, and generate the appropriate HTML: <abbr title="Three Letter Acronym">TLA</abbr>, thus giving us our TLA with semantic markup.

I even solved what I called the IRA problem back in 2003, you know, when the IRA steals the IRAs from members of the IRA; or in other words, when you have the same TLA that maps to different meanings. And I can even mention IRA GERSHWIN without fear of it becoming Initial Risk Assessment GERSHWIN. The IRA problem is solved with yet another block definition at the top of the post:

abbr2:	IRAa	IRA	Irish Republican Army
	IRAr	IRA	International Reading Association
	IRAm	IRA	Individual Retirement Account

So I type IRAa and the code will generate <abbr title="Irish Republican Army">IRA</abbr>.

I suppose I could always include definitions of common TLAs I use in the code itself, but it hasn't been that big of an issue for me to just define the TLAs I use in the post itself.

That's pretty much all I have for a markup language. Yes, it's tailored to what I write and how I want to present it. I don't expect anyone to use this engine as it makes sense to me, but maybe not to you. And that's the point, this is for me to use. I made this for myself. And I'm lucky enough to be able to do so.

Friday, September 26, 2025

Yet more notes on web bot activity

For the past few months, every other week my server (which hosts this blog) would just go crazy for a day and require a full reboot to get back to normal. I haven't tracked down a root cause for this, but I do suspect it has to do with web bot activity increasing over the past few months. I ran a query over the logs for August, generating the number of requests per second and here are the top ten results:

timestamp	host	RPS
26/Aug/2025:03:26:36 -0400	76.14.125.194	740
26/Aug/2025:03:26:29 -0400	76.14.125.194	735
26/Aug/2025:03:26:35 -0400	76.14.125.194	697
26/Aug/2025:03:26:37 -0400	76.14.125.194	693
26/Aug/2025:03:25:54 -0400	76.14.125.194	666
26/Aug/2025:03:25:53 -0400	76.14.125.194	607
26/Aug/2025:03:26:28 -0400	76.14.125.194	589
26/Aug/2025:03:26:38 -0400	76.14.125.194	576
26/Aug/2025:03:26:17 -0400	76.14.125.194	574
26/Aug/2025:03:25:49 -0400	76.14.125.194	539

Websites like Google or MyLinkedFaceTikInstaPinMeTokBookTrestSpaceGramInWe might be able to handle loads like this, but I'm running a blog on a single server. These numbers are insane! Fortunately, this level of activity didn't last for long, but it certainly made it “interesting” on my server for a few minutes:

# requests per minute
timestamp	RPM
03:23	27
03:24	72
03:25	4752
03:26	11131
03:27	1185
03:28	58
03:29	26

It's looking like spikes in activity might be a reason for my server freaking out.

Apache doesn't come with a way to limit IP connections. A search lead me to mod_limitipconn, a simple module that limits an IP address to a maximum number of concurrent connections. There's nothing about rate limiting per se, but it can't hurt, and it's a simple enough to install.

So earlier this week, I installed it. I set a maximum connection limit of 30—that is, no single IP address can connect more than 30 times concurrently. I just picked a high enough number (possibly too high) to still allow legitimate traffic through while keeping the worst abuse away. The code as downloaded will return a “503 No Service” when it kicks in, but I changed it to return a “429 Too many requests” which better reflects the actual situation (I think the code was originally written before 429 was a valid response code).

And it's working. It's already caught 18 bots (or rather, bots with distinct IP addresses), and they are all from the same ASN: GOOGLE-CLOUD-PLATFORM, US (and the user agents are all obviously forged). But what's curious about these is that a subset of the requests include a referrer URL. Most browsers these days restrict sending the referring link, or outright don't send it at all (to respect privacy). So to see them is unusual by a web bot.

Even more curious is these referring links have nothing to do with the link being referenced. There are, so far this month, 147 requests from the GOOGLE-CLOUD-PLATFORM ASN sending a referrer to Slashdot. And I don't mean to a page on Slashdot, but to the main page of Slashdot. There are also referrers to Cisco (on 201 requests), Petrobras (on 581 requests), NBC News (on 221 requests) among 435 other websites being referenced on requests. I don't understand the reasoning here. It's not like I'll let through a request just because it came from Slashdot. I don't publish referring links. I know sites used to publish referring links back in the day, and spammers used this to gain Page Rank for their own pages (or for their clients) but that can't be worth it these days? Can it? Are these still old bots running but long forgotten? What is the angle here?

Anyway, I'll have to wait and see if limiting IP connections will solve my server issues. I do hope that's all it is.

Oh, it's a bug on my side that prevents full conditional requests

I'm still pouring through web sever log files and I'm noticing that many of the feed readers fetching my various feeds files aren't using conditional requests. I was in the process of writing to the author of one of them describing the oversight when I noticed that particular feed reader using both methods of conditional requests: the If-Modified-Since: header and the If-None-Match header in conjunction with a HEAD request. I thought I should test that with my web server, just to make sure it was not a bug on my side.

It's a bug on my side!

Specifically, an Apache bug where compressed output interferes with the If-None-Match method. There is a workaround though:

RequestHeader edit "If-None-Match" '^"((.*)-gzip)"$' '"$1", "$2"'

That rewrites the incoming If-None-Match header to work around the bug. Now maybe that whole conditional request thang with my webserver will now work properly.

Sigh.

Wednesday, October 01, 2025

Why do they even bother with /robots.txt?

It's still too early to see if limiting IP connections to my webserver is doing any good, but it's catching bots that are hammering a bit too hard. And the bots that are getting caught so far are all from the GOOGLE-CLOUD-PLATFORM ASN (34.174.0.0/17 for the record). Two dozen different IP addresses so far this day, with about half a dozen different forged user agents.

But what I noticed (and this happened last month too!) is that an individual IP address will come in like a wrecking ball, making as many requests in as short amount of time as it can, but that each such sequence will start with a request of /robots.txt. Like this:

34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /robots.txt HTTP/1.1" 200 34 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" -/- (-%)
34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /2022/09/22.1 HTTP/1.1" 200 5627 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5609/18725 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /refs/access.html HTTP/1.1" 200 4476 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4458/16246 (27%)
34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /refs/copyright.html HTTP/1.1" 200 3860 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 3842/14775 (26%)
34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET //2022/09/22.1 HTTP/1.1" 200 5627 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5609/18725 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /about/ HTTP/1.1" 200 4586 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4568/16282 (28%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09 HTTP/1.1" 200 14654 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 14636/44761 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/22.1 HTTP/1.1" 200 5628 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5610/18725 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2000/08/01 HTTP/1.1" 200 5675 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5657/19414 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /1999/12/04.1 HTTP/1.1" 200 4363 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4345/15504 (28%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET / HTTP/1.1" 200 14583 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 14565/44552 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/21.1 HTTP/1.1" 200 4421 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4403/16165 (27%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /about/technical.html HTTP/1.1" 200 7028 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 7010/23332 (30%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /index.json HTTP/1.1" 200 15951 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 15933/47681 (33%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/22 HTTP/1.1" 200 5538 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5520/18605 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /archive/ HTTP/1.1" 200 5171 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5153/30006 (17%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/24.1 HTTP/1.1" 200 4958 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4940/17340 (28%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /refs/glossary.html HTTP/1.1" 200 3844 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 3826/14690 (26%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /index.atom HTTP/1.1" 200 16863 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 16845/61183 (27%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09/26.2 HTTP/1.1" 200 4499 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4481/16293 (27%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2018/02 HTTP/1.1" 200 15624 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 15606/50795 (30%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/11 HTTP/1.1" 200 13793 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 13775/42797 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2015/07 HTTP/1.1" 200 34729 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 34711/108151 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2000/04 HTTP/1.1" 200 28253 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 28235/82627 (34%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2014/08 HTTP/1.1" 200 6190 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 6172/20372 (30%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2008/02 HTTP/1.1" 200 25223 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 25205/76745 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/08 HTTP/1.1" 200 16603 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 16585/52801 (31%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09/11 HTTP/1.1" 200 5089 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5071/17989 (28%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09/17 HTTP/1.1" 200 5420 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5402/18527 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/06 HTTP/1.1" 200 8392 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 8374/27002 (31%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2005/03 HTTP/1.1" 200 31336 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 31318/98041 (31%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/12 HTTP/1.1" 200 13553 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 13535/41052 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2012/08 HTTP/1.1" 200 7131 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 7113/24140 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /about/history.html HTTP/1.1" 200 6247 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 6229/20734 (30%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/06 HTTP/1.1" 200 14118 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 14100/46770 (30%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2021/11 HTTP/1.1" 200 18020 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 18002/57817 (31%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2007/11 HTTP/1.1" 200 25827 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 25809/77770 (33%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2004/09 HTTP/1.1" 200 23195 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 23177/70110 (33%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2009/09 HTTP/1.1" 200 13535 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 13517/41367 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2015/09 HTTP/1.1" 200 32956 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 32938/98767 (33%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2018/07 HTTP/1.1" 200 17319 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 17301/54361 (31%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2017/07 HTTP/1.1" 200 8116 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 8098/26633 (30%)

And that's it. A request for /robots.txt, then two seconds of requests, then gone.

I mean, I guess it's nice that it looks at /robots.txt, but how do I control it? Do I use Mozilla? Windows? AppleWebKit? KHTML? Gecko? Chrome? Safari? Or is it checking /robots.txt looking for places explicitly disallowed? (no, none of the bots from GOOGLE-CLOUD-PLATFORM has made requests to locations I've disallowed—I checked). Maybe for hints about speed? Maybe?

But I still don't get the referrer though. This case was https://google.com/, but I've also seen other referring links, like https://riotgames.com/ and https://northwestern.edu/.

Weird.

Monday, October 13, 2025

I don't think it's news to anyone out there that one should avoid Network Solutions for domain registration and probably for anything else as well

The rest of my domains have been transfered away from Network Solutions. The process wasn't hard. It wasn't even really tedious, it just took a bunch of waiting.

I would log into Network Solutions, click past a bunch of needless notifications and upsells, and request a domain transfer. I would then get a chance to renew the domain for the low-low price of $19.95, which technically is cheaper, but still twice the price that my new registar, Porkbun, charges. Click past that, and I would have to wait up to four days in order to change my mind before Network Solutions send the transfer key. You know, to keep me from making a rash decision to stop paying them money.

Once I got the transfer key, I would then transfer in the domain to Porkbun. Network Solutions would then send an email 24 hours later, informing me that I have four days to change my mind, but I should talk to one of their “transfer specialists” to help transfer my domain, because Network Solutions is adamant that I don't rush into transfering my domains away from them and thus, stop paying them.

Four days after that, I would receive email from both Network Solutions and Porkbun that the domain (or domains actually) transfered over. So the process was mostly a waiting game on the part of Network Solutions.

Now that I'm no longer using Network Solutions for domain registration, I want to delete my account there. Of course, there's no link on the Network Solutions to delete my account, you know, to keep me from making a rash decision. Nope, I have to call to talk to an “account specialist” to do that deed. ~~And it turns out, there is no way for them to delete my account. None. Nada. Zip.~~

Let that sink in—there is no way to delete your Network Solutions account!

They're damn adamant that I keep my account, just in case!

The best I can do is delete my credit card information. You know, the same credit card information that you can't update what-so-ever. In reality, they have to manually delete the credit card information from my “from now until the Heat Death of the Universe” account at Network Solutions.

Good Lord. What a clown show!

Update on Thursday, October 16^th, 2025

Account deletion at Network Solutions is a bit more nuanced than I thought.

Thursday, October 16, 2025

So account deletion at Network Solutions is a bit more nuanced than I was led to believe

I was, perhaps, a bit harsh with my criticisms of account deletion at Network Solutions. I noticed that they had yet to remove the billing information, so I called back to get an update and this time I was able to talk to a “billing specialist.” They still had to manually delete the credit card info (which they did while I waited) but I also learned a bit more about their policies about account deletion. If there's no services on an account, it will be deleted after a month (the “billing speciallist” wasn't exactly clear on this, but it sounded like a month) of inactivity, which, okay, I can see that. I just wish that was a bit more visible on the site, both to reassure those that want to leave Network Solutions, and to warn those that do use Network Solutions that no billing activity on their account can lead to automatic deletion.

So they move up from “clown show” to “annoying to use.”

Monday, October 20, 2025

The fix wasn't easy, or C precedence bites

For the past decade now, I've done a Christmas release for mod_blog (only missing the years 2019 and 2023), just beacuse. I was poking around the codebase looking for changes I could make for Christmas this year, and well, I got a bit impatient and have just now released a version in time for Halloween. And it's scary—I'm removing features!

I've removed features in the past, like no longer supporting “ping servers” when it became clear it wasn't worth it, or automatically updating LinkedTikFaceMyInstaPinMeGramWeTokInBookSpaceTrest when InstaPinTikMyLinkedFaceMeTrestBookGramWeInSpaceTok changed how it works often enough to make it annoying for me to continue. But this time … this time it's different. This is removing functionality that has existed in the code base since the beginning!

To make it easier to write entries, I had code within mod_blog to process the input—mostly what existed was to convert sequences like ``quoted'' to “quoted” and “...” to “…”, but with an option to add <P> tags around logical paragraphs. But given that I now use my own markup language, I rarely used the web interface (like, I can count on my fingers the number of times I've used it and still have a few left over should give an indication of how little I use it) and the code just sat there, unused. So unused that in fixing one bug I introduced another bug in the code I fixed!

To recap, here's the original code:

char UrlDecodeChar(char **psrc)
{
  char *src;
  char  c;
  
  assert(psrc  != NULL);
  assert(*psrc != NULL);
  
  src = *psrc;
  c   = *src++;
  if (c == '+')
    c = ' ';
  else if (c == '%')
  {
    assert(isxdigit(*src));
    assert(isxdigit(*(src+1)));
    c    = ctohex(*src) * 16 + ctohex(*(src+1));
    src += 2;
  }
  *psrc = src;
  return(c);
}

and the “fixed” version:

char UrlDecodeChar(char **psrc)
{
  char *src;
  char  c;
  
  assert(psrc  != NULL);
  assert(*psrc != NULL);
  
  src = *psrc;
  c   = *src++;
  if (c == '+')
    c = ' ';
  else if (c == '%')
  {
    if (!isxdigit(*src))   return '\0';
    if (!isxdigit(*src+1)) return '\0';
    c    = ctohex(*src) * 16 + ctohex(*(src+1));
    src += 2;
  }
  *psrc = src;
  return(c);
}

I don't fault you if you can't spot the bug. I only found it when testing the web interface to ensure it wasn't completely broken with the conversion code removed ~~(instead it's now only mostly broken but that's an interesting case in and of itself and requires its own post)~~. The bug is in this line of code:

if (!isxdigit(*src+1)) return '\0';

The issue is due to C's precedence rules and dereferencing rules. The code above is parsed as src[0] + 1 instead of the src[1] that I was intending. When I modified the function, changing the calls to assert() into actual code to return an error (I typed in the new code as that's faster than modifying the existing code) I … kind of missed that.

Oh, who am I kidding? I totally missed that. But because I don't use the web interface this bug went unnoticed. Sigh.

For the record, I changed the code to read:

if (!isxdigit(src[0])) return '\0';
if (!isxdigit(src[1])) return '\0';
c    = ctohex(src[0]) * 16 + ctohex(src[1]);

Another long time feature I've removed is email notification. I added it early on where you could submit your email address to get notified of posts, but spammers and a lack of outside interest pretty much put the kibosh on that. As I still have three users of the email notification (one is me, one is Bunny, and one is one other person whom I'm not sure still reads the emails but they haven't bounced yet) I don't want to drop support completely, so now the email notifications are sent via the hook mechanism I added a few years ago.

In total, I removed over 3,000 lines of code from mod_blog. Granted, over 2,000 of them were in one function that was removed, but still, it's 3,000 lines of code I don't have to worry about any more.

Still, it's a bit scary to remove features that have been there for so long, and thus, a Halloween release.

Discussions about this entry

It worked, but it failed

In posting the previous post I encounted an interesting bug!

It wasn't in mod_blog per se, but in the hook running after an entry has been added, and therein is the bug—the entry was successfully added, but the hook failed.

The hook program failed due to a compilation error that was only triggered when it ran. I took the email notification code from mod_blog and turned it into a program. I also linked to the bloging core of mod_blog to avoid having to duplicate the code to read the configuration (the email notification block is now ignored by mod_blog itself), and because the configuration format is Lua, a compiler option is needed to support Lua modules written in C—basically, -rdynamic to allow C-based Lua modules to call Lua functions (which I allow, and need, to support my particular configuration).

This is the root cause of the issue.

But in the meantime, because the hook failed to run, the script I use that uses the HTTP PUT method received a status of “500 Internal Server Error,” the entry was stored, but none of the statically generated files (index.html and the various feed files) were generated, nor email sent.

Once I figured out what happened, it was easily remedied, but that still leaves the question of what should happen? I intended the add entry post-hook to handle situations like notifications, so in this case, if the hook fails, normal processing should proceed, but how to send back that the entry post-hook failed? Looking over the HTTP status codes, perhaps I could return a “202 Accepted” when the entry post-hook fails, with some information about the failure. That could work.

Tuesday, October 21, 2025

The actual root cause of yesterday's bug were laid over twenty years ago

Yesterday, I found the root cause of a bug but I did not go into details about how that bug slipped into production (so to speak). That's easy—the configuration of mod_blog differ between my development server and public server.

On my public server, I have the following bit of code in the configuration:

process = require("org.conman.process")

-- --------------------------------------------------------------------
-- process limits added because an earlier version of the code actually
-- crashed the server it was running on, due to resource exhaustion.
-- --------------------------------------------------------------------

process.limits.hard.cpu  = "10m" -- 10 minutes
process.limits.hard.core =  0    -- no core file
process.limits.hard.data = "20m" -- 20 MB

-- --------------------------------------------------------
-- We now resume our regularly scheduled config file
-- --------------------------------------------------------

I load a module to configure bits of the environment that mod_blog runs in. The configuration file on the development server does not have such code. So when I compiled the email notification program, the fact that I did not include the -rdynamic compiler option was not an issue when I ran my tests.

Yes, a case where there was a difference between development and production that allowed a bug to slip through. So I decided to dig a bit deeper. A few days ago I explained why I had such directives in my configuration file when I was asked why didn't I use Apache's RLimitMEM directive. I answered that the cause of adding the process limits happened pretty early in the use of mod_blog and that I didn't recall seeing such a directive in Apache at the time.

But I did get curious as to when Apache might have added the RLimitMEM directive. I started this site using Apache 1.3 (when that was the current version of Apache—I've been blogging for quite a long time) and I was thinking that the RLimitMEM directive may have been added around version 2.0. In my archives, I found a copy of Apache 1.3.9 and wouldn't you know it—RLimitMEM existed!

Sigh.

I could have avoided yesterday's issue had I only read a bit further into the Apache documentation back in the day.

Friday, October 31, 2025

For Hallowe'en, I'm half hoping we get all the kids so we have no candy left, and half hoping we get no kids so we have all the candy left

For the past few years, fewer and fewer kids have been showing up on our door step looking for candy. That doesn't stop Bunny from being opimistic and overbuying candy just in case scores of kids come by and get candy, least we get our house redecorated with eggs and toilet paper. Me, I've always been surprised when any kids show up these days on Hallowe'en. So it was that Bunny bought way more candy than I felt we needed.

We ended up with three kids showing up. In one group. At around 7:30 pm. At least we avoided carving a pumpkin this year, opting instead for my ceramic, pre-carved pumpkin.

The good news?

Bunny bought candy we like.

The bad news?

Bunny bought candy we like.

It won't go to waste, but it will go to our waist.

Monday, November 03, 2025

Limitations of a two-pass assembler

I've come to realize that supporting foward references in a two-pass assembler isn't always easy. The simple case of forward references I support:

	lda	#alpha
alpha	equ	5

On pass 1, alpha isn't defined, but by pass two, we have its value—5.

With this code, however:

	lda	#alpha
alpha	equ	bravo
bravo	equ	5

alpha is undefined on line 1, and it remains undefined even on line 2 because we haven't defined bravo yet. Thus when we end pass 1, alpha is still undefined. That it took me two years to even stumble across this issue is a bit surprising to me. I just haven't written 6809 assembly code like this.

Can I fix this? If I add another pass, probably. If I don't want to add another pass … I don't know. I would have to track expressions that aren't fully defined in pass 1, which could be a lot of work for an issue that might not come up all that often (if my own code is to go by). I mean, things can get quite pathological:

		lda	#Alpha
Alpha 		equ	Bravo+1
Bravo           equ	Charlie+1
Charlie		equ	Delta+1
Delta		equ	Echo+1
Echo		equ	Foxtrot+1
Foxtrot		equ	Golf+1
Golf		equ	Hotel+1
Hotel		equ	India+1
India		equ	Juliet+1
Juliet		equ	Kilo+1
Kilo		equ	Lima+1
Lima		equ	Mike+1
Mike		equ	November+1
November	equ	Oscar+1
Oscar		equ	Papa+1
Papa		equ	Quebec+1
Quebec		equ	Romeo+1
Romeo		equ	Sierra+1
Sierra		equ	Tango+1
Tango		equ	Uniform+1
Uniform		equ	Victor+1
Victor		equ	Whiskey+1
Whiskey		equ	Xray+1
Xray		equ	Yankee+1
Yankee		equ	Zulu+1
Zulu 		equ	1

lsawm (part of LWTools) does properly handle this pathological case but it does six passes, not two. The other 6809 assembler I have, an older one written back in the 90s, doesn't and issues deceptive error messages, so it's not like I'm the only one to not handle this properly.

As of now, I just issue an error and let the programmer deal with it.

XXV.

ANNOT.

4. The Surge of Natural Disasters

XLIX.

ANNOT.

Update on Monday, January 6th, 2025

Update on Tuesday, February 4th, 2025

Update on Tuesday, March 18th, 2025

Discussions about this entry

Update on Monday, October 20th, 2025

Obligatory Sidebar Links

Discussions about this entry

Discussions about this entry

Update on Tuesday, August 5th, 2025

Discussions about this entry

Discussions about this entry

Discussions about this entry

Update on Friday, June 6th, 2025 at 3:06 AM

Discussions about this entry

Discussions about this entry

Discussions about this entry

Discussions about this entry

Discussions about this entry

Developer gets shadowbanned by Hacker News and asks for a real IP ban instead

Update on Thursday, October 16th, 2025

Discussions about this entry

Update on Monday, January 6^th, 2025

Update on Tuesday, February 4^th, 2025

Update on Tuesday, March 18^th, 2025

Update on Monday, October 20^th, 2025

Update on Tuesday, August 5^th, 2025

Update on Friday, June 6^th, 2025 at 3:06 AM

Update on Thursday, October 16^th, 2025