The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Wednesday, January 01, 2025

Guess who made predictions for 2025? Can you say “Nostradamus?” I knew you could

Of course Nostradamus has predictions for 2025! When hasn't he had predictions for any given year?

Sigh.

So far, checking a few of the articles, not many have bothered to print the quatrains in question, and the one article (of which I hesitate to link to) I found that displays a translation of the quatrain, never bothered to list which quatrain it is.

And because the quatrains listed are translated, it's hard to locate the original in Nostradamus' writings.

For instance, this quatrain:

When the coin of leather rules,
The markets shall tremble,
The crescent and brass unite,
Gold and silver lose their value.

Doesn't seem to exist at all. Checking the version of Nostradamus at Project Gutenberg:

XXV.

French.

Par guerre longue tout l’exercite espuiser,
Que pour Soldats ne trouveront pecune,
Lieu d’Or, d’Argent cair on viendra cuser,
Gaulois Ærain, signe croissant de Lune.

English.

By a long War, all the Army drained dry,
So that to raise Souldiers they shall find no Money,
Instead of Gold and Silver, they shall stamp Leather,
The French Copper, the mark of the stamp the new Moon.

ANNOT.

This maketh me remember the miserable condition of many Kingdoms, before the west-Indies were discovered; for in Spain Lead was stamped for Money, and so in France in the time of King Dagobert, and it seemeth by this Stanza, that the like is to come again, by reason of a long and tedious War.

The true prophecies or prognostications of Michael Nostradamus, physician to Henry II. Francis II. and Charles IX. Kings of France, and one of the best astronomers that ever were.

This is the only quatrain where “leather” appears. And there's nothing in that quatrain about gold and silver losing their value. Moving on, another quatrain from the article I was able to locate:

4. The Surge of Natural Disasters

Nostradamus warned of a year marked by hurricanes, tsunamis, and earthquakes, driven by geological instability, solar activity, and climate change. His depiction of “hollow mountains” and poisoned waters paints a grim picture of devastation, particularly in vulnerable regions like the Amazon rainforest.

“Garden of the world near the new city,
In the path of the hollow mountains:
It will be seized and plunged into the Tub,
Forced to drink waters poisoned by sulfur.”

The confluence of these natural calamities could accelerate global efforts to combat climate change and reimagine disaster resilience. Yet, the cost in lives, resources, and environmental destruction underscores the urgent need for collective action before catastrophe becomes routine.

And let's see what the commentary from the 1600s said about this quatrain:

XLIX.

French.

Jardin du Monde aupres de Cité neufve,
Dans le chemin des Montagnes cavées,
Sera saisi & plongé dans la Cuve,
Beuvant par force eaux Soulphre envenimées.

English.

Garden of the World, near the new City,
In the way of the digged Mountains,
Shall be seized on, and thrown into the Tub,
Being forced to drink Sulphurous poisoned waters.

ANNOT.

This word Garden of the World, doth signifie a particular person, seeing that this Garden of the World was seized on and poisoned in a Tub of Sulphurous water, in which he was thrown.

The History may be this, that Nostradamus passing for a Prophet and a great Astrologer in his time, abundance of people came to him to know their Fortunes, and chiefly the Fathers to know that of their Children, as did Mr. Lafnier, and Mr. Cotton, Father of that renowned Jesuit of the same name, very like then that Mr. du Jardin having a son did ask Nostradamus what should become of him, and because his son was named Cosmus, which in Greek signifieth the World, he answered him with these four Verses.

Garden of the World, for Cosmus of the Garden, In his travels shall be taken hard by the New City, in a way that hath been digged between the Mountains, and there shall be thrown in to a Tub of poisoned Sulphurous water to cause him to die, being forced to drink that water which those rogues had prepared for him.

Those that have learned the truth of this History, may observe it here. This ought to have come to pass in the last Age, seeing that the party mentioned was then born when this Stanza was written, and this unhappy man being dead of a violent death, there is great likelyhood, that he was not above forty years old.

There is another difficulty, to know which is that new City, there being many of that name in Europe, nevertheless the more probable is, that there being many Knights of Maltha born in Provence (the native Countrey of our Author) it may be believed that by the new City he meaneth the new City of Maltha called la Valete, hard by which there is paths and ways digged in the Mountains, which Mountains are as if it were a Fence and a Barricado against the Sea, or else this Cosmus might have been taken by Pyrats of Algiers, and there in the new City of the Goulette be put to death in the manner aforesaid.

Nothing about it being 2025 when this comes to pass. Nothing about hurranes, tsunamis or earthquakes. It's almost as if Nostradamus was being intentionally vague about his prophesies. It could very well be about Naples, Italy, seeing how it's on the coast nestled in between volcanoes.

Or maybe Los Angeles. Yes, it's Los Angeles, land of Shake and Bake.

Of the other five “Nostradamus prophesies” mention in the aricle, none were written by the man. It's almost as if one could just make up Nostradamus prophesies. Why not?

HAPPY NEW YEAR!

Friday, January 03, 2025

It's more like computer security theater than actual security

In w3m, to edit a form textarea,

    ...
    f = fopen(tmpf, "w");
    if (f == NULL) {
        /* FIXME: gettextize? */
        disp_err_message("Can't open temporary file", FALSE);
        return;
    }
    if (fi->value)
        form_fputs_decode(fi->value, f);
    fclose(f);

    if (exec_cmd(myEditor(Editor, tmpf, 1)->ptr))
            goto input_end;
    ...

exec_cmd is some setup and teardown around a system(3) call with the user's editor and the temporary file. This is not good for security, as it allows w3m to execute by default anything. One tentative improvement would be to only allow w3m to execute a wrapper script, something like

    #!/bin/sh
    exec /usr/bin/vi -S "$@"

or some other restricted editor that cannot run arbitrary commands nor read from ~/.ssh and send those files off via internet connections. This is better, but why not disallow w3m from running anything at all?

    if (pledge(
          "cpath dns fattr flock inet proc rpath stdio tty unveil wpath",
          NULL) == -1)
       err(1, "pledge");

Here we need the “proc” (fork) allow so downloads still work, but “exec” is not allowed. This makes it a bit harder for attackers to run arbitrary programs. An attacker can still read various files, but there are also unveil restrictions that very much reduce the access of w3m to the filesystem. An attacker could make DNS and internet connections, though fixing that would require a different browser design that better isolates the “get stuff from the internet” parts from the “try to parse the hairball that is HTML” code, probably via imsg_init(3) on OpenBSD, or differently complicated to download to a directory with one process and to parse it with another. That way, a HTML security issue would have a more difficult time in getting out to the interwebs.

Security Hoop

What I find annoying is the lack of any type of attack as an example. It's always “data from da Intarwebs bad!” without regard to how it's bad. The author just assumes that hackers out there have some magical way of executing code on their computer just by the very act of downloading a file. The assumption that some special sequence of HTML can open a network connection to some control server in Moscow or Beijing or Washington, DC and siphon off critical data is just … I don't know, insane to me. Javascript, yes, I can see that happening. But HTML?

And then I recall the time that Microsoft added code to their programs to scan JPEG images for code and automatically execute it, and okay, I can see why maybe the cargo cult security mumbo-jumbo exists.

What I would like to see how opening a text editor with the contents of an HTML <TEXTAREA> could be attacked. What are the actual attack surfaces? And no, I won't accept “just … bad things, man!” as an answer. What, exactly?

One possible route would be ECMA-35 escape sequences, specifically the DCS and OSC sequences (which could be used to control devices or the operating system respectively), although I don't know of any terminal emulator today that supports them. Microsoft did add an escape sequence to reprogram the keyboard (ESC “[” key-code “;” string “p”) but that's in the “private use” area set aside for vendors.

This particular attack vector might work if the editor is running under a terminal or terminal emulator that support it, and the editor in question doesn't remove or escape the raw escape sequence codes. I tried a few text editors on the following text (presented as a hexadecimal dump to show the raw escape sequence):

00000000: 54 68 69 73 20 69 73 20 1B 5B 34 31 6D 72 65 64 This is .[41mred
00000010: 1B 5B 30 6D 20 74 65 78 74 2E 0A 0A             .[0m text...

None of the editors I tried (which are all based on the command line and thus, use escape sequences themselves to display text on a terminal) displayed red text. The escape sequence wasn't run as an escape sequence.

Another attack might embedding editor-specific commands within the text. This is a common aspect of some editors, like vi. And I can see this being concerning, especially if the commands one can set in a text file include accessing arbitrary files or running commands.

A third attack could be an attempt to buffer overflow the editor, either by sneaking in a huge download (like say, a file with a single one gigabyte line) or erroneous input (for example, if the editor expects a line to end with a CR and LF, send an LF then CR). Huge input is a bit harder to hide, but suble erroneous input could cause issues.

This is why I feel such articles are bad—by not talking about actual threats they enforce a form of “learned helplessness.” Everything is dangerous and we must submit to onerous measures to keep ourselves safe. Sprinkling calls to pledge() aren't the answer. Yes, it helps, but not thinking critically about security leads to a worse experience overall, such as having to manually edit a file which would still be subject to all three of the above attacks anyway. By identifying the attacks, then a much better way to mitigate the attacks could be found (in this case, an editor that strips out escape sequences and does not support embedded commands; and yes, I know I have a minority opinion here—sigh).

And to address the bit about parsing HTML—is parsing really that fraught with danger? All you need to parse HTML is to follow the explicit (and in excruciating detail) HTML5 specification. How hard can that be?

Saturday, January 04, 2025

It's still cargo cult computer security

My first question to you, as someone who is, shall we say, “sensitive” to security issues, why are you exposing a network based program to the Internet without an update in the past 14 years?

Granted, measures such as ASLR and W^X can make life more difficult for an attacker, and you might notice w3m crashing as the attackers try to get the stars to line up for their ROP gadget to work as you (or some automation) try to download a malicious page over and over. Or, you could get unlucky and they are now running whatever code they want, or reading all your files.

Attacks

I have my own issues with ASLR (I think it's the wrong thing to do—much better would have been to separate the stack into two, a return stack and a parameter (or data) stack, but I suspect we won't ever see such an approach because of the entrenchment of the C ABI) so I won't get into this.

What I would like to see how opening a text editor with the contents of an HTML <TEXTAREA> could be attacked. What are the actual attack surfaces? And no, I won't accept “just … bad things, man!” as an answer. What, exactly?

Where is your formal verification for the lack of errors?

I did not assert the code was free of error. I was asking for examples of actual attacks.

Otherwise, there is some amount of code executed to make that textarea work, all of which is the “actual attack surface”. If you look at the CVE for w3m (nevermind the code w3m uses from SSL, curses, iconv, intl, libc, etc.) one may find:

Was that so hard?

The first bug you mention, the “format string vulnerability” seems to be related to this one-line fix (and yes, I did download the source code for this):

@@ -1,4 +1,4 @@
-/* $Id: file.c,v 1.249 2006/12/10 11:06:12 inu Exp $ */
+/* $Id: file.c,v 1.250 2006/12/27 02:15:24 ukai Exp $ */
 #include "fm.h"
 #include <sys/types.h>
 #include "myctype.h"
@@ -8021,7 +8021,7 @@ inputAnswer(char *prompt)
 	ans = inputChar(prompt);
     }
     else {
-	printf(prompt);
+	printf("%s", prompt);
 	fflush(stdout);
 	ans = Strfgets(stdin)->ptr;
     }

It would be easy to dimiss this as a rookie mistake, but I admit, it can be hard to use C safely, which is why I keep asking for examples and in some cases, even a proof-of-concept so others can understand how it works, and how to mitigate them.

But just keep crying pledge() and see how things improve.

The second bug you mentioned seems to be CVE-2002-1335, which is 23 years old by now and none of the links on that page show any details about this bug. I also fail to see how this could lead to an “arbitrary file access” back to the attacker unless there's some additional JavaScript required. The constant banging on the pledge() drum does nothing to show how such an attack works so as to educate programmers on what to look for and how to think about mitigations. When I asked “What are the actual attack surfaces?” I actually meant that. How does this lead to an “arbitrary file access?” It always appears to be “just assume the nukes have been launched” type of rhetoric. It doesn't help educate us “dumb” programmers. Please, tell me, how is this exploitable? Or is that forbidden knowledge not to be given out for fear it will be used by those less intentioned?

This is the crux of my frustration here—all I see is “programs bad, mmmmmmkay?” and magic pixie dust to solve the issues.

I've had to explain to programmers in a well regarded CSE department recently why their code was … sub-optimal. Less polite words could be used. They were running remote, user-supplied strings through a system(3) call, and it took a few emails to convince them that this was kind of bad.

And I can bitch about having to teach opererations how to configure syslog and “no, we can't have a single configuration file for two different, geographical sites and besides, we maintain the configuration files, not you!” so this cuts both ways.

Moreover, it's fairly simple to pledge and unveil a process to remove classes of system calls (such as executing other programs) or remove access to swathes of the filesystem (so an attacker will have a harder time to run off with your SSH keys).

And how, exactly, is adding pledge and unveil onerous? …

Easy huh?

The man page doesn't say anything about limiting calls to open(). It appears that is handled by unveil() which doesn't seem all that easy to me:

… Directories are remembered at the time of a call to unveil(). This means that a directory that is removed and recreated after a call to unveil() will appear to not exist.

unveil() use can be tricky because programs misbehave badly when their files unexpectedly disappear. In many cases it is easier to unveil the directories in which an application makes use of files.

unveil(2) - OpenBSD manual pages

To me, I read “in some cases, code may be difficult to debug.”

And while it may be easy for you to add a call to unveil() or pledge(), I assure you that it's not at all easy for the kernel to support such calls. Now, in addition to all the normal Unix checks that need to happen (and in the past, gone wrong on occasion) that a whole slew of new checks need to be added which complicate the kernel. Just as an example, pass “dns” promise to pledge() and the calls to socket(), connect(), sendto() and recvfrom() are disabled until the file /etc/resolv.conf is opened. Then they're enabled, but probably only to allow UDP port 53 through. Unless the “inet” promise is given, then socket(), connect(), etc. are allowed. That's … a lot of logic to puzzle through. And as someone who doesn't trust programmers (as you stated), this isn't a problem for you?

As a programmer, it can also make it hard to reason about some scenarios—like, if I use “stdio” promise, but not the “inet” promise, can I open files served up by NFS? I mean, probably, but “probably” isn't “yes” and there are a lot of programming sins commited because “it worked for me.”

I did say that using pledge() helps, but it doesn't solve all attacks. For instance, there's not special promise I can give to pledge() that states “I will not send escape codes to the terminal” even though that's an attack vector, espcially if the terminal in question supports remapping the keyboard! Any special recomendations for that attack? Do I really need to embed \e[13;"rm -rf ~/*"p to drive the point home?

Also (because I do not use OpenBSD) do I still have access to every system call after this?

pledge(
    " stdio rpath wpath cpath  dpath     tmppath inet   mcast"
    " fattr chown flock unix   dns       getpw   sendfd recvfd"
    " tape  tty   proc  exec   prot_exec settime ps     vminfo"
    " id    pf    route wroute audio     video   bpf    unveil"
    "  error");

If not, why not? That's a potential area to look for bugs.

How, exactly, is adding pledge and unveil to w3m “helplessness”, and then iterating on that design as one gains more experience?

As you said yourself: “I do not trust programmers (nor myself) to not write errors, so look to pledge and unveil by default, especially for ‘runs anything, accesses remote content’ browser code.” What am I to make of this, except for “Oh, all I have to do is add pledge() and unveil() to my program, and then it'll be safe to execute!”

In my opinion, banging on the pledge() drum doesn't help educate programmers on potential problems. It doesn't help programmers to write code to be anal when dealing with input. It doesn't help programmers to think about potential exploits. It just punts the problem with magic pixie dust that will solve all the problems.

… It took much less time to add to w3m than writing this post did; most of the time for w3m was spent figuring out how to disable color support, kill off images, and to get the CFLAGS aright. It is almost zero maintenance once done and documented.

What, exactly, is your threat model? Because that's … I don't know what to say. You remove features just because they might be insecure. I guess that's one way to approach security. Another approach might be to cut the network cable.

I only ask as I was hacked once. Bad. Lost two servers (file system wiped clean), almost lost a third. And you know what? Not only did it not change my stance around computer security, there wasn't a XXXXX­XXXXX thing I could do about it either! It was an inside job. Is that part of your threat model?

By the way, /usr/bin/vi -S is used to edit the temporary file. This does a pledge so that vi cannot run random programs.

But what's stopping an attacker from adding commands to your ~/.bashrc file to do all the nasty things it wants do to the next time you start a shell? That's the thing—pledge() by itself won't stop all attacks, but by dismissing the question of “what attack surfaces” can lead one to believe that all that's needed is pledge(). It leads (in my opinion) to a false sense of security.

It is rather easy to find CVE for errors in HTML parsing code, besides the “did not properly escape HTML tags in the ALT attribute” thing w3m was doing that lead to arbitrary file access.

CVE-2021-23346, CVE-2024-52595, CVE-2022-0801, CVE-2021-40444, CVE-2024-45338, CVE-2022-24839, CVE-2022-36033, CVE-2023-33733, …

You might want to be more careful in the future, as one of those CVE's you listed has nothing do to with parsing HTML. I'll leave it as an exercise for you to find which one it is.

I also get the feeling that we don't see eye-to-eye on this issue, which is normal for me. I have some opinions that are not mainstream, are quite nuanced, and thus, aren't easy to get across (ask me about defensive programming sometime).

My point with all this—talk about computer security is all cargo cultish and is not helping with actual computer security. And what is being done is making other things way more difficult than it should be.

Sunday, January 05, 2025

Security Theater

Also, Linux is getting a landlock thing, which sounds maybe a bit like unveil. Are they likewise deluded, or maybe there's something useful about this class of security thingymabobber, especially with “defense in depth” in mind?

Tradeoffs

An aspect I think you are discounting is the effort required to implement the mitigations. While plege() and unveil() are simple to use, their implementation is anything but. Just from reading the man pages, it appears there are exceptions, and then exceptions to the exceptions, that must be supported. What makes Linux or OpenBSD different than other pieces of software, like openssl?

Sure, such things help overall but as you state, there are tradeoffs—and a big one I see is adding complexity to an already complex system. And in my experience, security makes it harder to diagnose issues (one exaple from work—a piece of network equipment was “helpfully” filtering network traffic for exploits, making it difficult to test our software properly, you know, in the absense of such technology).

A different take is that pledge and unveil, along with the various other security mitigations, hackathons, and so forth, are a good part of a healthy diet. Sure, you can still catch a cold, but it may be less bad, or have fewer complications.

I also think you are discounting the risk compensation that this may cause With all these mitigations, what incentives are there for a programmer to be careful in writing code? One area I think we differ in is just how much of a crutch such technology becomes.

If you don't want that defense in depth, eh, you do you.

It's less that I don't want defense in depth (and it's sad to live in a world where that needs to be the default stance) but that you can do everything “by the book” and still get blindsided. I recall the time in the early 90s when I found myself logged into the university computer I used and saw myself also logged in from Russia, all because of a Unix workstation in a different department down the hall had no root password and running a program sniffing the network (for more perspective—at the time the building was wired with 10-Base-2, also known as “cheap-net,” in which all traffic is transmitted to all stations, and the main campus IT department was more concerned with its precious VAX machine than supporting departments running Unix).

My first encounter with the clown show that is “computer security” came in the late 90s. At the time, I was working at a small web-hosting company when a 500+ page report was dumped on my desk (or rather, a large PDF file in my email) with the results of a “PCI compliance scan” on our network. It was page after page of “Oh My God! This computer has an IP address! This computer responds to ping requests! Oh My God! This computer has a web site on it! And DNS entries! Oh My XXXXX­XX God! You handle email!”

For. Every. Single. Web. Site. And. Computer. On. Our. Network.

It was such an obviously low effort report with so much garbage, it was difficult to pull out the actual issues with our network. You know what would have been nice? Recognition what we were a web hosting company in addition to handling email and DNS for our customers. Maybe a report broken down by computer, maybe in a table format like:

Hypothetical report of a network scan
IP addressprotocol/portport namenotes
192.0.2.10ICMP echo ping see Appendix A
TCP port 22 SSH UNEXPECTED—see Appendix D
TCP port 25 SMTP Maybe consolidate email to a single server—see Appendix B
TCP port 53 DNS DNS queries resolve—see Appendix C
UDP port 53 DNS DNS queries resolve—see Appendix C
TCP port 80 HTTP
TCP port 443HTTPS
192.0.2.11ICMP echo ping see Appendix A
TCP port 22 SSH UNEXPECTED—see Appendix D
TCP port 25 SMTP Maybe consolidate email to a single server—see Appendix B
TCP port 53 DNS DNS queries resolve—see Appendix C
UDP port 53 DNS DNS queries resolve—see Appendix C
UDP port 69 TFTP UNEXPECTED—see Appendix D
TCP port 80 HTTP
TCP port 443HTTPS

Where Appendix A could explain why supporting ping is questionable, but allowable, Appendix B could explain the benefits of consolidating email on a machine that doesn't serve email, and Appendix C could explain the potential data leaks of a DNS server that resolves non-authoritative domains, which in our case, was the real issue with our scan but was buried in just a ton of nonsense results with the assumption that we have no clue what we're doing (at least, that's how I read the 500+ page report).

The hypothetical report above shows SSH being open on the boxes—fair enough. A common security measure to to have a “SSH jump server” that is specifically hardened to only expose SSH one one host, and the rest only accept SSH connections on a (preferrably) separate “management” interface with private IP addresses. And oh, we're running TFTP on a box—again we should probably have a separate system on a “management” interface running TFTP to backup our router configs.

But such a measured, actionable report takes real work to generate. Much much easier to just dump a raw network scan with scary jargon.

And since then, most talk of “computer security” has, in my experience, been mostly of the breathless “Oh My God You're Pwned!” scare tactic variety.

My latest encounter with “computer security” came a few years ago at The Ft. Lauderdale Office of the Corporation, when our new Overlords wanted to change how we did things. The CSO visited and informed us that they were going to change how we did security, and in the process make our jobs much more difficult. It turns out it wasn't because our network or computers were insecure—no! Our network had a higher score (according to some networking scoring company—think of the various credit scoring companies but for corporate networks) than our new parent company (almost a perfect score). No, it came down to “that's not how we do things. We're doing it, our way!” And “their way” was just checking off a list of boxes on some list as cheaply as possible.

I think another way we differ is in how much we think “computer security” has become a cargo cult.

Update on Monday, January 6th, 2025

This thread on Lobsters is a perfect example of the type of discussion I would like to see around security. Especially on-point is this comment: “… the [question] I was actually asking: ‘Why is it dangerous, so I can have a better mental model of danger in the future?’”

Tuesday, January 07, 2025

I am Socrates

I tried reading this with an open mind, but then I came across this:

This is a very easy fix. If I paste the error back into the LLM it will correct it. Though in this case, as I’m reading the code, it’s quite clear to me that I can just delete the line myself, so I do.

Via Lobsters, How I program with LLMs

My initial reaction to this was Woah there buddy! Are you sure you want to use your brain? Yes, caustic sarcasm is not a pretty reaction but I am what I am. [A reactionary cynical neo-Luddite? —Editor] [Shut up you! —Sean] Further down the page, the author presents some code the LLM wrote and then says:

Exactly the sort of thing I would write!

And I'm like, Yeah, you have 30 years of programming experience backing that up. What about programmers today who don't have that experience? They just accept what's given to them uncritically? [Yup, A reactionary cynical new-Luddite. —Editor] [Sigh. —Sean] At least the code in question were unit tests and it wasn't he who had to write unit tests for AI written code (which was my fear just prior to leaving The Enterprise).

But reading further, I can't help but think of Socrates:

For this invention will produce forgetfulness in the minds of those who learn to use it, because they will not practice their memory. Their trust in writing, produced by external characters which are no part of themselves, will discourage the use of their own memory within them. You have invented an elixir not of memory, but of reminding; and you offer your pupils the appearance of wisdom, not true wisdom, for they will read many things without instruction and will therefore seem to know many things, when they are for the most part ignorant and hard to get along with, since they are not wise, but only appear wise.

Plato rejects writing by the mouth of Socrates

While that's true to some degree, over the past 2½ millenium since then, it's been, overall and in my opinion, a positive thing. But then again, writing and books have been a part of my world since I was born, so it's the natual part of the way the world works:

Anything that is in the world when you're born is normal and ordinary and is just a natural part of the way the world works. Anything that's invented between when you're fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it. Anything invented after you're thirty-five is against the natural order of things.

Douglas Adams, The Salmon of Doubt

Can you guess I'm older than thirty-five?

So I'm resigned to the fact that this is our new reality—programmers will use AI (against my better judgement but nobody asked me—it really is alien to my way of thinking) and it's for the future to see if it was worth it in the long term.

But in the mean time, I am Socrates (and no, the irony that his thoughts on writing were written down is not lost on me).

Friday, January 17, 2025

These robots enable employment

An incredible video about the development of robots not solely controlled by software but by people that enable them to work jobs they otherwise could not do so. While I guess you could technically call these “robots,” they come across more as “waldos,” devices that enable people to physically work from a remote location. In any case, I think it's a fantastic use of technology.

Saturday, January 18, 2025

I bet this comes with an automatic compacting bit-bucket for disposing of all that network noise

Setting up a media server on a PC or using a computer as a network audio renderer (endpoint) is easy nowadays. But the problem with computers is that they were never designed with audio in mind. While there are improvements for USB-based playback available (such as our JCAT USB Card FEMTO or JCAT USB Isolator), the network controller part of a PC remains noisy. JCAT delivers the solution with the NET Card FEMTO – the ultimate network interface designed specifically for transferring high-quality audio over LAN.

The sound image becomes crystal-clear: transparent, quiet, smooth and yet full of fine details you have never heard before. It will allow you to experience music at much deeper level.

NET CARD FEMTO - JCAT . precision sounds.

There are times when I think, are there people who actually buy this stuff? And yet, I come across this page:

The XACT PHANTOM™ USB cable is the ultimate choice for discerning audiophiles seeking unparalleled precision and natural sound. Handcrafted with meticulous attention to detail, each cable takes over 7 hours to complete, ensuring unmatched quality and performance. Our proprietary design includes precise mechanical and impedance pairing of the conductors, as well as a highly specialized twisting process. This meticulous construction is key to eliminating interference and preserving the purity of the audio signal.

The XACT PHANTOM™ USB cable features custom-designed aluminum connectors, engineered to provide a secure and stable connection. The result is a cable that delivers remarkable clarity, preserving the full natural richness of your music across the entire frequency range.

PHANTOM CABLES – XACT Audio

And now I'm thinking, I'm in the wrong industry! What's wrong with separating rich-yet-stupid audiophiles from their money? It's just too bad that the market for Eberhard Faber Design Art Marker No. 255 has, if you'll pardon the pun, dried up.

Sunday, February 02, 2025

Artisanal code to solve an issue only I have

Update on Tuesday, February 4th, 2025

The code presented below has a bug that has been fixed.. The code linked to below contains the current fixed code. That is all.

Now on with the original post …

I'm still using my ISP, despite the repeated letters that my service will go away at some point. But in the meantime, they keep reissuing a new IP address every so often just to reiterate their dedication to their serving up a dynamic IP address at no addtional cost to me. One of the casualties of their new policy is the monitoring of the system logs on my public server. I used to send syslog output from my public server to my development system at home, just to make it easier to keep an eye on what's happening. No more.

What I needed was a reverse-proxy type of thing—where the client (my development machine) connects to the server, then the server sends a copy of the logs down the connection. A separate program would be easier to write then to modify the exiting syslog daemon I'm using. It was a simple matter of telling the syslog daemon to forward a copy of all the logs to another program on the same system. Then I just had to write that program. To begin with, I need to load some modules:

local syslog  = require "org.conman.syslog"
local signal  = require "org.conman.signal"
local errno   = require "org.conman.errno"
local tls     = require "org.conman.nfl.tls"
local nfl     = require "org.conman.nfl"
local net     = require "org.conman.net"

The nfl module is my “server framework” for network based servers. Each TCP or TLS connection will be run on its own Lua thread, making the code easier to write than the typical “callback hell” that seems to be popular these days. I still need to make some low-level network calls, so I need the net module as well.

On to the configuration:

local SYSLOG  = "127.0.0.1"
local HOST    = "brevard.conman.org"
local CERT    = "brevard.conman.org.cert.pem"
local KEY     = "brevard.conman.org.key.pem"
local ISSUER  = "/C=US/ST=FL/O=Conman Laboratories/OU=Security Division/CN=Conman Laboratories CA/emailAddress=ca@conman.org"
local clients = {}

I didn't bother with a configuration file. This whole code base exists to solve an issue I have as simply as possible. At this point, a configuration file is overkill. The SYSLOG variable defines the address this server will use to accept output from syslog. Due to the way my current syslog daemon works, the port number it uses to forward logs is hard coded, so no need to specify the port. I'm going to run this over TLS because, why not? The tls module makes it easy to use, and it will make authentication trivial for this program. The CERT and KEY are the certificates needed, and these are generated by some scripts I wrote to play around with running my own simple certificate authority. My server is set to accept certificates signed by my simple certificate authority, which you can see in the definition of the ISSUER variable.

The clients variable is to track the the clients that connect to collect syslog output. Even though I'll only ever have one client, it's easy enough to make this an array.

local laddr = net.address(SYSLOG,'udp',514)
local lsock = net.socket(laddr.family,'udp')
lsock:bind(laddr)

nfl.SOCKETS:insert(lsock,'r',function()
  local _,data,err = lsock:recv()
  if data then
    for co in pairs(clients) do
      nfl.schedule(co,data)
    end
  else
    syslog('error',"recv()=%s",errno[err])
  end
end)

And now we create the local socket to receive output from syslog, and then add the socket to a table of sockets the framework uses, telling it to handle “read-ready” events. The data is read and then for each thread (Lua calls them “coroutines”) in the clients list, we schedule said thread to run with the data received from syslog.

local okay,err = tls.listen(HOST,514,client_main,function(conf)
  conf:verify_client()
  return conf:keypair_file(CERT,KEY)
     and conf:protocols("tlsv1.3")
end)

if not okay then
  syslog('error',"tls.listen()=%s",err)
  os.exit(1,true)
end

signal.catch('int')
signal.catch('term')

nfl.server_eventloop(function() return signal.caught() end)
os.exit(0,true)

And before we get to the routine that handles the clients, this is the code that creates a listening socket for TLS connections. We configure the listening socket to require the client send a certificate of its own (this is half of the authentication routine) and the certificates required to secure the connection, and the minimum protocol level. There's some error checking, setting up to catch some signals, then we start the main loop of the framework, which will terminate upon receiving a SIGINT (interrupt) or SIGTERM (terminate).

And finally, the code that runs on each TLS connection:

local function client_main(ios)
  ios:_handshake()
  
  if ios.__ctx:peer_cert_issuer() ~= ISSUER then
    ios:close()
    return
  end
  
  syslog('info',"remote=%s",ios.__remote.addr)
  clients[ios.__co] = true
  
  while true do
    local data = coroutine.yield()
    if not data then break end
    local okay,errmsg = ios:write(data,'\n')
    if not okay then
      syslog('error',"tls:read() = %s",errmsg)
      break
    end
  end
  
  syslog('info',"remote=%s disconnecting",ios.__remote.addr)
  clients[ios.__co] = nil
  ios:close()
end

The handshake is required to ensure that the client certificate is fully sent before we can check the issuer of said certificate. This is the extent of my authentication—I check that the certificate is issued from my simple certificate authority and not just any random but valid certificate being presented. Yes, there is a chance someone could forge a certificate claiming to be from my simple certificate authority, but to get such a certificate, some real certificate authority would need to issue someone else a certificate that maches the issuer on my certificates. I'm not seeing that happening any time soon (and if that happens, there are bigger things I need to worry about).

Once I've authenticated the certificate, I then pause the thread, waiting for data from the UDP socket (see above). If there's no data, then the client has dropped the connection and we exit out of the loop. We then write the data from syslog to the client and if that fails, we exit out of the loop.

Once out of the loop, we close the connection and that's pretty much all there is to it.

Yes, I realize that the calls to syslog() will be sent to the syslog daemon, only to be passed back to this program, but at least there's a log of this on the server.

I should also note that I do not attempt to track which logs have been sent and which haven't—that's a deliberate design decision on my part and I can live with missing logs on my development server. The logs are still recorded on the server itself so if it's important, I still have them, and this keeps this code simple.

The client code on my development server is even simpler:

local clock  = require "org.conman.clock"
local signal = require "org.conman.signal"
local tls    = require "org.conman.net.tls"
local net    = require "org.conman.net"

local SYSLOG = "192.168.1.10"
local HOST   = "brevard.conman.org"
local CERT   = "/home/spc/projects/CA/ca/intermediate/certs/sean.conner.cert.pem"
local KEY    = "/home/spc/projects/CA/ca/intermediate/private/sean.conner.key.pem"

Again, load the required modules, and configure the program. Much like the server, having a configuration file for this is way overkill, thus the above variables.

signal.catch('int')
signal.catch('term')

local addr = net.address(SYSLOG,'udp',514)
local sock = net.socket(addr.family,'udp')

connect(sock,addr)

The code sets up some signal handlers, creates a socket to send the data to syslog and calls the main function.

local function connect(sock,addr)
  local ios,err = tls.connect(HOST,514,function(conf)
  return conf:keypair_file(CERT,KEY)
     and conf:protocols("tlsv1.3")
  end)
  
  if not ios then
    io.stderr:write("Failure: ",err," retrying in a bit ...\n")
    clock.sleep(1)
  else
    io.stderr:write("\n\n\nConnected\n")
    main(ios,sock,addr)
  end
  
  if not signal.caught() then
    return connect(sock,addr)
  end
end

The connect() function tries to connect to the server with the given certificates. If it fails (and I expect this to happen when I get reassigned an IP address) it waits for a bit and retries again. If the connection succeeds though:

local function main(ios,sock,addr)
  for data in ios:lines() do
    if signal.caught() then
      ios:close()
      os.exit(0,true)
    end
    sock:send(addr,data)
  end  
  ios:close()
end

The code just loops, reading lines from the server and then sending them directly to the syslog daemon. Any errors (like the IP address got reassigned so the connection drops) the loop ends, we close the connection and return, falling into the retry loop in the connect() function.

In case anyone is interested, here's the source code for the server and the client.


And now some metacommentary on the artisanal code I just wrote

When I wrote the two programs to retrieve output from syslog from my public server, the thing I did not do use was any AI program (aka Cat) to help with the design nor the code. It was a simple problem with a straightforward solution and it's sad to think that more and more programmers are reaching for Cat for even simple programs.

I wonder though—is the popularity of Cat because of business demands that incentivize quick hacks to get new features and/or bug fixes and deincentivize deep knowledge or methodical implementations? Because of the constant churn of frameworks du jour and languages with constantly changing implementations? Because of sprawling code bases that not a single person can understand as a whole? Because businesses want to remove expensive programmers who might say “no”?

Anyway …

I don't expect the code I wrote to be of use for anyone else. The issue I'm solving is probably unique to me (and to the death of the true peer-to-peer Internet but I digress). But I also feel that such simple programs, ones that can be thought of as “disposable” almost, are not popular these days.

Although I'll admit that could be just a bias I'm forming from some forums I hang out on. These programs are too simple, there's no need for Docker (which is what? A tar file with some required files for use with a custom program to get around the fact that shared libraries are a mess?) or Kubernetes (which is what? A Google project Google doesn't even use but convinced enough people it's required to run at Google Scale?). Yeah, there are a few dependencies but not the hundreds you would get from using something like NodeJS.

I don't know … I'm just … in a mood.

Tuesday, February 04, 2025

Concurrency is tricky

As I was writing the previous entry I got the nagging feeling that something wasn't quite right with the code. I got distracted yesterday helping a friend bounce programming issues off me, but after that, I was able to take a good look at the code and figured out what I did wrong.

Well, not “wrong” per se, the code as it worked—it's just that it could fail catastrophically in the right conditions (or maybe wrong conditions, depending upon your view).

But first, a bit about how my network sever framework works. The core bit of code is this:

local function eventloop(done_f)
  if done_f() then return end

  -- calculate and handle timeouts
  -- each coroutine that timed out is
  -- scheduled to run on the RUNQUEUE,
  -- with nil and ETIMEDOUT.

  SOCKETS:wait(timeout)
  for event in SOCKETS:events() do
    event.obj(event)
  end

  while #RUNQUEUE > 0 do
    -- run each coroutine in the run queue until
    -- it eithers yields or returns (meaning
    -- it's finished running).
  end

  return eventloop(done_f)
end

Details are emitted full gory details here but in general, the event loop calls a passed in function to check if we need to shut down, then calculates a timeout value while checking for coroutines that registered a timeout. If any did, we add the coroutine to a run queue with nil and ETIMEDOUT to inform the resuming coroutine that it timed out. Then we scan a set of network sockets for activity with SOCKETS:wait() (on Linux, this ends up calling epoll_wait(); BSDs with kqueue() and most other Unix systems with poll()). We then call the handling function for each event. These can end up creating new coroutines and scheduling coroutines to run (these will be added to the run queue). And then for each coroutine in the run queue, we run it. Lather, rinse, repeat. Simple enough.

Now, on to the code I presented. This code registers a function to run when the given UDP socket recieves a packet of data, and schedules a number of coroutines waiting for data to run. This happens in the eventloop() function.

nfl.SOCKETS:insert(lsock,'r',function()
  local _,data,err = lsock:recv()
  if data then
    for co in pairs(clients) do
      nfl.schedule(co,data) -- problem
    end
  else
    syslog('error',"recv()=%s",errno[err])
  end
end)

I've noted a problematic line of code here.

And now the core of the routine to handle a TLS connection. This code yields to receive data, then writes the data to the TLS socket.

  while true do
    local data = coroutine.yield()
    if not data then break end
    local okay,errmsg = ios:write(data,'\n') -- <<< HERE
    if not okay then
      syslog('error',"tls:read() = %s",errmsg)
      break
    end
  end

I've marked where the root cause lies, and it's pretty subtle I think. The core issue is that ios:write() here could block, because the kernel output buffer is full and we need to wait for the kernel to send it. But the code that handles the UDP socket just assumes that the TLS coroutine is ready for more data. If ios:write() blocks and more UDP data comes on, the coroutine is prematurely resumed with the data, but that's just taken by the TLS thread as the write being successful, then yielding and then things get … weird, as the UDP side and the TLS side are now out of sync with each other. This, fortunately, hasn't trigger on me.

Yet.

It could, if too much was being logged to syslog. I wrote the following code to test it out:

#include <syslog.h>

#define MSG " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXUZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"

int main(void)
{
  int i;
  for (i = 0 ; i < 500 ; i++)
    syslog(LOG_DEBUG,"%3d " MSG MSG MSG,i);
  return 0;
}

And sure enough, the spice data stopped flowing.

What I needed to do was queue up the log messages to a given client, and only schedule it to run when it's waiting for more data. A few failed attempts followed—they were all based on scheduling the TLS thread when X number of messages were queued up (I tried one, then zero; neither worked). It worked much better by using a flag to indicate when the TLS coroutine wanted to be scheduled or not.

The UDP socket code is now:

nfl.SOCKETS:insert(lsock,'r',function()
  local _,data,err = lsock:recv()
  if data then
    for co,queue in pairs(clients) do
      table.insert(queue,data)
      if queue.ready then
        nfl.schedule(co,true)
      end
    end
  else
    syslog('error',"recv()=%s",errno[err])
  end
end)

The client list now contains a list of logs to send, along with a flag that the TLS coroutine sets indicating if it needs running or not. This takes advantage of Lua's tables which can have a hash part (named indices) and an array part, so we can include a flag in the queue.

And now the updated TLS coroutine:

local function client_main(ios)
  local function main()
    while #clients[ios.__co] > 0 do
      local data     = table.remove(clients[ios.__co],1)
      local okay,err = ios:write(data,'\n')
      if not okay then
        syslog('error',"tls:write()=%s",err)
        return
      end
    end
    
    clients[ios.__co].ready = true
    if not coroutine.yield() then
      return
    end
    clients[ios.__co].ready = false
    return main()
  end
  
  ios:_handshake()
  
  if ios.__ctx:peer_cert_issuer() ~= ISSUER then
    ios:close()
    return
  end
  
  syslog('info',"remote=%s",ios.__remote.addr)
  clients[ios.__co] = { ready = false }
  main()
  clients[ios.__co] = nil
  syslog('info',"remote=%s disconnecting",ios.__remote.addr)
  ios:close()
end

The core of the routine, the nested function main() does the real work here. When main() starts, the flag for queue readiness is false. It then runs through its input queue sending data to the client. Once that is done, it sets the queue readiness flag to true and then yields. Once it resumes, it sets the queue readiness flag to 'false' and (through a tail call) starts over again.

This ensures that logs are queued properly for delivery, and running the C test program again showed it works.

Tuesday, February 11, 2025

Two videos on how we figured out our solar system just based on obversations alone, long before we left the surly bonds of Earth

The video “Terence Tao on how we measure the cosmos” was very interesting to watch, as Terence goes into depth on how people in the past, and by past, I mean the distant past, figured out the earth was a sphere, how big that sphere was, and even reasoned that the earth went around the sun, long before the Christian Church even existed! And the method that Kepler used to figure out the orbits of Earth and the planets, when at the time we didn't quite know the distance to them, and all we had were positions in the sky to go by.

Incredible.

Also, a second video on how the moons of Jupiter (yes, it's not at all about Pluto despite the title) revealed much about how our solar system works. It even revealed that light had a finite speed.

I think if these methods were more widely known, how we figured out the shape of the Earth, the size of the moon and sun, and how orbits worked, then people wouldn't have the mistaken belief of a flat earth holding up the firmaments.

Update on Tuesday, March 18th, 2025

Part Two of “Terence Tao on how we measure the cosmos” has been released.


I never got the memo on “copyover servers”

There’s only so much you can do with builder rights on someone else’s MUD. To really change the game, you needed to be able to code, and most MUDs were written “real languages” like C. We’d managed to get a copy of Visual C++ 6 and the CircleMUD source code, and started messing about. But the development cycle was pretty frustrating — for every change, you had to recompile the server, shut it down (dropping everyone’s connections), bring it back up, and wait for everyone to log back in.

Some MUDs used a very cool trick to avoid this, called “copyover” or “hotboot”. It’s an idiom that lets a stateful server replace itself while retaining its PID and open connections. It seemed like magic back then: you recompiled the server, sent the right command, everything froze for a few seconds, and (if you were lucky) it came back to life running the latest code. The trick is simple but I can’t find a detailed write-up, so I wanted to write it out while I thought of it.

Via Lobsters, How Copyover MUD Servers Worked | Blog | jackkelly.name

Somehow, in all my years of programming (and the few years I was looking into the source code of various MUDs back in the early 90s) I never came across this method of starting an updated version of a server without losing any network connections. In hindsite, it's an obvious solution—it just never occured to me to do this.


Discussions about this entry

Saturday, March 01, 2025

Fixing a 27 year old bug that only now just got triggered

I will, from time to time, look at various logs for errors. And when I looked at the error log for my web server, intermixed with errors I have no control over like this:

[Tue Feb 25 10:41:19.504140 2025] [ssl:error] [pid 16571:tid 3833293744] [client 206.168.34.92:47678] AH02032: Hostname literature.conman.org provided via SNI and hostname 71.19.142.20 provided via HTTP have no compatible SSL setup
[Tue Feb 25 12:39:33.768053 2025] [ssl:error] [pid 16408:tid 3892042672] [client 167.94.146.59:50798] AH02032: Hostname hhgproject.org provided via SNI and hostname 71.19.142.20 provided via HTTP have no compatible SSL setup
[Sat Mar 01 05:34:44.029898 2025] [core:error] [pid 21954:tid 3841686448] [client 121.36.96.194:53710] AH10244: invalid URI path (/cgi-bin/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/bin/sh)
[Sat Mar 01 05:34:45.077056 2025] [core:error] [pid 23369:tid 3875257264] [client 121.36.96.194:53722] AH10244: invalid URI path (/cgi-bin/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/bin/sh)

I found a bunch of errors that I found concerning:

[Sun Feb 23 10:14:54.644036 2025] [cgid:error] [pid 16408:tid 3715795888] [client 185.42.12.144:51022] End of script output before headers: contact.cgi, referer: https://www.hhgproject.org/contact.cgi
contact.cgi: src/Cgi/UrlDecodeChar.c:41: UrlDecodeChar: Assertion `((*__ctype_b_loc ())[(int) ((*src))] & (unsigned short int) _ISxdigit)' failed.

It's obvious that a call to assert() failed in the function UrlDecodeChar() due to some robot failing to encode a web request properly. Let's see what the code is actually doing:

char UrlDecodeChar(char **psrc)
{
  char *src;
  char  c;
  
  assert(psrc  != NULL);
  assert(*psrc != NULL);
  
  src = *psrc;
  c   = *src++;
  if (c == '+')
    c = ' ';
  else if (c == '%')
  {
    assert(isxdigit(*src));
    assert(isxdigit(*(src+1)));
    c    = ctohex(*src) * 16 + ctohex(*(src+1));
    src += 2;
  }
  *psrc = src;
  return(c);
}

The problem was using assert() to check the results of some I/O—that's not what assert() is for. I think I was being lazy when I used those assertions and didn't bother with the proper coding practice of returning an error. Curious as to when I added this code, I checked the history and from December 3rd, 2004:

char UrlDecodeChar(char **psrc)
{
  char *src;
  int	c;

  ddt(psrc  != NULL);
  ddt(*psrc != NULL);

  src = *psrc;
  c   = *src++;
  if (c == '+')
    c = ' ';
  else if (c == '%')
  {
    ddt(isxdigit(*src));
    ddt(isxdigit(*(src+1)));
    c	 = ctohex(*src) * 16 + ctohex(*(src+1));
    src += 2;
  }
  *psrc = src;
  return(c);
}

The history in the current repository goes no further back due to losing my CVS repositories and it's interesting to see that this function is the same as it was back then (with the difference of using my own version of assert() called ddt() back in the day). Some further sluthing convinced me that I wrote this code back in 1997. This function is old enough to not only vote, be drafted, get drunk, and sign contracts, but be removed from its parents health insurance!

Good lord!

It's not how I would write that function today.

It's even more remarkable that I haven't seen this assert() trigger in all those years.

The fix was easy:

char UrlDecodeChar(char **psrc)
{
  char *src;
  char  c;
  
  assert(psrc  != NULL);
  assert(*psrc != NULL);
  
  src = *psrc;
  c   = *src++;
  if (c == '+')
    c = ' ';
  else if (c == '%')
  {
    if (!isxdigit(*src))   return '\0';
    if (!isxdigit(*src+1)) return '\0';
    c    = ctohex(*src) * 16 + ctohex(*(src+1));
    src += 2;
  }
  *psrc = src;
  return(c);
}

And propagating the error back up the call chain. This does result in a new major version for CGILib since I do follow semantic versioning since this is, technically speaking, a change in the public API even though this is less than 10 lines of code (out of 8,000+).

Monday, March 03, 2025

Yelling at clouds

I will admit—these are kneejerk reactions, but they're honestly my reactions to reading the following statements. I know, I know, hanging onions off our belt is long out of style.

And get off my lawn!

Anyway … statment the first:

Think jq, but without having to ask an LLM to write the query for you.

Via Lobsters, A float walks into a gradual type system

So … using jq is so hard you need to use a tool that will confabulate ¼ of the time in order to construct a simple query? Is that what you are saying? That you can't be bothered to use your brain? Just accept the garbage spewed forth by a probabilistic text slinger?

Really?

And did you use an LLM to help write the code? If not, why not?

Sigh.

And statement the second:

… and most importantly, coding can be social and fun again.

Via Lobsters, introducing tangled

If I had known that programming would become a team sport, I, an introvert, would have choosen a different career. Does XXXXX­XX everything have to be social? Why can't it just be fun? I need to be micromanaged as well?


A quirk of the Motorola 6809 assemblers

I just learned an interesting bit of trivia about 6809 assembly language on a Discord server today. When Motorola designed the 6809 assembler, they made a distinction between the use of n,PC and n,PCR in the indexing mode. Both of those make a reference based off the PC register, but in assembly language they defined, using n,PC means use the literal value of n as the distance, whereas n,PCR means generate the distance between n and the current value of the PC register.

I never knew that.

I just looked and all the materials I had on the 6809 use the n,PCR method everywhere, yet when I wrote my assembler, I only support n,PC and it always calculates the distance. I think I forgot that it should have been n,PCR because on the 68000 (which I also programmed, and was also made by Motorola) it always used n,PC.

And I don't think I'll change my assembler as there does exist a method to use an arbitrary value of n as a distance: LDA (*+3)+n,PC. The asterisk evaluates to the address of the current instruction, and by adding 3 you get the address of the next instruction, which in the PC-relative addressing mode, is a distance of 0. Then n will be the actual offset used in the instruction. Yes, it's a bit convoluted, but it's a way to get how Motorola originally defined n,PC.

And apparently, Motorola defined it that way to make up for less intelligent assemblers back in the day due to memory constraints. We are long past those days.

Tuesday, March 18, 2025

Measuring the cosmos, part II

Last month, I mentioned part one of how we measured the night sky, and now, part two of “Terence Tao on how we measure the cosmos”.


A network of bloggers, a reel of YouTubers and other collective nouns

While I just made up the “network of bloggers” and “reel of YouTubers,” other collective nouns for groups, like a gaggle of geese, a murder of crows, or a pod of whales, are not quite as old as they may seem, and were largely made up just a few hundred years ago, and there were a lot more than we use today, according to this video. Neat.


Who serves whom?

The narrative around these bots is that [AIs] are there to help humans. In this story, the hospital buys a radiology bot that offers a second opinion to the human radiologist. If they disagree, the human radiologist takes another look. In this tale, AI is a way for hospitals to make fewer mistakes by spending more money. An AI assisted radiologist is less productive (because they re-run some x-rays to resolve disagreements with the bot) but more accurate.

In automation theory jargon, this radiologist is a "centaur" – a human head grafted onto the tireless, ever-vigilant body of a robot

Of course, no one who invests in an AI company expects this to happen. Instead, they want reverse-centaurs: a human who acts as an assistant to a robot. The real pitch to hospital is, "Fire all but one of your radiologists and then put that poor bastard to work reviewing the judgments our robot makes at machine scale."

Pluralistic: AI can't do your job (18 Mar 2025) – Pluralistic: Daily links from Cory Doctorow

This has always been my fear of the recent push of LLM backed AI—not that they would help me do my job better, but that I existed to help it do its job better (if I'm even there).

Wednesday, March 19, 2025

How I vibe code

There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

Via Flutterby, Andrej Karpathy on X

Good Lord! If you thought software today was bloated and slow, this sounds like it would produce software that is gigantically glacial in comparison (and by “embrace exponentials” I think he means “accept code with O(n2), O(2n) or even O(n!) behavior”).

That's not how I would “vibe code.” No, to me, “vibe coding” is:

  1. Don't necessarily worry about the behavior of the code—make it work but at least try to avoid O(2n) or worse algorithms, then make it right, then fast.
  2. Don't use version control! If you make a mistake and need to revert, revert by hand, or carry on through the bad code. And avoid using directores like “src.1/”, “src.2/“ or “src-no-really-this-works/”—that's still a form of version control (albeit a poor man's version control). Power through your mistakes.
  3. Don't bother with “unit tests,” “integration tests,” TDD or even BDD. I'm not saying don't test, just don't write tests. Want to refactor? Go ahead—bull through the changes, or don't. It's your code. Yes, this does mean mostly manual testing, and having a file of test data is fine—just don't write test code.
  4. Format the code however you want! Form your own opinions on formatting. Have some soul in your code for once.
  5. This isn't a team sport, so no pair programming! This is vibe coding, not vibe partying.
  6. Remember the words of Bob Ross: “we don't make mistakes, just happy little accidents.”
  7. Go with the flow. Just Do It™!

Now that I think about it, this is pretty much how programmers wrote code on home computers in the late 70s/early 80s. Funny that. But just blindly accepting LLM-written code? Good luck in getting anything to run correctly.

Sheesh.

Friday, March 21, 2025

A different approach to blocking bad webbots by IP address

Web crawlers for LLM-based companies, as well as some specific solutions to blocking them, have been making the rounds in the past few days. I was curious to see just how many were hitting my web site, so I ran a few queries over the log files. To ensure consistent results, I decided to query the log file for last month:

Quick summary of results for February 2025
total requests 468439
unique IPs 24654
Top 10 requests per IP
IP Requests
4.231.104.62 43242
198.100.155.33 26650
66.55.200.246 9057
74.80.208.170 8631
74.80.208.59 8407
216.244.66.239 5998
4.227.36.126 5832
20.171.207.130 5817
8.29.198.26 4946
8.29.198.25 4807

(Note: I'm not concerned about protecting any privacy here—given the number of results, there is no way these are any individual. These are all companies hitting my site, and if companies are mining their data for my information, I'm going to do the same to them. So there.)

But it became apparent that it's hard to determine which requests are coming from a single entity—it's clear that a company can employ a large pool of IP addresses to crawl the web, and it's hard to figure out what IPs are under control of which company.

Or is it?

An idea suddenly hit me—a stray thought from the days when I was wearing a network admin hat I recalled that BGP routing basically knows the network boundaries for networks as it's based on policy routing via ASNs. I wonder if I could map IP addresses to ASNs? A quick search and I found my answer—yes! Within a few minutes, I had converted a list of 24,654 unique IP addresses to 1,490 unique networks, I was then able to rework my initial query to include the ASN (or rather, the human readable version instead of just the number):

Requests per IP/ASN
IP Requests AS
4.231.104.62 43242 MICROSOFT-CORP-MSN-AS-BLOCK, US
198.100.155.33 26650 OVH, FR
66.55.200.246 9057 BIDDEFORD1, US
74.80.208.170 8631 CSTL, US
74.80.208.59 8407 CSTL, US
216.244.66.239 5998 WOW, US
4.227.36.126 5832 MICROSOFT-CORP-MSN-AS-BLOCK, US
20.171.207.130 5817 MICROSOFT-CORP-MSN-AS-BLOCK, US
8.29.198.26 4946 FEEDLY-DEVHD, US
8.29.198.25 4807 FEEDLY-DEVHD, US

Now, I was curious as to how they identified themselves, so I reran the query to include the user agent string. The top eight identified themselves consistently:

Requests per Agent
Agent Requests
Go-http-client/2.0 43236
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/132.0.0.0 Safari/537.36 26650
WF search/Nutch-1.12 9057
Mozilla/5.0 (compatible; ImagesiftBot; +imagesift.com) 8631
Mozilla/5.0 (compatible; ImagesiftBot; +imagesift.com) 8407
Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help@moz.com) 5998
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot) 5832
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot) 5817

The last two, however had a changing user agent string:

Identifiers for 8.29.198.26
Agent Requests
Feedly/1.0 (+https://feedly.com/poller.html; 16 subscribers; ) 1667
Feedly/1.0 (+https://feedly.com/poller.html; 6 subscribers; ) 1419
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 6 subscribers; ) 938
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 16 subscribers; ) 811
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 8 subscribers; ) 94
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 37 subscribers; ) 17
Identifiers for 8.29.198.25
Agent Requests
Feedly/1.0 (+https://feedly.com/poller.html; 16 subscribers; ) 1579
Feedly/1.0 (+https://feedly.com/poller.html; 6 subscribers; ) 1481
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 6 subscribers; ) 905
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 16 subscribers; ) 741
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 8 subscribers; ) 90
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 37 subscribers; ) 11

I'm not sure what the difference is between polling and fetching (checking the URLs shows two identical pages, only differing in “Poller” and “Fetcher.” But looking deeper into that is for another post.

The next request I did was to see how many IPs (that hit my site in February) map to a particular ASN, and the top 10 are:

IPs per AS
AS Count
ALIBABA-CN-NET Alibaba US Technology Co., Ltd., CN 4034
AMAZON-02, US 1733
HWCLOUDS-AS-AP HUAWEI CLOUDS, HK 1527
GOOGLE-CLOUD-PLATFORM, US 996
COMCAST-7922, US 895
AMAZON-AES, US 719
TENCENT-NET-AP-CN Tencent Building, Kejizhongyi Avenue, CN 635
MICROSOFT-CORP-MSN-AS-BLOCK, US 615
AS-VULTR, US 599
ATT-INTERNET4, US 472

So Alibaba US crawled my site from 4,034 different IP addresses—I haven't done the query to figure out how many requests each ASN did, but it should be a straightforward thing to just replace IP address with the ASN to get a better count of which company is crawling my site the hardest.

And now I'm thinking, I wonder if instead of a form of ad-hoc banning of single IP addresses, or blocking huge swaths of IP addresses (like 47.0.0.0/8, it might not be better to block per ASN? The IP to ASN mapping service I found makes it quite easy to get the ASN of an IP address (and to map the ASN to an human-readable name), Instead of, for example, blocking 101.32.0.0/16, 119.28.0.0/16, 43.128.0.0/14, 43.153.0.0/16 and 49.51.0.0/16 (which isn't an exaustive list by any means) just block IPs belonging to ASN 132203, otherwise known as “TENCENT-NET-AP-CN Tencent Building, Kejizhongyi Avenue, CN.”

I don't know how effective that idea is, but the IP-to-ASN site I found does offer the information via DNS, so it shouldn't be that hard to do.


A deeper dive into mapping web requests via ASN, not by IP address

I went ahead and replaced IP addresses with ASNs in the log file to find the network that sent the most requests to my blog for the month of February.

Top 10 networks requesting a page from blog
MICROSOFT-CORP-MSN-AS-BLOCK, US 78889
OVH, FR 31837
ALIBABA-CN-NET Alibaba US Technology Co., Ltd., CN 25019
HETZNER-AS, DE 23840
GOOGLE-CLOUD-PLATFORM, US 21431
CSTL, US 17225
HURRICANE, US 15495
AMAZON-AES, US 14430
FACEBOOK, US 13736
AKAMAI-LINODE-AP Akamai Connected Cloud, SG 12673

Even though Alibaba US has the most unique IPs hitting my blog, Microsoft is still the network making the most requests. So let's see how Microsoft presents itself to my web server. Here are the user agents it sends:

Web agents from the Microsoft Network
agent requests
Go-http-client/2.0 43236
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot) 23978
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36 7953
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0 2955
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot 210
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot 161
DuckDuckBot/1.1; (+http://duckduckgo.com/duckduckbot.html) 123
'DuckDuckBot-Https/1.1; (+https://duckduckgo.com/duckduckbot)' 122
Python/3.9 aiohttp/3.10.6 28
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.6478.36 Safari/537.36 14
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.114 Safari/537.36 14
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.68 10
DuckAssistBot/1.2; (+http://duckduckgo.com/duckassistbot.html) 10
DuckAssistBot/1.1; (+http://duckduckgo.com/duckassistbot.html) 10
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36 6
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.143 Safari/537.36 6
python-requests/2.32.3 5
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.142 Safari/537.36 5
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 4
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:77.0) Gecko/20100101 Firefox/77.0 4
DuckDuckBot-Https/1.1; (+https://duckduckgo.com/duckduckbot) 4
Twingly Recon 3
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot) 3
Mozilla/5.0 (compatible; Twingly Recon; twingly.com) 3
python-requests/2.28.2 2
newspaper/0.9.1 2
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36 2
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b 2
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36 2
http.rb/5.1.1 (Mastodon/4.2.10; +https://trystero.social/) Bot 1
http.rb/5.1.1 (Mastodon/4.2.10; +https://trystero.social/) 1
Mozilla/5.0 (Windows NT 6.1; WOW64) SkypeUriPreview Preview/0.5 skype-url-preview@microsoft.com 1
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36 1
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36 1
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.48 1
Mastodon/4.4.0-alpha.2 (http.rb/5.2.0; +https://sns.mszpro.com/) Bot 1
Mastodon/4.4.0-alpha.2 (http.rb/5.2.0; +https://sns.mszpro.com/) 1
Mastodon/4.3.3 (http.rb/5.2.0; +https://the.voiceover.bar/) Bot 1
Mastodon/4.3.3 (http.rb/5.2.0; +https://the.voiceover.bar/) 1
Mastodon/4.3.3 (http.rb/5.2.0; +https://discuss.systems/) Bot 1
Mastodon/4.3.3 (http.rb/5.2.0; +https://discuss.systems/) 1

The top result comes from a single IP address and probably requires a separate post about it, since it's weird and annoying. But the rest—you got Bing, you got OpenAI, you got several Mastodon instances—it seems like most of these are from Microsoft's cloud offering. A mixture of things.

What about Facebook?

Web agents from Facebook
agent requests
meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler) 13497
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) 207
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36 12
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 4
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 4
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36 4
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:58.0) Gecko/20100101 Firefox/59.0 4
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36 Edg/132.0.0.0 2
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36 2

Hmm … looks like I have a few readers at Facebook, but other than that, nothing terribly interesting.

Alibaba, on the other hand, is frightening. Out of 25,019 requests, it presented 581 different user agents. From looking at what was requested, I don't think it's 500 Chinese people reading my blog—it's defintely bots crawling my site (and amusingly, there are requests to /robots.txt file, but without a proper user agent to go by, it's hard to block it via that file).

I can think of one conclusion here—if you do filter by ASN, it can help tremendously, but it also comes with possibly blocking legitimate traffic.


Still no information on who “The Knowledge AI” is or was

Back in July 2019 I was investigating some bad bots on my website when I came across the bot that identified itself simply as “The Knowledge AI” that was the number one robot hitting my site. Most bots that identify themselves will give a URL to a page that describes their usage like Barkrowler (to pick one that recently crawled my site). But not so “The Knowledge AI”. That was all it said, “The Knowledge AI”. It was very hard to Google, but I wouldn’t be surprised if it was OpenAI.

The earliest I can find “The Knowledge AI” crawling my site was April of 2018, and despite starting on April 16th, it was the second most active robot that month. In May it was the number one bot, and it stayed there through October of 2022, after which it pretty much dropped—from 32,000+ in October of 2022 to 85 in November of 2022 (about 4½ years). It was sporadic, showing up in single digit hits until January of 2024. It may be still crawling my site, but if it is, it is no longer identifying itself.

I don’t know if “The Knowledge AI” was an LLM company crawling, but if it was, not giving a link to explain the bot is suspicious. It’s the rare crawler that doesn’t identify itself with at least a URL to describe it. The fact that it took the number one crawling spot on my site for 4 ½ years is suspicious. As robots go, it didn’t affect the web server all that much (I’ve come across worse ones), and well over 90% of its requests were valid (unlike MJ12, which had a 75% failure rate). And my /robots.txt file doesn’t exclude any robot from scanning, so I can’t really complain about it.

My comment on “Mitigating SourceHut's partial outage caused by aggressive crawlers | Lobsters”

Even though the log data is a few years old, I don't think that IPs change from ASN to ASN all that much (but I could be wrong on that). I checked the IPs used by “The Knowledge AI” in May 2018, and in October 2022, and they didn't change that much. They were still the same /24 networks across that time.

Looking up the information today is very disappointing—Hurricane Electric LLC., a backbone provider.

So no real information about who “The Knowledge AI” might have been.

Sigh.


Now a bit about feed readers

There are a few bots acting less than optimally that aren't some LLM-based company scraping my site. I think. Anyway, the first one I mentioned:

Identifiers for 8.29.198.26
Agent Requests
Feedly/1.0 (+https://feedly.com/poller.html; 16 subscribers; ) 1667
Feedly/1.0 (+https://feedly.com/poller.html; 6 subscribers; ) 1419
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 6 subscribers; ) 938
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 16 subscribers; ) 811
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 8 subscribers; ) 94
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 37 subscribers; ) 17
Identifiers for 8.29.198.25
Agent Requests
Feedly/1.0 (+https://feedly.com/poller.html; 16 subscribers; ) 1579
Feedly/1.0 (+https://feedly.com/poller.html; 6 subscribers; ) 1481
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 6 subscribers; ) 905
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 16 subscribers; ) 741
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 8 subscribers; ) 90
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 37 subscribers; ) 11

This is feedly, a company that offers a news reader (and I'd like to thank the 67 subscribers I have—thank you). The first issue I have about this client is the apparent redundant requests from six different clients. An issue because I only have three different feeds, the Atom feed, the RSS feed and the the JSON feed. The poller seems to be acting correctly—16 subscribers to my Atom feed and 6 to the RSS feed. The other four? The fetchers? I'm not sure what's going on there. There's one for the RSS feed, and three for the Atom feed. And one of them is a typo—it's requesting “//index.atom” instead of the proper “/index.atom” (but apparently Apache allows it). How do I have 16 subscribers to “/index.atom” and another 37 for “/index.atom”? What exactly, is the difference between the two? And can't you fix the “//index.atom” reference? To me, that's an obvious typo, one that could be verified by retreiving both “/index.atom” and “//index.atom” and seeing they're the same.

Anyway, the second issue I have with feedly is their apparent lack of caching on their end. They do not do a conditional request and while they aren't exactly slamming my server, they are making multiple requests per hour, and for a resource that doesn't change all that often (excluding today that is).

Then there's the bot at IP address 4.231.104.62. It made 43,236 requests to get “/index.atom”, 5 invalid requests in the form of “/gopher://gopher.conman.org/0Phlog:2025/02/…” and one other valid request for this page. It's not the 5 invalid requests or the 1 valid request that has me weirded out—it's the 43,236 to my Atom feed. That's one request every 55 seconds! And even worse—it's not a conditional request! Of all the bots, this is the one I feel most like blocking at the firewall level—just have it drop the packets entirely.

At least it supports compressed results.

Sheesh.

As for the rest—of the 109 bots that fetched the Atom feed at least once per day (I put the cut off at 28 requests or more durring February), only 31 did so conditionally. That's a horrible rate. And of the 31 that did so conditionally, most don't support compression. So on the one hand, the majority of bots that fetch the Atom feed do so compressed. On the other hand, it appears that the bots that do fetch conditionally most don't support compression.

Sigh.

Wednesday, March 26, 2025

Notes on blocking spam by filtering on ASN

So now that I can classify IP addresses by ASN, I thought I might see how it could help with spam email. I'm already using an ansi-spam agent to cut down on spam, so maybe filtering by ASN could cut down even more. The last time I looked into additional means of spam avoidance, the use of SPF wasn't worth the effort.

And I'm afraid the effort of blocking via ASN won't be worth the effort either. Looking over email attempts over the past month, the top 10 networks who sent email to my server, from 5,181 individual emails:

Top 10 emailers to my server
AS Count
IOMART-AS, 375
IDNIC-IDCLOUDHOST-AS-ID 369
PAIR-NETWORKS, 263
MICROSOFT-CORP-MSN-AS-BLOCK, 246
AS-COLOCROSSING, 152
EMERALD-ONION, 124
GOOGLE, 122
SPARKPOST, 120
TZULO, 112
AMAZON-02, 106

Unlike the web (or even Gemini or gopher) there isn't one dominant network here—it's all spread out. I don't think it's really worth the effort to block via ASN for spam. At least for my email server.

Obligatory Picture

Dad was resigned to the fact that I was, indeed, a landlubber, and turned the boat around yet again …

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer

No AI was used in the making of this site, unless otherwise noted.

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2025 by Sean Conner. All Rights Reserved.