The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Saturday, March 01, 2025

Fixing a 27 year old bug that only now just got triggered

I will, from time to time, look at various logs for errors. And when I looked at the error log for my web server, intermixed with errors I have no control over like this:

[Tue Feb 25 10:41:19.504140 2025] [ssl:error] [pid 16571:tid 3833293744] [client 206.168.34.92:47678] AH02032: Hostname literature.conman.org provided via SNI and hostname 71.19.142.20 provided via HTTP have no compatible SSL setup
[Tue Feb 25 12:39:33.768053 2025] [ssl:error] [pid 16408:tid 3892042672] [client 167.94.146.59:50798] AH02032: Hostname hhgproject.org provided via SNI and hostname 71.19.142.20 provided via HTTP have no compatible SSL setup
[Sat Mar 01 05:34:44.029898 2025] [core:error] [pid 21954:tid 3841686448] [client 121.36.96.194:53710] AH10244: invalid URI path (/cgi-bin/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/bin/sh)
[Sat Mar 01 05:34:45.077056 2025] [core:error] [pid 23369:tid 3875257264] [client 121.36.96.194:53722] AH10244: invalid URI path (/cgi-bin/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/bin/sh)

I found a bunch of errors that I found concerning:

[Sun Feb 23 10:14:54.644036 2025] [cgid:error] [pid 16408:tid 3715795888] [client 185.42.12.144:51022] End of script output before headers: contact.cgi, referer: https://www.hhgproject.org/contact.cgi
contact.cgi: src/Cgi/UrlDecodeChar.c:41: UrlDecodeChar: Assertion `((*__ctype_b_loc ())[(int) ((*src))] & (unsigned short int) _ISxdigit)' failed.

It's obvious that a call to assert() failed in the function UrlDecodeChar() due to some robot failing to encode a web request properly. Let's see what the code is actually doing:

char UrlDecodeChar(char **psrc)
{
  char *src;
  char  c;
  
  assert(psrc  != NULL);
  assert(*psrc != NULL);
  
  src = *psrc;
  c   = *src++;
  if (c == '+')
    c = ' ';
  else if (c == '%')
  {
    assert(isxdigit(*src));
    assert(isxdigit(*(src+1)));
    c    = ctohex(*src) * 16 + ctohex(*(src+1));
    src += 2;
  }
  *psrc = src;
  return(c);
}

The problem was using assert() to check the results of some I/O—that's not what assert() is for. I think I was being lazy when I used those assertions and didn't bother with the proper coding practice of returning an error. Curious as to when I added this code, I checked the history and from December 3rd, 2004:

char UrlDecodeChar(char **psrc)
{
  char *src;
  int	c;

  ddt(psrc  != NULL);
  ddt(*psrc != NULL);

  src = *psrc;
  c   = *src++;
  if (c == '+')
    c = ' ';
  else if (c == '%')
  {
    ddt(isxdigit(*src));
    ddt(isxdigit(*(src+1)));
    c	 = ctohex(*src) * 16 + ctohex(*(src+1));
    src += 2;
  }
  *psrc = src;
  return(c);
}

The history in the current repository goes no further back due to losing my CVS repositories and it's interesting to see that this function is the same as it was back then (with the difference of using my own version of assert() called ddt() back in the day). Some further sluthing convinced me that I wrote this code back in 1997. This function is old enough to not only vote, be drafted, get drunk, and sign contracts, but be removed from its parents health insurance!

Good lord!

It's not how I would write that function today.

It's even more remarkable that I haven't seen this assert() trigger in all those years.

The fix was easy:

char UrlDecodeChar(char **psrc)
{
  char *src;
  char  c;
  
  assert(psrc  != NULL);
  assert(*psrc != NULL);
  
  src = *psrc;
  c   = *src++;
  if (c == '+')
    c = ' ';
  else if (c == '%')
  {
    if (!isxdigit(*src))   return '\0';
    if (!isxdigit(*src+1)) return '\0';
    c    = ctohex(*src) * 16 + ctohex(*(src+1));
    src += 2;
  }
  *psrc = src;
  return(c);
}

And propagating the error back up the call chain. This does result in a new major version for CGILib since I do follow semantic versioning since this is, technically speaking, a change in the public API even though this is less than 10 lines of code (out of 8,000+).

Monday, March 03, 2025

Yelling at clouds

I will admit—these are kneejerk reactions, but they're honestly my reactions to reading the following statements. I know, I know, hanging onions off our belt is long out of style.

And get off my lawn!

Anyway … statment the first:

Think jq, but without having to ask an LLM to write the query for you.

Via Lobsters, A float walks into a gradual type system

So … using jq is so hard you need to use a tool that will confabulate ¼ of the time in order to construct a simple query? Is that what you are saying? That you can't be bothered to use your brain? Just accept the garbage spewed forth by a probabilistic text slinger?

Really?

And did you use an LLM to help write the code? If not, why not?

Sigh.

And statement the second:

… and most importantly, coding can be social and fun again.

Via Lobsters, introducing tangled

If I had known that programming would become a team sport, I, an introvert, would have choosen a different career. Does XXXXX­XX everything have to be social? Why can't it just be fun? I need to be micromanaged as well?


A quirk of the Motorola 6809 assemblers

I just learned an interesting bit of trivia about 6809 assembly language on a Discord server today. When Motorola designed the 6809 assembler, they made a distinction between the use of n,PC and n,PCR in the indexing mode. Both of those make a reference based off the PC register, but in assembly language they defined, using n,PC means use the literal value of n as the distance, whereas n,PCR means generate the distance between n and the current value of the PC register.

I never knew that.

I just looked and all the materials I had on the 6809 use the n,PCR method everywhere, yet when I wrote my assembler, I only support n,PC and it always calculates the distance. I think I forgot that it should have been n,PCR because on the 68000 (which I also programmed, and was also made by Motorola) it always used n,PC.

And I don't think I'll change my assembler as there does exist a method to use an arbitrary value of n as a distance: LDA (*+3)+n,PC. The asterisk evaluates to the address of the current instruction, and by adding 3 you get the address of the next instruction, which in the PC-relative addressing mode, is a distance of 0. Then n will be the actual offset used in the instruction. Yes, it's a bit convoluted, but it's a way to get how Motorola originally defined n,PC.

And apparently, Motorola defined it that way to make up for less intelligent assemblers back in the day due to memory constraints. We are long past those days.

Obligatory Picture

Dad was resigned to the fact that I was, indeed, a landlubber, and turned the boat around yet again …

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer

No AI was used in the making of this site, unless otherwise noted.

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2025 by Sean Conner. All Rights Reserved.