Saturday, March 01, 2025
Fixing a 27 year old bug that only now just got triggered
I will, from time to time, look at various logs for errors. And when I looked at the error log for my web server, intermixed with errors I have no control over like this:
[Tue Feb 25 10:41:19.504140 2025] [ssl:error] [pid 16571:tid 3833293744] [client 206.168.34.92:47678] AH02032: Hostname literature.conman.org provided via SNI and hostname 71.19.142.20 provided via HTTP have no compatible SSL setup [Tue Feb 25 12:39:33.768053 2025] [ssl:error] [pid 16408:tid 3892042672] [client 167.94.146.59:50798] AH02032: Hostname hhgproject.org provided via SNI and hostname 71.19.142.20 provided via HTTP have no compatible SSL setup [Sat Mar 01 05:34:44.029898 2025] [core:error] [pid 21954:tid 3841686448] [client 121.36.96.194:53710] AH10244: invalid URI path (/cgi-bin/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/bin/sh) [Sat Mar 01 05:34:45.077056 2025] [core:error] [pid 23369:tid 3875257264] [client 121.36.96.194:53722] AH10244: invalid URI path (/cgi-bin/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/%%32%65%%32%65/bin/sh)
I found a bunch of errors that I found concerning:
[Sun Feb 23 10:14:54.644036 2025] [cgid:error] [pid 16408:tid 3715795888] [client 185.42.12.144:51022] End of script output before headers: contact.cgi, referer: https://www.hhgproject.org/contact.cgi contact.cgi: src/Cgi/UrlDecodeChar.c:41: UrlDecodeChar: Assertion `((*__ctype_b_loc ())[(int) ((*src))] & (unsigned short int) _ISxdigit)' failed.
It's obvious that a call to assert()
failed in the function UrlDecodeChar()
due to some robot failing to encode a web request properly.
Let's see what the code is actually doing:
char UrlDecodeChar(char **psrc) { char *src; char c; assert(psrc != NULL); assert(*psrc != NULL); src = *psrc; c = *src++; if (c == '+') c = ' '; else if (c == '%') { assert(isxdigit(*src)); assert(isxdigit(*(src+1))); c = ctohex(*src) * 16 + ctohex(*(src+1)); src += 2; } *psrc = src; return(c); }
The problem was using assert()
to check the results of some I/O—that's not what assert()
is for.
I think I was being lazy when I used those assertions and didn't bother with the proper coding practice of returning an error.
Curious as to when I added this code,
I checked the history and from December 3rd, 2004:
char UrlDecodeChar(char **psrc) { char *src; int c; ddt(psrc != NULL); ddt(*psrc != NULL); src = *psrc; c = *src++; if (c == '+') c = ' '; else if (c == '%') { ddt(isxdigit(*src)); ddt(isxdigit(*(src+1))); c = ctohex(*src) * 16 + ctohex(*(src+1)); src += 2; } *psrc = src; return(c); }
The history in the current repository goes no further back due to losing my CVS repositories and it's interesting to see that this function is the same as it was back then
(with the difference of using my own version of assert()
called ddt()
back in the day).
Some further sluthing convinced me that I wrote this code back in 1997.
This function is old enough to not only vote,
be drafted,
get drunk,
and sign contracts,
but be removed from its parents health insurance!
Good lord!
It's not how I would write that function today.
It's even more remarkable that I haven't seen this assert()
trigger in all those years.
The fix was easy:
char UrlDecodeChar(char **psrc) { char *src; char c; assert(psrc != NULL); assert(*psrc != NULL); src = *psrc; c = *src++; if (c == '+') c = ' '; else if (c == '%') { if (!isxdigit(*src)) return '\0'; if (!isxdigit(*src+1)) return '\0'; c = ctohex(*src) * 16 + ctohex(*(src+1)); src += 2; } *psrc = src; return(c); }
And propagating the error back up the call chain. This does result in a new major version for CGILib since I do follow semantic versioning since this is, technically speaking, a change in the public API even though this is less than 10 lines of code (out of 8,000+).
Monday, March 03, 2025
Yelling at clouds
I will admit—these are kneejerk reactions, but they're honestly my reactions to reading the following statements. I know, I know, hanging onions off our belt is long out of style.
And get off my lawn!
Anyway … statment the first:
Think
jq
, but without having to ask an LLM to write the query for you.
Via Lobsters, A float walks into a gradual type system
So … using jq
is so hard you need to use a tool that will confabulate ¼ of the time in order to construct a simple query?
Is that what you are saying?
That you can't be bothered to use your brain?
Just accept the garbage spewed forth by a probabilistic text slinger?
Really?
And did you use an LLM to help write the code? If not, why not?
Sigh.
And statement the second:
… and most importantly, coding can be social and fun again.
Via Lobsters, introducing tangled
If I had known that programming would become a team sport, I, an introvert, would have choosen a different career. Does XXXXXXX everything have to be social? Why can't it just be fun? I need to be micromanaged as well?
A quirk of the Motorola 6809 assemblers
I just learned an interesting bit of trivia about 6809 assembly language on a Discord server today.
When Motorola designed the 6809 assembler,
they made a distinction between the use of n,PC
and n,PCR
in the indexing mode.
Both of those make a reference based off the PC
register,
but in assembly language they defined,
using n,PC
means use the literal value of n as the distance,
whereas n,PCR
means generate the distance between n and the current value of the PC
register.
I never knew that.
I just looked and all the materials I had on the 6809 use the n,PCR
method everywhere,
yet when I wrote my assembler,
I only support n,PC
and it always calculates the distance.
I think I forgot that it should have been n,PCR
because on the 68000
(which I also programmed,
and was also made by Motorola) it always used n,PC
.
And I don't think I'll change my assembler as there does exist a method to use an arbitrary value of n as a distance:
LDA (*+3)+
n,PC
.
The asterisk evaluates to the address of the current instruction,
and by adding 3 you get the address of the next instruction,
which in the PC
-relative addressing mode,
is a distance of 0.
Then n will be the actual offset used in the instruction.
Yes,
it's a bit convoluted,
but it's a way to get how Motorola originally defined n,PC
.
And apparently, Motorola defined it that way to make up for less intelligent assemblers back in the day due to memory constraints. We are long past those days.