The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Tuesday, October 21, 2025

The actual root cause of yesterday's bug were laid over twenty years ago

Yesterday, I found the root cause of a bug but I did not go into details about how that bug slipped into production (so to speak). That's easy—the configuration of mod_blog differ between my development server and public server.

On my public server, I have the following bit of code in the configuration:

process = require("org.conman.process")

-- --------------------------------------------------------------------
-- process limits added because an earlier version of the code actually
-- crashed the server it was running on, due to resource exhaustion.
-- --------------------------------------------------------------------

process.limits.hard.cpu  = "10m" -- 10 minutes
process.limits.hard.core =  0    -- no core file
process.limits.hard.data = "20m" -- 20 MB

-- --------------------------------------------------------
-- We now resume our regularly scheduled config file
-- --------------------------------------------------------

I load a module to configure bits of the environment that mod_blog runs in. The configuration file on the development server does not have such code. So when I compiled the email notification program, the fact that I did not include the -rdynamic compiler option was not an issue when I ran my tests.

Yes, a case where there was a difference between development and production that allowed a bug to slip through. So I decided to dig a bit deeper. A few days ago I explained why I had such directives in my configuration file when I was asked why didn't I use Apache's RLimitMEM directive. I answered that the cause of adding the process limits happened pretty early in the use of mod_blog and that I didn't recall seeing such a directive in Apache at the time.

But I did get curious as to when Apache might have added the RLimitMEM directive. I started this site using Apache 1.3 (when that was the current version of Apache—I've been blogging for quite a long time) and I was thinking that the RLimitMEM directive may have been added around version 2.0. In my archives, I found a copy of Apache 1.3.9 and wouldn't you know it—RLimitMEM existed!

Sigh.

I could have avoided yesterday's issue had I only read a bit further into the Apache documentation back in the day.

Monday, October 20, 2025

It worked, but it failed

In posting the previous post I encounted an interesting bug!

It wasn't in mod_blog per se, but in the hook running after an entry has been added, and therein is the bug—the entry was successfully added, but the hook failed.

The hook program failed due to a compilation error that was only triggered when it ran. I took the email notification code from mod_blog and turned it into a program. I also linked to the bloging core of mod_blog to avoid having to duplicate the code to read the configuration (the email notification block is now ignored by mod_blog itself), and because the configuration format is Lua, a compiler option is needed to support Lua modules written in C—basically, -rdynamic to allow C-based Lua modules to call Lua functions (which I allow, and need, to support my particular configuration).

This is the root cause of the issue.

But in the meantime, because the hook failed to run, the script I use that uses the HTTP PUT method received a status of “500 Internal Server Error,” the entry was stored, but none of the statically generated files (index.html and the various feed files) were generated, nor email sent.

Once I figured out what happened, it was easily remedied, but that still leaves the question of what should happen? I intended the add entry post-hook to handle situations like notifications, so in this case, if the hook fails, normal processing should proceed, but how to send back that the entry post-hook failed? Looking over the HTTP status codes, perhaps I could return a “202 Accepted” when the entry post-hook fails, with some information about the failure. That could work.


The fix wasn't easy, or C precedence bites

For the past decade now, I've done a Christmas release for mod_blog (only missing the years 2019 and 2023), just beacuse. I was poking around the codebase looking for changes I could make for Christmas this year, and well, I got a bit impatient and have just now released a version in time for Halloween. And it's scary—I'm removing features!

I've removed features in the past, like no longer supporting “ping servers” when it became clear it wasn't worth it, or automatically updating Linked­Tik­Face­My­Insta­Pin­Me­Gram­We­Tok­In­Book­Space­Trest when Insta­Pin­Tik­My­Linked­Face­Me­Trest­Book­Gram­We­In­Space­Tok changed how it works often enough to make it annoying for me to continue. But this time … this time it's different. This is removing functionality that has existed in the code base since the beginning!

To make it easier to write entries, I had code within mod_blog to process the input—mostly what existed was to convert sequences like ``quoted'' to “quoted” and “...” to “…”, but with an option to add <P> tags around logical paragraphs. But given that I now use my own markup language, I rarely used the web interface (like, I can count on my fingers the number of times I've used it and still have a few left over should give an indication of how little I use it) and the code just sat there, unused. So unused that in fixing one bug I introduced another bug in the code I fixed!

To recap, here's the original code:

char UrlDecodeChar(char **psrc)
{
  char *src;
  char  c;
  
  assert(psrc  != NULL);
  assert(*psrc != NULL);
  
  src = *psrc;
  c   = *src++;
  if (c == '+')
    c = ' ';
  else if (c == '%')
  {
    assert(isxdigit(*src));
    assert(isxdigit(*(src+1)));
    c    = ctohex(*src) * 16 + ctohex(*(src+1));
    src += 2;
  }
  *psrc = src;
  return(c);
}

and the “fixed” version:

char UrlDecodeChar(char **psrc)
{
  char *src;
  char  c;
  
  assert(psrc  != NULL);
  assert(*psrc != NULL);
  
  src = *psrc;
  c   = *src++;
  if (c == '+')
    c = ' ';
  else if (c == '%')
  {
    if (!isxdigit(*src))   return '\0';
    if (!isxdigit(*src+1)) return '\0';
    c    = ctohex(*src) * 16 + ctohex(*(src+1));
    src += 2;
  }
  *psrc = src;
  return(c);
}

I don't fault you if you can't spot the bug. I only found it when testing the web interface to ensure it wasn't completely broken with the conversion code removed (instead it's now only mostly broken but that's an interesting case in and of itself and requires its own post). The bug is in this line of code:

if (!isxdigit(*src+1)) return '\0';

The issue is due to C's precedence rules and dereferencing rules. The code above is parsed as src[0] + 1 instead of the src[1] that I was intending. When I modified the function, changing the calls to assert() into actual code to return an error (I typed in the new code as that's faster than modifying the existing code) I … kind of missed that.

Oh, who am I kidding? I totally missed that. But because I don't use the web interface this bug went unnoticed. Sigh.

For the record, I changed the code to read:

if (!isxdigit(src[0])) return '\0';
if (!isxdigit(src[1])) return '\0';
c    = ctohex(src[0]) * 16 + ctohex(src[1]);

Another long time feature I've removed is email notification. I added it early on where you could submit your email address to get notified of posts, but spammers and a lack of outside interest pretty much put the kibosh on that. As I still have three users of the email notification (one is me, one is Bunny, and one is one other person whom I'm not sure still reads the emails but they haven't bounced yet) I don't want to drop support completely, so now the email notifications are sent via the hook mechanism I added a few years ago.

In total, I removed over 3,000 lines of code from mod_blog. Granted, over 2,000 of them were in one function that was removed, but still, it's 3,000 lines of code I don't have to worry about any more.

Still, it's a bit scary to remove features that have been there for so long, and thus, a Halloween release.


Discussions about this entry

Thursday, October 16, 2025

So account deletion at Network Solutions is a bit more nuanced than I was led to believe

I was, perhaps, a bit harsh with my criticisms of account deletion at Network Solutions. I noticed that they had yet to remove the billing information, so I called back to get an update and this time I was able to talk to a “billing specialist.” They still had to manually delete the credit card info (which they did while I waited) but I also learned a bit more about their policies about account deletion. If there's no services on an account, it will be deleted after a month (the “billing speciallist” wasn't exactly clear on this, but it sounded like a month) of inactivity, which, okay, I can see that. I just wish that was a bit more visible on the site, both to reassure those that want to leave Network Solutions, and to warn those that do use Network Solutions that no billing activity on their account can lead to automatic deletion.

So they move up from “clown show” to “annoying to use.”

Monday, October 13, 2025

I don't think it's news to anyone out there that one should avoid Network Solutions for domain registration and probably for anything else as well

The rest of my domains have been transfered away from Network Solutions. The process wasn't hard. It wasn't even really tedious, it just took a bunch of waiting.

I would log into Network Solutions, click past a bunch of needless notifications and upsells, and request a domain transfer. I would then get a chance to renew the domain for the low-low price of $19.95, which technically is cheaper, but still twice the price that my new registar, Porkbun, charges. Click past that, and I would have to wait up to four days in order to change my mind before Network Solutions send the transfer key. You know, to keep me from making a rash decision to stop paying them money.

Once I got the transfer key, I would then transfer in the domain to Porkbun. Network Solutions would then send an email 24 hours later, informing me that I have four days to change my mind, but I should talk to one of their “transfer specialists” to help transfer my domain, because Network Solutions is adamant that I don't rush into transfering my domains away from them and thus, stop paying them.

Four days after that, I would receive email from both Network Solutions and Porkbun that the domain (or domains actually) transfered over. So the process was mostly a waiting game on the part of Network Solutions.

Now that I'm no longer using Network Solutions for domain registration, I want to delete my account there. Of course, there's no link on the Network Solutions to delete my account, you know, to keep me from making a rash decision. Nope, I have to call to talk to an “account specialist” to do that deed. And it turns out, there is no way for them to delete my account. None. Nada. Zip.

Let that sink in—there is no way to delete your Network Solutions account!

They're damn adamant that I keep my account, just in case!

The best I can do is delete my credit card information. You know, the same credit card information that you can't update what-so-ever. In reality, they have to manually delete the credit card information from my “from now until the Heat Death of the Universe” account at Network Solutions.

Good Lord. What a clown show!

Update on Thursday, October 16th, 2025

Account deletion at Network Solutions is a bit more nuanced than I thought.

Wednesday, October 01, 2025

Why do they even bother with /robots.txt?

It's still too early to see if limiting IP connections to my webserver is doing any good, but it's catching bots that are hammering a bit too hard. And the bots that are getting caught so far are all from the GOOGLE-CLOUD-PLATFORM ASN (34.174.0.0/17 for the record). Two dozen different IP addresses so far this day, with about half a dozen different forged user agents.

But what I noticed (and this happened last month too!) is that an individual IP address will come in like a wrecking ball, making as many requests in as short amount of time as it can, but that each such sequence will start with a request of /robots.txt. Like this:

34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /robots.txt HTTP/1.1" 200 34 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" -/- (-%)
34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /2022/09/22.1 HTTP/1.1" 200 5627 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5609/18725 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /refs/access.html HTTP/1.1" 200 4476 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4458/16246 (27%)
34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /refs/copyright.html HTTP/1.1" 200 3860 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 3842/14775 (26%)
34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET //2022/09/22.1 HTTP/1.1" 200 5627 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5609/18725 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /about/ HTTP/1.1" 200 4586 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4568/16282 (28%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09 HTTP/1.1" 200 14654 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 14636/44761 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/22.1 HTTP/1.1" 200 5628 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5610/18725 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2000/08/01 HTTP/1.1" 200 5675 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5657/19414 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /1999/12/04.1 HTTP/1.1" 200 4363 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4345/15504 (28%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET / HTTP/1.1" 200 14583 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 14565/44552 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/21.1 HTTP/1.1" 200 4421 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4403/16165 (27%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /about/technical.html HTTP/1.1" 200 7028 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 7010/23332 (30%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /index.json HTTP/1.1" 200 15951 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 15933/47681 (33%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/22 HTTP/1.1" 200 5538 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5520/18605 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /archive/ HTTP/1.1" 200 5171 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5153/30006 (17%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/24.1 HTTP/1.1" 200 4958 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4940/17340 (28%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /refs/glossary.html HTTP/1.1" 200 3844 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 3826/14690 (26%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /index.atom HTTP/1.1" 200 16863 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 16845/61183 (27%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09/26.2 HTTP/1.1" 200 4499 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4481/16293 (27%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2018/02 HTTP/1.1" 200 15624 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 15606/50795 (30%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/11 HTTP/1.1" 200 13793 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 13775/42797 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2015/07 HTTP/1.1" 200 34729 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 34711/108151 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2000/04 HTTP/1.1" 200 28253 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 28235/82627 (34%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2014/08 HTTP/1.1" 200 6190 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 6172/20372 (30%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2008/02 HTTP/1.1" 200 25223 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 25205/76745 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/08 HTTP/1.1" 200 16603 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 16585/52801 (31%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09/11 HTTP/1.1" 200 5089 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5071/17989 (28%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09/17 HTTP/1.1" 200 5420 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5402/18527 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/06 HTTP/1.1" 200 8392 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 8374/27002 (31%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2005/03 HTTP/1.1" 200 31336 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 31318/98041 (31%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/12 HTTP/1.1" 200 13553 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 13535/41052 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2012/08 HTTP/1.1" 200 7131 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 7113/24140 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /about/history.html HTTP/1.1" 200 6247 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 6229/20734 (30%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/06 HTTP/1.1" 200 14118 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 14100/46770 (30%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2021/11 HTTP/1.1" 200 18020 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 18002/57817 (31%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2007/11 HTTP/1.1" 200 25827 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 25809/77770 (33%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2004/09 HTTP/1.1" 200 23195 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 23177/70110 (33%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2009/09 HTTP/1.1" 200 13535 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 13517/41367 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2015/09 HTTP/1.1" 200 32956 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 32938/98767 (33%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2018/07 HTTP/1.1" 200 17319 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 17301/54361 (31%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2017/07 HTTP/1.1" 200 8116 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 8098/26633 (30%)

And that's it. A request for /robots.txt, then two seconds of requests, then gone.

I mean, I guess it's nice that it looks at /robots.txt, but how do I control it? Do I use Mozilla? Windows? AppleWebKit? KHTML? Gecko? Chrome? Safari? Or is it checking /robots.txt looking for places explicitly disallowed? (no, none of the bots from GOOGLE-CLOUD-PLATFORM has made requests to locations I've disallowed—I checked). Maybe for hints about speed? Maybe?

But I still don't get the referrer though. This case was https://google.com/, but I've also seen other referring links, like https://riotgames.com/ and https://northwestern.edu/.

Weird.

Friday, September 26, 2025

Oh, it's a bug on my side that prevents full conditional requests

I'm still pouring through web sever log files and I'm noticing that many of the feed readers fetching my various feeds files aren't using conditional requests. I was in the process of writing to the author of one of them describing the oversight when I noticed that particular feed reader using both methods of conditional requests: the If-Modified-Since: header and the If-None-Match header in conjunction with a HEAD request. I thought I should test that with my web server, just to make sure it was not a bug on my side.

It's a bug on my side!

Specifically, an Apache bug where compressed output interferes with the If-None-Match method. There is a workaround though:

RequestHeader edit "If-None-Match" '^"((.*)-gzip)"$' '"$1", "$2"'

That rewrites the incoming If-None-Match header to work around the bug. Now maybe that whole conditional request thang with my webserver will now work properly.

Sigh.


Yet more notes on web bot activity

For the past few months, every other week my server (which hosts this blog) would just go crazy for a day and require a full reboot to get back to normal. I haven't tracked down a root cause for this, but I do suspect it has to do with web bot activity increasing over the past few months. I ran a query over the logs for August, generating the number of requests per second and here are the top ten results:

timestamp host RPS
26/Aug/2025:03:26:36 -0400 76.14.125.194 740
26/Aug/2025:03:26:29 -0400 76.14.125.194 735
26/Aug/2025:03:26:35 -0400 76.14.125.194 697
26/Aug/2025:03:26:37 -0400 76.14.125.194 693
26/Aug/2025:03:25:54 -0400 76.14.125.194 666
26/Aug/2025:03:25:53 -0400 76.14.125.194 607
26/Aug/2025:03:26:28 -0400 76.14.125.194 589
26/Aug/2025:03:26:38 -0400 76.14.125.194 576
26/Aug/2025:03:26:17 -0400 76.14.125.194 574
26/Aug/2025:03:25:49 -0400 76.14.125.194 539

Websites like Google or My­Linked­Face­Tik­Insta­Pin­Me­Tok­Book­Trest­Space­Gram­In­We might be able to handle loads like this, but I'm running a blog on a single server. These numbers are insane! Fortunately, this level of activity didn't last for long, but it certainly made it “interesting” on my server for a few minutes:

# requests per minute
timestamp RPM
03:23 27
03:24 72
03:25 4752
03:26 11131
03:27 1185
03:28 58
03:29 26

It's looking like spikes in activity might be a reason for my server freaking out.

Apache doesn't come with a way to limit IP connections. A search lead me to mod_limitipconn, a simple module that limits an IP address to a maximum number of concurrent connections. There's nothing about rate limiting per se, but it can't hurt, and it's a simple enough to install.

So earlier this week, I installed it. I set a maximum connection limit of 30—that is, no single IP address can connect more than 30 times concurrently. I just picked a high enough number (possibly too high) to still allow legitimate traffic through while keeping the worst abuse away. The code as downloaded will return a “503 No Service” when it kicks in, but I changed it to return a “429 Too many requests” which better reflects the actual situation (I think the code was originally written before 429 was a valid response code).

And it's working. It's already caught 18 bots (or rather, bots with distinct IP addresses), and they are all from the same ASN: GOOGLE-CLOUD-PLATFORM, US (and the user agents are all obviously forged). But what's curious about these is that a subset of the requests include a referrer URL. Most browsers these days restrict sending the referring link, or outright don't send it at all (to respect privacy). So to see them is unusual by a web bot.

Even more curious is these referring links have nothing to do with the link being referenced. There are, so far this month, 147 requests from the GOOGLE-CLOUD-PLATFORM ASN sending a referrer to Slashdot. And I don't mean to a page on Slashdot, but to the main page of Slashdot. There are also referrers to Cisco (on 201 requests), Petrobras (on 581 requests), NBC News (on 221 requests) among 435 other websites being referenced on requests. I don't understand the reasoning here. It's not like I'll let through a request just because it came from Slashdot. I don't publish referring links. I know sites used to publish referring links back in the day, and spammers used this to gain Page Rank for their own pages (or for their clients) but that can't be worth it these days? Can it? Are these still old bots running but long forgotten? What is the angle here?

Anyway, I'll have to wait and see if limiting IP connections will solve my server issues. I do hope that's all it is.

Thursday, September 25, 2025

This is my markup language. There are plenty of others, but this is mine

The Lobster's Blog Carnival is up, and the theme is “What have you made for yourself?” While I have plenty of programs I wrote, there's one that I specifically wrote for my own use: MOPML.

I wrote it to make writing blog entries easier for me. For twenty years, I was hand-crafting HTML for each entry and I finally got tired of it. I wanted an easier way to make entries, so I started down the path of implementing my own markup language. Existing languages like Markdown or AsciiDOC didn't appeal to me and were a bit too generic in how they did things. I also wanted to steal ideas from TeX and Org Mode, as well as some ideas I had to support tags like <ABBR> (which not many sites bother doing).

As I already had twenty years of entries in HTML, one design goal was not to store the entries as MOPML, but to keep them their final HTML-rendered state. This meant that I could play around with the syntax of MOPML and not have to worry about breaking existing entries. Besides, if I had to edit a post after publication, I can edit the HTML directly; I have been doing that for years anyway. For the impementation, I chose Lua, specifically so I could use LPEG.

The TeX inpsired syntax are for items like M-dashes, where I can type three dashes like --- and get a single M-dash on output: —. Or even type typographical quotes where I can type ``This is quoted'' and get “This is quoted”. I even extended that so that when I type “1/2” I get “½”. It's also easy to add new entries to the particular parsing rule.

And while I was inpired by Org Mode for things such as tables and block quotes, I did not care for the syntax, so I changed it to suit my needs. A table is easy to generate:

#+table This is a caption
*header	foo	bar	baz
**footer	foo	bar	baz
Entry 1	3	14	15
Entry 2	92	62	82
Entry 3	8	-1	4
#-table

The #+table starts a table defintion, and is followed by an optional caption. A header row is marked by a starting asterisk, and a footer row is marked by two asterisks. Each field is separated by a tab character. The above example will produce the following table:

This is a caption
header foo bar baz
footer foo bar baz
Entry 1 3 14 15
Entry 2 92 62 82
Entry 3 8 -1 4

The above sample is yet another Org Mode inspried block:

#+source MOPML
#+table This is a caption
*header	foo	bar	baz
**footer	foo	bar	baz
Entry 1	3	14	15
Entry 2	92	62	82
Entry 3	8	-1	4
#-table
#-source

(For the record, I did have to go in after rendering this post and fix the above example, but I never intended to nest #+source blocks in the first place.)

I also have a defined block for when I quote email:

#+email
From: John Doe <{{johndoe@example.net}}>
To: sean@conman.org
Subject: Re: Morbi in lorem ut lectus accumsan
        placerat. Morbi TLA enim id turpis
Date: Mon, 1 Apr 2019 18:12:41 +0200

Lorem ipsum dolor sit amet, consectetur adipiscing elit.  Donec gravida
justo et aliquam lobortis.

#-email

The From: header has another formatting quirk—the {{ and }} denote text that is to be censored in the output. This is how I get those XXXXX­XX censor bars in my posts. The above will render the block as:

From
John Doe <XXXXX­XXXXX­XXXXX­XXXX>
To
sean@conman.org
Subject
Re: Morbi in lorem ut lectus accumsan placerat. Morbi TLA enim id turpis
Date
Mon, 1 Apr 2019 18:12:41 +0200

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec gravida justo et aliquam lobortis.

The Markdown-inspried bits are mostly with inline markup, such as *emphasis* for emphasis, and `code` for code. I never did like the Markdown or Org Mode syntax for links, so I played around with it for quite a bit until I got a syntax I like. So when I want to link to this page, I type {^https://www.conman.org/people/spc/about/ this page}. It's a minimal syntax that isn't likely to appear in normal text. And if I ever do want to include a “}” in a link, I can always escape it like {^/ sample \} text} to get sample } text.

But I think the best feature is how I handle abbreviations. HTML contans the <ABBR> tag to semantically mark up TLAs and what not. I wish most web authors would do this, as it would make reading about the MPZ easier to understand, and most browsers on the market will show a tooltip with the TITLE attribute if you hover over it.

I was moaning about this way back in 2003, and I finally have a method I'm happy with. All I do is include a block of abbreviations at the top of the post:

abbr:	HTML	HyperText Markup Language
	MOPML	My Own Private Markup Language
	LPEG	Lua Parsing Expression Grammar
	URL	Uniform Resource Locator
	TLS	Three Letter Acronyms
	MPZ	Medial Palisade Zone

The code will read this block and generate the LPEG code to recognize the acronym, such as TLA, and generate the appropriate HTML: <abbr title="Three Letter Acronym">TLA</abbr>, thus giving us our TLA with semantic markup.

I even solved what I called the IRA problem back in 2003, you know, when the IRA steals the IRAs from members of the IRA; or in other words, when you have the same TLA that maps to different meanings. And I can even mention IRA GERSHWIN without fear of it becoming Initial Risk Assessment GERSHWIN. The IRA problem is solved with yet another block definition at the top of the post:

abbr2:	IRAa	IRA	Irish Republican Army
	IRAr	IRA	International Reading Association
	IRAm	IRA	Individual Retirement Account

So I type IRAa and the code will generate <abbr title="Irish Republican Army">IRA</abbr>.

I suppose I could always include definitions of common TLAs I use in the code itself, but it hasn't been that big of an issue for me to just define the TLAs I use in the post itself.

That's pretty much all I have for a markup language. Yes, it's tailored to what I write and how I want to present it. I don't expect anyone to use this engine as it makes sense to me, but maybe not to you. And that's the point, this is for me to use. I made this for myself. And I'm lucky enough to be able to do so.


Maybe these will last longer than two years

Eight days later and I finally have my new glasses!

[Self-portrait with new glasses] Glasses.  Titanium, not steel.

I no longer have to tape them to my head to keep them on. Even better, I can now clean my glasses with cursing like a sailor.

Woot!

Obligatory Picture

[Self-portrait with my new glasses]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer

No AI was used in the making of this site, unless otherwise noted.

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2025 by Sean Conner. All Rights Reserved.