The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Friday, September 26, 2025

Yet more notes on web bot activity

For the past few months, every other week my server (which hosts this blog) would just go crazy for a day and require a full reboot to get back to normal. I haven't tracked down a root cause for this, but I do suspect it has to do with web bot activity increasing over the past few months. I ran a query over the logs for August, generating the number of requests per second and here are the top ten results:

timestamp host RPS
26/Aug/2025:03:26:36 -0400 76.14.125.194 740
26/Aug/2025:03:26:29 -0400 76.14.125.194 735
26/Aug/2025:03:26:35 -0400 76.14.125.194 697
26/Aug/2025:03:26:37 -0400 76.14.125.194 693
26/Aug/2025:03:25:54 -0400 76.14.125.194 666
26/Aug/2025:03:25:53 -0400 76.14.125.194 607
26/Aug/2025:03:26:28 -0400 76.14.125.194 589
26/Aug/2025:03:26:38 -0400 76.14.125.194 576
26/Aug/2025:03:26:17 -0400 76.14.125.194 574
26/Aug/2025:03:25:49 -0400 76.14.125.194 539

Websites like Google or My­Linked­Face­Tik­Insta­Pin­Me­Tok­Book­Trest­Space­Gram­In­We might be able to handle loads like this, but I'm running a blog on a single server. These numbers are insane! Fortunately, this level of activity didn't last for long, but it certainly made it “interesting” on my server for a few minutes:

# requests per minute
timestamp RPM
03:23 27
03:24 72
03:25 4752
03:26 11131
03:27 1185
03:28 58
03:29 26

It's looking like spikes in activity might be a reason for my server freaking out.

Apache doesn't come with a way to limit IP connections. A search lead me to mod_limitipconn, a simple module that limits an IP address to a maximum number of concurrent connections. There's nothing about rate limiting per se, but it can't hurt, and it's a simple enough to install.

So earlier this week, I installed it. I set a maximum connection limit of 30—that is, no single IP address can connect more than 30 times concurrently. I just picked a high enough number (possibly too high) to still allow legitimate traffic through while keeping the worst abuse away. The code as downloaded will return a “503 No Service” when it kicks in, but I changed it to return a “429 Too many requests” which better reflects the actual situation (I think the code was originally written before 429 was a valid response code).

And it's working. It's already caught 18 bots (or rather, bots with distinct IP addresses), and they are all from the same ASN: GOOGLE-CLOUD-PLATFORM, US (and the user agents are all obviously forged). But what's curious about these is that a subset of the requests include a referrer URL. Most browsers these days restrict sending the referring link, or outright don't send it at all (to respect privacy). So to see them is unusual by a web bot.

Even more curious is these referring links have nothing to do with the link being referenced. There are, so far this month, 147 requests from the GOOGLE-CLOUD-PLATFORM ASN sending a referrer to Slashdot. And I don't mean to a page on Slashdot, but to the main page of Slashdot. There are also referrers to Cisco (on 201 requests), Petrobras (on 581 requests), NBC News (on 221 requests) among 435 other websites being referenced on requests. I don't understand the reasoning here. It's not like I'll let through a request just because it came from Slashdot. I don't publish referring links. I know sites used to publish referring links back in the day, and spammers used this to gain Page Rank for their own pages (or for their clients) but that can't be worth it these days? Can it? Are these still old bots running but long forgotten? What is the angle here?

Anyway, I'll have to wait and see if limiting IP connections will solve my server issues. I do hope that's all it is.

Obligatory Picture

[Self-portrait with my new glasses]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer

No AI was used in the making of this site, unless otherwise noted.

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2025 by Sean Conner. All Rights Reserved.