The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Wednesday, October 01, 2025

Why do they even bother with /robots.txt?

It's still too early to see if limiting IP connections to my webserver is doing any good, but it's catching bots that are hammering a bit too hard. And the bots that are getting caught so far are all from the GOOGLE-CLOUD-PLATFORM ASN (34.174.0.0/17 for the record). Two dozen different IP addresses so far this day, with about half a dozen different forged user agents.

But what I noticed (and this happened last month too!) is that an individual IP address will come in like a wrecking ball, making as many requests in as short amount of time as it can, but that each such sequence will start with a request of /robots.txt. Like this:

34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /robots.txt HTTP/1.1" 200 34 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" -/- (-%)
34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /2022/09/22.1 HTTP/1.1" 200 5627 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5609/18725 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /refs/access.html HTTP/1.1" 200 4476 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4458/16246 (27%)
34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /refs/copyright.html HTTP/1.1" 200 3860 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 3842/14775 (26%)
34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET //2022/09/22.1 HTTP/1.1" 200 5627 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5609/18725 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /about/ HTTP/1.1" 200 4586 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4568/16282 (28%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09 HTTP/1.1" 200 14654 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 14636/44761 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/22.1 HTTP/1.1" 200 5628 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5610/18725 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2000/08/01 HTTP/1.1" 200 5675 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5657/19414 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /1999/12/04.1 HTTP/1.1" 200 4363 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4345/15504 (28%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET / HTTP/1.1" 200 14583 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 14565/44552 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/21.1 HTTP/1.1" 200 4421 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4403/16165 (27%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /about/technical.html HTTP/1.1" 200 7028 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 7010/23332 (30%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /index.json HTTP/1.1" 200 15951 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 15933/47681 (33%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/22 HTTP/1.1" 200 5538 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5520/18605 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /archive/ HTTP/1.1" 200 5171 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5153/30006 (17%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/24.1 HTTP/1.1" 200 4958 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4940/17340 (28%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /refs/glossary.html HTTP/1.1" 200 3844 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 3826/14690 (26%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /index.atom HTTP/1.1" 200 16863 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 16845/61183 (27%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09/26.2 HTTP/1.1" 200 4499 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4481/16293 (27%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2018/02 HTTP/1.1" 200 15624 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 15606/50795 (30%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/11 HTTP/1.1" 200 13793 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 13775/42797 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2015/07 HTTP/1.1" 200 34729 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 34711/108151 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2000/04 HTTP/1.1" 200 28253 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 28235/82627 (34%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2014/08 HTTP/1.1" 200 6190 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 6172/20372 (30%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2008/02 HTTP/1.1" 200 25223 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 25205/76745 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/08 HTTP/1.1" 200 16603 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 16585/52801 (31%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09/11 HTTP/1.1" 200 5089 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5071/17989 (28%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09/17 HTTP/1.1" 200 5420 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5402/18527 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/06 HTTP/1.1" 200 8392 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 8374/27002 (31%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2005/03 HTTP/1.1" 200 31336 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 31318/98041 (31%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/12 HTTP/1.1" 200 13553 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 13535/41052 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2012/08 HTTP/1.1" 200 7131 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 7113/24140 (29%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /about/history.html HTTP/1.1" 200 6247 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 6229/20734 (30%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/06 HTTP/1.1" 200 14118 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 14100/46770 (30%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2021/11 HTTP/1.1" 200 18020 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 18002/57817 (31%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2007/11 HTTP/1.1" 200 25827 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 25809/77770 (33%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2004/09 HTTP/1.1" 200 23195 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 23177/70110 (33%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2009/09 HTTP/1.1" 200 13535 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 13517/41367 (32%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2015/09 HTTP/1.1" 200 32956 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 32938/98767 (33%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2018/07 HTTP/1.1" 200 17319 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 17301/54361 (31%)
34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2017/07 HTTP/1.1" 200 8116 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 8098/26633 (30%)

And that's it. A request for /robots.txt, then two seconds of requests, then gone.

I mean, I guess it's nice that it looks at /robots.txt, but how do I control it? Do I use Mozilla? Windows? AppleWebKit? KHTML? Gecko? Chrome? Safari? Or is it checking /robots.txt looking for places explicitly disallowed? (no, none of the bots from GOOGLE-CLOUD-PLATFORM has made requests to locations I've disallowed—I checked). Maybe for hints about speed? Maybe?

But I still don't get the referrer though. This case was https://google.com/, but I've also seen other referring links, like https://riotgames.com/ and https://northwestern.edu/.

Weird.

Obligatory Picture

[Self-portrait with my new glasses]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer

No AI was used in the making of this site, unless otherwise noted.

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2025 by Sean Conner. All Rights Reserved.