Wednesday, October 01, 2025
Why do they even bother with /robots.txt?
It's still too early to see if limiting IP connections to my webserver is doing any good, but it's catching bots that are hammering a bit too hard. And the bots that are getting caught so far are all from the GOOGLE-CLOUD-PLATFORM ASN (34.174.0.0/17 for the record). Two dozen different IP addresses so far this day, with about half a dozen different forged user agents.
But what I noticed
(and this happened last month too!)
is that an individual IP address will come in like a wrecking ball,
making as many requests in as short amount of time as it can,
but that each such sequence will start with a request of /robots.txt
.
Like this:
34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /robots.txt HTTP/1.1" 200 34 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" -/- (-%) 34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /2022/09/22.1 HTTP/1.1" 200 5627 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5609/18725 (29%) 34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /refs/access.html HTTP/1.1" 200 4476 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4458/16246 (27%) 34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET /refs/copyright.html HTTP/1.1" 200 3860 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 3842/14775 (26%) 34.174.101.217 - - [01/Oct/2025:01:05:03 -0400] "GET //2022/09/22.1 HTTP/1.1" 200 5627 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5609/18725 (29%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /about/ HTTP/1.1" 200 4586 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4568/16282 (28%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09 HTTP/1.1" 200 14654 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 14636/44761 (32%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/22.1 HTTP/1.1" 200 5628 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5610/18725 (29%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2000/08/01 HTTP/1.1" 200 5675 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5657/19414 (29%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /1999/12/04.1 HTTP/1.1" 200 4363 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4345/15504 (28%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET / HTTP/1.1" 200 14583 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 14565/44552 (32%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/21.1 HTTP/1.1" 200 4421 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4403/16165 (27%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /about/technical.html HTTP/1.1" 200 7028 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 7010/23332 (30%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /index.json HTTP/1.1" 200 15951 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 15933/47681 (33%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/22 HTTP/1.1" 200 5538 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5520/18605 (29%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /archive/ HTTP/1.1" 200 5171 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5153/30006 (17%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/09/24.1 HTTP/1.1" 200 4958 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4940/17340 (28%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /refs/glossary.html HTTP/1.1" 200 3844 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 3826/14690 (26%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /index.atom HTTP/1.1" 200 16863 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 16845/61183 (27%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09/26.2 HTTP/1.1" 200 4499 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 4481/16293 (27%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2018/02 HTTP/1.1" 200 15624 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 15606/50795 (30%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/11 HTTP/1.1" 200 13793 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 13775/42797 (32%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2015/07 HTTP/1.1" 200 34729 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 34711/108151 (32%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2000/04 HTTP/1.1" 200 28253 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 28235/82627 (34%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2014/08 HTTP/1.1" 200 6190 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 6172/20372 (30%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2008/02 HTTP/1.1" 200 25223 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 25205/76745 (32%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/08 HTTP/1.1" 200 16603 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 16585/52801 (31%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09/11 HTTP/1.1" 200 5089 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5071/17989 (28%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2025/09/17 HTTP/1.1" 200 5420 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 5402/18527 (29%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2022/06 HTTP/1.1" 200 8392 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 8374/27002 (31%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2005/03 HTTP/1.1" 200 31336 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 31318/98041 (31%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/12 HTTP/1.1" 200 13553 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 13535/41052 (32%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2012/08 HTTP/1.1" 200 7131 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 7113/24140 (29%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /about/history.html HTTP/1.1" 200 6247 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 6229/20734 (30%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2019/06 HTTP/1.1" 200 14118 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 14100/46770 (30%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2021/11 HTTP/1.1" 200 18020 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 18002/57817 (31%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2007/11 HTTP/1.1" 200 25827 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 25809/77770 (33%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2004/09 HTTP/1.1" 200 23195 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 23177/70110 (33%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2009/09 HTTP/1.1" 200 13535 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 13517/41367 (32%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2015/09 HTTP/1.1" 200 32956 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 32938/98767 (33%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2018/07 HTTP/1.1" 200 17319 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 17301/54361 (31%) 34.174.101.217 - - [01/Oct/2025:01:05:04 -0400] "GET /2017/07 HTTP/1.1" 200 8116 "https://google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36" 8098/26633 (30%)
And that's it.
A request for /robots.txt
,
then two seconds of requests,
then gone.
I mean,
I guess it's nice that it looks at /robots.txt
,
but how do I control it?
Do I use Mozilla? Windows? AppleWebKit? KHTML? Gecko? Chrome? Safari?
Or is it checking /robots.txt
looking for places explicitly disallowed?
(no,
none of the bots from GOOGLE-CLOUD-PLATFORM has made requests to locations I've disallowed—I checked).
Maybe for hints about speed?
Maybe?
But I still don't get the referrer though.
This case was https://google.com/
,
but I've also seen other referring links,
like https://riotgames.com/
and https://northwestern.edu/
.
Weird.