Friday, March 21, 2025
A deeper dive into mapping web requests via ASN, not by IP address
I went ahead and replaced IP addresses with ASNs in the log file to find the network that sent the most requests to my blog for the month of February.
MICROSOFT-CORP-MSN-AS-BLOCK, US | 78889 |
OVH, FR | 31837 |
ALIBABA-CN-NET Alibaba US Technology Co., Ltd., CN | 25019 |
HETZNER-AS, DE | 23840 |
GOOGLE-CLOUD-PLATFORM, US | 21431 |
CSTL, US | 17225 |
HURRICANE, US | 15495 |
AMAZON-AES, US | 14430 |
FACEBOOK, US | 13736 |
AKAMAI-LINODE-AP Akamai Connected Cloud, SG | 12673 |
Even though Alibaba US has the most unique IPs hitting my blog, Microsoft is still the network making the most requests. So let's see how Microsoft presents itself to my web server. Here are the user agents it sends:
agent | requests |
---|---|
Go-http-client/2.0 | 43236 |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot) | 23978 |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36 | 7953 |
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0 | 2955 |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot | 210 |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot | 161 |
DuckDuckBot/1.1; (+http://duckduckgo.com/duckduckbot.html) | 123 |
'DuckDuckBot-Https/1.1; (+https://duckduckgo.com/duckduckbot)' | 122 |
Python/3.9 aiohttp/3.10.6 | 28 |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.6478.36 Safari/537.36 | 14 |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.114 Safari/537.36 | 14 |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.68 | 10 |
DuckAssistBot/1.2; (+http://duckduckgo.com/duckassistbot.html) | 10 |
DuckAssistBot/1.1; (+http://duckduckgo.com/duckassistbot.html) | 10 |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36 | 6 |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.143 Safari/537.36 | 6 |
python-requests/2.32.3 | 5 |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.142 Safari/537.36 | 5 |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 | 4 |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:77.0) Gecko/20100101 Firefox/77.0 | 4 |
DuckDuckBot-Https/1.1; (+https://duckduckgo.com/duckduckbot) | 4 |
Twingly Recon | 3 |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot) | 3 |
Mozilla/5.0 (compatible; Twingly Recon; twingly.com) | 3 |
python-requests/2.28.2 | 2 |
newspaper/0.9.1 | 2 |
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36 | 2 |
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b | 2 |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36 | 2 |
http.rb/5.1.1 (Mastodon/4.2.10; +https://trystero.social/) Bot | 1 |
http.rb/5.1.1 (Mastodon/4.2.10; +https://trystero.social/) | 1 |
Mozilla/5.0 (Windows NT 6.1; WOW64) SkypeUriPreview Preview/0.5 skype-url-preview@microsoft.com | 1 |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36 | 1 |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36 | 1 |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.48 | 1 |
Mastodon/4.4.0-alpha.2 (http.rb/5.2.0; +https://sns.mszpro.com/) Bot | 1 |
Mastodon/4.4.0-alpha.2 (http.rb/5.2.0; +https://sns.mszpro.com/) | 1 |
Mastodon/4.3.3 (http.rb/5.2.0; +https://the.voiceover.bar/) Bot | 1 |
Mastodon/4.3.3 (http.rb/5.2.0; +https://the.voiceover.bar/) | 1 |
Mastodon/4.3.3 (http.rb/5.2.0; +https://discuss.systems/) Bot | 1 |
Mastodon/4.3.3 (http.rb/5.2.0; +https://discuss.systems/) | 1 |
The top result comes from a single IP address and probably requires a separate post about it, since it's weird and annoying. But the rest—you got Bing, you got OpenAI, you got several Mastodon instances—it seems like most of these are from Microsoft's cloud offering. A mixture of things.
What about Facebook?
agent | requests |
---|---|
meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler) | 13497 |
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) | 207 |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36 | 12 |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 | 4 |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 | 4 |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36 | 4 |
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:58.0) Gecko/20100101 Firefox/59.0 | 4 |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36 Edg/132.0.0.0 | 2 |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36 | 2 |
Hmm … looks like I have a few readers at Facebook, but other than that, nothing terribly interesting.
Alibaba,
on the other hand,
is frightening.
Out of 25,019 requests,
it presented 581 different user agents.
From looking at what was requested,
I don't think it's 500 Chinese people reading my blog—it's defintely bots crawling my site
(and amusingly, there are requests to /robots.txt
file,
but without a proper user agent to go by,
it's hard to block it via that file).
I can think of one conclusion here—if you do filter by ASN, it can help tremendously, but it also comes with possibly blocking legitimate traffic.