Thursday, July 11, 2019
Yet more observations about the MJ12Bot
I received a reply about MJ12Bot! Let's see …
- From
- Majestic <XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX>
- To
- Sean Conner <sean@conman.org>
- Subject
- [Majestic] Re: Your robot is making bogus requests to my webserver
- Date
- Thu, 11 Jul 2019 08:34:13 +0000
##- Please type your reply above this line -##
Oh … really? Sigh.
Anyway, the only questionable bit in the email was this line:
The prefix
//
in a link of course refers to the same site as the current page, over the same protocol, so this is why these URLs are being requested back from your server.
which is … somewhat correct. It does mean “use the same protocol” but the double slash denotes a “network path reference” (RFC-3986, section 4.2) where, at a minimum, a hostname is required. If this is just a misunderstanding on the developers' part, it could explain the behavior I'm seeing.
And speaking of behavior, I decided to check the logs (again, using last month) one last time for two reports.
404 (not found) | 200 (okay) | Total requests | User agent |
---|---|---|---|
170 | 42676 | 46334 | The Knowledge AI |
21 | 36088 | 38097 | Mozilla/5.0 (compatible; SemrushBot/3~bl; +http://www.semrush.com/bot.html) |
46 | 16633 | 17130 | Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/) |
5 | 15840 | 15928 | Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/) |
3 | 12304 | 12353 | Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) |
36 | 8412 | 8929 | Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler) |
7 | 8428 | 8908 | Gigabot |
5680 | 2015 | 7872 | Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) |
28 | 6604 | 6942 | Barkrowler/0.9 (+http://www.exensa.com/crawl) |
0 | 4705 | 4737 | istellabot/t.1.13 |
404 (not found) | 200 (okay) | Total requests | User agent |
---|---|---|---|
5680 | 2015 | 7872 | Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) |
656 | 109 | 768 | Mozilla/5.0 (compatible; MJ12bot/v1.4.7; http://mj12bot.com/) |
177 | 45 | 553 | Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2) |
170 | 42676 | 46334 | The Knowledge AI |
120 | 0 | 120 | Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) |
(Note: The number of 404s and 200s might not add up to the total—there might be other requests that returned a different status not reported here.)
MJ12Bot is the 8th most active client on my site, yet it has the top two spots for bad requests, beating out #3 by over an order of magnitude (35 times the amount in fact).
But I don't have to worry about it since the email also stated they removed my site from their crawl list. Okay … I guess?