The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Wednesday, July 10, 2019

Some more observations about the MJ12Bot

I received another reply from MJ12Bot about their badly written bot and it just said the person responsible for handling enquiries was out of the office for the day and I should expect a reponse tomorrow. We shall see. In the mean time, I decided to check some of the other bots hitting my site and see how well they fare, request wise. And I'm using the logs from last month for this, so these results are for 30 days of traffic.

Top 10 bots hitting The Boston Diaries
requests percentage user agent
167235 70 Total (out of 239641)
46334 19 The Knowledge AI
38097 16 Mozilla/5.0 (compatible; SemrushBot/3~bl; +http://www.semrush.com/bot.html)
17130 7 Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
15928 7 Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/)
12358 5 Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
8929 4 Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
8908 4 Gigabot
7872 3 Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)
6942 3 Barkrowler/0.9 (+http://www.exensa.com/crawl)
4737 2 istellabot/t.1.13

So let's see some results:

Results of bot queries
Bot 200 % 301 % 304 % 400 % 403 % 404 % 410 % 500 % Total %
The Knowledge AI 42676 92.1 3352 7.2 0 0.0 127 0.3 4 0.0 170 0.4 5 0.0 0 0.0 46334 100.0
SemrushBot/3~bl 36088 94.7 1873 4.9 0 0.0 110 0.3 0 0.0 21 0.1 5 0.0 0 0.0 38097 100.0
BLEXBot/1.0 16633 97.1 208 1.2 124 0.7 114 0.7 0 0.0 46 0.3 5 0.0 0 0.0 17130 100.0
AhrefsBot/6.1 15840 99.4 78 0.5 0 0.0 4 0.0 0 0.0 5 0.0 0 0.0 1 0.0 15928 99.9
bingbot/2.0 12304 99.6 35 0.3 0 0.0 6 0.0 0 0.0 3 0.0 5 0.0 0 0.0 12353 99.9
MegaIndex.ru/2.0 8412 94.2 456 5.1 0 0.0 24 0.3 0 0.0 36 0.4 1 0.0 0 0.0 8929 100.0
Gigabot 8428 94.6 448 5.0 0 0.0 23 0.3 0 0.0 7 0.1 2 0.0 0 0.0 8908 100.0
MJ12bot/v1.4.8 2015 25.6 175 2.2 0 0.0 2 0.0 0 0.0 5680 72.2 0 0.0 0 0.0 7872 100.0
Barkrowler/0.9 6604 95.1 300 4.3 0 0.0 10 0.1 0 0.0 28 0.4 0 0.0 0 0.0 6942 99.9
istellabot/t.1.13 4705 99.3 28 0.6 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 4 0.1 4737 100.0

Percentage wise of the top 10 bots hitting my blog (and in fact, these are the 10 ten clients hitting my blog) MJ12Bot is just bad at 72% bad requests. It's hard to say what the second worst one is, but I'll have to give it to “The Knowledge AI” bot (and my search-foo is failing me in finding anything about this one). Percentage wise, it's about on-par with the others, but some of its requests are also rather odd:

It appears to be a similar problem as MJ12Bot, but one that doesn't happen nearly as often.

Now, this isn't to say I don't have some legitimate “not found“ (404) results. I did come across some actual valid 404 results on my own blog:

Some are typos, some are placeholders for links I forgot to add. And those I can fix. I just wish someone would fix MJ12Bot. Not because it's bogging down my site with unwanted traffic, but because it's just bad at what it does.

Obligatory Picture

[It's the most wonderful time of the year!]

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

http://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2019 by Sean Conner. All Rights Reserved.