Friday, March 21, 2025
Now a bit about feed readers
There are a few bots acting less than optimally that aren't some LLM-based company scraping my site. I think. Anyway, the first one I mentioned:
Agent | Requests |
---|---|
Feedly/1.0 (+https://feedly.com/poller.html; 16 subscribers; ) | 1667 |
Feedly/1.0 (+https://feedly.com/poller.html; 6 subscribers; ) | 1419 |
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 6 subscribers; ) | 938 |
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 16 subscribers; ) | 811 |
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 8 subscribers; ) | 94 |
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 37 subscribers; ) | 17 |
Agent | Requests |
---|---|
Feedly/1.0 (+https://feedly.com/poller.html; 16 subscribers; ) | 1579 |
Feedly/1.0 (+https://feedly.com/poller.html; 6 subscribers; ) | 1481 |
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 6 subscribers; ) | 905 |
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 16 subscribers; ) | 741 |
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 8 subscribers; ) | 90 |
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 37 subscribers; ) | 11 |
This is feedly, a company that offers a news reader (and I'd like to thank the 67 subscribers I have—thank you). The first issue I have about this client is the apparent redundant requests from six different clients. An issue because I only have three different feeds, the Atom feed, the RSS feed and the the JSON feed. The poller seems to be acting correctly—16 subscribers to my Atom feed and 6 to the RSS feed. The other four? The fetchers? I'm not sure what's going on there. There's one for the RSS feed, and three for the Atom feed. And one of them is a typo—it's requesting “//index.atom” instead of the proper “/index.atom” (but apparently Apache allows it). How do I have 16 subscribers to “/index.atom” and another 37 for “/index.atom”? What exactly, is the difference between the two? And can't you fix the “//index.atom” reference? To me, that's an obvious typo, one that could be verified by retreiving both “/index.atom” and “//index.atom” and seeing they're the same.
Anyway, the second issue I have with feedly is their apparent lack of caching on their end. They do not do a conditional request and while they aren't exactly slamming my server, they are making multiple requests per hour, and for a resource that doesn't change all that often (excluding today that is).
Then there's the bot at IP address 4.231.104.62. It made 43,236 requests to get “/index.atom”, 5 invalid requests in the form of “/gopher://gopher.conman.org/0Phlog:2025/02/…” and one other valid request for this page. It's not the 5 invalid requests or the 1 valid request that has me weirded out—it's the 43,236 to my Atom feed. That's one request every 55 seconds! And even worse—it's not a conditional request! Of all the bots, this is the one I feel most like blocking at the firewall level—just have it drop the packets entirely.
At least it supports compressed results.
Sheesh.
As for the rest—of the 109 bots that fetched the Atom feed at least once per day (I put the cut off at 28 requests or more durring February), only 31 did so conditionally. That's a horrible rate. And of the 31 that did so conditionally, most don't support compression. So on the one hand, the majority of bots that fetch the Atom feed do so compressed. On the other hand, it appears that the bots that do fetch conditionally most don't support compression.
Sigh.