The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Friday, March 21, 2025

Now a bit about feed readers

There are a few bots acting less than optimally that aren't some LLM-based company scraping my site. I think. Anyway, the first one I mentioned:

Identifiers for 8.29.198.26
Agent Requests
Feedly/1.0 (+https://feedly.com/poller.html; 16 subscribers; ) 1667
Feedly/1.0 (+https://feedly.com/poller.html; 6 subscribers; ) 1419
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 6 subscribers; ) 938
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 16 subscribers; ) 811
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 8 subscribers; ) 94
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 37 subscribers; ) 17
Identifiers for 8.29.198.25
Agent Requests
Feedly/1.0 (+https://feedly.com/poller.html; 16 subscribers; ) 1579
Feedly/1.0 (+https://feedly.com/poller.html; 6 subscribers; ) 1481
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 6 subscribers; ) 905
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 16 subscribers; ) 741
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 8 subscribers; ) 90
Feedly/1.0 (+http://www.feedly.com/fetcher.html; 37 subscribers; ) 11

This is feedly, a company that offers a news reader (and I'd like to thank the 67 subscribers I have—thank you). The first issue I have about this client is the apparent redundant requests from six different clients. An issue because I only have three different feeds, the Atom feed, the RSS feed and the the JSON feed. The poller seems to be acting correctly—16 subscribers to my Atom feed and 6 to the RSS feed. The other four? The fetchers? I'm not sure what's going on there. There's one for the RSS feed, and three for the Atom feed. And one of them is a typo—it's requesting “//index.atom” instead of the proper “/index.atom” (but apparently Apache allows it). How do I have 16 subscribers to “/index.atom” and another 37 for “/index.atom”? What exactly, is the difference between the two? And can't you fix the “//index.atom” reference? To me, that's an obvious typo, one that could be verified by retreiving both “/index.atom” and “//index.atom” and seeing they're the same.

Anyway, the second issue I have with feedly is their apparent lack of caching on their end. They do not do a conditional request and while they aren't exactly slamming my server, they are making multiple requests per hour, and for a resource that doesn't change all that often (excluding today that is).

Then there's the bot at IP address 4.231.104.62. It made 43,236 requests to get “/index.atom”, 5 invalid requests in the form of “/gopher://gopher.conman.org/0Phlog:2025/02/…” and one other valid request for this page. It's not the 5 invalid requests or the 1 valid request that has me weirded out—it's the 43,236 to my Atom feed. That's one request every 55 seconds! And even worse—it's not a conditional request! Of all the bots, this is the one I feel most like blocking at the firewall level—just have it drop the packets entirely.

At least it supports compressed results.

Sheesh.

As for the rest—of the 109 bots that fetched the Atom feed at least once per day (I put the cut off at 28 requests or more durring February), only 31 did so conditionally. That's a horrible rate. And of the 31 that did so conditionally, most don't support compression. So on the one hand, the majority of bots that fetch the Atom feed do so compressed. On the other hand, it appears that the bots that do fetch conditionally most don't support compression.

Sigh.

Obligatory Picture

Dad was resigned to the fact that I was, indeed, a landlubber, and turned the boat around yet again …

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer

No AI was used in the making of this site, unless otherwise noted.

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2025 by Sean Conner. All Rights Reserved.