The Boston Diaries

Friday, April 22, 2022

Notes on some extreme lawn ornaments, Brevard edition

Eight years ago (wow! Has it been that long? [Yes. —Editor] [Who asked you? —Sean]) while in Brevard, I took a picture of some extreme lawn ornaments—life sized plastic cows. I wrote the “eat moar chikin” image caption (if you hold your mouse over the image, it should pop up) because the cows reminded me of the cows used by Chick-fil-a.

I'm reading the Transylvania Times when I come across the article “Transylvanian of the Week: John Taylor.” He owns O.P. Taylor's, a well known toy store in the area, and he's the one with the life sized plastic cows in his front yard. Not only that, but he purchased them from the person who made them for Chick-fil-a. Little did I know that my caption was more correct than I thought.

Play stupid games, win stupid prizes

It's not only Gemini bots having issues with redirects. I'm poking around the logs from my webserver, when I scan all of them to see the breakdown of response codes my server is sending (for this month). And well … it's rather surprising:

Breakdown of HTTP response codes from all the sites I host
Status	Meaning	Count
Status	Meaning	Count
302	Found (moved temporarily)	253773
200	OK	178414
304	Not Modified	25552
404	Not Found	8214
301	Moved Permanently	6358
405	Method Not Allowed	1453
410	Gone	685
400	Bad Request	255
206	Partial Content	151
401	Unauthorized	48
500	Internal Server Error	24
403	Forbidden	4

I was not expecting that many temporary redirects. Was it some massive issue across all the sites? Or just a few? Well, it turned all of the temporary redirects were from one site: http://www.flummux.org/ (and no, I'm not linking to it as the reason why will become clear). I registered the domain way back in 2000 just as a place to play around with web stuff or to temporarly make files available without cluttering up my main websites. The site isn't meant to be at all serious.

Scanning the log file manually, I was seeing endless log entries like:

XXXXXXXXXXXXXXX - - [10/Apr/2022:20:55:05 -0400] "GET / HTTP/1.0" 302 284 "http://flummux.org/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; MRA 4.6 (build 01425); .NET CLR 1.0.3705; .NET CLR 2.0.50727)" -/- (-%)

That log entry indicates a “browser” from IP address XXXXXXXXXXXXXXX, identifying itself as “Mozilla (yada yada)” on the 10^th of April, attempted to get the main page, as referred by http://flummux.org/. And for how many times this happened, broken down by browser:

Top five user agents making the troublesome requests
Count	User agent
127100	Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; MRA 4.6 (build 01425); .NET CLR 1.0.3705; .NET CLR 2.0.50727)
126495	Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; InfoPath.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E)
42	Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36
36	CATExplorador/1.0beta (sistemes at domini dot cat; https://domini.cat/catexplorador/)
15	Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0

Ah, two “browsers” that don't limit the number of redirects they follow. And amusingly enough, both agents came from the same IP address. Or maybe it's the same agent, just lying about what it is. Who knows? Well, aside from the author(s) of said “browser.”

But what was all horribly confusing to me why the server was issuing a temporary redirect. Yes, if you try to go to http://flummux.org/ the server will repond with a permanent redirect (status 301) to http://www.flummux.org/ (the reasons for that is to canonicalize the URLs and avoid the “duplicate content penalty” from Google—I set this all up years ago). But the site shouldn't redirect again. I can bring the site up in my browser without issue (which is a visual … pun? Commentary? Joke? on the line “The sky above the port was the color of television, tuned to a dead channel.”).

And then I remembered—back in 2016, I set things up such that if the browser sent in a referring link, the page would temporarily redirect back to the referring link (which is why I'm not linking to it—you would just be redirected right back to this page). I set that up on a lark for some reason that now esacapes me. So the above “browsers” kept bouncing back and forth between flummux.org and www.flummux.org. For a quarter of a million requests.

Sigh.

In other news, bugs are nothing more than an inattention to detail.

I have now wrapped my brain around how it got that link

Martin Chang replied to my post about Gemini crawlers, saying that it was his crawler that had sent links like gemini://gemini.conman.org/boston/2008/04/30/2008/04/30.1 and decided to look into the issue. Well, he did, and he found it wasn't his issue, but mine.

Oh my.

Okay, so how did I end up generating links like gemini://gemini.conman.org/boston/2008/04/30/2008/04/30.1?

This is, first and foremost, a blog on the web. Each entry is stored as HTML, and when a request is made via gopher or Gemini, the entries making up the request are retrieved and converted to the appropriate format. As part of that conversion, links to the blog itself have to be translated appropriately, and that's where the error happened.

So, for example, the links for the above entry are collected:

http://www.cisco.com/
http://it.slashdot.org/article.pl?sid=08/04/29/2254242
http://www.arin.net/
2008/04/30.1#fn-2008-04-30-1-1
http://www.barracudanetworks.com/
http://answers.yahoo.com/question/index?qid=20080219010714AAnF91Q

Those links with a URL scheme are passed through as is, but #4 is special, not only is it a relative link to my blog, but it also contains a URL fragment, and that's where things went pear-shaped. The code to do the URL translations parsed each link as a URL, but for relative links, I used the string, not the parsed URL structure. As such, the code didn't work so well with URL fragments, and thus, I ended up with links like gemini://gemini.conman.org/boston/2008/04/30/2008/04/30.1 (for the record, the same bug was in the gopher translation code as well).

The fix, as for most bugs, was easy once the core issue was identified. The other issues I talked about are, as far as I can tell, not stuff I can fix.

Friday, April 22, 2022

Notes on some extreme lawn ornaments, Brevard edition

Play stupid games, win stupid prizes

I have now wrapped my brain around how it got that link

Obligatory Picture

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer